The Retrieval of Forest and Grass Fractional Vegetation Coverage in Mountain Regions Based on Spatio-Temporal Transfer Learning

: The vegetation cover of forests and grasslands in mountain regions plays a crucial role in regulating climate at both regional and global scales. Thus, it is necessary to develop accurate meth-ods for estimating and monitoring fractional vegetation cover (FVC) in mountain areas. However, the complex topographic and climate factors pose significant challenges to accurately estimating the FVC of mountain forests and grassland. Existing remote sensing products, FVC retrieval methods, and FVC samples may fail to meet the required accuracy standards. In this study, we propose a method based on spatio-temporal transfer learning for the retrieval of FVC in mountain forests and grasslands, using the mountain region of Huzhu County, Qinghai Province, as the study area. The method combines simulated FVC samples, Sentinel-2 images


Introduction
Mountains cover approximately 30% of the Earth's surface.Forest and grassland, as important components of mountain ecosystems, provide critical resources for human survival and development and prevent natural disasters such as landslides and debris flows.Additionally, they play a crucial role in maintaining regional and global ecological balance and regulating climate change [1][2][3].Thus, monitoring and protecting forest and grass vegetation cover in mountain areas is essential.Fractional Vegetation Cove (FVC) is defined as a vertical projection of the areal proportion of a landscape occupied by green vegetation [4], which can characterize the vegetation coverage and is often used to quantify the dynamic changes of vegetation on a regional or global scale [5].With global climate change and the intensification of human activities, forest and grass vegetation coverage in the mountains, particularly plateau areas with relatively fragile ecological environments, has been destroyed.Therefore, the dynamic monitoring and protection of FVC in mountain areas is a critical and long-term research topic that requires sustained attention.
Remote sensing Earth observation provides key technology for large-scale, multitemporal FVC retrieval and monitoring at regional and global scales, and it is also available in mountain areas.At present, remote sensing-based FVC products, including GEOV2 FVC [6], GEOV3 FVC [7], and GLASS FVC [8], have been extensively utilized for global FVC monitoring and have achieved positive results [9][10][11].However, these products typically have coarse spatial resolutions ranging from hundreds of meters to kilometers [12], and higher-resolution FVC products are relatively scarce.The high diversity of forest and grass types, frequent cloud cover, and rugged topography in mountain areas make it challenging for existing FVC products to capture the fine-scale heterogeneity of mountain topography, which results in increased uncertainties in FVC estimation [13][14][15][16].Therefore, to ensure the accuracy of FVC estimation in mountain regions, it is necessary to adopt higher-resolution remote sensing images and consider topographic factors [17,18].
The methods of FVC estimation include a physically based model, linear spectral mixture models (LSMA) based on vegetation index (VI), and machine learning (ML).The physical model is mainly to establish a look-up table (LUT) based on PROSAIL [19], a vegetation radiative transfer (RT) model, to estimate the FVC.However, physical methods require more input parameters and prior knowledge of the vegetation canopy.The complex topography and lack of prior knowledge of vegetation in mountain areas increase the difficulty of FVC estimation.The LSMA model [20,21], known as the pixel dichotomy model, is the simplest FVC estimation method and is commonly used in medium-resolution or high-resolution images.Nevertheless, the normalized difference vegetation index (NDVI), as the variable of the pixel dichotomy model, usually exhibits the phenomena of "saturation" in areas with dense vegetation canopy cover in mountain areas, resulting in an overestimation of FVC [22,23].The ML [24] method establishes an FVC estimation model by training a large number of high-precision samples composed of FVC values and spectral reflectance or VI.This method has been widely applied due to its excellent ability to handle nonlinear problems and reliable results.Typically, high-precision FVC samples come from drone images in the field or satellite images with fine pixel scale (resolution ≤ 1 m) [25,26].However, mountain areas are easily affected by cloud and fog cover, and the topographic effect causes huge differences in slope vegetation spectral reflectance, resulting in scarce and difficult-to-obtain high-precision FVC samples.In addition, traditional ML algorithms are only applicable to vegetation bands or VIs in specific areas, and for images with different resolutions and temporal phases, the dataset and training need to be reconstructed, resulting in poor transferability.
Transfer learning (TL) is used to improve a learner from one domain by transferring information from a related domain [27][28][29][30].In the study of vegetation quantitative retrieval, TL achieves model transfer by applying a pre-trained model from high-precision samples in one region to a new region or new remote sensing data source.Research has reported that the TL model is more robust than traditional ML models in crop FVC and yield estimation [31][32][33].Astola [34] used the DNN-based TL model to predict structure variables of several forests, revealing that TL can solve the problem of model performance degradation caused by insufficient reference data in the field.TL can be realized by pre-training simulated samples, while high-precision samples are lacking.The RT mode can provide sufficient training samples for the TL model by generating simulated vegetation parameters.Yu [35] estimated the FVC of winter wheat in Sentinel-2 images (RMSE = 0.06) by using the LSTM model pre-trained on the simulated FVC samples generated with PROSAIL.
The TL model trained with simulated FVC holds great potential in predicting spatiotemporal variations of FVC in complex high-altitude mountain regions.Estimating FVC in mountain regions requires the consideration of not only vegetation spectrum and VIs but also topographic gradients, incorporating slope and aspect.At present, the RT models employed in generating vegetation parameter samples seldom account for the effects of topographic undulation, leading to limited sample accuracy in complex mountain regions.The accuracy of these samples can also impact the model's precision in predicting FVC spatio-temporal variations.
Aiming at the current problem of lack of FVC training samples in mountain areas and difficulty in obtaining them, this paper proposed a method for estimating forest and grassland FVC distribution based on spatio-temporal TL in mountain areas, considering topographic factors.The flowchart of the proposed approach is shown in Figure 1: (1) We established a sample classification system based on the characteristics of mountain areas, considering the type of surface features and the characteristics of altitude gradients.
(2) Based on the high-resolution remote sensing data, we established the FVC sample by using the PROSAIL model.(3) We Used FVC samples to train 1DCNN and LSTM models.(4) We fine-tuned the pre-trained model and obtained the distribution of FVC. in mountain regions requires the consideration of not only vegetation spectrum and VIs but also topographic gradients, incorporating slope and aspect.At present, the RT models employed in generating vegetation parameter samples seldom account for the effects of topographic undulation, leading to limited sample accuracy in complex mountain regions.The accuracy of these samples can also impact the model's precision in predicting FVC spatio-temporal variations.Aiming at the current problem of lack of FVC training samples in mountain areas and difficulty in obtaining them, this paper proposed a method for estimating forest and grassland FVC distribution based on spatio-temporal TL in mountain areas, considering topographic factors.The flowchart of the proposed approach is shown in Figure 1

Model performance comparison
Vegetation Type Altitudes

Regional scale FVC distribution
Training Samples Groups

Model Training and Transfer Learning
Data Pre-processing and FVC Samples Generation

Study Area
The study area (Figure 2) is located in the mountain area (101°48′~102°42′E, 36°29′~37°11′N) of Huzhu Tu Autonomous County, Qinghai Province, China.The study area covers an area of 3321 km 2 , with an elevation ranging from 2100 m to 4360 m.The southwest part of Huzhu County is dominated by a basin with crops and urban, while the northeast part is a mountain area covered by grassland and forest.This region belongs to the continental cold temperate climate, the average annual temperature is 5.8 °C, the average annual precipitation is 447 mm, and the mountain vegetation in this region shows obvious seasonal changes.

Study Area
The study area (Figure 2) is located in the mountain area (101  2 , with an elevation ranging from 2100 m to 4360 m.The southwest part of Huzhu County is dominated by a basin with crops and urban, while the northeast part is a mountain area covered by grassland and forest.This region belongs to the continental cold temperate climate, the average annual temperature is 5.8 • C, the average annual precipitation is 447 mm, and the mountain vegetation in this region shows obvious seasonal changes.

Remote Sensing Image Data and Preprocessing
In the study, the multispectral time series remote sensing images of Sentinel-2 with 10 m resolution and HJ-2A/B with 16 m resolution are selected.Sentinel-2 imagery was utilized to construct a TL sample dataset in the study area.To reflect the seasonal variation of FVC in the study area, Sentinel-2 data with cloud cover of less than 20% from May to October during 2019-2022 were selected.The Sentinel-2 data was sourced from the Copernicus Open Access Hub, ESA (https://scihub.copernicus.eu/dhus/#/home,accessed on 2 October 2023).HJ-2A/B data served as the target images for FVC retrieval and were obtained from the China Centre for Resources Data and Application (https://data.cresda.cn/#/2dMap,accessed on 2 October 2023).The digital Satellite elevation model (DEM) of the study area was obtained from the Shuttle Radar Topography Mission (SRTM) with a 30 m spatial resolution and was used to calculate slope and aspect.
Preprocessing operations, including radiometric calibration, atmospheric correction, and geometric correction, were conducted to obtain a reflectance dataset of satellite images.We used the monthly maximum synthesis method [36] to remove cloud shadow interference and obtained 24 phases of Sentinel-2 and 6 phases of HJ-2A/B cloud-free time series images.The DEM was resampled to the same resolution as the Sentinel-2 and HJ-2A/B images, respectively, and topographic slope and aspect were calculated.

Site FVC
Site FVC for validation was taken in the mountain vegetation coverage area of Huzhu County from 10 to 12 September 2022.At all measurement points, FVC was recorded by analyzing digital photographs and visual estimation, and a total of 43 ground measurement points were obtained.The FVC measurement value can be obtained by calculating the percentage of green pixels in the white rectangle frame in the digital photographs taken on the spot (Figure 3b) to all pixels in the white rectangle.The green pixels and nongreen pixels in digital photographs are extracted with Marcial-Pablo's method [37], and the FVC measurement value calculated is as follows: where  is the measured FVC, and  and  are the green pixels and the total number of pixels in the digital photograph, respectively.

Data Sources 2.2.1. Remote Sensing Image Data and Preprocessing
In the study, the multispectral time series remote sensing images of Sentinel-2 with 10 m resolution and HJ-2A/B with 16 m resolution are selected.Sentinel-2 imagery was utilized to construct a TL sample dataset in the study area.To reflect the seasonal variation of FVC in the study area, Sentinel-2 data with cloud cover of less than 20% from May to October during 2019-2022 were selected.The Sentinel-2 data was sourced from the Copernicus Open Access Hub, ESA (https://scihub.copernicus.eu/dhus/#/home,accessed on 1 October 2023).HJ-2A/B data served as the target images for FVC retrieval and were obtained from the China Centre for Resources Data and Application (https://data.cresda.cn/#/2dMap,accessed on 1 October 2023).The digital Satellite elevation model (DEM) of the study area was obtained from the Shuttle Radar Topography Mission (SRTM) with a 30 m spatial resolution and was used to calculate slope and aspect.
Preprocessing operations, including radiometric calibration, atmospheric correction, and geometric correction, were conducted to obtain a reflectance dataset of satellite images.We used the monthly maximum synthesis method [36] to remove cloud shadow interference and obtained 24 phases of Sentinel-2 and 6 phases of HJ-2A/B cloud-free time series images.The DEM was resampled to the same resolution as the Sentinel-2 and HJ-2A/B images, respectively, and topographic slope and aspect were calculated.

Site FVC
Site FVC for validation was taken in the mountain vegetation coverage area of Huzhu County from 10 to 12 September 2022.At all measurement points, FVC was recorded by analyzing digital photographs and visual estimation, and a total of 43 ground measurement points were obtained.The FVC measurement value can be obtained by calculating the percentage of green pixels in the white rectangle frame in the digital photographs taken on the spot (Figure 3b) to all pixels in the white rectangle.The green pixels and non-green pixels in digital photographs are extracted with Marcial-Pablo's method [37], and the FVC measurement value calculated is as follows: where FVC measured is the measured FVC, and pixel green and pixel total are the green pixels and the total number of pixels in the digital photograph, respectively.

Vegetation Cover Classification
In this study, the Global Land Cover with Fine Classification System at 30 m in 2020 [38] was utilized as a reference dataset for land cover classification.The experimental area includes various land cover types, such as grassland, forest, cropland, water bodies, and urban areas, as illustrated in Figure 2c.For this study, we focused on two types of vegetation, namely forest and grassland.Based on the distribution of vegetation at different elevations, we divided the mountain area into 9 regions by using a gradient of 500 m for forest and grassland classification.

FVC Training Samples of Mountain Area
Due to topographic and climatic conditions in plateau mountain areas [39], it is difficult to obtain vegetation coverage sample data on the spot, and the existing vegetation coverage sample data are not accurate enough, and the representativeness of mountain characteristics is insufficient.This paper needs to use PROSAIL simulation data to establish highly accurate vegetation coverage training samples [40][41][42].By inputting the physical and chemical parameters of the leaves, structural parameters, and parameters such as light and soil (Table 1), the spectral reflectance of the vegetation canopy in the range of 400-2500 nm was simulated.Since there is a functional relationship between vegetation coverage and LAI, the simulated vegetation coverage data can be obtained by inputting LAI into the PROSAIL model, and the calculation method is as follows: where  () is the function of gap fraction,  is the sun zenith angle,  is defined as the average leaf angle, and (,  ) is the function of solar zenith angle and mean leaf inclination, expressed as an orthographic projection of unit leaf area. is the leaf dispersion or clumping, and it can be considered that the canopy leaves are randomly distributed in this study, that is,  = 1.

Vegetation Cover Classification
In this study, the Global Land Cover with Fine Classification System at 30 m in 2020 [38] was utilized as a reference dataset for land cover classification.The experimental area includes various land cover types, such as grassland, forest, cropland, water bodies, and urban areas, as illustrated in Figure 2c.For this study, we focused on two types of vegetation, namely forest and grassland.Based on the distribution of vegetation at different elevations, we divided the mountain area into 9 regions by using a gradient of 500 m for forest and grassland classification.

FVC Training Samples of Mountain Area
Due to topographic and climatic conditions in plateau mountain areas [39], it is difficult to obtain vegetation coverage sample data on the spot, and the existing vegetation coverage sample data are not accurate enough, and the representativeness of mountain characteristics is insufficient.This paper needs to use PROSAIL simulation data to establish highly accurate vegetation coverage training samples [40][41][42].By inputting the physical and chemical parameters of the leaves, structural parameters, and parameters such as light and soil (Table 1), the spectral reflectance of the vegetation canopy in the range of 400-2500 nm was simulated.Since there is a functional relationship between vegetation coverage and LAI, the simulated vegetation coverage data can be obtained by inputting LAI into the PROSAIL model, and the calculation method is as follows: where P 0 (θ) is the function of gap fraction, θ is the sun zenith angle, θ 1 is defined as the average leaf angle, and G(θ, θ 1 ) is the function of solar zenith angle and mean leaf inclination, expressed as an orthographic projection of unit leaf area.λ 0 is the leaf dispersion or clumping, and it can be considered that the canopy leaves are randomly distributed in this study, that is, λ 0 = 1.The simulated spectrum and FVC samples were first obtained through the PROSAIL model, and the corresponding relationship between FVC and four bands of visible light (red, green, blue) and near-infrared bands was established through an artificial neural network as a generator of high-precision FVC samples.Then, FVC training labels based on the Sentinel-2 multispectral data are generated by using the sample generator.

Features Extraction
In this study, several vegetation indices (VIs) representing vegetation growth and topographic parameters of sample points were used as characteristic variables.The vegetation index includes the Normalized Difference Vegetation Index (NDVI) [43], Green Normalized Difference Vegetation Index (GNDVI) [44], Enhanced Vegetation Index (EVI) [45], Difference Vegetation Index (DVI) [46], and Modified Soil-Adjusted Vegetation Index (MSAVI) [47], which were calculated based on the spectra of Sentinel-2.Topographic features included the slope and aspect of the mountains in the study area.

Deep Transfer Learning Method
Deep transfer learning is a method of fine-tuning pre-trained weights by establishing a neural network model.In this study, one-dimensional convolutional neural network (1D-CNN) and Long Short-Term Memory (LSTM) were used to fine-tune the pre-trained weights to obtain a FVC retrieval model suitable for remote sensing time series images and to realize the temporal and spatial migration of the model.

1D-CNN Neural Network Model
The 1D-CNN [48,49] is a convolutional neural network variant that is primarily utilized for one-dimensional sequence data, such as audio and text, among others.It has also been applied in medicine for patient ECG classification [50].Compared to traditional fully connected neural networks, 1D-CNN can more effectively handle the local relationships in sequence data, thereby yielding superior performance in processing remote sensing time series data.This neural network is composed of five parts: input layer, convolution layer, pooling layer, fully connected layer, and output layer.The convolution layer carries out a convolution operation on the input data by sliding a fixed-size convolution kernel, which extracts the features in the convolution kernel and maps them to the next layer.The 1D-CNN network can reduce the dimensionality and computation of the feature map by setting the pooling layer after the convolutional layer.Finally, a fully connected layer is employed to map the extracted features to the output layer.
Figure 4a depicts the 1D-CNN structure, where the hidden layer is comprised of four convolutional layers with 64, 128, 256, and 64 neurons.A dropout layer is added before the fully connected layer of the model to remove a small number of neuron nodes randomly to prevent overfitting.The time series data of VI and topographic factors serve as input for the model, and the predicted value of FVC is outputted.

LSTM Neural Network Model
LSTM neural network [51] is a commonly used neural network model in processing remote sensing time series image data.Time series correlation can be used to reflect the time series characteristics of forecasting targets.In the LSTM network structure, each storage unit includes forget gate, input gate, and output gate control mechanism [52,53], which can effectively avoid the problem of "gradient disappearance" in the process of training time series data [54].
The data source includes VI with time series features and slope and aspect with nontime series features.Thus, the training features are divided into time series data and nontime series data, and the LSTM network of Figure 4b is constructed with reference to the crop prediction model of Cao [55].The FVC prediction model includes input layer, LSTM layer, and dense layer.The LSTM layer consists of 5 layers of LSTM units.These layers of LSTM units contain 64, 128, 128, and 64 storage units (Cells).Firstly, the time series data features are extracted by LSTM and then input into the fully connected layer together with topographic features such as topographic slope and aspect, and finally, the FVC prediction value is obtained.

Pre-Training Samples and Valid Samples
The high-precision pre-training samples used in the model include over 150,000 randomly selected pixels from remote sensing images.

LSTM Neural Network Model
LSTM neural network [51] is a commonly used neural network model in processing remote sensing time series image data.Time series correlation can be used to reflect the time series characteristics of forecasting targets.In the LSTM network structure, each storage unit includes forget gate, input gate, and output gate control mechanism [52,53], which can effectively avoid the problem of "gradient disappearance" in the process of training time series data [54].
The data source includes VI with time series features and slope and aspect with non-time series features.Thus, the training features are divided into time series data and non-time series data, and the LSTM network of Figure 4b is constructed with reference to the crop prediction model of Cao [55].The FVC prediction model includes input layer, LSTM layer, and dense layer.The LSTM layer consists of 5 layers of LSTM units.These layers of LSTM units contain 64, 128, 128, and 64 storage units (Cells).Firstly, the time series data features are extracted by LSTM and then input into the fully connected layer together with topographic features such as topographic slope and aspect, and finally, the FVC prediction value is obtained.

Pre-Training Samples and Valid Samples
The high-precision pre-training samples used in the model include over 150,000 randomly selected pixels from remote sensing images.To achieve TL of the model on HJ-2A/B images, we initialized the LSTM and 1D-CNN models with pre-trained neural network weights on FVC samples.Before inputting the feature variables of HJ-2A/B images, the fully connected layer of the pre-trained model was replaced with a new one.The front layers of the network were frozen, and the fully connected layer was retrained with the feature of the target image to accomplish fine-tuning.

Validation
To evaluate the accuracy of the simulated FVC and the deep transfer learning FVC retrieval model, we used the coefficient of determination (R 2 ), and the root mean square error (RMSE).They were calculated as below: where y i and ŷi are the observed FVC values and predicted FVC values with FVC retrieval models, y i is the mean FVC value on remote sensing image, and n is the total number of samples.

Feature Importance
The importance will serve as the criterion for analyzing the estimation results of features on FVC.The SHAP (Shapley Additive Explanations) is a method for explaining the prediction results of machine learning models.It is based on the concept of Shapley values in game theory and explains the predictions of the model by calculating the contribution of each feature to the prediction results [56].The core idea of the SHAP method is to decompose the model prediction results into the contribution value of each feature and perform a weighted average of the contribution value of each feature to obtain the final prediction result [57].

Result of FVC Retrieval
Figure 5A,B, respectively, show the spatio-temporal distribution of forest and grassland FVC in the mountain areas of Huzhu County for six periods in 2022, using the TL method based on LSTM and 1D-CNN networks.The FVC in the mountain areas of the study region presents distinct seasonal variations from May to October, with FVC gradually decreasing with increasing altitude.Comparing the vegetation cover extraction results of the LSTM and 1D-CNN models, the FVC estimates for June, July, and August are nearly similar.However, the FVC retrieval results of the two models show significant differences during periods of sparse vegetation in May and October.
Remote Sens. 2023, 15, x FOR PEER REVIEW 9 of 22 gradually decreasing with increasing altitude.Comparing the vegetation cover extraction results of the LSTM and 1D-CNN models, the FVC estimates for June, July, and August are nearly similar.However, the FVC retrieval results of the two models show significant differences during periods of sparse vegetation in May and October.Figure 6 shows the temporal variation trend of the mean FVC time series inverted by the reference FVC, LSTM model, and 1D-CNN model.It can be observed that the FVC inversion results based on both models show a consistent trend with the actual reference FVC variation, with the LSTM method predicting FVC results that are closer to the reference FVC.Table 3 presents the FVC prediction accuracy with LSTM and 1D-CNN models in grassland and forest areas, obtained by randomly selecting 9590 points in the mountain region of the study area for prediction.The mean R 2 and RMSE values show that the performance of the LSTM model (grassland: R 2 = 0.9108, RMSE = 0.0714; forest area: R 2 = 0.8809, RMSE = 0.0581) is better than the 1DCNN model (grassland: R 2 = 0.8722, RMSE = 0.0806; forest area: R 2 = 0.847, RMSE = 0.065).Figure 7 are the scatter plots of the reference FVC vs. predicted FVC based on LSTM model (Figure 7a) and 1DCNN model (Figure 7b) from May to October 2022.The FVC retrieval accuracy of remote sensing image data of the model in June and July is higher, and the FVC retrieval accuracy difference between the two models is small.The FVC retrieval accuracy of the May image is the lowest, and the RMSE difference between the two models is approximately 0.03.Table 3 presents the FVC prediction accuracy with LSTM and 1D-CNN models in grassland and forest areas, obtained by randomly selecting 9590 points in the mountain region of the study area for prediction.The mean R 2 and RMSE values show that the performance of the LSTM model (grassland: R 2 = 0.9108, RMSE = 0.0714; forest area: R 2 = 0.8809, RMSE = 0.0581) is better than the 1DCNN model (grassland: R 2 = 0.8722, RMSE = 0.0806; forest area: R 2 = 0.847, RMSE = 0.065).Figure 7 are the scatter plots of the reference FVC vs. predicted FVC based on LSTM model (Figure 7a) and 1DCNN model (Figure 7b) from May to October 2022.The FVC retrieval accuracy of remote sensing image data of the model in June and July is higher, and the FVC retrieval accuracy difference between the two models is small.The FVC retrieval accuracy of the May image is the lowest, and the RMSE difference between the two models is approximately 0.03.

Importance Ranking of Features on FVC Retrieval
The importance ranking of features of different elevation gradients and months indicated that VIs and topographic factors had different impacts on FVC retrieval (Figure 8).
the increase of the altitude gradient.Among the VIs, NDVI and GNDVI are the most important, and their importance is greatest at an altitude of 3000-3500 m.Concerning topographic factors, the aspect has the most significant impact on vegetation coverage.The slope is of little importance below an altitude of 3000 m, but it has a significant impact on FVC extraction in high-altitude areas above 3000 m.
Figure 8c,d shows the importance of topographic factors and VIs on the FVC retrieval results in each month of 2022, which reveals that the importance of feature variables also shows a seasonal pattern like the vegetation growth cycle.The change in the importance of VIs is basically similar to vegetation growth within a year, while the importance of topographic factors has an opposite trend, which is more pronounced in forest areas than grassland.

Model Performance
The method, which uses PROSAIL and other RT models to establish the correspondence between Earth surface vegetation parameters and spectral features, makes up for the lack of high-precision samples in ML models, and it has begun to attract widespread attention in the quantitative retrieval of vegetation parameters [58].In this paper, the FVC samples generated by the coupling model based on PROSAIL are validated with ground Under different altitude gradients, the four most important features in grassland are NDVI, aspect, GNDVI, and slope (Figure 8a), while in forest areas, the four most important features are aspect, NDVI, GNVDI, and slope (Figure 8b).MSAVI, EVI, and DVI are ranked as the least important features in both vegetation types.In grassland, NDVI and GNDVI exhibit a similar trend in importance, significantly influencing FVC prediction results.However, their importance gradually diminishes above 3000 m.The importance of slope increases with elevation, while the importance of aspect decreases first but then rises with increasing elevation.
Regarding the forest area, the importance of each feature showed a similar trend with the increase of the altitude gradient.Among the VIs, NDVI and GNDVI are the most important, and their importance is greatest at an altitude of 3000-3500 m.Concerning topographic factors, the aspect has the most significant impact on vegetation coverage.The slope is of little importance below an altitude of 3000 m, but it has a significant impact on FVC extraction in high-altitude areas above 3000 m.
Figure 8c,d shows the importance of topographic factors and VIs on the FVC retrieval results in each month of 2022, which reveals that the importance of feature variables also shows a seasonal pattern like the vegetation growth cycle.The change in the importance of VIs is basically similar to vegetation growth within a year, while the importance of topographic factors has an opposite trend, which is more pronounced in forest areas than grassland.

Model Performance
The method, which uses PROSAIL and other RT models to establish the correspondence between Earth surface vegetation parameters and spectral features, makes up for the lack of high-precision samples in ML models, and it has begun to attract widespread attention in the quantitative retrieval of vegetation parameters [58].In this paper, the FVC samples generated by the coupling model based on PROSAIL are validated with ground data, and their validation accuracy is compared with the accuracy of the high-resolution FVC samples generated by the dichotomy method.
Figure 9 displays scatter plots comparing the accuracy of FVC samples generated through the proposed approach and the dichotomy model.In Figure 9a, the fitting line between the FVC samples generated by the proposed approach and the site FVC is closer to a 1:1 line (R 2 = 0.7536, RMSE = 0.0596, Bias = −0.0308).It indicates that the accuracy of the proposed approach is better than that of the dichotomy method (R 2 = 0.7536, RMSE = 0.0596, Bias = −0.0308),as shown in Figure 9b.These results demonstrate that the sample accuracy generated by the sample generator, which is based on PROSAIL simulation FVC, is guaranteed.It can provide high-precision training samples for the model in this paper.
Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 22 data, and their validation accuracy is compared with the accuracy of the high-resolution FVC samples generated by the dichotomy method.Figure 9 displays scatter plots comparing the accuracy of FVC samples generated through the proposed approach and the dichotomy model.In Figure 9a, the fitting line between the FVC samples generated by the proposed approach and the site FVC is closer to a 1:1 line (R 2 = 0.7536, RMSE = 0.0596, Bias = −0.0308).It indicates that the accuracy of the proposed approach is better than that of the dichotomy method (R 2 = 0.7536, RMSE = 0.0596, Bias = −0.0308),as shown in Figure 9b.These results demonstrate that the sample accuracy generated by the sample generator, which is based on PROSAIL simulation FVC, is guaranteed.It can provide high-precision training samples for the model in this paper.By utilizing numerous simulated FVC samples and considering the mountain VIs and topographic factors such as slope and aspect, a pre-trained LSTM and 1DCNN network was used to create an FVC estimation model.This model was applied to multi-temporal HJ2-A/B remote sensing data.Upon analyzing the estimation results for forest and grass coverage, it was found that both LSTM and 1DCNN methods demonstrated high accuracy in FVC estimation for mountain areas.The model performed the best during the period from June to August when vegetation growth was at its highest.The complex structure of the LSTM network allowed it to extract interdependence between different time steps from time series data, giving it an advantage over the 1DCNN method in estimation accuracy.However, the 1DCNN method uses convolution and pooling operations that reduce computational training time compared to the LSTM method.
The model displayed the lowest RMSE in May; however, some sample points with low FVC are significantly overestimated.Based on the image, it can be observed that there is snow covering a significant portion of the area where the DEM exceeds 3500 m.This snow cover negatively impacts the distribution of sample points, which in turn affects the model's ability to feature in the snow-covered region and causes a deviation in estimating the FVC.To enhance the model's performance in future studies, it would be beneficial to incorporate additional variables such as temperature and precipitation.

Influence of Topographic Features on FVC Retrieval
The vegetation in high-altitude mountain regions is sensitive to various factors such as temperature, precipitation, soil, and illumination [59][60][61].These factors are distributed differently across regions, which is indirectly affected by topographical features.The By utilizing numerous simulated FVC samples and considering the mountain VIs and topographic factors such as slope and aspect, a pre-trained LSTM and 1DCNN network was used to create an FVC estimation model.This model was applied to multi-temporal HJ2-A/B remote sensing data.Upon analyzing the estimation results for forest and grass coverage, it was found that both LSTM and 1DCNN methods demonstrated high accuracy in FVC estimation for mountain areas.The model performed the best during the period from June to August when vegetation growth was at its highest.The complex structure of the LSTM network allowed it to extract interdependence between different time steps from time series data, giving it an advantage over the 1DCNN method in estimation accuracy.However, the 1DCNN method uses convolution and pooling operations that reduce computational training time compared to the LSTM method.
The model displayed the lowest RMSE in May; however, some sample points with low FVC are significantly overestimated.Based on the image, it can be observed that there is snow covering a significant portion of the area where the DEM exceeds 3500 m.This snow cover negatively impacts the distribution of sample points, which in turn affects the model's ability to feature in the snow-covered region and causes a deviation in estimating the FVC.To enhance the model's performance in future studies, it would be beneficial to incorporate additional variables such as temperature and precipitation.

Influence of Topographic Features on FVC Retrieval
The vegetation in high-altitude mountain regions is sensitive to various factors such as temperature, precipitation, soil, and illumination [59][60][61].These factors are distributed differently across regions, which is indirectly affected by topographical features.The spatio-temporal statistical analysis of forest and grassland was carried out according to the classification standard of Table 4.The result showed that the importance of topographical features varies with altitude and seasonality in mountain regions.Moreover, the contributions of these features to the estimation of FVC for forest and grassland are not the same.
In general, slope significantly impacts the distribution of soil moisture and nutrients [62].Figure 10a,b illustrate the distribution ratio of forest and grassland samples across various slope gradients.In regions below an altitude of 3000 m, the topography is generally characterized by gentle slopes with minimal undulations, resulting in effective water and nutrient retention in the soil.The slope gradient has a negligible effect on the FVC of forest and grassland in such areas.Shen [63] and De Castilho [64] also reported that topography has little influence on vegetation parameters, and biomass estimation results in regions with low topographic relief.Above 3000 m, the topographic undulation becomes more prominent with increasing altitude, resulting in reduced soil and water conservation capacity and a significant increase in the importance of slope gradient on the FVC of forests and grassland [61].Furthermore, the impact of slope gradient on grassland is typically greater than that on forest land.This is attributed to the shallower root systems of grassland, which makes them less reliant on soil and, hence, more susceptible to slope gradients.August), the temperature and precipitation in mountain regions are abundant, resulting in luxuriant vegetation.The VI has a direct influence on vegetation coverage, thus diminishing the importance of topographic factors.However, in September, the seasonal effect causes vegetation to become sparse, and it tends to thrive on sunlit slopes with adequate warmth and moisture [71], thereby gradually increasing the importance of topographic factors.In various altitude gradients, aspect plays a significant role in determining forest and grass coverage.Aspect can influence the distribution and duration of solar radiation in mountain regions, leading to differences in FVC between sun and shadow slopes [62,65].Figure 11 displays the average FVC distribution for eight orientations of forest and grassland between May and October 2022.At elevations below 3000 m, the vegetation cover on the shadow slopes (NE, NW) of the forested area is slightly higher than that on the sun slopes (SE, SW) (Figure 11b).Since the study area is located in the semi-arid area of the northern hemisphere, relatively high temperatures and dry microclimates tend to form on sun slopes [66,67], which promotes vegetation transpiration and soil water evaporation and inhibits the growth of vegetation on sunny slopes.However, the distribution of FVC on grassland slopes is contrary to that of forested slopes (Figure 11a).This is due to the dominance of forests as the vegetation cover type below 3000 m in the mountain area, which shades the low grassland and captures more sunlight than the grassland [68,69], thus inhibiting the vegetation cover of grassland.At elevations above 3000 m, the decline in temperature causes both forests and grasses to develop more on the sun slopes.With increasing elevation, alpine meadow gradually becomes the main vegetation cover type, which is less impacted by forest, and the tendency to grow on the sun slopes becomes more prominent.Moreover, vegetation growth can impact the importance of topographic features.Figures 10c,d and 12 show highly similar trends in the variation of FVC and VI.The VI serves as a responsive indicator of climate change [5,70].During the growing season (May to August), the temperature and precipitation in mountain regions are abundant, resulting in luxuriant vegetation.The VI has a direct influence on vegetation coverage, thus diminishing the importance of topographic factors.However, in September, the seasonal effect causes vegetation to become sparse, and it tends to thrive on sunlit slopes with adequate warmth and moisture [71], thereby gradually increasing the importance of topographic factors.

Influence of Pre-Training Sample Size
The pre-training of TL typically requires a significant amount of data, which directly affects the accuracy of quantitative retrieval of vegetation parameters.Although a large volume of training data can improve estimation accuracy, excessive data can result in issues such as information redundancy and prolonged training time, which can compromise both the accuracy and efficiency of the model [35].In order to address these problems, using a random sampling method, Sentinel-2 time series datasets with varying data volumes

Influence of Pre-Training Sample Size
The pre-training of TL typically requires a significant amount of data, which directly affects the accuracy of quantitative retrieval of vegetation parameters.Although a large volume of training data can improve estimation accuracy, excessive data can result in issues such as information redundancy and prolonged training time, which can compromise both the accuracy and efficiency of the model [35].In order to address these problems, using a random sampling method, Sentinel-2 time series datasets with varying data volumes (10%, 20%, 30%, …, and 100%) were utilized as input for pre-training the LSTM and 1D-CNN models.Subsequently, the model network weight parameters were finetuned using HJ-2A/B feature variables and corresponding FVC labels.
Figure 13 illustrates the trend of the RMSE of the FVC prediction results with varying pre-training sample sizes of the input model.Table 5 shows the FVC estimation accuracy of LSTM and 1DCNN models in all vegetation, grassland and forest areas under different pre-training sample sizes.The RMSE of FVC prediction for HJ-2A/B images in all phases was averaged for the study area.The FVC prediction accuracy improved as the pre-training sample size increased, with a sufficient sample size allowing for more comprehensive model training.Importantly, when the pre-training sample size exceeded 60% of the total sample size (approximately 90,000 sample points), the RMSE of FVC prediction no longer exhibited significant changes and gradually stabilized, indicating that the model had been fully trained.

Implications and Limitations
We propose an FVC estimation method based on a spatio-temporal transfer learning (TL) model, which utilizes simulated high-precision samples to pre-train the neural network and enables the estimation of FVC from high-resolution time series images in mountain regions.Both the LSTM and 1D-CNN models demonstrated high accuracy in extracting FVC of forest and grassland with varying topographic gradients in mountain regions, exhibiting strong cross-temporal sequence transferability.This transfer learning approach requires only a small number of samples for fine-tuning the pre-trained models, enabling the extraction of FVC from mountain remote sensing images in different temporal and spatial dimensions.The method can be further extended to extract high-precision vegetation parameters from larger regional scales.
The method proposed in this paper exhibits certain limitations.While it can provide the model with a large number of training samples, the generation of a considerable amount of simulated samples can result in data redundancy, which in turn may adversely impact the efficiency of model training and prediction [72].Therefore, determining an appropriate sample size is a key research question that warrants further attention.Furthermore, due to geographic and seasonal constraints, this study examined only the peak period of vegetation growth (May to October).To extend the applicability of the model to other regions, a more comprehensive investigation of annual vegetation change patterns and consideration of various factors such as climate are imperative for fully unleashing the model's potential.

Conclusions
In this study, we propose a novel method for estimating the FVC of forests and grassland in mountain areas based on spatio-temporal TL to address the issues of low sample accuracy and insufficient representation in the current FVC retrieval models.This method first combines the simulated FVC data generated by the PROSAIL model, the VI computed using the Sentinel-2 time series images, and the topographic data of the study area to establish a high-precision sample library in mountain areas.The samples are then divided by altitude and vegetation type and pre-trained using the LSTM and 1D-CNN models.Subsequently, the vegetation index and topographic data corresponding to the HJ2-A/B time series remote sensing images in 2022 are used to fine-tune the pre-training models, enabling the temporal and spatial transfer of images in mountain areas.
Based on the results, the simulated FVC samples demonstrate high accuracy.The LSTM and 1D-CNN models exhibit exceptional performance in estimating FVC from mountain image data with six temporal phases in 2022.However, due to their distinct structural characteristics, the LSTM model outperforms in classification accuracy but is more time-consuming during the retrieval process.The models achieve the highest prediction accuracy from June to August when mountain vegetation exhibits vigorous growth and minimal changes, enabling effective extraction of sample features.Conversely, from May to June, forest and grass coverage undergo rapid changes, particularly highaltitude snowline fluctuations, resulting in noticeable differences in the data between the two temporal phases.The deviation of the model extraction features leads to relatively large errors in FVC retrieval.Therefore, future studies could enhance the model by incorporating factors such as climate, snow cover, and precipitation to adapt to seasonal changes.
Topographic features are of great significance to the FVC retrieval in mountain areas.The analysis of the importance of feature variables demonstrates that slope and aspect are more crucial in areas with higher altitudes and relatively lower FVC.Furthermore, topographic factors have a greater impact on the FVC retrieval results during the period of vegetation dormancy or entering the dormancy period (May and October) in a year.Due to the differences between forest and grassland in mountain areas, the importance of topographic factors in the FVC retrieval of the two types of plants is different.The above conclusions explain why topographic factors need to be estimated in the retrieval of mountain FVC.
A subset of the pre-training samples can ensure the efficacy of the proposed method, indicating that the LSTM and 1D-CNN models can achieve stable performance using 90,000 pre-training samples in the study region.Notably, the LSTM-based model exhibits significantly superior predictive accuracy compared to the 1D-CNN-based model.
In summary, the proposed approach in this paper is advantageous in precisely estimating FVC in mountain areas that exhibit intricate climate patterns, substantial topographic fluctuations, and limited access to high-precision samples.This method enables the dynamic monitoring of mountain FVC in high-resolution remote sensing images and has the potential to expand its range, thereby furnishing more comprehensive data support for protecting vegetation in the fragile ecological regions of plateau mountains.
: (1) We established a sample classification system based on the characteristics of mountain areas, considering the type of surface features and the characteristics of altitude gradients.(2) Based on the high-resolution remote sensing data, we established the FVC sample by using the PROSAIL model.(3) We Used FVC samples to train 1DCNN and LSTM models.(4) We fine-tuned the pre-trained model and obtained the distribution of FVC.

Figure 1 .
Figure 1.Flowchart of the mountain FVC retrieval based on TL methods.

Figure 1 .
Figure 1.Flowchart of the mountain FVC retrieval based on TL methods.

Figure 2 .
Figure 2. The remote sensing image (a), DEM map (b), and land cover map (c) of Huzhu County, Qinghai Province, China.

Figure 2 .
Figure 2. The remote sensing image (a), DEM map (b), and land cover map (c) of Huzhu County, Qinghai Province, China.

Figure 3 .
Figure 3.The geographic location of Huzhu measurement area.(a) The green boxes are FVC measurement area.(b) The field photograph of the FVC measurement sample point.(c) The yellow dots are the site FVC points.

Figure 3 .
Figure 3.The geographic location of Huzhu measurement area.(a) The green boxes are FVC measurement area.(b) The field photograph of the FVC measurement sample point.(c) The yellow dots are the site FVC points.
Remote Sens. 2023, 15, x FOR PEER REVIEW 7 of 22randomly to prevent overfitting.The time series data of VI and topographic factors serve as input for the model, and the predicted value of FVC is outputted.

Figure 5 .
Figure 5. Spatial and temporal distribution pattern of estimated FVC in mountain area in Huzhu County from May to October 2022 based on LSTM (A) and 1DCNN (B) models.

Figure 5 .
Figure 5. Spatial and temporal distribution pattern of estimated FVC in mountain area in Huzhu County from May to October 2022 based on LSTM (A) and 1DCNN (B) models.

Figure 6
Figure6shows the temporal variation trend of the mean FVC time series inverted by the reference FVC, LSTM model, and 1D-CNN model.It can be observed that the FVC inversion results based on both models show a consistent trend with the actual reference FVC variation, with the LSTM method predicting FVC results that are closer to the reference FVC.

Figure 6 .
Figure 6.The FVC spatio-temporal variation charts of alpine meadow (a,b) and forest area (c,d) from May to October in 2019-2022.(b,d), respectively, show the predicted FVC of the LSTM model and 1DCNN model in the test set (a,b) and the change of the reference FVC.

Figure 6 .
Figure 6.The FVC spatio-temporal variation charts of alpine meadow (a,b) and forest area (c,d) from May to October in 2019-2022.(b,d), respectively, show the predicted FVC of the LSTM model and 1DCNN model in the test set (a,b) and the change of the reference FVC.

Figure 7 .
Figure 7.The scatter plots of the reference FVC vs. predicted FVC based on LSTM model (a) and 1DCNN model (b), respectively.A total of 9590 random points were generated in study area, and linear regression method was used to fit the reference FVC and predicted FVC.

Figure 7 .
Figure 7.The scatter plots of the reference FVC vs. predicted FVC based on LSTM model (a) and 1DCNN model (b), respectively.A total of 9590 random points were generated in study area, and linear regression method was used to fit the reference FVC and predicted FVC.

Figure 8 .
Figure 8. Importance ranking of the features in FVC retrieval for grassland and forest areas within different elevation areas (a,b) and bar plot of importance of features for grassland and forest areas from May to October in 2022 (c,d).

Figure 8 .
Figure 8. Importance ranking of the features in FVC retrieval for grassland and forest areas within different elevation areas (a,b) and bar plot of importance of features for grassland and forest areas from May to October in 2022 (c,d).

Figure 9 .
Figure 9.The scatter plots of site FVC vs. predicted FVC (training samples) with PROSAIL method (a) and dichotomy method (b).The black diagonal is the 1:1 line, and the red line is the linear regression fitting line of the scatter point.

Figure 9 .
Figure 9.The scatter plots of site FVC vs. predicted FVC (training samples) with PROSAIL method (a) and dichotomy method (b).The black diagonal is the 1:1 line, and the red line is the linear regression fitting line of the scatter point.

Figure 10 .
Figure 10.(a,b) are the plot charts of sample number ratio of grassland and forest area sample points in different slope ranges in each altitude region.Figure (c,d) are plot charts of the monthly mean FVC changes of grassland and forest area in each altitude region.

Figure 10 .
Figure 10.(a,b) are the plot charts of sample number ratio of grassland and forest area sample points in different slope ranges in each altitude region.Figure (c,d) are plot charts of the monthly mean FVC changes of grassland and forest area in each altitude region.

Figure 10 .
Figure 10.(a,b) are the plot charts of sample number ratio of grassland and forest area sample points in different slope ranges in each altitude region.Figure (c,d) are plot charts of the monthly mean FVC changes of grassland and forest area in each altitude region.

Figure 11 .
Figure 11.Polar coordinates of FVC distribution of forest area (a) and grassland (b) in different aspects from May to October 2022.

Figure 11 .
Figure 11.Polar coordinates of FVC distribution of grassland (a) and forest area (b) in different aspects from May to October 2022.

Figure 12 .
Figure 12.Monthly trend chart of forest (a) and grassland (b) vegetation index with increasing altitude.In the figure, the horizontal axis represents the value of the vegetation index, and the vertical axis represents the DEM.The value of the vegetation index is the average of the vegetation index values of all sample points within each interval when the DEM is divided into 100 m intervals.The black dotted line X = 3000 is the dividing line of vegetation index change with altitude.

Figure 12 .
Figure 12.Monthly trend chart of grassland (a) and forest (b) vegetation index with increasing altitude.In the figure, the horizontal axis represents the value of the vegetation index, and the vertical axis represents the DEM.The value of the vegetation index is the average of the vegetation index values of all sample points within each interval when the DEM is divided into 100 m intervals.The black dotted line X = 3000 is the dividing line of vegetation index change with altitude.

Figure 13 22 Figure 13 .
Figure 13 illustrates the trend of the RMSE of the FVC prediction results with varying pre-training sample sizes of the input model.Table 5 shows the FVC estimation accuracy of LSTM and 1DCNN models in all vegetation, grassland and forest areas under different pre-training sample sizes.The RMSE of FVC prediction for HJ-2A/B images in all phases was averaged for the study area.The FVC prediction accuracy improved as the pre-training sample size increased, with a sufficient sample size allowing for more comprehensive model training.Importantly, when the pre-training sample size exceeded 60% of the total sample size (approximately 90,000 sample points), the RMSE of FVC prediction no longer exhibited significant changes and gradually stabilized, indicating that the model had been fully trained.Remote Sens. 2023, 15, x FOR PEER REVIEW 17 of 22

Figure 13 .
Figure 13.Mean RMSE of LSTM and 1D-CNN method on HJ-2A/B data under different sample sizes of pre-training dataset.

Table 1 .
Input parameters in the PROSAIL.

Table 2
displays the sample points of Table 2 displays the sample points of different types of vegetation at various altitudes.To facilitate pre-training, the sample set is divided into a training set and a validation set, with the training set comprising 70% of the samples and the validation set containing the remaining 30%.

Table 2 .
The number distribution of pre-training samples in grassland and forest areas.

Table 3 .
The R 2 and RMSE of FVC retrieval based on the LSTM model and 1DCNN model in grassland and forest, respectively.

Table 3 .
The R 2 and RMSE of FVC retrieval based on the LSTM model and 1DCNN model in grassland and forest, respectively.

Table 4 .
Classification of topographic variables in the study area.

Table 5 .
Retrieval accuracy of FVC of LSTM and 1DCNN models in all vegetation, grassland and forest areas under different pre-training sample sizes.The values in the table are the average of RMSE based on the FVC retrieval results of HJ-2A/B image data in 2022.

Table 5 .
Retrieval accuracy of FVC of LSTM and 1DCNN models in all vegetation, grassland and forest areas under different pre-training sample sizes.The values in the table are the average of RMSE based on the FVC retrieval results of HJ-2A/B image data in 2022.