3.4.1. Evaluation Criteria
In this study, the random forest model, PT-Former model, and
kNDVI-PT-Former model were used to build and compare the models of the data on vegetation change monitoring in grassland desertification ecological restoration. By inputting the dataset of vegetation change characteristics, the results of monitoring vegetation change by the above three models are obtained. Subsequently, the accuracy of the above models is evaluated. The semantic segmentation evaluation index is mainly used to measure the classification accuracy of each pixel in the image and the accuracy of the segmentation area. Common metrics include Pixel Accuracy (
PA), Class Pixel Accuracy (
CPA), Mean Pixel Accuracy (
MPA), Intersection over Union (
IoU), Mean IoU (
mIoU), etc. [
31,
32].
This study mainly uses the CPA and the IoU to evaluate the accuracy of the model.
Class Pixel Accuracy is a commonly used evaluation index in semantic segmentation tasks. It measures the accuracy of the model’s pixel-level prediction for each category. It refers to the proportion of pixels in each pixel of the image that the model correctly classifies as a specific category of pixels. The formula is as follows.
The following are used in the formula.
TPi: The model correctly predicts the number of pixels of class
i.
FPi: The model incorrectly predicts the number of background pixels of class i.
FNi: The model incorrectly predicts the number of class i pixels as the background class.
- 2.
IoU
Intersection over Union is a commonly used evaluation index in semantic segmentation, which measures the degree of overlap between prediction results and real annotations. The higher the
IoU value, the more accurate the prediction results of the model. The formula is as follows.
The following are used in the formula.
TP (True Positive): Correctly predict the number of pixels for the target class.
TN (True Negative): Correctly predict the number of pixels for the background class.
FP (False Positive): False prediction is the number of background pixels of the target class.
FN (False Negative): Error prediction is the number of target pixels of the background class.
- 3.
Loss Function
In machine learning and deep learning, the loss function (or cost function) quantifies the discrepancy between the predicted output and the true target. It serves as the optimization objective during model training, guiding parameter updates via gradient descent or other optimization algorithms. The formulation of the hybrid loss function is given as follows:
where
λ is the coefficient defined in the hybrid loss function.
To explain the concepts of BCE loss and Dice loss, we consider the predicted change map
and the reference map
Y as sets of pixels, represented, respectively, as
= {
,
i = 1, 2, …,
N} and
Y = {
,
i = 1, 2, …,
N}. Here,
denotes the probability of change in the
ith pixel, while
represents the reference value of the
ith pixel. In this context, a value of 0 indicates an unchanged pixel, whereas a value of 1 signifies a changed pixel. The total count of pixels in the change map is denoted by
N. The combined objective function of these two loss functions can be formulated as
- 4.
Cross-Validation
Cross-validation is a common method for evaluating the generalization ability and stability of a model. The core idea is to repeatedly divide the dataset into the training set and the validation set, and avoid the bias caused by a single division through multiple rounds of training. This study uses a 5-fold cross-validation method:
The preprocessed remote sensing dataset (including images and labels) is randomly divided into 5 subsets (Subset 1–5) of similar size, ensuring sample independence and uniform distribution across subsets.
Note: If the data contains spatio-temporal correlation (e.g., multi-temporal images of the same region), spatial stratified sampling is used to avoid allocating samples from the same region to different subsets.
- (2)
Multi-Round Training and Validation
Round 1: Train the model using Subsets 2–5, validate with Subset 1, and record validation metrics (e.g., overall accuracy CPA, F1-score).
Round 2: Train with Subsets 1, 3–5, validate with Subset 2, and repeat metric recording.
This process is repeated for 5 rounds to obtain 5 sets of metric results.
- (3)
Result Analysis
Calculate the mean (to measure the model’s average performance) and standard deviation (to measure stability) of the metrics.
3.4.5. Comparative Analysis of Models
The random forest model, PT-Former model and
kNDVI-PT-Former model were verified and analyzed by the vegetation change monitoring dataset of grassland desertification ecological restoration in the study area of Wuzhumuqin. Random forest was chosen as a representative machine learning method to establish a baseline, while PT-Former—sharing the Transformer architecture—allows direct comparison of
kNDVI feature integration. This highlights the synergistic effect of spectral indices (
kNDVI) and spatio-temporal modeling.
Figure 15 and
Table 8 are the accuracy comparison of the three models. The accuracy of the
kNDVI-PT-Former model is the best in these models, followed by the PT-Former model, and the performance of the random forest model is relatively poor. Therefore, the PT-Former model based on deep learning and the
kNDVI-PT-Former model can extract complex features from multi-spectral remote sensing data more accurately than the random forest model based on machine learning. In particular, the
kNDVI-PT-Former model can effectively identify small vegetation changes, and the overall model monitoring accuracy is improved by 11%. Therefore, the use of the
kNDVI-PT-Former model for vegetation change monitoring in complex environments has significant advantages and more accurate monitoring results.
Combined with the area and types of grassland desertification ecological restoration projects of Xiwuzhumuqin Banner Forestry and Grassland Bureau over the years, field investigation and verification were carried out in mid-November 2024, and some ecological restoration projects were selected for monitoring and verification in this study area every year. In the study area, the local relevant departments carried out fencing and artificial grass ecological restoration measures on 2.13 km
2 of sandy land at the beginning of 2021. As shown in
Figure 16, the restoration vegetation is growing well and the ecological restoration effect is remarkable.
At the beginning of 2022, fencing and artificial afforestation ecological restoration measures were carried out on 1.33 km
2 of sandy land. As shown in
Figure 17, the vegetation gradually grew and the ecological restoration effect was obvious.
At the beginning of 2023, artificial afforestation ecological restoration measures were carried out on 0.691 km
2 of sandy land. As shown in
Figure 18, due to the short vegetation, it is difficult to identify the existence of vegetation through remote sensing images, and only part of the vegetation can be monitored.
At the beginning of 2024, artificial afforestation ecological restoration measures were carried out on 0.4 km
2 sandy land, as shown in
Figure 19. The vegetation restored by 2024 is shorter and smaller, and it is difficult to identify vegetation in remote sensing images. Only a small part of vegetation can be monitored. Therefore, it is difficult to monitor the presence of vegetation within 2 years after ecological restoration in grassland desertification areas, especially the use of vegetation such as Caragana korshinskii and saplings for ecological restoration, and the effect is not obvious in the short term.
Taking the result map of vegetation monitoring change in the study area output by the
kNDVI-PT-Former model as an example, as shown in
Figure 20, the area of ecological restoration projects in recent years is marked in the map, and then the monitoring effect maps of each model in the ecological restoration project area are extracted, respectively, and then the monitoring effect of each model is verified by comparative analysis. Among them, the yellow box indicates the ecological restoration project area in 2021; the blue box indicates the ecological restoration project area in 2022; the purple box and orange box represent the ecological restoration project area in 2023 (only the restoration measures are different); and the light blue box indicates the ecological restoration project area in 2024.
The three model monitoring results of the ecological restoration area in 2021 are extracted, respectively, as shown in
Figure 21. (a) is the result of the random forest model output, (b) is the result of the PT-Former model output, and (c) is the result of the
kNDVI-PT-Former model output. Comparing the three maps, it can be seen that the
kNDVI-PT-Former model has the best monitoring effect.
The three model monitoring results of the ecological restoration area in 2022 are extracted, respectively, as shown in
Figure 22. (a) is the result of the random forest model output, (b) is the result of the PT-Former model output, and (c) is the result of the
kNDVI-PT-Former model output. Comparing the three maps, it can be seen that the
kNDVI-PT-Former model has the best monitoring effect.
The areas repaired in 2023 and 2024 are no longer compared due to the relatively small amount of vegetation that can be monitored. The time series map maintains the same geographical range, projections, and color schemes to ensure spatial comparability. The coordinate system for all the above model output figures is WGS84 (EPSG: 4326). The scale of
Figure 9,
Figure 10,
Figure 11,
Figure 12,
Figure 13,
Figure 14,
Figure 15 and
Figure 20 is 1:200,000, while the scale of
Figure 21 and
Figure 22 is 1:20,000.
Through comparative analysis with random forest and PT-Former, the kNDVI-PT-Former model, based on an advanced deep learning architecture, integrates time-series analysis and spatial feature extraction techniques. It can break through the limitations of traditional monitoring methods and achieve high-precision dynamic monitoring of vegetation for grassland and sandy land ecological restoration. In practical applications, this model, with its high-resolution remote sensing image processing capability, can cover vast grassland and sandy areas, quickly identify the vegetation changes at different succession stages, and accurately capture the tiny fluctuations in vegetation coverage and biomass, as well as the evolution trend of community structure. Whether it is the assessment of the vegetation restoration process in the ecological restoration project area or the early warning of potential desertification areas, the kNDVI-PT-Former model can provide detailed and reliable data support. This enables relevant departments and research institutions to promptly grasp the effectiveness of ecological restoration, scientifically adjust strategies for combating desertification, and rationally allocate human and material resources, laying a solid technical foundation for establishing a long-term ecological protection mechanism and promoting the sustainable development of grassland and sandy land ecosystems.
Therefore, in complex environments, the use of
kNDVI combined with the Transform model for vegetation change monitoring has significant advantages, which can capture long-term vegetation dynamics and support ecological monitoring and climate change research. In this paper, however, no comparative experiments were conducted using other models. In the future, it may be considered to apply more advanced models to vegetation change monitoring in ecological restoration to enhance model diversity. This study has achieved good results in the monitoring of vegetation change in grassland desertification ecological restoration, but it still needs to be refined and improved in the algorithm of extracting vegetation change characteristics in the network layer, so as to provide ideas for future research on grassland vegetation change monitoring. While the
kNDVI-PT-Former model demonstrates consistent performance improvements across cross-validation folds (e.g., 11%
CPA gain over PT-Former,
Section 3.4.5), the absence of formal statistical tests reflects the study’s focus on methodological innovation rather than exhaustive comparative inference. Meanwhile, although this study is based on single-phase data, cross-seasonal validation may enhance the model’s robustness. In future research, multi-seasonal imagery data—such as those collected during the spring green-up period and the autumn biomass peak period—will be integrated with correlation analysis of climatic factors to enhance the comprehensiveness and accuracy of monitoring. The study also plans to incorporate additional quantitative validation experiments and analyses, with the aim of strengthening the applicability and reliability of the monitoring model across diverse environments and conditions.
However, vegetation change monitoring via remote sensing imagery also has certain limitations. Key limitations include the following aspects: First, optical remote sensing data are significantly perturbed by cloud cover, atmospheric scattering, and climatic influences (e.g., precipitation events), which introduce noise into vegetation index calculations. Furthermore, the compromised temporal continuity of time-series datasets often limits the ability to capture rapid vegetation growth phases during critical phenological windows (e.g., spring budburst or monsoon-driven biomass accumulation). Most critically, low-spatial-resolution remote sensing data (e.g., >10 m) are increasingly inadequate for high-precision vegetation monitoring requirements, while high-resolution datasets (e.g., 0.5 m resolution) remain prohibitively expensive and lack universal accessibility for large-scale applications. Therefore, how to utilize remote sensing techniques for more efficient and precise vegetation change monitoring remains a research topic warranting in-depth investigation. Additionally, this study identifies several commonalities with prior research. First, the integration of multi-source data enriches the dataset and establishes a robust foundation for subsequent research. For example, Fan et al. (2024) achieved effective dynamic vegetation cover monitoring in rare earth mines by utilizing multi-source remote sensing data [
23]. Second, model comparisons highlight the superiority of specific frameworks. For instance, Li et al. (2024) evaluated four object detection models—classical SSD, RetinaNet, YOLOv3, and Faster R-CNN—for tree detection and found that the SSD model demonstrated the best balance between inference speed and performance on their dataset, positioning it as a powerful target detection solution [
33]. The unique contribution of this study lies in its innovative integration of vegetation indices into deep learning algorithms for vegetation change monitoring, a methodology that outperforms traditional models (random forest and PT-Former) in detection accuracy.