remote sensing Grassland Conservation Effectiveness of National Nature Reserves in Northern China

: Grasslands are crucial ecosystem biomes for breeding livestock and combatting climate change. By 2018, the national nature reserves (NNRs) in the Inner Mongolia Autonomous Region (IMAR) had constituted 8.55% of the land area. However, there is still a knowledge gap about their effectiveness in grasslands. Based on a multiyear time series of the growing season composite from 2000 to 2020, we proposed an effectiveness score to assess the effectiveness of the NNRs, using the 250 m MOD13Q1 NDVI data with Theil–Sen and Mann–Kendall trend analysis methods. We found the following: 22 of 30 NNRs were deemed effective in protecting the Inner Mongolian grasslands. The NNRs increased pixels with a sustainable trend 19.26% and 20.55% higher than the unprotected areas and the IMAR, respectively. The pixels with a CV NDVI < 0.1 (i.e., NDVI coefﬁcient of variation) in the NNRs increased >35.22% more than those in the unprotected areas and the IMAR. The NDVI changes within the NNRs showed that 63.64% of NNRs had a more signiﬁcant trend of greening than before the change point, which suggests a general greening in NNRs. We also found that the NNRs achieved heterogeneous effectiveness scores across protection types. Forest ecology protection and wildlife animal protection types are the most efﬁcient, whereas wildlife vegetation protection is the least effective type. This study enriches the understanding of grassland conservation and sheds light on the future direction of the sustainable management of NNRs.


Introduction
Grassland, as the largest terrestrial ecosystem in China, is deemed vital for natural resource reservation, carbon balance, peoples' livelihoods, and animal husbandry [1][2][3]. China has grassland coverage of 393 million hectares composed of major types of meadow steppes, typical steppes, desert steppes, and alpine steppes, which makes up 40% of China's megadiverse land area and 12% of the world's grasslands [4]. Grassland ecosystems provide crucial ecological functions for human society and ecosystem services. Specifically, regarding the benefits to human society, grassland ecosystems offer irreplaceable constituents for food safety based on the support of livestock production [5]. For ecosystem services, grassland ecosystems exert ecosystem functions, including but not limited to soil and water conservation, the soil organic carbon pool, air purification, biodiversity maintenance, and so on [6,7]. Despite the important role of grasslands in the ecosystem, several studies have declared that there is little attention to grassland conservation, although the threats still occur [8,9].
In recent decades, the grassland degradation problem has remained the most crucial ongoing ecological threat in China, and represents both an ecosystem threat and a human welfare crisis [10]. Grassland degradation refers to unfavorable changes in both vegetation and soil quality that contribute to undermining the height, coverage, biomass, and biodiversity of vegetation, and simultaneously cause the deterioration of soil nutrients as well as further soil erosion [11]. Due to the need of socioeconomic development, both climate change and human activities have exacerbated grassland degradation in recent decades [12,13]. Grassland degradation is especially severe in the Inner Mongolia Autonomous Region (IMAR), and almost 90% of the IMAR grassland suffers from degradation [14]. Substantial grassland degradation might induce both declining productivity and decreasing environmental carrying capacity [15].
To combat the ecological problems caused by grassland degradation, the Chinese government has launched two main measures with particular emphasis on grassland conservation, i.e., large-scale ecological engineering projects (EEPs) and protected areas (PAs). EEPs such as the Grain to Green Program (GTGP), the Beijing-Tianjin Sand Source Control Program (BSSCP), and the Natural Forest Conservation Program (NFCP) were launched coherently from the late 1990s to 2000s [16][17][18]. Other successive EEPs, such as the Subsidy and Incentive for Grassland Conservation (SISGC), which was launched in 2011, were also targeted at severe grassland degradation areas, similar to the aforementioned projects [19]. Apart from constructing EEPs, PAs are also renowned as dominant ecological restoration practices combatting grassland degradation. PAs are areas with clearly defined geographic boundaries that are recognized and scientifically managed by legal or other effective means for the long-term conservation of nature and associated ecosystem services and cultural values [20]. Compared with the more project-based EEPs [21], establishing protected areas is more inclined to be part of an area-based, self-supporting system [22,23]. As one type of PA, national nature reserves (NNRs) have the strictest protection efforts and all Chinese NNRs are included in the Priority Areas for Biodiversity Conservation (PABC) [24,25]. Therefore, it is crucial to understand whether NNRs can sufficiently conserve grassland ecosystems and what further knowledge is needed for policy-making.
Remote sensing observations, coordinated with a geographic information system, are now the primary tool for monitoring vegetation changes. Remote sensing-based indicators such as the normalized difference vegetation index (NDVI), fractional vegetation cover (FVC), and net primary productivity (NPP) derived from remote sensing data, are widely used for monitoring grassland ecosystems [19,26,27]. Among all indicators, the effectiveness of ecological processes has been assessed using the spatiotemporal NDVI changes as one of the most cost-effective proxies in many studies [28][29][30]. As the most widely used single indicator for characterizing vegetation conditions, the NDVI can be used to represent vegetated regions and differentiate non-vegetated areas, as computed by the normalized interpretation of the near-infrared to red ratio [31]. With regard to the effectiveness of vegetation restoration as well as the ecological monitoring within PAs, studies have increasingly used the NDVI as a proxy to analyze the ecosystem dynamics within PAs [32][33][34][35].
Although an increasing number of studies have been conducted to analyze PA effectiveness, to the best of our knowledge, few studies have considered the effectiveness of NNRs on grasslands in the IMAR [36]. It is no wonder that assessing the effectiveness of NNRs is still the bedrock for directing adaptive PA management [37], especially for understanding the strict NNR management of ecological barriers such as the IMAR. Given the above, this study's overarching goal can be divided into the following components: (1) Are the NNRs in the IMAR effectively enhancing greenness at the provincial level as well as in the unprotected areas based on growing season imagery? We would expect that the NDVI trend would become greener or possibly more stable within the NNRs. (2) Are the NNRs in the IMAR significantly buffering the variations in the grasslands compared with IMAR and unprotected areas? We would expect that the NDVI trend would show less variability within the NNR bounds. (3) Do the NNRs in the IMAR show heterogeneous effectiveness with the different protection types? We would expect that the effectiveness of NNRs varies among the different protection types; if so, which type is more effective?

Study Area
The study areas are the NNRs, which are located in the IMAR and are situated in the climatic transition zone between the semiarid and arid environments of northern China with an annual precipitation of 340 mm [38]. With a longitude that stretches from 97 • 10 E to 126 • 09 E, together with a latitude that ranges from 37 • 24 N to 53 • 20 N, the IMAR has diversified land cover types under a total area of 1.18 million km 2 due to both climatic and anthropogenic factors [39]. In this study, we categorized the land cover types into seven major types according to Table 1 of Friedl et al. (2010) [40], which are shown in Figure 1 below. Figure 1 demonstrates the Land Use and Land Cover (LULC) map by 2001. The land cover of the IMAR is dominated by vast grasslands stretching from west to east, and although the forest is in the northeastern region, cropland is on the eastern side and unvegetated zones are on the western side [39].
By 2018, the NNRs accounted for 8.55% of the total area of the IMAR, with the distribution lying from west to east. In this study, a total of 30 NNRs were investigated to assess their effectiveness in protecting temperate grasslands (see Table 1

Remotely Sensed Datasets
MOD13Q1, archived on the Google Earth Engine cloud platform, has a 16-day NDVI composite with a spatial resolution of 250 m. MOD13Q1 was calculated using the maximum value composite method (MVC) with the MODIS surface reflectance data product [41]. We considered this dataset because previous studies have widely and successfully used MOD13Q1 data to analyze the vegetation dynamic related to protected areas [42]. In this study, the MOD13Q1 NDVI was obtained from the Google Earth Engine for the period from 2000 to 2020 in the growing season ranging from April to October. April to October was considered the growing season for the IMAR, as previous research has noted that the vegetation in the IMAR reinvigorates in April and languishes in October [43,44]. In order to assure the reliability of the pixels, we have used the summary QA band to filter the growing season images, leaving only the remote sensing images with confidence.
Throughout the growing season, the median value within each month was obtained, and the mean yearly NDVI was calculated to retrieve the grassland growth. Both the monthly median and yearly mean values were obtained on a per-pixel-based level using the Google Earth Engine cloud computing platform to mitigate the influence of abnormal climate events on vegetation dynamic detection [41]. Note that pixels with a mean annual NDVI less than 0.1 were excluded, as they represented sparse aboveground vegetation, before further analysis [45].
The land cover data used in this study can also be derived from the 500 m MCD12Q1 dataset from the open-access Google Earth Engine. The pixels that were consistent with unchanged grassland land cover during the study period were filtered and were assumed to be unchanged grassland without land cover change (LCC) with other types. Both the boundaries of the national nature reserves and administrative boundaries were derived from the Resource and Environment Science and Data Center, CAS.

Trend Analysis
The Theil-Sen estimator can describe the concrete magnitude of the monotonic trend obtained by the Mann-Kendall test [46,47]. Thus, the Theil-Sen estimator was combined with the Mann-Kendall test to indicate the per-pixel vegetation trend in previous studies [48][49][50]. Compared with the traditional linear regression analysis method, which is sensitive to outliers, Theil-Sen trend analysis is mainly used to obtain the median of the slope between N pairs of given series, exposing less sensitivity to the autocorrelation among different time scales. As a result, the interannual vegetation trend obtained is more reliable. The specific calculation of the Theil-Sen estimator can be calculated as follows: The Mann-Kendall test is a widely applicable method when utilized to diagnose the significance of the Theil-Sen detected trend for the duration of the given time series data. The main characteristic of the Mann-Kendall test is that it is a nonparametric detection method that is robust to outliers as well as permitting non-Gaussian distribution [51]. The null assumption of the Mann-Kendall test is that there is no monotonic trend in a given time series. If the significance test is passed, the change in grassland pixels in a given pixel in the time series shows a monotonous trend. If the significance test is not passed, it indicates that the trend is stable (i.e., no significant trend). In this study, Mann- , Ts < 0 (2) where Ts is the NDVI time series from 2000 to 2020 and can be calculated as follows: Var(Ts) = n(n − 1)(2n + 5) 18 (5) where NDVI j and NDVI i denote the values of pixels i and j, respectively. n denotes the length of the time series. In addition, sgn denotes the sign function. The Z value can range from (−∞, +∞), and the significance level in this study is considered p = 0.05; only when under the condition of |Z| > u 1−α/2 could the given pixel pass the Mann-Kendall test, showing a significant trend. Aside from the Theil-Sen estimator and its significance test, we were also interested in determining whether the grassland pixel trend would be retained in the future. In this study, we obtained the Hurst index at the per-pixel level to explore whether the trend of the given pixel would be sustained in the future. The Hurst index method was calculated using rescaled range analysis, and this method has proven to be applicable at both regional and domestic scales for forecasting future vegetation dynamics based on past time-series information [52,53]. The Hurst index can be calculated as follows [54]: First, slice the time series data |NDVI Then, define the mean series as follows: Calculate the cumulative deviation X(t,T), range R(T), and standard deviation S(T) as follows: The actual value of the Hurst index can be calculated using the fitting equation log(R/S)n = a + H × log(n) using the least-squares method. Its value range generally fluctuates from 0 to 1 and can be interpreted in three ways [55] to explain the dependence. If the Hurst value is within 0.5~1, it demonstrates that the trend of the given pixel from 2000 to 2020 is likely to be retained in the future. If the Hurst value is within 0~0.5, it demonstrates that the trend of the given pixel from 2000 to 2020 is likely to reverse in the future. Furthermore, the closer the absolute of the Hurst value is to 1, a more powerful dependence is returned based on the known series, whereas the closer the absolute of the Hurst value is to 0.5, less dependence power is used to forecast the future trend based on the known series.

Coefficient of Variation
The coefficient of variation (CV, CoV, or coefficient of variation) is a widely used metric to characterize the discreteness of vegetation NDVI time-series data and the fluctuation of interannual vegetation dynamics [49,56]. In this paper, the coefficient of variation was used to analyze the volatility of vegetation dynamics and its spatial distribution pattern at different scales, i.e., pixel-based level, NNR level, and provincial level. The coefficient of variation calculated in this paper can be calculated as follows: where CV NDVI represents the coefficient of variation of NDVI values in the growing season between 2000 and 2020, σNDVI represents the annual standard deviation of NDVI values, and NDVI represents the annual mean NDVI value. A larger CV NDVI indicates that the NDVI time series of the given pixel is more volatile and unstable, with a more discrete distribution, and vice versa. According to the categorization of fluctuations in previous research [50], five CV NDVI level-based categorizations were adopted. They were high fluctuations (CV NDVI ≥ 0.2), relatively high fluctuations (0.15 ≤ CV NDVI < 0.2), medium fluctuations (0.10 ≤ CV NDVI < 0.15), relatively low fluctuations (0.05 ≤ CV NDVI < 0.10), and low fluctuations (CV NDVI < 0.05).

Effectiveness Score Calculation
This study proposed a composite metric of the effectiveness score to conduct the assessment of Inner Mongolian NNRs' effectiveness in halting the grassland degradation. The effectiveness score can be calculated using three sub-metrics, i.e., the t test, the ecological change of NDVI slope, and CV NDVI before and after the change point. The submetrics were calculated using the following steps: Step 1: calculate the t test value to compare the NDVI mean within NNRs with their outside value.
The annual NDVI means for four ranges of buffer zones (i.e., 0-5 km, 6-10 km, 11-15 km, and 16-20 km buffers) were each compared with the values within the NNRs, taken using a t test. The t test value was used as the first sub-metric because t > 0 indicated that the given NNR had a relatively protective effect compared with outside buffer areas, but if t < 0, the annual mean of the NDVI outside the NNR was higher than inside, implying that the NNR did not have a significant protective effect on the grasslands and that there was greater grassland degradation. The range of the radius of the buffer zone referred to the literature by Knorn et al. and Kleemann et al. [57,58]. This step was to test whether the annual NDVI inside the NNR boundary significantly surpassed that outside the buffer regions; if so, grassland pixels within the NNR were in better growing condition than unprotected buffer areas at the 95% confidence level. Before conducting the t test, we delineated the buffer zones for each radius range and for each NNR to ensure that further calculation would only be assessed once and to eliminate the overlay from other NNR boundaries. The calculation formula is as follows.
where n is the number of sample observations, NDVI mean corresponds to the interannual NDVI mean of the NNRs and their buffer zones from 2000 to 2020, k is the number of controllable variables, and n−k−2 is the degree of freedom (df). A t test was passed if p < 0.05. As we used the buffers of different radius, the mean value t test of the four radius was also attained for further analysis and to describe the average difference level between NNR and NNR's outside, hereafter referred to as average t test outcome.
Step 2: Calculate the change point and its statistical significance using the Pettitt test. Specifically, the Pettitt test combined with the temporal trend of NDVI change has been proven to be successful in detecting pre-and post-ecological changes and thus in evaluating the effectiveness of ecological restoration [59]. We also used the Pettitt test to conduct change-point detection within the time series because we were curious about how many of the NNRs possessed a change point. Further, how many of the NNRs have seen ecological restoration after the change point, with their post-NDVI time-series change slope surpassing the pre-change point? How many of the NNRs became worse after the change point and experienced ecological degradation? In terms of the Pettitt test's features, it is a nonparametric test for detecting a change point in the time series. The method does not require the time series to meet a certain distribution and can detect the time and number of change points in the time series [60]. The principles and calculation items were described as follows: where K T is the change point of the time series, and the significance level of K T detection is set at p = 0.05. The null assumption is that the given series does not exhibit its change point. If p ≤ 0.05, the detected change point would be significant, and it returns to the year that the most likely change point could happen, as well as the location of K T in the time series. This part of the analysis was conducted with the assistance of the trend v1.1.4 package of R. K T was calculated as follows: U t,T and sgn(x t −x k ) can be calculated as follows: sgn The significance of the returned x k should be tested using the p value, calculated as follows: Step 3: Calculate the ecological change within the NNR boundary. The second and third sub-metrics were the pre-and post-ecological change comparisons within the NNRs of the NDVI slope and CV NDVI , respectively. According to the latest release of the "Criteria for Evaluating the Effectiveness of Ecological Protection of Nature Reserves (Trial) for Public Comments in 2020 (in Chinese)", we applied the net change pixel ratio using the significant greening pixel ratio within the given NNRs to differentiate the significant browning pixel ratio in sub-time series of both T 1 and T 2 . We considered the ecological change between the two time periods before and after the change point. Inspired by [21], ecological restoration and ecological degradation were mainly represented as greening and browning, respectively. Thus, the comparison of the time series T 1 (i.e., before the change point) and T 2 (after the change point) was also taken into account. Noting that the significant greening ratio was calculated using the total area of pixels with a significant greening trend within NNRs (i.e., the significant greening pixels being summed together and aggregating their total area within the NNR boundary) to divide the administrative areas of the NNRs, similarly, the significant browning ratio was calculated using the total area of pixels with significant browning trend within NNRs to divide the NNRs' area. This process has been calculated in all 30 NNRs.
Where for each NNR calculation, T 1 and T 2 divided the whole NDVI time series into two subseries, i.e., the pre-and post-change point time series, respectively. ∆ A (T 1 ) and ∆ A (T 2 ) are the calculated net change pixel ratios for T 1 and T 2 , respectively.
We also calculated the CV NDVI before and after the change point aggregated at the NNR-base level. Here, we hypothesized that if the CV NDVI of T 2 is greater than that of T 1 , the grassland biomes are more influenced by disturbances, which may further hinder ecological restoration, and vice versa. The CV NDVI can be calculated as follows: Step 5: Obtain the effectiveness score metric. The ecological change within each NNR was assessed using the entropy weight method, and each of these three metrics (i.e., CV NDVI , A, and average t test outcome) was weighted, summed to obtain the effectiveness score.
The entropy weight method has been used in recent analysis of the ecological quality of the terrestrial system using remote sensing monitoring [61]. The advantage of this method is that it is an objective multi-index evaluation to attain the relative intensity of each index solely based on the information that each index contains. The rationale behind this weighting method is that the weight of each of the three sub-metrics mentioned above could be determined using the information provided from their observation values. Details regarding the entropy weight method are listed as follows [61]: 1. List the original matrix A using all observations of the indices X ij .
where n denotes the number of n indices, m is the number of observations of each index, and m denotes the evaluated object that we have interest in, i.e., 30 NNRs. The matrix A for the remaining steps of entropy weight method evaluation consists of n × m observations; here in our study, X ij denotes the i th NNR-based index observation value of the j th index (i.e., CV NDVI , A, and average t test outcome).
2. Standardize the sub-metrics. Afterwards, we need to define the standardized method of the positive index and reverse index; the direction of the standardized method is affected by the feature of the index. We defined the A and t test to be the positive indexes, and CV NDVI to be the reverse index. The reason comes from our assumptions of this study, as we expect the grassland biomes within NNRs' boundary to exert larger t test to show higher NDVI annual mean value than those outside, and we also expect that the grassland could have a better ecological restoration status after the change point. Meanwhile, for CV NDVI we expect less ecological variation within NNRs and as a result, we expect less of a change rate with this metric and it should be a reverse index.
Moving forward, we calculated the standardized value from the original A matrix. In this way, the standardized matrix r ij could be retained.
The positive indices should be calculated using the formula below: The reverse index should be standardized in the formula below: 3. Calculate the entropy of the sub-metrics. Further, according to the definition of Shannon information entropy, the calculation of the entropy value of the j th index is shown below: where p ij denotes the proportion of the i th observation of the index j.
4. Calculate the difference coefficient of the first index and weight. The rationale is that the more significant the difference for the index j, the greater the entropy value E j of the index j would hold.
The first index G j and the weight W j would be calculated as follows: Figure 2 shows the spatiotemporal distribution of the interannual NDVI change in three different regions of interest, i.e., the provincial areas of IMAR, the unprotected areas, and the NNRs. The categories listed in the sub-figures of Figure 2 have all been defined using the standard deviation scheme of ArcGIS 10.8. Figure 2a shows a spatial sketch of Sen's slope. Overall, the significant greening areas were mainly distributed on the southeastern Ordos Plateau and eastern Xing'an League (dashed region in Figure 2a), where the browning region was mainly distributed in the midwestern region of Inner Mongolia, such as ChiFeng, Ulanqab, and Xilingol. From Figure 2b, we can easily observe that the pixels were categorized into 6 types using the Hurst value and vegetation dynamic analysis (see Table 2 for the 6 types of pixels). We noticed that the Hurst index ranged from 0.04 to 0.92, and the mean value of the Hurst index of Inner Mongolia was 0.67, with a standard deviation of 0.09, indicating that the grassland pixels in the study area could have a tendency to show a similar trend to that of the past 21 years from 2000-2020. For this reason, we mainly focused on the trend with a Hurst value greater than 0.5.  Figure 2a,c show that at the provincial level, vegetation dynamics show high spatial variability, with two major trends of vegetation greening and a stable trend coexisting. Areas with sustainable greening trends were mainly distributed in the Korchin sandy land, Ordos Plateau, and forested areas, such as southwestern Tongliao and eastern Xing'an, accounting for 47.67% of the total area of the district. The areas with sustainable steady conditions accounted for 49.37%, mainly distributed in the central regions of the province, whereas the pixels with sustainable browning trends were scattered in regions such as Xilingol and Chi Feng, with a ratio of 0.69%. Table 2 illustrates the ratio of the grassland trend within the IMAR, the unprotected areas, and the NNRs. Although the area with a sustainable greening trend inside the NNRs accounted for 37.85%, with a much smaller percentage than the provincial administrative region and the unprotected areas, the ratio of the area with a sustainable steady trend increased to 58.88%, which was 19.26% and 20.55% higher than the provincial level and the unprotected areas, respectively. Among the three regions of interest, NNR management showed marginal positive effects on mitigating grassland degradation, as the percentage coverage of the sustainably browning pixels was relatively lower than that of the IMAR and the unprotected areas, with a ratio of 0.62 in NNRs, 0.69 in the IMAR, and 0.7 in the unprotected areas; however, this decrease was not large compared with the change in the steady pixels. The results revealed the interesting finding that the NNRs have exerted some effectiveness on grassland protection based on trend analysis but seem to have more effectiveness in stabilizing the vegetation dynamic trend, compared with their slightly positive effects on decreasing the browning pixel ratios.

The NDVI Variation in the Temperate Grasslands during 2000-2020
Figure 2d depicts the spatial distribution of the coefficient of variation derived from the interannual NDVI calculation (CV NDVI ). According to our categorization scheme in the Methods section, the CV NDVI was classified into 5 grades, including high fluctuation (CV NDVI ≥ 0.20), relatively high fluctuation (0.20 > CV NDVI ≥ 0.15), medium fluctuation (0.15 > CV NDVI ≥ 0.10), relatively low fluctuation (0.10> CV NDVI ≥ 0.05), and low fluctuation (CV NDVI <0.05). At the provincial level, the highly fluctuating and relatively highly fluctuating areas were mainly distributed in Ordos, Tongliao, and Hulun Lake. The areas with medium fluctuation were mainly distributed in Ulanqab, Baotou, and Xilingol. The areas with relatively low fluctuation and low fluctuation were mainly distributed in the transition area between Xilingol and Chifeng. Table 3 lists the areas with different fluctuation grades in the three regions of interest, IMAR, the unprotected areas, and the NNRs. As shown in Table 3, there were fewer differences between the IMAR and the unprotected areas when compared with the NNRs, although the pixel ratio with high fluctuation reached 18.24%, which was 8.34% larger than that of the IMAR and 8.22% larger than that of the unprotected areas. However, the ratio of relatively high fluctuating and medium fluctuating pixels decreased by 8.35% and 26.25% compared with that of the IMAR and decreased by 8.3% and 26.26% compared with that of the unprotected areas, respectively. Furthermore, the ratio of relatively low fluctuation pixels accounted for 22.61%, which was 7.48% higher than that of the IMAR and 7.56% higher than that of the unprotected areas. The pixels with low fluctuation inside both the IMAR and the unprotected areas were merely 1.06% times those in the NNRs. Therefore, it can be concluded that NNRs, as one kind of ecological restoration effort, have exerted positive effects on stabilizing temperate grassland coverage from 2000 to 2020.   The summary figure in the corner lists the nested one-way ANOVA test outcomes for each protection type of NNR (i.e., FEP, DEP, etc.); the 5 columns of each were set in the order of the NNRs, a 1-5 km buffer, a 6-10 km buffer, a 11-15 km buffer, and a 16-20 km buffer, respectively, in that each point denoted the NDVI value in each spatial zone for each NNR with a given protection type. The gray horizontal line depicts the mean value of all the scattered points in each spatial zone, whereas the gray vertical line depicts the 95% confidence interval. We can clearly analyze the distribution of each NNR's NDVI value. FEP had the shortest confidence interval among all NNRs, and the NDVI mean of all the FEP NNRs was predominantly higher than in those regions outside the FEP NNRs. The IWP NNRs and WAP NNRs also showed a pattern of 'inside NNRs exceeded outside', but the advantage was not that obvious with a larger confidence interval, which contained more uncertainty. For the DEP, WVP, PRP, and GMP NNRs, their NDVI values were lower or almost the same as their outside values. From the nested one-way ANOVA test, the mean NDVI of the FEP was significantly greater than that of the WAP by 0.170, significantly greater than that of the WVP by 0.259, and significantly higher than that of the PRP by 0.254. All three comparisons had a p value < 0.0001. Additionally, the IWP NNRs were 0.181 greater than the WAP, 0.271 greater than the WVP, and 0.265 greater than the PRP in terms of the mean NDVI, with p values all < 0.0001. Table 4 lists the number of NNRs with significant and nonsignificant mean differences after calculating the t test. As shown in Table 4, the number of NNRs with a significantly higher mean difference than their outside buffer regions increased from 12 to 16 as the buffer radius increased from 1-5 km to 16-20 km, showing that the NNRs had a relatively highest NDVI mean than the farthest NNR outside(i.e.,16-20km buffer) among the four buffer regions. We found that if we expanded the buffer to 16-20 km, approximately 16 NNRs (53.33% of the total NNRs) would meet the lower NDVI mean without NNR establishment from 2000 to 2020.   In terms of the NNRs with different types of categories, for NNRs with the desert ecology protection type(DEP NNRs), grasslands showed an increasing trend at 0.0044 ± 0.0016 per year before the change point and 0.0050 ± 0.0022 per year after the change point. For NNRs with the wildlife animal protection type(WAP NNRs), grasslands showed an increasing trend at 0.0031 ± 0.0014 per year before the change point and 0.0056 ± 0.0007 per year after the change point. In terms of NNRs with the forest ecology protection type(FEP NNRs), grasslands showed an increasing trend at 0.0046 ± 0.0044 before the change point and 0.0057 ± 0.0025 after the change point. For the other four types of NNRs without detectable change points, the NNRs with wildlife vegetation protection(WVP NNR) showed an increasing trend at 0.0017 per year, whereas the NNRs with paleontological remains protection(PRP NNR) showed an increasing trend at 0.0026 per year. In terms of NNRs with the inland wetland protection type(IWP NNRs) and NNRs with the grassland meadow protection type(GMP NNRs), the increasing trends were 0.0015 ± 0.0007 per year and 0.0024 ± 5 × 10 −5 per year, respectively. Figure 5 shows that 22 of 30 NNRs had effective reserves that protected the temperate grasslands in the IMAR. The gray dashed line shows the mean of the total NNRs. The red line in the violin box shows the median effectiveness for each type of NNR. The IWP and WVP NNRs were categorized as mild effective reserves, as the median IWP and WVP mean effectiveness values were lower than the gray dashed line, which was the average level of all NNRs. The GMP NNRs were categorized as the medium effective reserve, as the median was comparable to the gray dashed line. WAP, PRP, and DEP were categorized as effective NNRs, as their median reached higher than the mean. Interestingly, statistics showed that FEP was a mild effective reserve type. Although there were some FEP NNRs with high effectiveness, the difference between FEP NNRs was most extensive among all seven NNR types.

Effects of National Nature Reserves on Grasslands
In this study, we set contiguous buffers with a 5 km increase in radius. Noting that this method can control the natural and environmental conditions so that they are similar [58], we then hypothesized that the annual NDVI mean variation in unchanged grassland pixels from 2000 to 2020 was due to the settlement of NNRs. The forest ecology protection type (FEP) NNRs, such as Inner Mongolia Qingshan and Inner Mongolia Helan Mountain, had the highest effectiveness scores but some FEP NNRs also had the lowest effectiveness scores. The possible cause for this might be due to the well-known phenomenon of PA downsizing, degazettement, and downgrading (PADDD), where forest NRs were mostly downsized from 2003 to 2015 according to previous studies [62], which may have decreased management efforts. The desert ecology protection NNRs (DEP) had high effectiveness scores, possibly due to greening efforts, because our results showed that among all reserves, the DEP NNRs had statistically higher NDVI slopes than other types of NNRs (see Table 5). This is quite reasonable, as the local government is dedicated to programs to raise greenness, for example, the Tumuji NNR, one of the DEP NNRs, where ecological conservation has gathered attention from social media [63]. Within that NNR, more than 780 mu (1 mu ≈ 666.67 m 2 ) of grasslands have been sown [63]; at present, the grassland biodiversity of the Tumuji Reserve can reach 18 species per square meter, and the biomass per square meter can weigh up to 4000 fresh grams. In addition, effective grassland conservation consisting of a grazing ban in both the core and buffer areas has also been conducted in the DEP NNRs, and previous research showed that this would place less human pressure on grassland biomes [64,65]. Before further analysis of whether NNRs experienced ecological restoration or degradation after the change point, we mapped the CV NDVI and NDVI slope results for Inner Mongolia grasslands as they were essential to further study. Thus, we needed to ensure that the CV NDVI and NDVI slope results were comparable with previous literature. We found that the spatial pattern of the CV NDVI was lower when distributed in the eastern transitional zones with grasslands and forests (see Figures 1 and 2d). In addition, our results also showed that the greening pixels were located mainly in the northeast and southwest, whereas central and eastern Inner Mongolia were dominated by browning pixels. Both the CV NDVI results and NDVI slope results were consistent with previous studies derived from SPOT, MODIS, and GIMMS NDVI [44,45,66]. Furthermore, the rank of CV NDVI among different NNRs in descending order was FEP, WAP, IWP, GMP, DEP, WVP, and PRP (see Table 6). Similar to the NNR effectiveness assessment in Yunnan Province conducted by Qiu et al. (2020), our results also found that the WAP and FEP had less variance in the NDVI, indicating that the disturbance in the WAP and FEP NNRs for grasslands was lower than that in other NNRs [67]. The possible reason the CV NDVI fluctuation varies greatly among different regions, is because the number of compensatory components that enrich plant communities differs between the western and the eastern IMAR [68,69].  1 As we wish the grassland ecosystem to be more stable compared with the outside changes, we desire a lower CV NDVI value inside each NNR, in which the lower value the CV NDVI is, the greener the box shown, while the higher value the CV NDVI is, the redder the box shown. An asterisk denotes that the statistic is significant at the p = 0.10 level and two asterisks denote significance at the p = 0.05 level and two asterisks denote significance at the p = 0.05 level while using the Kruskal-Wallis test. IWP, PRP, FEP, GMP, DEP, WAP, and WVP denote inland wetland protection, paleontological remains protection, forest ecology protection, grassland meadow protection, desert ecology protection, wildlife animal protection, and wildlife vegetation protection, respectively.

Necessity Regarding the IMAR NNR Effectiveness Assessment
China has been investing heavily in ecological conservation through the settlement and management of the NNRs [70,71]; unfortunately, there is still a knowledge gap in the present assessment of the effectiveness of China's NNRs [72], especially for the NNRs in the IMAR [36]. In light of this knowledge gap, we formulated an effectiveness score to study the effectiveness of the NNRs among different types of reserves. At present, diverse indicators have been constructed to highlight the necessity of effectiveness assessment. Among these indicators, simple-computed indicators to solve data acquisition problems have increasingly mushroomed [73,74]. For example, Mu et al. proposed the Area-Pattern Quadrant Analysis (APQA) system specialized for wetland NNR effectiveness assessment [75], and Huang et al. devised three indices, i.e., the Landscape Transfer Index (LTI), the Protected Landscape Integrity Index (PLII), and the Interfered Landscape Sprawl Index (ILSI), based on the landscape transfer matrix [72]. According to Gillespie et al., the NDVI mean aggregated at the reserve-based level and the NDVI changes at the pixel level have both been conducted to assess the effectiveness of protected areas [34]. For this study, we also used the NDVI mean at the reserve-based level and the NDVI changes at the pixel level, together with a t test, to construct the effectiveness score metric, which presents a possible pattern for evaluating the effectiveness of NNRs on grassland protection.
The wildlife vegetation protection (WVP) NNR type has the lowest NDVI slope within the NNRs but still has a high NDVI cv value (see Tables 5 and 6), suggesting that the WVP NNR might be the most at-risk NNR with slow greening but high disturbance. As the "Green Shield" special action was carried out by the central government of China starting in 2017 [62], the high disturbance around the WVP NNR was found to possibly relate to the intensity of human activity, such as the large amount of development and construction present around the NNR [76]. Notably, Helianthemum songaricum, a living plant fossil, that is concentrated solely in the desert steppe in the West Ordos region of Inner Mongolia and the Yili Valley of Xinjiang Province [77,78], has been suffering habitat fragmentation due to disturbance from human activity [77]. Except for human activity, the plant diversity problem could also be a possible reason why the WVP NNR should raise more awareness of conservation, because conservationists have documented plant diversity declines in grassland ecosystems within the WVP NNR boundary [79]. We could come to the conclusion that the NNRs on sandy or desert steppes in the IMAR are facing the risk of slow greening but fierce disturbance. Considering this, more protection efforts are needed to solve this urgent problem [36].

Limitations and Future Directions
Overall, in this paper we analyzed the effectiveness of the NNRs in the IMAR using the NDVI-based coefficient of variation, NDVI slope, and NDVI mean. The method comparing the areas inside and outside the NNRs combines the NDVI slope with the coefficient of variation of the NDVI, and this approach has been proven to provide a better overall picture of the PA effectiveness [67]. At the same time, NDVI as the data source was the most common VI in studies that used a single VI to conduct the PA analysis [42], as it has been seen as a successful proxy depicting the vegetation greenness. However, NDVI changes inside and outside the NNRs could possibly be due to coexisting factors rather than solely NNR management. For example, NDVI changes can also be induced by grazing and fencing; however, this kind of data is limited to within the boundaries of the NNRs [80]. Furthermore, the assessment was derived from weighted sub-metrics; however, previous research has argued that self-selecting indices are inclined to cause diverse principles for reference, which might cause the less comparability among individual assessment case's result [75]. There is still a call for future studies to consider the potential influencing factors using more comprehensive indicators. Additionally, taking an interdisciplinary scope for modifying the assessment method would also be helpful; for example, incorporating ground-based field validation together with trajectory information derived from time-series satellite archives can provide more value to NNR management.
We highlight that there is a need for caution about the relationship between the buffer radius and the LULC datasets when conducting future studies. In this study, we have applied the usually adopted radius of 5 km as this radius has been widely used both for domestic as well as global research [81,82], also because it has been proven to control similar natural and environmental conditions when conducting a treatment and control comparison analysis [58]; the successive radius of 5 km, 10 km, 15 km, and 20 km were used to compare with the area inside the NNRs. However, setting a buffer radius in the preprocessing process has been blurred by a suite of previous studies and remains a point of contention, with some having acknowledged the selection standards (e.g., funding projects), whereas others just followed previous studies. For this reason, various radius have been adopted when it comes to assessing the effectiveness of protected areas i.e., 1 km, 5 km, 10 km, 25 km, 50 km, 100 km, and so on [83][84][85][86], which could undermine the comparability of the individual work and result in a bias in the effectiveness outcome if there remains a lack of consideration of the ratio between the LULC resolution and the buffer radius.
Considering there is great potential for promoting more advanced algorithms in the remote sensing application for monitoring PA, we also encourage the multi-source datasets collaboration to capture the details of the landscape change possibly allowing both high spatial resolution and very dense datasets. Especially for those PAs located in special habitats with frequent interaction with human pressure such as wetland ecosystems, or with frequent interaction with climate change such as dryland ecosystems. It would be thoughtful to incorporate synthetic datasets such as UAV imagery and SAR data with mainstream algorithms such as but not limited to STARFM and IFSDAF [87,88], to permit the more adaptative spatial and temporal resolution data for monitoring the subtle ecological changes across different vegetation types of these PAs under the complex context of both climate change and human pressure. Also, there is a need to promote the dataset availability both inside and outside the PA boundary, such as the state of the biodiversity and the patchy distributions of overexploitation activities such as livestock grazing [80], etc., because these potential factors influence the PA's effectiveness and interact with the vegetation types and human pressure. What's more, the problem of the diverse conservation efforts with different landscape patterns would likely intertwine with the PAs on a landscape scale and this problem would incline to become serious for megadiverse regions such as the Amazon [89], as there are more than one type of conservation land use expected for PAs. In the future, we need to consider the interactions across landscapes and explore the decoupling of the PA's effectiveness from other conservation treatments.

Conclusions
The effectiveness of national nature reserves (NNRs) in the Inner Mongolia Autonomous Region (IMAR) was a priority for assessment, especially for grassland biomes in the ecological barrier which now have a conservation gap. Using MOD13Q1 NDVI data and trend analysis, we developed a composite metric with the assistance of the en-tropy weight method using the NDVI mean and the pixel-based trend analysis change in CV NDVI and the NDVI slope. With this metric, this study quantified NNR effectiveness on grassland conservation from 2000 to 2020. First, the pixel-based NDVI results showed that the NNRs increased the sustainable steady pixel ratio to more than 19.26% higher than that of the unprotected areas and 20.55% higher than that of Inner Mongolia. The pixels of CV NDVI <0.1 increased more than 35.22% than those in the unprotected areas and Inner Mongolia. Second, a comparison of the NDVI mean with its surrounding buffers at 5 km, 10 km, 15 km, and 20 km showed that among all NNRs, 63.64% had higher NDVI slopes after the change points than before. Last, the NDVI time series with the Pettitt change-point analysis showed that 63.64% of NNRs have seen a more significant trend of greening than before the change point, which suggests a general greening trend within the NNR boundaries. More importantly, we concluded that 22 of the 30 NNRs were effective in sheltering grasslands based on the effectiveness score.
This study evaluated the effectiveness of the NNRs in conserving grassland in Inner Mongolia by establishing an effectiveness score as the indicator. This indicator considered the annual NDVI mean difference between the NNRs' inside and outside buffers and detected ecological changes (restoration or degradation) within the NNR boundaries. We found that the effectiveness score varied among the types of NNR settlement, and that the NNRs with the wildlife animal protection type and the desert ecology protection type were deemed more effective, whereas the NNRs with the wildlife vegetation protection type were deemed to have the least efficiency; thus, we call for more attention to be given to conserving wildlife vegetation on the sandy steppe. Further research should consider coexisting factors and promote more comprehensive metrics to assess the effectiveness of the NNRs. This study enriches our understanding of the future direction of NNR management in grassland biomes. Furthermore, it helps shed light on the scientific topic of environmental protection of the sustainable development agenda of the UN 2030 in order to understand the complex interactions between the protected areas with climate change in the Anthropocene context. Future works should focus on how to refine the framework of the effectiveness assessment that lies with different vegetation types, by considering synthetic datasets with the assistance of comprehensive indicators and a deeper understanding of the relationship between PA treatment and control units. Institutional Review Board Statement: Not applicable.