A Conversion Method to Determine the Regional Vegetation Cover Factor from Standard Plots Based on Large Sample Theory and TM Images : A Case Study in the Eastern Farming-Pasture Ecotone of Northern China

The key to simulating soil erosion is to calculate the vegetation cover (C) factor. Methods that apply remote sensing to calculate the C factor at a regional scale cannot directly use the C factor formula. That is because the C factor formula is obtained by experiments, and needs the coverage ratio data of croplands, woodlands, and grasslands at a standard plot scale. In this paper, we present a C factor conversion method from a standard plot to a km-sized grid based on large sample theory and multi-scale remote sensing. The results show that: (1) Compared with the existing C factor formula, our method is based on the coverage ratio of croplands, woodlands, and grasslands on a km-sized grid, and takes the C factor formula obtained from the standard plot experiment and applies it to a regional scale. This method improves the applicability of the C factor formula, and can satisfy the need to simulate soil erosion in large areas; (2) The vegetation coverage obtained by remote sensing interpretation is significantly consistent (paired samples t-test, t = −0.03, df = 0.12, 2-tail significance p < 0.05) and significantly correlated with the measured vegetation coverage; (3) The C factor of the study area is smaller in the middle, southern, and northern regions, and larger in the eastern and western regions. The main reason for that is the distribution of woodlands, the Hunshandake and Horqin sandy lands, and the valleys affected by human activities; (4) The method presented in this paper is more meticulous than the C factor method based on the vegetation index, improves the applicability of the C factor formula, and can be used to simulate soil erosion on a large scale and provide strong support for regional soil and water conservation planning.


Introduction
Soil erosion models have provided strong support for regional soil and water conservation planning.With the development of the Geographic Information System (GIS) and Remote sensing (RS), the water erosion equation has been widely used in regional soil erosion simulations [1][2][3][4][5][6].Current water erosion models include the universal soil loss equation (USLE) [7], the revised universal soil loss equation (RUSLE) [8,9], the China soil loss equation [10], and so on.These models are being widely used to estimate soil loss in agricultural and environmental management.In the models, soil loss A = R × K × L × S × C × P, where R, K, L, S, C, and P, respectively, are rainfall erosivity (R), soil erodibility (K), slope length (L), slope steepness (S), vegetation cover and management (C), and support practice factor (P).The R, K, L, and S factors are controlled by the natural environment, and therefore will not be changed by short-term soil and water conservation measures and activities.However, the C factor has the greatest change range among the factors of water erosion models, and the changes can differ by two to three orders of magnitude [11].According to Benkobi et al. [12] and Biesemans et al. [13], the vegetation cover factor, together with slope steepness and length factors, are most sensitive for soil loss, and have the most significant effect on the overall effectiveness of the USLE/RUSLE models [14].Therefore, calculating and improving the accuracy of regional C factors has become the key to improving regional soil erosion simulations.
Most of the regional C factors have been determined with remote sensing data through the following methods: (1) The direct assignment of land use/coverage [6].This method is simple, but the accuracy of the computed C factor is poor [11]; (2) The vegetation index estimated C factor method [15][16][17].This method can express the regional vegetation coverage more finely, but it is less comprehensive, and multiple layers and shallow roots are usually ignored; (3) The spectral mixing analysis (SMA) estimated C factor method [18,19].The SMA method considers the contributions of litter, gravel, etc.; it can fully reflect the C factor information independent of the measurements, and the soil background does not affect it.However, the SMA method cannot be used when vegetation and/or litter completely covers the surface, or when the data is affected by multiple scattering [11]; (4) Experimental approaches combined with geostatistical methods [17,20].With this method, the C factor can be interpolated using GIS and remote sensing images as auxiliary variables.Wang et al. [17] improved mapping the C factor for the USLE by geostatistical methods with TM images.Based on multiple primary variables (canopy cover, ground cover, and vegetation height), Gertner et al. [20] mapped the C factor in regions from a joint co-simulation.
All of these methods that apply remote sensing have the following problems: Firstly, the methods above use the C factor formula obtained from standard plot experiments, while such formula cannot be directly applied to an entire region.The C factor formula of croplands, woodlands, and grasslands is calculated by experiments on the erosion rate of cropland, woodland, and grassland plots with bare land in the standard plots.However, the vegetation cover of a region is a complex combination of croplands, woodlands, and grasslands, differing from the vegetation cover modelled in the standard plot.Secondly, the RSI method requires a large number of field samples, and cannot be used in fragmented landscape areas, such as the farming-pasture ecotones of northern China.Not only is it difficult to ensure the accuracy of the spatial interpolation, but it is also time-consuming, laborious, and difficult to promote.Thirdly, there are very few ways to properly consider the effects of surface coverage and canopy coverage on soil erosion in the modelled region.
Based on the results of previous regional C factor estimations, we analyzed the key factors of soil erosion and highlighted the key factors affecting the scale conversion [21].In order to improve the accuracy of the regional C factor estimation and obtain a large-scale C factor map for a macro-scale soil erosion simulation, we built a C factor estimation method based on large sample theory and Landsat Thematic Mapper (TM) images, and we show that our method solves the key problem of transitioning from a standard plot to km-sized grids, and hence accurately estimates regional C factor.

Study Area
The eastern section of the farming-pasture ecotone (hereinafter referred to as the study area) (see Figure 1) includes 84 counties (banners) in the Inner Mongolia Autonomous Region, Liaoning Province, Heilongjiang Province, Jilin Province, and Hebei Province, and has an area of 4.402 × 10 5 km 2 .In 2015-2016, our research group carried out two field trips to determine the land use and soil erosion in the study area, and the length of the survey route was nearly 7000 km.According to the typical geographic unit and landform type, we set up 21 survey sample areas.From 19 July to 25 July 2015, the western part of the study area was inspected, where 10 inspection points were created over a route of 2930 km.From 7 August to 14 August 2016, the eastern part of the study area was inspected, where 11 inspection points were created over a route of more than 4000 km.

Study Area
The eastern section of the farming-pasture ecotone (hereinafter referred to as the study area) (see Figure 1) includes 84 counties (banners) in the Inner Mongolia Autonomous Region, Liaoning Province, Heilongjiang Province, Jilin Province, and Hebei Province, and has an area of 4.402 × 10 5 km 2 .In 2015-2016, our research group carried out two field trips to determine the land use and soil erosion in the study area, and the length of the survey route was nearly 7000 km.According to the typical geographic unit and landform type, we set up 21 survey sample areas.From 19 July to 25 July 2015, the western part of the study area was inspected, where 10 inspection points were created over a route of 2930 km.From 7 August to 14 August 2016, the eastern part of the study area was inspected, where 11 inspection points were created over a route of more than 4000 km.

Basic Idea and Research Framework
More than 1000 coverage data-points were obtained from Landsat TM images of the Global Land Survey in 2010 (GLS2010) and ground measurements, and according to large sample theory and information entropy theory, their distributions and proportions are similar.Large sample theory, also called asymptotic theory, is used to approximate the distribution of an estimator when the sample size n is large.This theory is extremely useful if the exact sampling distribution of the estimator is complicated or unknown [22].In addition, based on information-entropy theory [23], when the sample size is sufficiently large, the samples can be assumed approximate to a normal distribution.According to the above two theories, the subsets of remote sensing data-points and field measurements data-points have a similar distribution.That is, the same km-sized grid, 2000 land cover data-points obtained from field measurements have a similar distribution and similar proportion of croplands, woodlands, and grasslands as 1000 land cover data-points determined from the Landsat TM images.The accuracy of the field measured vegetation cover method is better than that of the remote sensing method, so the field measurements can be used as the reference for verifying the remote sensing measurements.The research framework is shown in Figure 2.

Basic Idea and Research Framework
More than 1000 coverage data-points were obtained from Landsat TM images of the Global Land Survey in 2010 (GLS2010) and ground measurements, and according to large sample theory and information entropy theory, their distributions and proportions are similar.Large sample theory, also called asymptotic theory, is used to approximate the distribution of an estimator when the sample size n is large.This theory is extremely useful if the exact sampling distribution of the estimator is complicated or unknown [22].In addition, based on information-entropy theory [23], when the sample size is sufficiently large, the samples can be assumed approximate to a normal distribution.According to the above two theories, the subsets of remote sensing data-points and field measurements data-points have a similar distribution.That is, the same km-sized grid, 2000 land cover data-points obtained from field measurements have a similar distribution and similar proportion of croplands, woodlands, and grasslands as 1000 land cover data-points determined from the Landsat TM images.The accuracy of the field measured vegetation cover method is better than that of the remote sensing method, so the field measurements can be used as the reference for verifying the remote sensing measurements.The research framework is shown in Figure 2. The C factor conversion from a "standard plot (single vegetation type)" to a "kilometer grid" (multiple vegetation types) contains six steps: (1) The vegetation cover of the sampling method was evaluated based on the high-resolution images of the survey sample area; (2) We interpreted the Landsat TM images of the study area and derived the land use data based on the CART decision tree classification method; (3) We then set up 21 survey sample areas, each with an area of 1 km 2 , and 2000 canopy cover and surface cover data points were established; (4) Based on the resolution of land use data, 1125 canopy coverage data points were obtained in each km-sized grid of the study area, and the canopy coverage (Cc) of the croplands, woodlands, and grasslands was calculated; (5) The surface cover (Sc) factor was calculated based on the surface coverage surveyed in the survey sample area, and was applied to the entire study area according to the landform type; (6) According to the Cc and Sc factors, we calculated the C factors of the study area.Finally, we verified the regional C factor by the survey sample areas.

Materials
The study mainly used three types of data: measured data (survey sample data), remote sensing data (Landsat TM image data, MODIS data and high-resolution images), and basic geographical data.The C factor conversion from a "standard plot (single vegetation type)" to a "kilometer grid" (multiple vegetation types) contains six steps: (1) The vegetation cover of the sampling method was evaluated based on the high-resolution images of the survey sample area; (2) We interpreted the Landsat TM images of the study area and derived the land use data based on the CART decision tree classification method; (3) We then set up 21 survey sample areas, each with an area of 1 km 2 , and 2000 canopy cover and surface cover data points were established; (4) Based on the resolution of land use data, 1125 canopy coverage data points were obtained in each km-sized grid of the study area, and the canopy coverage (Cc) of the croplands, woodlands, and grasslands was calculated; (5) The surface cover (Sc) factor was calculated based on the surface coverage surveyed in the survey sample area, and was applied to the entire study area according to the landform type; (6) According to the Cc and Sc factors, we calculated the C factors of the study area.Finally, we verified the regional C factor by the survey sample areas.

Materials
The study mainly used three types of data: measured data (survey sample data), remote sensing data (Landsat TM image data, MODIS data and high-resolution images), and basic geographical data.We used the Landsat TM image data of the Global Land Survey in 2010 (GLS2010, https://glovis.usgs.gov/,30 m × 30 m).The dates of the MODIS data (https://urs.earthdata.nasa.gov/profile, 1 km × 1 km) are 12 July 2015 and 27 July 2016, near the survey dates.A total of about 42,000 coverage data-points were surveyed throughout the study area.Basic information on the survey sample areas is shown in Table 1.We used Gaofen-2 satellite images (GF-2) that covered the investigation area to discuss the relationship between the survey estimated vegetation coverage and intercept vegetation coverage in the km-sized grid.The GF-2 satellite was designed and developed by the China Academy of Space Technology (CAST).It employs the CAST-CS-L3000A bus and two Panchromatic image/Multi spectral image (PAN/MS) cameras: one produces MS images with four bands in the visible and near-infrared (VNIR) range with a spatial resolution of 3.2 m; and the other generates PAN images in the visible range with a spatial resolution of 0.8 m [24].We chose four GF-2 images that were obtained within one month either side of the survey date in 2015.The GF-2 date of sample NO. 4 is 4 August 2015, sample NO. 5 is 9 August 2015, sample NO. 9 is 19 August 2015, and sample NO. 10 is 9 August 2015.With the eCognition software, we adopted the object-oriented high-precision remote sensing interpretation method to obtain high-resolution land use/cover data of the survey sample area.

Canopy Coverage Factor Upscaling
The canopy coverage factor can be calculated from vegetation cover, which can be estimated using point intercept, line-point intercept, grid-point intercept, and ocular estimates [25,26].Scholars have compared the above-mentioned methods to calculate the vegetation cover, and found that when the samples are less than 20, the point-based methods are more precise than ocular estimates [25,27,28] and line-point intercepts [25].When the sample size is large enough, the estimate of the line-point intercept is the same as that of the point intercept and grid-point intercept, and can correctly reflect vegetation cover [25,26].In this paper, 21 survey sample areas were set up in the study area.In each survey sample area we collected a total of 2000 samples, which is far greater than the requirement of 20 samples [26].
The size of the sampling unit is also important for ground measurements.Duncan et al. [29] analyzed the influence of sampling unit size on the remote sensing regression model based on the vegetation index and vegetation cover, and discussed the most suitable sampling unit size.In this paper, we chose the km-sized grid as the statistical unit.Then, via TM image interpretation, we produced more than 1000 land use/land cover data-points in each km-sized grid.Next, the MODIS vegetation cover was directly measured from the vegetation index in the km-sized grid.The different methods used to estimate vegetation cover are shown in Figure 3.The size of the sampling unit is also important for ground measurements.Duncan et al. [29] analyzed the influence of sampling unit size on the remote sensing regression model based on the vegetation index and vegetation cover, and discussed the most suitable sampling unit size.In this paper, we chose the km-sized grid as the statistical unit.Then, via TM image interpretation, we produced more than 1000 land use/land cover data-points in each km-sized grid.Next, the MODIS vegetation cover was directly measured from the vegetation index in the km-sized grid.The different methods used to estimate vegetation cover are shown in Figure 3.The key to estimating the C factor is to calculate the vegetation cover.There is no standard method to monitor vegetation cover, and current methods can be divided into surface measurement methods and remote sensing methods.The surface measurement method is limited by the workload and the measurement area size, and is unsuitable as an independent measurement method at a large scale.The methods based on remote sensing to calculate the vegetation cover rely on a surface test for the regression calibration.It has a certain accuracy, but is subject to the restrictions of promotion and application, especially in fragmented landscapes like farming-pasture ecotones.In terms of current technology, the accuracy of the surface measurement method is higher than that of the remote sensing method, and thus can be used as the basis for remote sensing measurements and data verification [30].

Land use classification system
The coverage and proportion of various land types on a km-sized grid can be calculated according to the high-resolution land use data of the study area, such as the global 30 m land use data produced by Chen et al. [31] or Gong et al. [32].However, both the above-mentioned classification data confuse grassland and bare land in the study area, and contain no secondary classification of grassland.Therefore, we adopted a secondary land use classification system for the study area based on the land use/cover classification system of Liu et al. [33] to classify the TM images.The interpretation effects of some classification methods (such as neural network classification, objectoriented classification) can be very good for a single image, but when it comes to a large area, the workload and classification efficiency must be considered.The CART algorithm is based on the decision tree classification method [34], and combines DEM and NDVI data, as well as supervised and unsupervised classification methods, resulting in a much higher interpretation efficiency than other interpretation methods [34] for large areas.So, we adopted the decision tree classification method of the CART algorithm to carry out land use classification (we interpreted more than 40 scene Landsat TM images from GLS2010).The key to estimating the C factor is to calculate the vegetation cover.There is no standard method to monitor vegetation cover, and current methods can be divided into surface measurement methods and remote sensing methods.The surface measurement method is limited by the workload and the measurement area size, and is unsuitable as an independent measurement method at a large scale.The methods based on remote sensing to calculate the vegetation cover rely on a surface test for the regression calibration.It has a certain accuracy, but is subject to the restrictions of promotion and application, especially in fragmented landscapes like farming-pasture ecotones.In terms of current technology, the accuracy of the surface measurement method is higher than that of the remote sensing method, and thus can be used as the basis for remote sensing measurements and data verification [30].

TM Image Land Use Interpretation
• Land use classification system The coverage and proportion of various land types on a km-sized grid can be calculated according to the high-resolution land use data of the study area, such as the global 30 m land use data produced by Chen et al. [31] or Gong et al. [32].However, both the above-mentioned classification data confuse grassland and bare land in the study area, and contain no secondary classification of grassland.Therefore, we adopted a secondary land use classification system for the study area based on the land use/cover classification system of Liu et al. [33] to classify the TM images.The interpretation effects of some classification methods (such as neural network classification, object-oriented classification) can be very good for a single image, but when it comes to a large area, the workload and classification efficiency must be considered.The CART algorithm is based on the decision tree classification method [34], and combines DEM and NDVI data, as well as supervised and unsupervised classification methods, resulting in a much higher interpretation efficiency than other interpretation methods [34] for large areas.So, we adopted the decision tree classification method of the CART algorithm to carry out land use classification (we interpreted more than 40 scene Landsat TM images from GLS2010).

• Decision tree classification based on the CART algorithm
Prior to classification, the Landsat TM and Landsat Enhanced Thematic Mapper (ETM) data in the Global Land Survey of 2010 (GLS2010) underwent cloud interpretation and geometric correction were combined into multi-band images composed of blue, green, red, near infrared, short-wave infrared, medium-wave infrared, and long-wave radiation, together with NDVI, ISODATA, DEM, and other bands.The NDVI data was generated from TM images and the DEM data from global ASTGTM data.The ISODATA data is from unsupervised classification of the TM images, and the minimum class number of unsupervised classification is 10 and the maximum is 25.
The main steps of decision tree classification based on the CART algorithm are: (1) Select the training areas.According to the secondary classification system, a certain number of training samples were selected in the multi-band image and used to obtain expert knowledge rules.The training area selection order was: (i) water (river canals); (ii) built-up lands, industrial and mining lands, residential lands (urban land, rural residential areas, other construction land); and (iii) croplands (irrigated land, dry land); (2) Establish a decision tree based on the training area.We used the extension tool RuleGen [34] to automatically generate decision tree rules, and used the ENVI Execute Existing Decision Tree tool to establish a decision tree for land use interpretation.

• Classification accuracy evaluation
In this paper, the study area is relatively large and it is difficult to carry out scientific random field verification, so this paper uses a large number of random distributions of the single pixel verification point method based on Google images [31,35].We generated a total of 2000 points randomly throughout the study area (50 points in each image).In order to obtain the real surface cover data of the random points, and considering the high accuracy of Google images, we first used Google Earth to distinguish the real surface cover data [31,35], and second, we moved the random verification point which is in the edge of land cover type, to the center of land cover type, and avoided mixing pixels, ensuring the accuracy of the true surface data.Finally, we used the true surface data of points randomly to verify the accuracy of the land use in the interpretation.

Surface Coverage Factor Upscaling
At regional scale, it is difficult to obtain surface cover information such as surface litter, crop stubble, and gravel required by soil erosion models.To overcome this limitation, we first set up 21 survey sample areas in the typical landform unit, measured the ratio of surface litter, crop stubble, and gravel in the sample area, and finally interpolated them according to land surface types to obtain the surface coverage factor in the study area.Secondly, the surface of the study area was classified according to the geomorphic map (1:4 million) of China and its adjacent areas.Thirdly, according to the natural geographic features of the study area, we found that the surface of the study area can be classified by the geomorphic unit that can reveal the surface cover information to calculate the vegetation cover.Lastly, the surface coverage (Sc) of the study area was calculated based on the field coverage of the survey sample area and the structure geomorphic unit of the study area, as shown in Figure 4.Many scholars have experimentally determined the standard plot C factor for different regions [4,7,11,36].Different formulas are given based on different vegetation covers, for example: Jin et al. [37] built the C factor formula of grassland that had a 1.9% coverage in the standard plot of the Huangfu River Basin; Jiang et al. [38] built the C factor formula of grasslands and woodlands in a standard plot with coverages of more than 5% in Ansai County of Shaanxi Province; Liu [39] built the C factor formula of croplands, woodlands, and grasslands in a standard plot in Beijing.
The surface cover is mainly composed of litter, crop stubble, and gravel, which are more effective in reducing soil erosion than plant canopy cover.Based on this, the C-factor algorithm of Liu [39] considers both the surface cover and canopy cover of croplands, woodlands, and grasslands, which makes the C factor result more objective.Therefore, we chose Liu's [39] standard plot vegetation cover factor algorithm to calculate the C factor, as shown in Equation (1).
The canopy cover factor of cropland and grassland is calculated by Equation ( 2), and the canopy cover factor of woodland is calculated by Equation (3).Liu [39] found that the Sc factors of croplands and grasslands can be combined into a single formula by Equation (4).The Sc factor of woodlands is calculated by Equation (5). (2) where Cc and Cs are the canopy cover factor and the surface cover factor, respectively; Vc and VR are the canopy cover (%) and surface cover (%), respectively; and h is the canopy height (cm).As it is difficult to obtain surface information such as litter, crop stubble, and gravel on the regional scale, in this paper, the Sc factor was calculated by using the surface coverage factor upscale method.
• Regional C factor algorithm The regional vegetation cover factor needs to be calculated based on the fractional cropland, woodland, and grassland coverage.Based on the C factor algorithm of the standard plot, the regional C factor algorithm is proposed, as shown in Equation ( 6).

(
) / ( )  3.3.4.Regional Vegetation Cover Factor • Standard plot C factor algorithm Many scholars have experimentally determined the standard plot C factor for different regions [4,7,11,36].Different formulas are given based on different vegetation covers, for example: Jin et al. [37] built the C factor formula of grassland that had a 1.9% coverage in the standard plot of the Huangfu River Basin; Jiang et al. [38] built the C factor formula of grasslands and woodlands in a standard plot with coverages of more than 5% in Ansai County of Shaanxi Province; Liu [39] built the C factor formula of croplands, woodlands, and grasslands in a standard plot in Beijing.
The surface cover is mainly composed of litter, crop stubble, and gravel, which are more effective in reducing soil erosion than plant canopy cover.Based on this, the C-factor algorithm of Liu [39] considers both the surface cover and canopy cover of croplands, woodlands, and grasslands, which makes the C factor result more objective.Therefore, we chose Liu's [39] standard plot vegetation cover factor algorithm to calculate the C factor, as shown in Equation (1).
The canopy cover factor of cropland and grassland is calculated by Equation (2), and the canopy cover factor of woodland is calculated by Equation (3).Liu [39] found that the Sc factors of croplands and grasslands can be combined into a single formula by Equation (4).The Sc factor of woodlands is calculated by Equation (5).
where C c and C s are the canopy cover factor and the surface cover factor, respectively; V c and V R are the canopy cover (%) and surface cover (%), respectively; and h is the canopy height (cm).As it is difficult to obtain surface information such as litter, crop stubble, and gravel on the regional scale, in this paper, the Sc factor was calculated by using the surface coverage factor upscale method.
• Regional C factor algorithm The regional vegetation cover factor needs to be calculated based on the fractional cropland, woodland, and grassland coverage.Based on the C factor algorithm of the standard plot, the regional C factor algorithm is proposed, as shown in Equation (6).
where C is the C factor of the km-sized grid, V crop is the cropland coverage (%), V grass is the grassland coverage (%), and V forest is the forest coverage (%).C crop is the cropland vegetation cover factor, C grass is the grassland vegetation cover factor, and C forest is the woodland vegetation cover factor.

C Factor of the Survey Sample Area
Based on land use data and surface cover data, the C factor of each survey sample area was calculated according to the C factor algorithm (Table 2).
It can be seen that the C factor of the sample area that has a higher proportion of woodland was smallest.The C factor of the survey sample areas that have a higher proportion of grassland and cropland cover was medium, and the C factor of the survey sample areas that have a higher proportion of unused land (sand and bare land) was highest.

Fractional Vegetation Cover (FVC) in the Study Area
Based on the 30 m resolution land use map in 2010 from GLS2010, we calculated the coverage proportion of cropland, woodland, and grassland, as well as the vegetation cover of the study area, as shown in Figure 5.According to the vegetation cover of the study area in 2010 based on GLS2010, the vegetation cover (including the ratio of cropland, woodland, and grassland) was extracted in each survey sample area and verified by the measured vegetation coverage, as shown in Figure 6.The correlation vegetation coverage between the remote sensing and measured points is shown in Table 3.The paired-samples Student's t-test was used to detect the statistical significance level of the remote sensing interpretation vegetation cover.The results are shown in Table 4.
It can be seen from Figure 6, as well as Tables 3 and 4, that the vegetation coverage obtained by the remote sensing interpretation is significantly consistent and significantly correlated with the measured vegetation coverage.The vegetation coverage obtained by remote sensing interpretation is significantly consistent with the measured vegetation coverage (paired samples t-test, t = −0.03,df = 0.12, 2-tail significance p < 0.05), where R 2 is 0.58.Simultaneously, there is a statistically significant linear relationship present between the two groups of data.The two groups of data are significantly correlated (significance p < 0.05), and the correlation is 0.76.Furthermore, the two groups of data are generally distributed near the 1:1 line, and the regression coefficient is 0.93.To sum up, the results show that the method in this study can help accurately obtain the coverage ratio of croplands, woodlands, and grasslands in the km-sized grid.
where C is the C factor of the km-sized grid, Vcrop is the cropland coverage (%), Vgrass is the grassland coverage (%), and Vforest is the forest coverage (%).Ccrop is the cropland vegetation cover factor, Cgrass is the grassland vegetation cover factor, and Cforest is the woodland vegetation cover factor.

C Factor of the Survey Sample Area
Based on land use data and surface cover data, the C factor of each survey sample area was calculated according to the C factor algorithm (Table 2).
It can be seen that the C factor of the sample area that has a higher proportion of woodland was smallest.The C factor of the survey sample areas that have a higher proportion of grassland and cropland cover was medium, and the C factor of the survey sample areas that have a higher proportion of unused land (sand and bare land) was highest.

Fractional Vegetation Cover (FVC) in the Study Area
Based on the 30 m resolution land use map in 2010 from GLS2010, we calculated the coverage proportion of cropland, woodland, and grassland, as well as the vegetation cover of the study area, as shown in Figure 5.According to the vegetation cover of the study area in 2010 based on GLS2010, the vegetation cover (including the ratio of cropland, woodland, and grassland) was extracted in each survey sample area and verified by the measured vegetation coverage, as shown in Figure 6.The correlation vegetation coverage between the remote sensing and measured points is shown in Table 3.The paired-samples Student's t-test was used to detect the statistical significance level of the remote sensing interpretation vegetation cover.The results are shown in Table 4.
It can be seen from Figure 6, as well as Tables 3 and 4, that the vegetation coverage obtained by the remote sensing interpretation is significantly consistent and significantly correlated with the measured vegetation coverage.The vegetation coverage obtained by remote sensing interpretation is significantly consistent with the measured vegetation coverage (paired samples t-test, t = −0.03,df = 0.12, 2-tail significance p < 0.05), where R 2 is 0.58.Simultaneously, there is a statistically significant linear relationship present between the two groups of data.The two groups of data are significantly correlated (significance p < 0.05), and the correlation is 0.76.Furthermore, the two groups of data are generally distributed near the 1:1 line, and the regression coefficient is 0.93.To sum up, the results show that the method in this study can help accurately obtain the coverage ratio of croplands, woodlands, and grasslands in the km-sized grid.

C Factor in the Study Area
Based on the C factor conversion method, and using the proportional coverage of croplands, woodlands, and grasslands in the km-sized grid, the C factor in the study area (2010) was calculated, as shown in Figure 7.It can be seen from Figure 7 that the C factor is smaller in the middle, southern, and northern regions, and larger in the eastern and western sections of the study area.In the central and northern parts of the study area, there are a large number of woodlands, making the C factor lower.The Hunshandake sandy lands and Horqin sandy lands result in larger C factors in the eastern and western regions.In the south-central part of study area, the C factor of woodlands is obviously smaller than that of valleys.The main reason is that valleys are affected by human cultivation, so the C factor is improved.Using the C factor based on the measured coverage, the C factor of the survey sample area based on remote sensing interpretation is verified (Figure 8).

C Factor in the Study Area
Based on the C factor conversion method, and using the proportional coverage of croplands, woodlands, and grasslands in the km-sized grid, the C factor in the study area (2010) was calculated, as shown in Figure 7.

C Factor in the Study Area
Based on the C factor conversion method, and using the proportional coverage of croplands, woodlands, and grasslands in the km-sized grid, the C factor in the study area (2010) was calculated, as shown in Figure 7.It can be seen from Figure 7 that the C factor is smaller in the middle, southern, and northern regions, and larger in the eastern and western sections of the study area.In the central and northern parts of the study area, there are a large number of woodlands, making the C factor lower.The Hunshandake sandy lands and Horqin sandy lands result in larger C factors in the eastern and western regions.In the south-central part of study area, the C factor of woodlands is obviously smaller than that of valleys.The main reason is that valleys are affected by human cultivation, so the C factor is improved.Using the C factor based on the measured coverage, the C factor of the survey sample area based on remote sensing interpretation is verified (Figure 8).It can be seen from Figure 7 that the C factor is smaller in the middle, southern, and northern regions, and larger in the eastern and western sections of the study area.In the central and northern parts of the study area, there are a large number of woodlands, making the C factor lower.The Hunshandake sandy lands and Horqin sandy lands result in larger C factors in the eastern and western regions.In the south-central part of study area, the C factor of woodlands is obviously smaller than that of valleys.The main reason is that valleys are affected by human cultivation, so the C factor is improved.Using the C factor based on the measured coverage, the C factor of the survey sample area based on remote sensing interpretation is verified (Figure 8).It can be observed in Figure 8 that the C factor of the survey sample area calculated via remote sensing is consistent with the one based on measured coverage.The two data are distributed near the 1:1 line, with R 2 = 0.36 and correlation coefficient = 0.7.Due to the resolution and interval differences among the TM images and the field measurement, we obtained a low R square, which is acceptable if we do not use it to predict the C factor.This shows that the C factor measured in this study can reflect the vegetation cover in the study area and can be used to calculate the C factor in the soil erosion equation.

Relationship between Line-Point Estimated Vegetation Coverage and Intercept Vegetation Coverage in the Km-Sized Grid
With the eCognition software package, we used the object-oriented classification method to interpret the high-precision images (Gaofen-2 satellite images (GF-2)) of the sample area and obtain the coverage of various land cover types.The cropland, grassland, and woodland coverage obtained from high-precision images was compared with the line-point intercept estimated data, as shown in Figure 9.It can be observed in Figure 8 that the C factor of the survey sample area calculated via remote sensing is consistent with the one based on measured coverage.The two data are distributed near the 1:1 line, with R 2 = 0.36 and correlation coefficient = 0.7.Due to the resolution and interval differences among the TM images and the field measurement, we obtained a low R square, which is acceptable if we do not use it to predict the C factor.This shows that the C factor measured in this study can reflect the vegetation cover in the study area and can be used to calculate the C factor in the soil erosion equation.

Relationship between Line-Point Estimated Vegetation Coverage and Intercept Vegetation Coverage in the km-Sized Grid
With the eCognition software package, we used the object-oriented classification method to interpret the high-precision images (Gaofen-2 satellite images (GF-2)) of the sample area and obtain the coverage of various land cover types.The cropland, grassland, and woodland coverage obtained from high-precision images was compared with the line-point intercept estimated data, as shown in Figure 9.It can be observed in Figure 8 that the C factor of the survey sample area calculated via remote sensing is consistent with the one based on measured coverage.The two data are distributed near the 1:1 line, with R 2 = 0.36 and correlation coefficient = 0.7.Due to the resolution and interval differences among the TM images and the field measurement, we obtained a low R square, which is acceptable if we do not use it to predict the C factor.This shows that the C factor measured in this study can reflect the vegetation cover in the study area and can be used to calculate the C factor in the soil erosion equation.

Relationship between Line-Point Estimated Vegetation Coverage and Intercept Vegetation Coverage in the Km-Sized Grid
With the eCognition software package, we used the object-oriented classification method to interpret the high-precision images (Gaofen-2 satellite images (GF-2)) of the sample area and obtain the coverage of various land cover types.The cropland, grassland, and woodland coverage obtained from high-precision images was compared with the line-point intercept estimated data, as shown in Figure 9.It can be seen from Figure 9 that the correlation between the FVC from the line-point intercept method and the high-resolution image interpretation is very high.The two groups of data are mainly distributed in the vicinity of the 1:1 line.The nonparametric test of the relevant sample Wilcoxon, shows that the two groups of data are significantly consistent at a significance level of p < 0.05.The value of R 2 is 0.72 and the slope is 1.40, indicating that the cropland, grassland, and woodland coverage obtained from the line-point intercept method is larger than that from image interpretation, but can still be used to reflect the proportion of cropland, grassland, and woodland in the study area, with an interpretation rate of 72.29%.This result is basically consistent with the findings of previous studies [25,26], indicating that the line-point intercept method can effectively reflect the coverage of cropland, grassland, and woodland in the survey sample area.

Comparison of Our Method and the C Factor Algorithm Based on Remote Sensing Vegetation Index
We compared the C factor derived from the vegetation index (see Appendix A), with the C factor determined by our method, and the difference of two C factors is shown in Figure 10.
It can be seen from Figure 9 that the correlation between the FVC from the line-point intercept method and the high-resolution image interpretation is very high.The two groups of data are mainly distributed in the vicinity of the 1:1 line.The nonparametric test of the relevant sample Wilcoxon, shows that the two groups of data are significantly consistent at a significance level of p < 0.05.The value of R 2 is 0.72 and the slope is 1.40, indicating that the cropland, grassland, and woodland coverage obtained from the line-point intercept method is larger than that from image interpretation, but can still be used to reflect the proportion of cropland, grassland, and woodland in the study area, with an interpretation rate of 72.29%.This result is basically consistent with the findings of previous studies [25,26], indicating that the line-point intercept method can effectively reflect the coverage of cropland, grassland, and woodland in the survey sample area.

Comparison of Our Method and the C Factor Algorithm Based on Remote Sensing Vegetation Index
We compared the C factor derived from the vegetation index (see Appendix A), with the C factor determined by our method, and the difference of two C factors is shown in Figure 10.It can be seen from Figures 7-10 that the C factors obtained from the two methods are basically consistent (considering the system errors of vegetation index and remote sensing data, we define C factors with the difference between −0.1 and 0.1 as the consistent) in 22% of the study area.Compared with the C factor derived from image interpretation, the C factor based on the vegetation index is higher (up to 0.3) for western sandy lands, while lower (up to 0.3) for eastern croplands in our study area.The main reason for the differences is that the C factor formula of the standard plot is different.The C factor algorithm based on the vegetation index assumes only one vegetation type in the kmsized grid, and calculates the coverage of the vegetation to obtain the C factor.Therefore, in the sandy area where the complete distribution of sand is assumed, the C factor is overestimated, while in cropland areas where the complete distribution of cropland is assumed, the C factor is underestimated.As such, the C factor based on the remote sensing data is more precise than that based on the vegetation index.The water erosion equation used in this paper is a small-scale method, and although many previous studies have applied it on regional scale, many of its parameters are empirical, and their rationality has not been confirmed: (1) The measured parameters of the C factor are for selected field It can be seen from Figures 7-10 that the C factors obtained from the two methods are basically consistent (considering the system errors of vegetation index and remote sensing data, we define C factors with the difference between −0.1 and 0.1 as the consistent) in 22% of the study area.Compared with the C factor derived from image interpretation, the C factor based on the vegetation index is higher (up to 0.3) for western sandy lands, while lower (up to 0.3) for eastern croplands in our study area.The main reason for the differences is that the C factor formula of the standard plot is different.The C factor algorithm based on the vegetation index assumes only one vegetation type in the km-sized grid, and calculates the coverage of the vegetation to obtain the C factor.Therefore, in the sandy area where the complete distribution of sand is assumed, the C factor is overestimated, while in cropland areas where the complete distribution of cropland is assumed, the C factor is underestimated.As such, the C factor based on the remote sensing data is more precise than that based on the vegetation index.The water erosion equation used in this paper is a small-scale method, and although many previous studies have applied it on regional scale, many of its parameters are empirical, and their rationality has not been confirmed: (1) The measured parameters of the C factor are for selected field survey times and areas.The C factor in this paper was obtained from remote sensing and field survey data.The summer months (mainly July and August) in the study area have the strongest water erosion throughout the year, and there was almost no water erosion in other months.Besides, the vegetation coverage is higher in summer.The summer C factor determined the accuracy of the water erosion simulation.Therefore, the field investigation time and TM image acquisition time of this study were from July to August; (2) Empirical parameters of the C factor obtained from remote sensing data.The coverage of cropland, woodland, and grassland was obtained by interpreting the remote sensing images and then calculating the proportion of land use/cover.The grasslands can be divided into high coverage, medium coverage, and low coverage grassland [33].Among them, the high coverage grassland has a coverage of over 50%, assumed as 80%; the medium coverage grassland has a coverage of 20-50%, assumed as 50%; while the low coverage grassland has a coverage of 5-20% of the grass, assumed as 20%.The croplands are classified as irrigated and dry lands.The woodlands are divided into woodlands and shrubs and the coverages of cropland and woodland vegetation are assumed to be 100%.Swamps are low-lying wetlands, which are different from other unused lands (bare land, quicksand, saline, and alkaline land), while similar to high coverage grassland, with an assumed coverage of 80%.These vegetation coverage parameters are empirical to some degree and should be further confirmed in a further study.

Differences between the Remote Sensing Image Coverage and the Measured Coverage
Based on high-precision field survey and remote sensing image interpretation, this study constructed a km-sized grid C factor algorithm.Although the land cover data obtained from TM image interpretation already had the highest resolution on a regional scale, the coverage was still different from the measured coverage because TM images contained mixed pixels.In addition, based on TM images, we can only measure the canopy coverage, rather than the surface coverage that has impacts on soil erosion [39].Furthermore, the regional surface coverage can currently only be determined based on the surface coverage of representative sites through upscaling.

The Accuracy of the TM Image Interpretation
In this study, we randomly generated points in the TM images to verify the accuracy of our land use interpretation.A total of 2000 verification points (50 in each image) were generated in the study area, where each verification point was a single pixel.Based on the high-precision Google Earth images, we determined the land use type of each verification point to obtain the verification data.In order to ensure the accuracy of the verification data and to reduce the mixed pixel problem caused by the resolution difference and time inconsistency of the verification data, we moved the random points at the edge of the land use units to the center of the land use type.Using the randomly generated 2000 verification points, we tested the interpreted 2010 land use data.The results are shown in Table 5.It can be seen from Table 5 that the interpretation accuracy is 72.25% and the Kappa coefficient is 0.62.Landis and Koch [40] point out that a Kappa coefficient larger than 0.6 indicates a good accuracy, thus the interpretation accuracy of this paper is good.Many papers have proved the remote sensing interpretation of a large region of low accuracy [31,35].For example, the accuracy of many automatic classification methods is generally less than 65% [31], and this paper is a study of the fragmented landscape areas for the ecological transition zone, so a 72.25% accuracy is high enough.
There are several problems in the interpretation of land use in the study area: (1) The problem of mixed pixels and different spatial resolutions among Landsat TM and Google earth images decrease the interpretation accuracy; (2) Problems arising from different types of land have the same type of spectrum.The spectra of unused lands and construction lands are relatively close, and on a large regional scale the proportion of construction lands is relatively low, so erosion is rarely influenced.Unused lands such as bareland will cause erosion.However, this paper studied the vegetation cover (C) factor which is related to vegetation (cropland, grassland, and woodland), so the misclassification of construction land and bareland, though it may have resulted in a low accuracy, did not affect the vegetation coverage calculation.So the error can be ignored since this paper focuses on the proportion of vegetation (i.e., croplands, woodlands, and grasslands); (3) The mosaic problem.Due to the phase differences of TM images, there exist edge snap problems between different TM images.These problems reduced the accuracy of the TM image interpretation, so that the results had some uncertainty in the edge areas.However, the overall interpretation accuracy of cropland, woodland, and grassland can meet the requirement of vegetation coverage and the C factor calculation.

Conclusions
The vegetation cover (C) factor is one of the most influential factors in the soil erosion model.The C factor derived from a standard plot cannot be directly used on a regional scale.Based on remote sensing data and field investigations, we designed a C factor conversion method from the standard plot to a km-sized grid based on large sample theory.It can be concluded that: (1) Compared with existing C factor algorithms, our algorithm improves the applicable range of the C factor formula of the standard plot, and can be used to simulate soil erosion in large areas; (2) The vegetation coverage obtained by remote sensing interpretation is significantly consistent (paired samples t-test, t = −0.03,df = 0.12, 2-tail significance p < 0.05) and significantly correlated with the measured vegetation coverage.Meanwhile, the line-point intercept method can be used to effectively obtain the vegetation coverage of cropland, woodland, and grassland in the survey sample area (p < 0.05); (3) The C factor of the study area is smaller in the middle, southern, and northern regions, and larger in the eastern and western sections.The main reason for this is the distribution of woodlands, the Hunshandake and Horqin sandy lands, and human cultivation that affects the valleys; (4) The C factor conversion method based on large sample theory is better than the one based on vegetation indices.
In this paper, a method for estimating the regional C factor was proposed by combining the interpretation of remote sensing data and data obtained from field investigations.Our method is limited by the resolution of the remote sensing data, and the accuracy of the TM image interpretation.Thus, in the future, we aim to develop our method by: (1) further confirming the empirical parameters of the C factor; (2) establishing a database of C factors in different seasons; (3) studying the differences between remote sensing and high precision measured vegetation coverage.The dependent variable is the measured vegetation coverage.
Table A2 expresses the regression equation of the EVI and measured vegetation coverage relative to the dryness classification and its significance.When the dryness is grade 1, the significance level is 0.003 (p < 0.05); when the dryness is grade 2, the significance level is 0.015 (p < 0.05); when the dryness is grade 3, the significance level is 0.019 (p < 0.05).

C Factor Based on the Vegetation Index
According to the fitting equation for different dryness classifications, the FVC in the study area was calculated (Figure A2a).Based on the MODIS land cover data, the C factor (Figure A2b) of the study area was calculated according to the C factor formula of the standard plot.The dependent variable is the measured vegetation coverage.
Table A2 expresses the regression equation of the EVI and measured vegetation coverage relative to the dryness classification and its significance.When the dryness is grade 1, the significance level is 0.003 (p < 0.05); when the dryness is grade 2, the significance level is 0.015 (p < 0.05); when the dryness is grade 3, the significance level is 0.019 (p < 0.05).

C Factor Based on the Vegetation Index
According to the fitting equation for different dryness classifications, the FVC in the study area was calculated (Figure A2a).Based on the MODIS land cover data, the C factor (Figure A2b) of the study area was calculated according to the C factor formula of the standard plot.

Figure 1 .
Figure 1.Study area and two field trips routes.Figure 1. Study area and two field trips routes.

Figure 1 .
Figure 1.Study area and two field trips routes.Figure 1. Study area and two field trips routes.

Figure 2 .
Figure2.The C factor conversion method from a "standard plot (single vegetation type)" to a "kmsized grid" (multiple vegetation types).

Figure 2 .
Figure2.The C factor conversion method from a "standard plot (single vegetation type)" to a "km-sized grid" (multiple vegetation types).

Figure 3 .
Figure 3.The different methods used to estimate vegetation cover in the survey sample area: (a) The line-point intercept method; (b) from high-resolution satellite images; (c) from TM images; (d) from the MODIS vegetation index.

Figure 3 .
Figure 3.The different methods used to estimate vegetation cover in the survey sample area: (a) The line-point intercept method; (b) from high-resolution satellite images; (c) from TM images; (d) from the MODIS vegetation index.

20 Figure 4 .
Figure 4.The coverage of surface matter in the study area.3.3.4.Regional Vegetation Cover Factor • Standard plot C factor algorithm

Figure 4 .
Figure 4.The coverage of surface matter in the study area.

Figure 6 .
Figure 6.The verification of vegetation coverage in the survey sample area.

Figure 7 .
Figure 7.The C factor in the study area (2010).

− 6 Figure 6 .
Figure 6.The verification of vegetation coverage in the survey sample area.

Figure 6 .
Figure 6.The verification of vegetation coverage in the survey sample area.

Figure 7 .
Figure 7.The C factor in the study area (2010).

Figure 7 .
Figure 7.The C factor in the study area (2010).

Figure 8 .
Figure 8. Verifying the C factor based on remote sensing interpretation.

Figure 9 .
Figure 9.The correlation between the measured Fractional Vegetation Cover (FVC) from the linepoint intercept method and the high-resolution image interpretation.
resolution image coverage (%) Land cover types Y=X Linear

Figure 8 .
Figure 8. Verifying the C factor based on remote sensing interpretation.

Figure 8 .
Figure 8. Verifying the C factor based on remote sensing interpretation.

Figure 9 .Figure 9 .
Figure 9.The correlation between the measured Fractional Vegetation Cover (FVC) from the linepoint intercept method and the high-resolution image interpretation.

Figure 10 .
Figure 10.The difference of C factors based on the vegetation index and remote sensing data.

5. 3 . 1 .
The Empirical Parameters of the C Factor Need to Be Further Confirmed

Figure 10 .
Figure 10.The difference of C factors based on the vegetation index and remote sensing data.

5. 3 .
Uncertainties of the Calculated C Factors 5.3.1.The Empirical Parameters of the C Factor Need to Be Further Confirmed Remote Sens. 2017, 9, x FOR PEER REVIEW 18 of 20

Figure A2 .
Figure A2.Vegetation coverage and C cover factor based on the vegetation index (a) FVC map; (b) C factor map).

Figure A2 .
Figure A2.Vegetation coverage and C cover factor based on the vegetation index (a) FVC map; (b) C factor map).

Table 1 .
The basic information of the survey sample area.

Table 2 .
The C factor of the survey sample area.

Table 2 .
The C factor of the survey sample area.
Note:The surface coverage ratio is the sum of the litter, gravel, and fecal cover.

Table 3 .
The correlation between the remote sensing interpretation and measured vegetation coverage.

Table 4 .
The paired-samples Student t-test.

Table 5 .
Accuracy of land use interpretation in the study area.

Table A1 .
The Pearson Relationship between Different Geographical Elements and FVCD.

Table A2 .
The EVI coefficient and measured vegetation coverage under different dryness grades.

Table A1 .
The Pearson Relationship between Different Geographical Elements and FVCD.

Table A2 .
The EVI coefficient and measured vegetation coverage under different dryness grades.