A Conversion Method to Determine Regional 2 Vegetation Cover Factor from Standard Plots based 3 on Large Sample Theory and TM Images : A Case 4 Study in the Eastern Farming-pasture Ecotone of 5 Northern China 6

1 State Key Laboratory of Earth Surface Processes and Resource Ecology, Beijing Normal University, Beijing 9 100875, China; ldg@zjnu.edu.cn 10 2 College of Geography and Environmental Sciences, Zhejiang Normal University, Zhejiang 321004, China; 11 3 Key Laboratory of Regional Geography, Beijing Normal University, Beijing 100875, China; 12 4 Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China; 13 5 College of Life Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao 14 066000, China; 15 6 Key Laboratory of Environmental Change and Natural Disaster, MOE, Beijing Normal University, Beijing 16 100875, China; 17 7 Academy of Disaster Reduction and Emergency Management, Beijing Normal University, Beijing 100875, 18 China 19 8 Remote Sensing Application & Environmental Disaster Research Center, Zhejiang Normal University, 20 Zhejiang 321004, China; 21 * Correspondence: spj@bnu.edu.cn; Tel.: +86-010-58808179 22

The soil erosion models have provided strong support for regional soil and water conservation planning.With the development of Geographic Information System (GIS) and Remote sensing (RS), the water erosion equation has been widely used in regional soil erosion simulations [1][2][3][4][5][6].Current water erosion models include the universal soil loss equation (USLE) [7], the revised universal soil loss equation (RUSLE) [8,9], the China soil loss equation [10] and so on.These models are being widely used to estimate soil loss in agriculture and environmental management.In the models, soil loss A=R×K×L×S×C×P, where R, K, L, S, C and P, respectively, are rainfall erosivity (R), soil erodibility (K), slope length (L), slope steepness (S), vegetation cover and management (C), and support practice factor (P).The R, K, L and S factors are controlled by the natural environment, therefore will not be changed by short-term soil and water conservation measures and activities.However, the C factor has the greatest change range among the factors of water erosion models, and the changes can differ by 2-3 orders of magnitude [11].According to Benkobi, et al. [12] and Biesemans, et al. [13], the vegetation cover factor together with slope steepness and length factors are most sensitive for soil loss, and have the most significant effect on the overall effectiveness of the USLE / RUSLE models [14].Therefore, calculating and improving the accuracy of regional C factors has become the key to improving regional soil erosion simulations.
Most of the regional C factors have been determined with remote sensing data through the following methods: 1) The direct assignment of land use/coverage [15].This method is simple, but the accuracy of the computed C factor is poor [11]; 2) The vegetation index estimated C factor method [16][17][18].This method can express the regional vegetation coverage more finely, but it is less comprehensive, and multiple layers and shallow roots are usually ignored; 3) The spectral mixing analysis (SMA) estimated C factor method [19,20].The SMA method considers the contributions of litter, gravel, etc.; it can fully reflect the C factor information independent of the measurements, and the soil background does not affect it.However, the SMA method cannot be used when vegetation and/or litter completely covers the surface, or when the data is affected by multiple scattering [11]; 4) Experimental approaches combined with geostatistical methods [18,21].With this method, the C factor can be interpolated using GIS and remote sensing images as auxiliary variables.Wang, et al. [18] improved mapping the C factor for the USLE by geostatistical methods with TM images.Based on multiple primary variables (canopy cover, ground cover and vegetation height), Gertner, et al. [21] mapped the C factor in regions from joint co-simulation.
All of these methods that apply remote sensing have the following problems: Firstly, the methods above used the C factor formula obtained from standard plot experiments, while such formula cannot be directly applied to an entire region.The C factor formula of croplands, woodlands and grasslands, is calculated by experiments on the erosion rate of croplands, woodlands and grasslands plots with bare land in the standard plots.However, the vegetation cover of a region is a complex combination of croplands, woodlands and grasslands, differing from the vegetation cover modelled in the standard plot.Secondly, the RSI method requires a large number of field samples, and could not be used in the fragmented landscape areas, such as farming-pasture ecotones of northern China.Not only is it difficult to ensure the accuracy of the spatial interpolation, but it is also time-consuming, laborious and difficult to promote.Thirdly, there are very few ways to properly consider the effects of surface coverage and canopy coverage on soil erosion in the modelled region.
Based on the results of previous regional C factor estimations, we analyzed the key factors of soil erosion and highlighted the key factors affecting the scale conversion [22].In order to improve the accuracy of regional C factor estimation, and obtain large-scale C factor map for macro-scale soil erosion simulation, we built a C factor estimation method based on large sample theory and Landsat Thematic Mapper (TM) images, and we show that our method solves the key problem of transitioning from a standard plot to km-sized grids, and hence accurately estimates regional C factors.

Study Area
The eastern section of the farming-pasture ecotone (hereinafter referred to as the study area (see

Basic Idea and Research Framework
More than 1000 coverage data-points were obtained from Landsat TM images of Global Land Survey in 2010 (GLS2010) and ground measurements, and according to large sample theory and information entropy theory, their distributions and proportions are similar.Large sample theory, also called asymptotic theory, is used to approximate the distribution of an estimator when the sample size n is large.This theory is extremely useful if the exact sampling distribution of the estimator is complicated or unknown [23].In addition, based on information-entropy theory [24], when the sample size is sufficiently large, the samples can be assumed approximate to a normal distribution.
According to the above two theories, the subsets of remote sensing data-points and field measurements data-points have a similar distribution.That is, the same km-sized grid, 2,000 land cover data-points obtained from field measurements have a similar distribution and similar proportion of croplands, woodlands, and grasslands as 1,000 land cover data-points determined from the Landsat TM images.The accuracy of the field measured vegetation cover method is better than that of the remote sensing method, therefore the field measurements can be used as the reference for verifying the remote sensing measurements.The research framework is shown in Figure 2. The C factor conversion from a "standard plot (single vegetation type)" to a "kilometer grid" (multiple vegetation types) contains six steps: 1) The vegetation cover of the sampling method was evaluated based on the high-resolution images of the survey sample area.2) We interpreted the Landsat TM images of the study area and derived the land use data based on the CART decision tree classification method.3) Then set up 21 survey sample areas, each with an area of 1 km 2 , and 2,000 canopy cover and surface cover data established.4) Based on the resolution of land use data, 1,125 canopy coverage data were obtained in each km-sized grid of the study area, and the canopy coverage (Cc) of the croplands, woodlands, and grasslands were calculated.5) The surface cover (Sc) factor was calculated based on the surface coverage surveyed in survey sample area, and was applied to the entire study area according to the landform type.6) According to the Cc and Sc factors, we calculated in the C factors of study area.Finally, we verified the regional C factor by the survey sample areas.

Materials
The study mainly used three types of data: measured data (survey sample data), remote sensing data (Landsat TM image data, MODIS data and high-resolution images) and basic geographical data.
We used the Landsat TM image data of the Global Land Survey in 2010 (GLS2010, https://glovis.usgs.gov/,30m×30m).The dates of MODIS data (https://urs.earthdata.nasa.gov/profile,1km×1km) are July 12, 2015 and July 27, 2016, near the survey dates.A total of about 42,000 coverage data-points were surveyed throughout the study area.Basic information of the survey sample areas is shown in Table 1.2015.With the eCognition software, we adopted the object-oriented high-precision remote sensing interpretation method to obtain high-resolution land use/cover data of the survey sample area.

Canopy coverage factor upscaling
The canopy coverage factor can be calculated from vegetation cover, which can be estimated using point intercept, line-point intercept, grid-point intercept, and ocular estimates [26,27].Scholars have compared the above-mentioned methods to calculate the vegetation cover, and found that when the samples are less than 20, the point-based methods are more precise than ocular estimates [26,28,29] and line-point intercepts [26].When the sample size is large enough, the estimate of linepoint intercept is the same as that of the point intercept and grid-point intercept, and can correctly reflect vegetation cover [26,27].In this paper, 21 survey sample areas were set up in the study area.
In each survey sample area we collected a total of 2,000 samples, which is far greater than the requirement of 20 samples [27].
The size of the sampling unit is also important for ground measurements.Duncan, et al. [30] analyzed the influence of sampling unit size on the remote sensing regression model based on the vegetation index and vegetation cover, and discussed the most suitable sampling unit size.In this paper, we chose the km-sized grid as the statistical unit.Then, via TM image interpretation, we produced more than 1,000 land use/land cover data-points in each km-sized grid.Next, the MODIS vegetation cover was directly measured from the vegetation index in the km-sized grid.The different methods used to estimate vegetation cover are shown in Figure 3.The key to estimating the C factor is to calculate the vegetation cover.There is no standard method to monitor vegetation cover, and current methods can be divided into surface measurement methods and remote sensing methods.The surface measurement method is limited by the workload and the measurement area size, and is unsuitable as an independent measurement method at large scale.The methods based on remote sensing to calculate the vegetation cover relies on a surface test for the regression calibration.It has a certain accuracy, but it is subject to the restrictions of promotion and application, especially in fragmented landscapes like farming-pasture ecotones.In terms of current technology, the accuracy of the surface measurement method is higher than that of the remote sensing method, and thus can be used as the basis for remote sensing measurement and data verification [31].

TM Image Land Use Interpretation
• Land use classification system The coverage and proportion of various land types on a km-sized grid can be calculated according to the high-resolution land use data of the study area, such as the global 30 m land use data produced by Chen, et al. [32] or Gong, et al. [33].However, both the above-mentioned classification data confuse grassland and bare land in the study area, and contains no secondary classification of grassland.Therefore, we adopted a secondary land use classification system for the study area based on the land use/cover classification system of Liu, et al. [34] to classify the TM images.The interpretation effects of some classification methods (such as neural network classification, objectoriented classification) can be very good for single scene image, but when it comes to a large area, the workload and classification efficiency must be considered.The CART algorithm is based on the decision tree classification method [35], and combines the advantages of the DEM, NDVI, as well as supervised and unsupervised classification, resulting in a much higher interpretation efficiency than other interpretation methods [35] for large area.So, we adopted the decision tree classification method of the CART algorithm to carry out land use classification (we interpreted more than 40 scene Landsat TM images from GLS2010).The main steps of decision tree classification based on the CART algorithm are: 1) Select the training areas.According to the secondary classification system, a certain number of training samples were selected in the multi-band image and used to obtain expert knowledge rules.The training area selection order was: i) water (river canals), ii) Built-up lands, industrial and mining lands, residential lands (urban land, rural residential areas, other construction land), and iii) croplands (irrigated land, dry land); 2) Establish a decision tree based on the training area.We used the extension tool RuleGen [35] to automatically generate decision tree rules, and used the ENVI Execute Existing Decision Tree tool to establish a decision tree for land use interpretation.

• Classification accuracy evaluation
In this paper, the study area is relatively large, it is difficult to carry out scientific random field verification, so this paper uses a large number of random distribution of single pixel verification point method based on Google image [36,37].We generated a total of 2,000 points randomly throughout the study area (50 points in each image).In order to get the real surface cover data of the randomly points, and considering the high accuracy of Google images, we first used Google Earth to distinguish the real surface cover data [36,37], and second, we moved the random verification point which is in edge of land cover type, to the center of land cover type, and avoid mixing pixels and ensure the accuracy of the true surface data.Finally, we used the true surface data of points randomly to verify the accuracy of the land use in interpretation.

Surface coverage factor upscaling
At regional scales, it is difficult to obtain surface cover information such as surface litter, crop  Many scholars have experimentally determined the standard plot C factor for different regions [4,11,38,39].Different formulas are given based on different vegetation covers, for example: Jin, et al.
[40] built the C factor formula of grassland that had a 1.9% coverage in the standard plot of the Huangfu River Basin.Jiang, et al. [41] built the C factor formula of grasslands and woodlands in a standard plot with coverages of more than 5% in Ansai County of Shaanxi Province.Liu [42] built the C factor formula of croplands, woodlands and grasslands in a standard plot in Beijing.
The surface cover is mainly composed of litter, crop stubble and gravel, which are more effective in reducing soil erosion than plant canopy cover.Based on this, the C-factor algorithm of Liu [42] considers both the surface cover and canopy cover of croplands, woodlands and grasslands, which makes the C factor result more objective.Therefore, we chose Liu's [42] standard plot vegetation cover factor algorithm to calculate the C factor, as shown in eqn. 1.
The canopy cover factor of cropland and grassland is calculated by eqn.2, and the canopy cover factor of woodland is calculated by eqn. 3. Liu [42] found that the Sc factors of croplands and grasslands can be combined into a single formula by eqn. 4. The Sc factor of woodlands is calculated by eqn. 5. (2) 1.029*exp( 0.0235 Where Cc and Cs are the canopy cover factor and the surface cover factor, respectively, Vc and VR are the canopy cover (%) and surface cover (%), respectively, and h is the canopy height (cm).As it is difficult to obtain surface information such as litter, crop stubble and gravel on the regional scale, in this paper the Sc factor was calculated by using the surface coverage factor upscale method.
• Regional C factor algorithm The regional vegetation cover factor needs to be calculated based on the fractional cropland, woodland and grassland coverage.Based on the C factor algorithm of the standard plot, the regional C factor algorithm is proposed, as shown in eqn.6.

( ) / ( )
Where C is the C factor of the km-sized grid, Vcrop is the cropland coverage (%), Vgrass is the grassland coverage (%), and Vforest is the forest coverage (%).Ccrop is the cropland vegetation cover factor, Cgrass is the grassland vegetation cover factor, and Cforest is the woodland vegetation cover factor.Based on land use data and surface cover data, the C factor of each survey sample area was calculated according to the C factor algorithm (Table 2).It can be seen that the C factor of the sample area that has a higher proportion of woodland was smallest.The C factor of the survey sample areas that has a higher proportion of grassland and cropland cover was medium, and the C factor of the survey sample areas that has a higher proportion of unused land (sand and bare land) was highest.

Fractional Vegetation Cover (FVC) in the study area
Based on the 30 m resolution land use map in 2010 from GLS2010, we calculated the coverage proportion of cropland, woodland and grassland as well as the vegetation cover of the study area, as shown in Figure 5.According to the vegetation cover of the study area in 2010 based on GLS2010, the vegetation cover (including the ratio of cropland, woodland and grassland) was extracted in each survey sample area and verified by the measured vegetation coverage, as shown in Figure 6.The correlation vegetation coverage between the remote sensing and measured is shown in Table 3.The paired-samples Student's t test was used to detect the statistical significance level of the remote sensing interpretation vegetation cover.The results are shown in Table 4.
Table 3 The correlation between the remote sensing interpretation and measured vegetation coverage  It can be seen from Figure 6, Table 3 and Table 4  0.12, 2-tail significance p < 0.05), where R 2 is 0.58.Simultaneously, there is a statistically significant linear relationship present between the two groups of data.The two groups of data are significantly correlated (significance p <0.05), and the correlation is 0.76.Furthermore, the two groups of data are generally distributed near the 1:1 line, and the regression coefficient is 0.93.To sum up, the results show that the method in this study can help accurately obtain the coverage ratio of croplands, woodlands and grasslands in the km-sized grid.

C factor in the Study Area
Based on the C factor conversion method, and using the proportional coverage of croplands, woodlands and grasslands in the km-sized grid, the C factor in the study area (2010) was calculated, as shown in Figure 7.It can be observed in Figure 8 that the C factor of the survey sample area calculated via remote sensing is consistent with the one based on measured coverage.The two data are distributed near the 1:1 line, with R 2 = 0.36 and correlation coefficient = 0.7.It can be low R 2 if we do not predict and 1:1 line is able to judge the result of measured and interpretation.Because the resolution or interval is different among the TM image and the field measured, so we get a low R 2 .This shows that the C factor measured in this study can reflect the vegetation cover in the study area and can be used to calculate the C factor in the soil erosion equation.

Relationship between line-point estimated vegetation coverage and intercept vegetation coverage in the km-sized grid
With the eCognition software package, we used the object-oriented classification method to interpret the high-precision images (Gaofen-2 satellite images (GF-2)) of the sample area, and obtain the coverage of various land cover types.The cropland, grassland and woodland coverage obtained from high-precision images was compared with the line-point intercept estimated data, as shown in Figure 9.
Figure 9 The correlation between the measured Fractional Vegetation Cover (FVC) from the line-point intercept method and the high-resolution image interpretation.
It can be seen from Figure 9 that the correlation between the FVC from the line-point intercept method and the high-resolution image interpretation is very high.The two groups of data are mainly distributed in the vicinity of the 1:1 line.The nonparametric test of the relevant sample Wilcoxon, shows that the two groups of data are significantly consistent at a significance level of p<0.05.The value of R 2 is 0.72 and the slope is 1.40, indicating that the cropland, grassland and woodland coverage obtained from the line-point intercept method is larger than that from image interpretation, but can still be used to reflect the proportion of cropland, grassland and woodland in the study area, with an interpretation rate of 72.29%.This result is basically consistent with the findings of previous studies [26,27], indicating that the line-point intercept method can effectively reflect the coverage of cropland, grassland and woodland in the survey sample area.

Comparison of our Method and the C factor Algorithm Based on Remote Sensing Vegetation Index
We compared the C factor derived from the vegetation index (see Appendix A), with the C factor determined by our method, and the difference of two C factors is shown in Figure 10.It can be seen from Figure 7 and Figure 10 that the two C factors are basically consistent (Due to the system errors of two sets data, we define that the difference between −0.1 and 0.1, which is 10% of the difference, is basically consistent.) in 22% of the study area.Compared with C factor derived from image interpretation, the C factor based on the vegetation index is higher (up to 0.3) for western sandy lands, while lower (up to 0.3) for eastern croplands in our study area.The main reason for the differences is that the C factor formula of the standard plot is different.The C factor algorithm based on the vegetation index assumes only one vegetation type in the km-sized grid, and calculates the coverage of the vegetation to obtain the C factor.Therefore, in the sandy area where the complete distribution of sand is assumed, the C factor is overestimated, while in cropland areas where the complete distribution of cropland is assumed, the C factor is underestimated.As such, the C factor based on the remote sensing data is more precise than that based on the vegetation index.

The empirical parameters of the C factor need to be further confirmed
The water erosion equation used in this paper is a small-scale method, and although many previous studies have applied it on regional scales, many of its parameters are empirical, and their rationality has not been confirmed: 1) The measured parameters of C factor are for selected field survey time and area.The C factor in this paper was obtained from remote sensing and field survey data.The summer months (mainly July and August) in the study area have the strongest water erosion throughout the year, and there was almost no water erosion in other months.Besides, the vegetation coverage is higher in summer.The summer C factor determined the accuracy of water erosion simulation.Therefore, the field investigation time and TM image acquisition time of this study were from July to August.2) Empirical parameters of the C factor obtained from remote sensing data.The coverage of cropland, woodland and grassland was obtained by interpreting the remote sensing images and then calculating the proportion of land use/cover.The grasslands can be divided into high coverage, medium coverage and low coverage grassland [34].Among them, the high coverage grassland has a coverage of over 50%, assumed as 80%; the medium coverage grassland has a coverage of 20%-50%, assumed as 50%; while the low coverage grassland has a coverage of 5%-20% of the grass, assumed as 20%.The croplands are classified as irrigated and dry lands.The woodlands are divided into woodlands and shrubs and the coverages of cropland and woodland vegetation are assumed to be 100%.Swamps are low-lying wetlands, which are different from other unused lands (bare land, quicksand, saline and alkaline land), while similar to high coverage grassland, with an assumed coverage of 80%.These vegetation coverage parameters are empirical to some degree and should be further confirmed in the further study.

Differences between the remote sensing image coverage and the measured coverage
Based on high-precision field survey and remote sensing image interpretation, this study constructed a km-sized grid C factor algorithm.Although the land cover data obtained from TM image interpretation had already the highest resolution on a regional scale, the coverage was still different from the measured coverage because TM images contained mixed pixels.In addition, based on TM images we can only measure the canopy coverage, rather than surface coverage that has impacts on soil erosion [42].Furthermore, currently the regional surface coverage can only be determined based on the surface coverage of representative sites through upscaling.5.It can be seen from Table 5 that the interpretation accuracy is 72.25% and the Kappa coefficient is 0.62.Landis and Koch [43] point out that a Kappa coefficient larger than 0.6 indicates good accuracy, thus the interpretation accuracy of this paper is good.Many papers has proved that the remote sensing interpretation of large region of low accuracy [36,37].For example, the accuracy of many automatic classification methods, is basic less than 65% [37], and this paper is study the fragmented landscape areas for the ecological transition zone, so 72.25% accuracy has been enough high.
There are several problems in the interpretation of land use in the study area: 1) The problem of mixed pixel, and different spatial resolutions among Landsat TM and Google earth decreases the interpretation accuracy.2) Problems arising from different types of lands having the same type of spectrum.The spectra of unused lands and construction lands are relatively close, and on large regional scale the proportion of construction lands is relatively low, so erosion is rarely influenced.
Unused lands such as bareland will cause erosion.However, this paper studied the vegetation cover(C) factor which is related to vegetation (cropland, grassland and woodland), so the misclassification of construction land and bareland, though resulted in low accuracy, did not affect the vegetation coverage calculation.So the error can be ignored since this paper focuses on the proportion of vegetation (i.e.croplands, woodlands and grasslands).
3) The mosaic problem.Due to the phase differences of different TM images, there exist edge snap problems between different TM images.These problems reduced the accuracy of the TM image interpretation, so that the results had some uncertainty in the edge areas.However, the overall interpretation accuracy of cropland, woodland and grassland can meet the requirement of vegetation coverage and C factor calculation.

Conclusions
The vegetation cover (C) factor is one of the most influential factors in the soil erosion model.
The C factor derived from a standard plot cannot be directly used on regional scale.Based on remote sensing data and field investigations, we designed a C factor conversion method from the standard plot to a km-sized grid based on large sample theory.It can be concluded that: 1) Compared with existing C factor algorithms, our algorithm improves the applicable range of the C factor formula of the standard plot, and can be used to simulate soil erosion in large areas.
2) The vegetation coverage obtained by remote sensing interpretation is significantly consistent (paired samples t-test, t = −0.03,df = 0.12, 2-tail significance p < 0.05) and significantly correlated with the measured vegetation coverage.Meanwhile, the line-point intercept method can be used to effectively obtain the vegetation coverage of cropland, woodland and grassland in the survey sample area (p < 0.05).
3) The C factor of the study area is smaller in the middle, southern and northern regions, and larger in the eastern and western sections.The main reason for this is the distribution of woodlands, the Hunshandake and Horqin sandy lands and human cultivation that affects the valleys; 4) The C factor conversion method based on large sample theory is better than the one based on vegetation indices.
In this paper, a method for estimating the regional C factor was proposed by combining the interpretation of remote sensing data and data obtained from field investigations.Our method is limited by the resolution of the remote sensing data, and the accuracy of the TM image interpretation.
Thus, in the future, we aim to develop our method by: 1) further confirming the empirical parameters of the C factor; 2) establishing a database of C factors in different seasons; 3) studying the differences between remote sensing and high precision measured vegetation coverage.
founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results".
Appendix A: The C Factor Algorithm Based on the Remote Sensing Vegetation Index

Calculate vegetation coverage based on the Vegetation index
In order to express the vegetation coverage of the study area, the MODIS EVI vegetation index needs to be partitioned according to the measured vegetation coverage to carry out the regression modeling.These regression models are generally applicable to the area where the relational expression is constructed.Therefore, the difference between the EVI vegetation index of MODIS and the measured vegetation coverage (hereinafter referred to as FVCD) was used to divide the vegetation index and the measured coverage model.Based on the natural geographical conditions and factors, we analyzed the relationship between the dryness grade, landform type, vegetation type and sand land, and used the Pearson correlation of FVCD, as shown in Table A1.It can be seen from Table A1 that the degree of dryness has the highest correlation, where the Pearson's correlation coefficient of FVCD is 0.386, which is significant at the p<0.05 level.Therefore, we have established a regression relationship between the vegetation index and the measured vegetation coverage under different dryness.The study area is an arid and semi-arid region.The dryness can be subdivided into dryness 1 (0.2-0.4), dryness 2 (0.4-0.44) and dryness 3 (0.44-0.5).
According to the degree of dryness, the relationship between vegetation index and measured vegetation coverage was established, as shown in Table A2 and Figure A1. to the dryness classification and its significance.When the dryness is grade 1, the significance level is 0.003 (p<0.05); when the dryness is grade 2, the significance level is 0.015 (p<0.05); when the dryness is grade 3, the significance level is 0.019 (p<0.05).

Figure 1 )
Figure 1) includes 84 counties (banners) in the Inner Mongolia Autonomous Region, Liaoning Province, Heilongjiang Province, Jilin Province and Hebei Province, and has an area of 4.402 × 10 5 km 2 .In 2015-2016, our research group carried out two field trips to determine the land use and soil erosion in the study area, and the length of survey route was nearly 7000 km.According to the typical geographic unit and landform type, we set up 21 survey sample areas.From July 19 to July 25, 2015, the western part of the study area was inspected, where 10 inspection points were created over a route of 2930 km.From August 7 to August 14, 2016, the eastern part of the study area was inspected, where 11 inspection points were created over a route of more than 4,000 km.

Figure 1
Figure 1 Study area and two field trips routes

Figure 2
Figure2The C factor conversion method from a "standard plot (single vegetation type)" to a "kmsized grid" (multiple vegetation types)

Figure 3
Figure 3 The different methods used to estimate vegetation cover in the survey sample area: a: The line-point intercept method; b: from high-resolution satellite images; c: from TM images; d: from the MODIS vegetation index.
Decision tree classification based on the CART algorithm Prior to classification, the Landsat TM and Landsat Enhanced Thematic Mapper (ETM) data in Global Land Survey 2010 (GLS2010) underwent cloud interpretation and geometric correction and were combined into multi-band images composed of blue, green, red, near infrared, short-wave infrared, medium-wave infrared, long-wave radiation, together with NDVI, ISODATA, DEM and Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 21 September 2017 doi:10.20944/preprints201709.0098.v1other bands.The NDVI data was generated from TM images and the DEM data from global ASTGTM data.The ISODATA data is from unsupervised classification of the TM images, and the minimum class number of unsupervised classification is 10 and the maximum is 25.
stubble and gravel required by soil erosion models.To overcome this limitation, we first set up 21 survey sample areas in the typical landform unit, measured the ratio of surface litter, crop stubble and gravel in the sample area, and finally interpolated according to land surface types to obtain the surface coverage factor in the study area.Secondly, the surface of the study area was classified according to the geomorphic map (1:4 million) of China and its adjacent areas.Thirdly, according to the natural geographic features of the study area, we found the surface of the study area can be classified by the geomorphic unit that can get the surface cover information to calculate the vegetation cover.Lastly, the surface coverage (Sc) of the study area was calculated based on the field coverage of the survey sample area and the structure geomorphic unit of the study area, as shown in Figure4.

Figure 4
Figure 4 The coverage of surface matter in the study area

Figure 5 Figure 6
Figure 5 Vegetation coverage in the study area (2010) that the vegetation coverage obtained by remote sensing interpretation is significantly consistent and significantly correlated with the measured vegetation coverage.The vegetation coverage obtained by remote sensing interpretation is significantly consistent with the measured vegetation coverage (paired samples t-test, t = −0.03,df =

Figure 7
Figure 7 The C factor in study area (2010).

Figure 8
Figure 8 Verifying the C factor based on remote sensing interpretation

Figure 10
Figure 10 The difference of C factors based on the vegetation index and remote sensing data.

5 . 3 . 3
The accuracy of the TM image interpretationIn this study, we randomly generated points in the TM images to verify the accuracy of our land use interpretation.A total of 2,000 verification points (50 in each image) were generated in the study area, where each verification point was a single pixel.Based on the high-precision Google Earth images, we determined the land use type of each verification point to obtain the verification data.In order to ensure the accuracy of the verification data and to reduce the mixed pixel problem caused by resolution difference and time inconsistence of the verification data, we moved the random points at the edge of the land use units to the center of land use type.Using the randomly generated 2,000 verification points, we tested the interpreted 2010 land use data.The results are shown in Table

Preprints
Figure A1 Relationship between the EVI and measured vegetation coverage under different dryness

Figure
Figure A2 Vegetation coverage and C cover factor based on the vegetation index (a: FVC map; b: C factor map).

Table 1
The basic information of the survey sample area

Table 2
The C factor of the survey sample area

Table 4 The
paired-samples Student t Test

Table 5
Accuracy of land use interpretation in the study area

Table A1 The
Pearson Relationship between Different Geographical Elements and FVCD

Table A2 The
EVI coefficient and measured vegetation coverage under different dryness grades.The dependent variable is the measured vegetation coverageTable A2 expresses the regression equation of the EVI and measured vegetation coverage relative