Mapping Irrigated Areas of Northeast China in Comparison to Natural Vegetation

Accurate information about the location and extent of irrigation is fundamental to many aspects of food security and water resource management. This study develops a new method for identifying irrigation in northeastern China by comparing canopy moisture between the cropland and adjacent natural ecosystems (i.e., forests). This method is based on two basic assumptions, which we validated using field survey data. First, the canopy moisture of irrigated cropland, indicated by a satellite-based land surface water index (LSWI), is higher than that of the adjacent forest. Second, the difference in LSWI between irrigation cropland and forest is larger in arid regions than in humid regions. Based on the field survey and statistical dataset, our method performed well in indicating spatial variations of irrigated areas. Results from this study suggest that our method is a promising tool for mapping irrigated areas, as it is a general and repeatable method that does not rely on training samples and can be applied to other regions.


Introduction
Irrigation is one of the most important cropland management measures and plays an important role in determining crop yield [1,2]. Numerous studies show that yields are generally higher under irrigated conditions as compared to rainfed fields [3,4]. For example, average crop yields of irrigated farms exceed the corresponding yields of dryland farms by 15% for soybeans, 30% for maize, 99% for barley, and 118% for wheat in the United States [5]. Irrigation is a prerequisite for crop production and serves to increase yields and attenuate the effects of droughts in arid regions [6]. Although only 18% of cultivated area is irrigated globally [7], 40% of global food production comes from irrigated agriculture [8]. Irrigation information is one of the most important forms of data for creating a crop growth model for simulating regional crop yield [9]. In addition, cropland irrigation consumes 70% of all water withdrawn worldwide from rivers and aquifers [10]; in developing countries, more than 90% of water withdrawal is for cropland irrigation [7]. Water resources management and allocation need to adequately consider irrigation demand. Therefore, spatial and temporal information about irrigation is very important for monitoring cropland yields and water resources [11].
Satellite-based methods offer strong potential for routine monitoring of irrigation, as satellite data can acquire continuous temporal and spatial information over vegetated surfaces [12]. A satellite-based method refers to a classification method that uses satellite data to automatically classify irrigated areas according to the spectral differences between irrigated and non-irrigated pixels. Several studies have attempted to show the extent and density of irrigated land at different scales [12][13][14][15][16][17][18]. Automated classification approaches are commonly used to identifying irrigated areas, including unsupervised classification [19], single-or multi-stage supervised classification [20], and decision tree [21,22] methods, as well as supervised learning models such as random forests or support vector machines [23,24]. All these methods exploit spectral differences-in terms of reflectance, spectral indices, or other derived features-between irrigated areas and other land cover types [25].
Satellite-based methods always require prior knowledge and ground observations; their success is highly dependent on the quantity and quality of training samples, which are costly and time-consuming to obtain through field surveys [11]. Classification results may be specific and lack repeatability, because differences in classifiers or training samples may significantly alter classification results. This drawback may limit the large-scale application of these methods [26], therefore more objective and convenient methods of irrigation information extraction must be developed [13,14].
A second class of methods for identifying irrigation is mostly based on the analysis of spectral index time series-these techniques aim to identify the specific temporal dynamics of irrigated crops. Liu et al. [27] indicated a strong regression relationship between the coefficient variation of the land surface water index (LSWI) and the planting fraction of paddy rice fields compared with other upland crops, due to the presence of flooding water during the growing season. An analysis of a SPOT-VEGETATION normalized difference vegetation index (SPOT-VGT NDVI) time series found multiple NDVI peaks in irrigated fields during the growing season, in contrast with the single peak of rainfed fields, which can be used for identifying the irrigation [28]. Based on fusing a 480 m time-series Moderate Resolution Imaging Spectrometer (MODIS) and Landsat imagery to a 30 m spatial resolution, Chen et al. [29] proposed a new approach that applied a Gaussian process and linear regression models to detect irrigation attributes, including irrigation extent, frequency, and timing. Dheeravath et al. [15] also introduced an approach in which they segmented a three year MODIS reflectance time series acquired at eight day intervals prior to a classification step performed using spectral matching techniques. Ozdogan and Gutman [13] used temporal analysis of the NDVI signal as a proxy for detecting crop type, and the data clearly revealed the differences between irrigated and non-irrigated crops. In addition, previous studies also indicated larger differences in canopy moisture between irrigated croplands and natural vegetation in arid regions than in humid regions. The contribution of irrigation to increased crop yield is more effective in arid regions than in humid regions [3,4].
Based on the above methods, several global and regional irrigation products have been generated. The first global dataset of irrigated areas was produced by the Food and Agriculture Organization of the United Nations and the University of Frankfurt (FAO/UF), published in 1999 [30]. It showed the areal fraction of 0.5 arc degree by 0.5 arc degree grid cells that was equipped for irrigation in the 1990s. Since then, the FAO/UF datasets have been updated several times and the map resolution has increased to 5 arc minutes by 5 arc minutes [31]. A dedicated product, the Global Irrigated Area Map (GIAM), was produced by the International Water Management Institute (IWMI) using multi-source data (AVHRR, SPOT-VGT, JERS-1 SAR, rainfall, temperature, and elevation) in an approach applying segmentation, unsupervised classification, spectral matching, decision tree, and spatial modeling steps [32]. The GlobCover product, derived from unsupervised clustering of 300 m Medium Resolution Imaging Spectrometer (MERIS) data and expert labeling rules, provided irrigation classes among other land cover types [33]. In addition, several regional irrigation maps have also been generated. For example, Zhu et al. [34] developed three irrigation potential indices and a spatial allocation model, then downscaled the census data on irrigation from administrative units to individual pixels and produced a new irrigation map of China around the year 2000.
In this study, we aim to develop a coupled method, combining the satellite-based method and spectral index time series analysis for identifying irrigated areas. This coupled method is developed based on analyzing spectral index time series from MODIS data with 500 m spatial resolution. The algorithm was to identify irrigated regions by comparing the satellite-based moisture index of cropland with that of the adjacent natural vegetation (i.e., forests). The coupled method will be validated using observation sites and statistical data at county, prefecture, and province levels. The area and spatial distribution of irrigated areas in northeastern China will be quantified.

Study Area'
The study area is a part of Northeast China, one of the most important commodity grain production bases in China, which comprises three provinces: Heilongjiang, Jilin, and Liaoning ( Figure 1). According to data from the National Bureau of Statistics of China, in 2017, the planting area of maize in the study area accounted for 64% of the total cultivated land in northeastern China, the planting area of rice accounted for 28%, the area of soybean, sorghum, potato, and wheat accounted for 6%, and the other accounted for 2% [35]. In this region, the period of crop growth is short, i.e., just 2-3 months. The study area contained 13.51% of the irrigated areas in China and produced 19.27% of the food production [35]. More than 31.77% of northeastern China's crop land is irrigated [36]. There are three main sources of irrigation water: rivers, lakes, and groundwater. One-third of this region is flood plains and the rest mainly comprises hills and mountains. The Sanjiang, Songnen, and Liaohe plains lie between the Lesser Khingan Mountains and the Changbai Mountains. The study area has a temperate continental monsoon climate. The average annual precipitation varies widely-from 480 mm in the western plains to 900 mm along the eastern mountains. The annual accumulated air temperature greater than 10 • C ranges from 1600 to 3600 • days and the number of frost-free days varies between 140 days and 170 days. There is only a single season for crops in this region, due to the limitations imposed by temperature. In this study, we conducted the field survey from 8 July to 21 July, 2016 in Northeast China. Figure 1 shows the location of our validation samples in Northeast China.

Methodology
The principle of our algorithm was to identify irrigated regions by comparing the satellite-based moisture index of cropland with adjacent natural vegetation (i.e., forests). The algorithm was based on two fundamental assumptions. First, we assumed that the moisture condition of irrigated croplands is larger than that of the adjacent forests [13,37]. If the distances between cropland and adjacent forests are small enough, the precipitation in those areas can be considered the same. Therefore, the moisture at irrigated croplands should be higher than forests and non-irrigated cropland. So, if we know the moisture difference between irrigated croplands and forests, the irrigated croplands can be identified. However, the moisture difference is not same on the regional scale. Therefore, we made the second assumption that the differences of moisture between irrigated croplands and forests are larger in arid regions.
In order to validate the first assumption, we used the land surface water index (LSWI) to indicate the land surface moisture conditions. LSWI was calculated as: where ρ nir and ρ swir represent red and short wave infrared reflectances, respectively. LSWI can comprehensively represent the moisture conditions of plant and soil [38]. We calculated LSWI values (LSWI C ) during the peak of growing season from the 201st to the 241st day of the year at all 123 investigated cropland sites (see Section 3.3 Method validation), including 47 irrigated and 76 non-irrigated sites, and compared LSWI C with those of adjacent forests. In general, LSWI strongly depends on the vegetation index (NDVI) and increases with NDVI [39,40]. Therefore, we compared the LSWI of cropland with that of forests where there were the same NDVI values, to exclude the impact of NDVI [38]. Around a given cropland site, we searched the nearest 30 forest pixels where their mean NDVI (NDVI F ) value is close to the NDVI of croplands (NDVI C ) (i.e., |NDVI C -NDVI F | < 0.05 × NDVI C ). Then, we calculated the mean LSWI value of adjacent forests (LSWI F ) and compared this with the LSWI C at all investigated croplands to examine the first assumption.
To examine the second assumption, at each prefecture, we collected and sorted all pixels by descending the differences between LSWI C and LSWI F (LSWI Diff =LSWI C -LSWI F ). The pixels at the front of the queue had larger LSWI Diff values, and therefore a greater probability of being irrigated. Statistical data on irrigation areas at prefecture-level were used to determine the LSWI Diff thresholds. Prefecture-level statistical data were used to determine the number (N) of pixels with the largest LSWI Diff as irrigation at a given prefecture. The LSWI Diff value of the Nth is the threshold (LSWI Diff0 ) for differentiating the irrigation and non-irrigation. We selected 18 prefectures (half of the area prefectures) to examine the relationship between LSWI Diff0 and mean annual precipitation (MAP) proposed by the second assumption. In addition, we used the remaining 17 prefectures to examine this relationship. The results from examining the two assumptions are included in the Results section. If this assumption that there is strong correlation between LSWI Diff0 and MAP is verified, we can calculate LSWI Diff0 for each prefecture based on this correlation. Once the LSWI Diff0 is determined, then we can identify the cropland pixels where LSWI Diff value is larger than LSWI Diff0 as irrigation.
Based on these two assumptions, we developed the following process to identify the irrigated croplands.
Step (1) The mean NDVI (NDVI C ) and LSWI (LSWI C ) values were calculated for each cropland pixel during the peak of the growing season, from the 201st to the 241st day.
Step (2) The nearest 30 forest pixels for each cropland pixel were selected by comparing NDVI C and mean NDVI values of forest (NDVI F ) as described above.
Step (3) The LSWI difference (LSWI DIff ) between the cropland pixel (LSWI C ) and the adjacent forest pixels (LSWI F ) was calculated for each cropland pixel.
Step (4) Based on the relationship between LSWI Diff0 and MAP at the prefectures as described above, LSWI Diff0 was calculated at all prefectures using MAP.
Step (5) At each prefecture, all pixels were sorted by descending LSWI Diff , and the pixels with LSWI Diff >LSWI Diff0 were identified as irrigation pixels.

MODIS Data and Precipitation Data Preprocessing
The eight day surface reflectance product (MOD09A1) of MODIS was used to calculate NDVI and LSWI. A method based on the Savitzky-Golay filter was used to smooth out noise in the time series of the vegetation indexes, especially noise related to cloud contamination and atmospheric variability [41]. The land cover type product (MCD12Q1) was used to search forest pixels. The land cover types were determined using the MCD12Q1 product reference to the International Geosphere-Biosphere Programme's global vegetation classification system.
We used the gridded daily precipitation dataset provided by Yuan et al. (2015) [42]. This dataset was based on the meteorological observations from 735 stations from the National Climate Center of China Meteorological Administration, and was interpolated to a gridded climate dataset with a spatial resolution of 10 × 10 km using the thin plate smoothing splines [42]. In addition, we re-sampled the precipitation data to the same spatial resolution of MODIS data.

Method Validation
In this study, the 123 validation samples were obtained from three sources. The first source was the crop growth and soil moisture dataset provided by the China Meteorological Data Sharing Service System. These sites were used in our study, for a total of 70 non-irrigated and 10 irrigated samples. The other source was our field surveys, which we conducted from 8 July 2016 to 21 July 2016 in Northeast China-6 non-irrigated and 10 irrigated samples in all. The third source was the irrigation information on large irrigated areas provided by the China Irrigation and Drainage Development Center and collected by Google Earth. At all sampling sites, a survey was carried out and recorded on farmland irrigation times, well depth, pesticide and fertilizer use, and yield. In addition, we compared the estimated irrigation areas derived by this method with statistical data at the county, prefecture, and province levels. These data were obtained from the statistical yearbooks for Heilongjiang, Jilin, and Liaoning Provinces [43][44][45]. We compared our method with three widely used irrigation datasets: IWMI, FAO/UF, and Zhu datasets. We resampled the IWMI datasets and our new irrigation map to the same spatial resolution as the FAO/UF datasets because of the resolution differences among the three irrigation datasets.

Results
First, we examined the assumption that the LSWI values of irrigated cropland were larger than those of adjacent forests. Hence, we compared the LSWI values of the investigated sites with those of adjacent forests that have the same NDVI values as the cropland. A significant LSWI difference was found between the irrigated and non-irrigated croplands compared to the adjacent forest (ANOVA test, P < 0.01) (Figure 2a).
We examined the second assumption proposed in the previous section based on irrigation statistics and mean annual precipitation (MAP). We found a significant negative linear relationship between MAP and the LSWI Diff0 thresholds (difference in LSWI between irrigated cropland and forests) (P < 0.01) (Figure 3). The LSWI Diff0 thresholds decreased with increasing MAP, which indicated that larger a LSWI difference occurred in dry regions, affirming our second assumption. Therefore, we used regression to estimate the appropriate thresholds (LSWI Diff0 ) at all prefectures using MAP.  The statistical data validated our method at the county, prefecture, and province levels, showing that the new method could accurately estimate irrigation areas compared to the statistical data over three scales (i.e., at the county, prefecture, and provincial levels) (Figure 4 and Table 1). The correlation coefficient (R 2 values) between the identified irrigation areas and the statistical areas was 0.77 ( Figure 4a) and 0.40 (Figure 4b) at the prefecture and county levels, respectively. Figure 4a provided the validation results of the second assumption. The identified irrigation areas were highly consistent with the statistical areas over the 18 prefecture prefectures used to develop the relationship between LSWI Diff0 and MAP, as shown Figure 3a. In addition, a good consistence between identified areas and statistical areas also was found over the remaining 17 prefecture prefectures (Figure 4a). At the province level, the relative errors of the identified irrigation areas were −26.41%, 26.34%, and 4.10% in Heilongjiang, Jilin, and Liaoning provinces, respectively (Table 1).  Based on the new method, we generated a map of irrigated fields in Northeast China in 2016 ( Figure 5). Irrigated fields were distributed mainly on three alluvial plains (Songnen, Liaohe, and Sanjiang plains) and valleys along three rivers (the Songhua, Wusuli, and Heilongjiang Rivers). Due to the northward expansion of irrigated cropland over the past decade, Heilongjiang Province accounted for nearly 52.11% of the overall irrigation area in the three provinces, which is consistent with 2016 statistical data (Table 1) [43].

Discussion
This study developed a new satellite-based method for identifying irrigation areas by comparing the satellite-based moisture index between croplands and adjacent forests. The results showed similar LSWI values for non-irrigated croplands and adjacent forests, but higher LSWI values for irrigated croplands (Figure 2b). Based on 123 surveyed sites, we compared the performance of this method with three other irrigation datasets: two global irrigation datasets (FAO/UF and IWMI) and an irrigation map of China derived by Zhu et al. [34] (see method). In general, the overall accuracy of our method was 77.24% and its kappa coefficient was 0.49; both measures were superior to the 47.15%, 62.60%, and 62.60% overall accuracies and 0.07, 0.23, and 0.11 kappa coefficients of the FAO/UF, IWMI, and Zhu datasets ( Figure 6; Table 3). In general, the method identified correctly 26 irrigated samples out of total 47 irrigated samples, and identified correctly 76.67% non-irrigation samples (Figure 7; Table 3).    Figure 3 showed the significant relationship between MAP and the LSWI Diff0 thresholds, which was used to identify the irrigation areas. Although there are some uncertainties, the relationship is statistically significant and indicates the reliable dependence of the threshold of LSWI Diff0 on MAP. Based on this relationship, we can identify the irrigation areas and avoid the systematic estimation errors compared to the constant threshold of LSWI Diff0 . It is true that the estimation of irrigated areas performs well at the prefecture level (R 2 = 0.77), rather than at the county level (R 2 =0.40) in Figure 4. Basically, there are large challenges in identifying the irrigation areas on a small scale (county) compared to on a large scale (prefecture). Many local environmental conditions will reduce the estimate accuracy [3,13,46]. For example, there are large estimation errors over the eastern regions of Jilin province because of the mountainous areas. Optical remotely sensed images in mountainous areas are subject to radiometric distortions induced by topographic effects, which need to be corrected before quantitative applications [47]. In addition, the moisture condition substantially changes with the topography and slope aspect, and precipitation is not the only factor which determines the moisture condition of croplands and the adjacent forests [48]. Therefore, further works need to improve the performance at mountainous areas.
The comparison suggests that the new irrigation map supports statistical areas at the prefecture-level and overall Northeast China. However, this study has underestimated Heilongjiang Province by 26.41% (Table 1). Major differences between irrigation patterns derived by the new map and the statistical data occurred in the Heilongjiang Farms & Land Reclamation Administration, which are mainly distributed in Heihe, Shuangyashan, and Jixi. Because the irrigation area data from the Heilongjiang Farms & Land Reclamation Administration are separately counted, they were not classified to prefectures, so statistics from this area were missing.
There are several limitations of this method which impacted its performance. First, this method identifies the irrigated croplands by comparing canopy moisture between cropland and adjacent forests. Therefore, it is important for the forests to be within a close enough distance for the forests to have the same precipitation as the croplands. Figure 8 shows that more than 50% of the croplands had forests within a distance of less than 25 km, which indicates that most croplands satisfy these conditions. However, in Songyuan and Baicheng of the Songnen Plain, the forests are relatively far away (>100km), and the precipitation conditions of croplands and its reference forests probably have large differences. Although this is only a small proportion (12%) [43][44][45], this will result in errors in the estimated irrigation areas. Second, the method did not consider the differences of water storage between croplands and forests. In general, compared to the croplands, the adjacent forests green-up earlier [49,50]. Therefore, the forests can store more precipitation in the soil [51,52], which may result in the differences of soil moisture between the croplands and forests. In addition, previous studies showed that the capacity of plant transpiration and canopies to intercept water differs among ecosystems of the same leaf area index [53,54]. For example, at the same leaf area index (~2 m 2 m −2 ), conifer forests typically store 20% of the precipitation, whereas the deciduous plants store 10% [54]. All crops in this study area are broadleaf, but the trees are broadleaf and needleleaf. The differences in the plant transpiration between croplands and forests may result in differences in soil moisture.
Third, this method relies on the regression relationship between LSWI Diff0 and MAP to identify irrigation areas (see method). In this study, we used regression coefficients, shown in Figure 3, over Northeast China, and the coefficients may change due to different climate backgrounds. Hence, if this method is applied to other regions, it is necessary to redefine the relationship between LSWI Diff0 and MAP.
Our method does not depend on field investigation and will be easy to popularize on a larger scale. The identification of irrigation pixels is consistent with actual irrigation samples in field surveys and statistical areas ( Table 3). It is possible to supervise the distribution and the area of irrigation automatically using satellite-based datasets, avoiding visual interpretation [55][56][57] of high-resolution images [58] and much fieldwork, as well as substantially improving the method's repeatability and applicability.

Conclusions
In this study, we developed a new method for identifying irrigation in cropland in northeastern China, based on the relationship between an area's MAP and the LSWI difference between cropland and adjacent natural landscape pixels. We validated that the new method can be used to accurately map irrigation areas. In addition, we compared our method with three existing methods using site-based survey and county-to province-level statistical data. The results suggest that our method agrees with the patterns of irrigation distribution compared with the three current methods. Our method is a promising and effective tool for mapping irrigated regions. Moreover, it will be beneficial for studying water resource management and food security.