IrrMapper: A Machine Learning Approach for High Resolution Mapping of Irrigated Agriculture Across the Western U.S.

: High frequency and spatially explicit irrigated land maps are important for understanding the patterns and impacts of consumptive water use by agriculture. We built annual, 30 m resolution irrigation maps using Google Earth Engine for the years 1986–2018 for 11 western states within the conterminous U.S. Our map classiﬁes lands into four classes: irrigated agriculture, dryland agriculture, uncultivated land, and wetlands. We built an extensive geospatial database of land cover from each class, including over 50,000 human-veriﬁed irrigated ﬁelds, 38,000 dryland ﬁelds, and over 500,000 km 2 of uncultivated lands. We used 60,000 point samples from 28 years to extract Landsat satellite imagery, as well as climate, meteorology, and terrain data to train a Random Forest classiﬁer. Using a spatially independent validation dataset of 40,000 points, we found our classiﬁer has an overall binary classiﬁcation (irrigated vs. unirrigated) accuracy of 97.8%, and a four-class overall accuracy of 90.8%. We compared our results to Census of Agriculture irrigation estimates over the seven years of available data and found good overall agreement between the 2832 county-level estimates (r 2 = 0.90), and high agreement when estimates are aggregated to the state level (r 2 = 0.94). We analyzed trends over the 33-year study period, ﬁnding an increase of 15% (15,000 km 2 ) in irrigated area in our study region. We found notable decreases in irrigated area in developing urban areas and in the southern Central Valley of California and increases in the plains of eastern Colorado, the Columbia River Basin, the Snake River Plain, and northern California.


Introduction
In the Western U.S., over 80% of extracted freshwater is used for irrigation (i.e., artificial application of water to crops by humans), 56% of which is consumed by crops (i.e., lost to the spatial extent and variability of individual agricultural fields and associated volumes of consumptive water use [39]. Previous SRS studies focused on mapping regional to global-scale irrigation often depend on census estimates of irrigated land area or existing land-use and land-cover datasets to parameterize irrigation models. Examples include the Global Irrigated Area Map (GIAM [40][41][42]), Landsat-based irrigation dataset (LANID [43]), and the Moderate Resolution Imaging Spectroradiometer Irrigated Agriculture Dataset for the U.S. (MIrAD-US [44,45]). These studies aim to reproduce the reported irrigated extent with added spatial detail, often using a greenness index threshold in which pixels are considered irrigated. While several studies produce annual irrigated lands data [46][47][48][49][50][51], to our knowledge, none are available for the Western U.S. A significant advance in annual, high resolution mapping of irrigated areas was recently achieved over the High Plains Aquifer (HPA) of the Central U.S. by Deines et al. [50,51]. This approach used an independently developed dataset to train a Random Forest (RF) model, a non-parametric ensemble decision tree classification and regression algorithm [52]. They mapped historical irrigated lands annually from 1984 to 2017 at 30-m resolution within a 625,000 km 2 study area with 91.4% overall accuracy. They used a novel approach to overcome imagery gaps and commission errors, and parameterized their model with neighborhood greenness indices and many ancillary datasets. Drivers of irrigated area [50] and projections of High Plains Aquifer decline [51] were also studied.
RF has been successfully implemented in many SRS-based land classification studies on mixed land types [53][54][55], and for classification of agricultural land uses [56][57][58][59][60]. RF has been shown to be a reliable and fast algorithm for remote sensing applications, suited to handling high-dimensional and colinear data, insensitive to overfitting, and explanatory of variable importance [61].
Here, we describe a Landsat-based irrigation detection RF model, IrrMapper, to map annual irrigation status at 30-m resolution. We use a similar approach and build on the previous work of Deines [50], by expanding the spatial scope and parameterizing the RF model with more extensive training, climate, land use, and other geospatial datasets. IrrMapper produces irrigation status wall-to-wall across the Western U.S., and is independent of USDA NASS irrigation statistics, allowing for an independent comparison to Census of Agriculture data as described in the following sections.

Methodological Overview
IrrMapper uses a RF modeling approach to predict four land classes of irrigated agriculture, dryland agriculture (i.e., crops receiving water only from precipitation), uncultivated lands, and wetlands at an annual time step, and at 30-m spatial resolution across the Western U.S. The RF model is parameterized using a large set of training data of both the target class (i.e., irrigation) and non-target classes (e.g., uncultivated), and numerous geospatial and climatic datasets. The training data consist of manually developed Geographic Information System (GIS) field boundary polygons and attributes of irrigation-equipped and unirrigated lands developed by numerous state and federal agencies, and research institutions. Input parameter data are geospatial and climate datasets including Landsat and aerial imagery, terrain and land use data, and precipitation, temperature, and evaporative demand. We sampled 132 parameter values from geospatial and climate datasets at 60,000 randomly distributed training points within our field polygon training dataset, and used them to train and apply the RF algorithm to predict and perform accuracy assessment of irrigation status classes across the Western U.S. We used Google Earth Engine (GEE [62]), a cloud-based geospatial analysis platform and multi-petabyte catalog of geospatial data and satellite imagery to access all imagery used in training data development, compile all model input data, to parameterize and train the RF model, to predict land class, and to extract results and validation data. All services from GEE were free.

Study Area
The study area consists of 11 Western U.S. states of Arizona, California, Colorado, Idaho, Montana, New Mexico, Nevada, Oregon, Utah, Washington, and Wyoming, an area of 3.1 million km 2 (Figure 1). This region is more arid than the eastern U.S. with exceptions in the Pacific Northwest and regions of northern California. Annual precipitation in the study area ranges from a minimum of approximately 60 mm year −1 in southeast California to over 3000 mm year −1 in the Cascade Mountains of Washington. Evaporative demand ranges from approximately 500 mm year −1 in the Cascade Mountains of Washington to over 2600 mm year −1 in southern Nevada. The Southwest U.S. is dominated by summer monsoonal precipitation, while the northern and Pacific zones receive the majority of precipitation in the winter, much of it in the form of snow. In general, the climate transitions from Pacific coastal and Mediterranean to continental, from west to east.

Landsat and Aerial Imagery
We extracted 132 parameters to use as input data to the model exclusively from datasets with continuous coverage of the entire study region and study period. The 30 m resolution Landsat data used in this work provides six optical bands collected from the Landsat TM, Landsat 7 ETM+, OLI sensors: red, green, blue, near infrared, and two shortwave infrared bands. We used the Landsat Collection 2 Surface Reflectance product, the highest level of processing currently available. Landsat 5 TM and Landsat 7 +ETM surface reflectance data have been corrected for atmospheric conditions and viewing angle geometry using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS [63]) algorithm. Landsat 8 OLI surface reflectance data were processed using the Land Surface Reflectance Code (LaSRC [64]). For each year, we calculated the mean surface reflectance for each of the six optical bands for four periods: March 1-May 1; May 1-July 1; July 1-September 1; and September 1-November 1. We also calculated the maximum, minimum, and mean per-pixel Normalized Difference Vegetation Index (NDVI) for each year. We did not attempt to perform a radiometric cross-calibration between Landsat instruments; differences between processed surface reflectance images exist but are small [65].
Our study area consists of 186 Landsat path-row scenes (Figure 1), each of which was revisited every 16 days by each Landsat mission during the study period. Simultaneous operation of Landsat 5 and 7 from 1999 to 2012 and Landsat 7 and 8 from 2013 yields an 8-day revisit time during 20 years of our 33-year study period, a total of 269,241 available scenes. In May 2003, Landsat 7 suffered a scan line corrector hardware failure (SLC-off) resulting in data gaps in image captures covering about 20% of the image area [66]. While the multiple concurrent Landsat operations during most of our study period allowed for data collection everywhere, during the 2012 collection period, only Landsat 7 SLC-off data were available.
We used images from the U.S. Farm Service Agency National Aerial Imaging Program (NAIP) to verify agricultural field boundary accuracy. NAIP provides 3-and 4-channel (i.e., Red-Blue-Green and Red-Blue-Green-Near Infrared) imagery at various resolutions (0.6, 1, and 2 m) from 2003 to present, offered on a state-by-state basis for multiple years. We used the latest available imagery for each state in our data development process, see [67].

Meteorology and Climate Data
The University of Idaho Climatology Lab produced daily surface Gridded Meteorology (gridMet) at 4-km resolution for the conterminous U.S. from 1979 to present [68]. We extracted mean temperature and total precipitation from gridMet for the duration of the four growing season periods and for the preceding water year (i.e., October 1-September 30) to the termination of each of the growing season periods, for each year covered in our training data (28 years). We also extracted the 10th, 50th, and 90th percentile annual minimum and maximum temperature, the annual total precipitation, and the annual total potential evapotranspiration from gridMET. We extracted minimum, maximum, and average monthly temperatures and monthly average precipitation for each month of the calendar year from WorldClim, a 1 km resolution worldwide gridded climate product providing 30-year climate normals based on the period 1970-2000 [69].

Terrain and Land Use Data
We extracted elevation, slope, and aspect from the USGS National Elevation Dataset 1 3 arc-second resolution digital elevation model (DEM). Using the DEM we calculated the Topographic Position Index at 150, 250, and 1250 m [70]. We used the USDA Crop Data Layer [71] and the the USGS National Landcover Dataset [72] to generate binary crop mask and land cover layers.

Training Data
The training and validation datasets for IrrMapper were derived from polygon vector data covering partial areas of each state, obtained from federal and state agencies, and research institutions ( Table 1). All data were stripped of attribution and joined into a database; only geometries were used. Four land classes were represented in the training data: irrigated agricultural fields (Figure 2), dryland agricultural fields, uncultivated lands, and wetlands ( Figure 3).   Table 1 shows the number of polygons and total irrigated training area from each class in each of the 11 Western States.
We assumed the dryland agriculture, wetlands, and uncultivated lands were constant throughout our study period of 1986-2018. We attributed irrigation to irrigation-equipped fields during specific years to account for the possibility that irrigation-equipped fields were fallowed during some years (Table 1). Figure 3. Training data from the unirrigated classes used to train IrrMapper (i.e., wetlands, dryland agriculture, and uncultivated lands). Table 1 shows the number of polygons and total training area from each class in each of the 11 Western States.
Irrigation-equipped polygon datasets that had been developed for specific time periods were verified for those years. Irrigation-equipped polygon datasets without temporal information were generally developed for 4-6 years. These years were chosen to represent a range of climatic variability within the study period found using the Climate at a Glance tool [88], with at least one year of below normal water year precipitation, at least one year of above normal water year precipitation, and at least one year of near-normal water year precipitation ( Figure 4).  Irrigation training data development consisted of two steps: (1) filtering the polygons by NDVI, a common satellite-detected proxy for vegetation density and vigor; and (2) visual inspection. Our filter kept the polygons containing pixels where the lower 15th percentile NDVI of pixels had maximum NDVI greater than 0.5 during either the early or the late summer, May to July and July to October, respectively. Polygons that did not meet the criteria of the filter were ignored. We inspected all polygons resulting from the filtering process using NAIP aerial imagery and Landsat 5, 7, or 8, early, late, and overall summer maximum NDVI. We compared the NDVI with the surrounding natural vegetation and removed any polygons with only partially irrigated extent or where the field boundaries were inaccurate. Our verified irrigation dataset consists of 101,875 features, each corresponding to the year for which it was filtered and then inspected, of which 53,367 are unique agricultural field boundaries covering 14,659 km 2 (1.9% of total training data area). The 48,508 duplicates are fields that were found to be irrigated for more than one year during the years we selected for data development in the state. To our knowledge, this represents an unprecedented collection of verified irrigated areas.
The dryland agriculture training data is almost entirely within the major wheat-growing regions of CO, MT, and WA, with a small amount in the Upper Colorado River Basin in WY, UT, AZ, and CO. The features represent cultivated lands lacking irrigation infrastructure. The dryland data consists of 38,259 fields covering 63,406 km 2 (10.4% of total training data area). These data were inspected for general accuracy using NAIP imagery at several locations but were not systematically verified on a field-by-field basis. The wetlands training data were collected from the U.S. Fish and Wildlife National Wetlands Inventory [75]. We chose 99,697 features at random from the 'Freshwater Emergent Wetland', 'Freshwater Forested/Shrub Wetland', and 'Riverine' classes, covering 2343 km 2 (0.4% of total training data area). The uncultivated class was composed of the USDA Forest Service Roadless Areas Inventory [73], the National Wilderness Preservation System wilderness inventory (comprised of wilderness areas managed by the Bureau of Land Management, Fish and Wildlife Service, Forest Service and National Park Service) [74], and sources of forestry and rangeland data gleaned from states. The uncultivated dataset consists of 39,409 features covering 534,442 km 2 (87.4% of total training data area).
As with the dryland data, the wetlands and uncultivated lands data were inspected to ensure general accuracy, but not systematically verified. We used all the appropriate training data we were able to obtain. The four classes of training data together cover 611,514 km 2 , about 20% of the study region.

Model Training and Classification
IrrMapper is trained using the Random Forest (RF) algorithm, a non-parametric ensemble decision tree classification and regression algorithm. RF chooses random subsets of training samples to train many decision trees and makes a classification based on the mode of the set of trees. In the IrrMapper RF, model hyperparameters were tested using the Scikit-Learn Python implementation of the RF algorithm on our training dataset [89]. We set the number of Rifle decision trees to 100, the number of variables per split to 11, the minimum size of the terminal node to 1, and deactivated the out-of-bag mode in favor of testing accuracy using cross-validation (see below). We then used our hyperparameters to run the GEE implementation of RF.
To extract training data for IrrMapper, pixel sampling locations for 30,000 points within the irrigated areas and 10,000 points within each unirrigated class were placed randomly within a 20-m interior buffered extent of the vector coverage for each land class over the study area ( Figure 5). The points within the irrigated coverage were attributed with the year for which that field polygon had been verified as irrigated, while the other classes were randomly assigned a year from the 28 years we had irrigation training data. We used GEE to then create a composite image of both static (i.e., land cover, terrain, and climate) and dynamic (i.e., Landsat, Landsat-derived indices, and meteorology) gridded data. Each pixel value was extracted for each sample point and returned in a table for use in training the RF algorithm. We trained the RF algorithm within GEE and predicted land class using the 132-layer stack of input rasters over the entire study area each year 1986-2018. While IrrMapper is trained and predicts using four land cover classes, in a final processing step, the three unirrigated land classes are grouped into a general 'unirrigated' class, to give a binary irrigated/unirrigated classification result over the study region. To assess variable importance, we ran the Scikit-Learn implementation of the RF model using our IrrMapper hyperparameters over ten iterations to extract the average feature importance of our model parameters.

Model Cross Validation
To validate our GEE-based IrrMapper RF model, we extracted a sample set of 60,000 points using the same procedure as described above for training points extraction. Points located within a 60-m buffer of the original training dataset were removed. A random subset of 10,000 points from each class was then used to extract results from GEE and calculate a confusion matrix. Additionally, a random subset of points, the number of which for each class was weighted according to the relative area of each of the training classes, was selected for use in further assessment as discussed below. This provided a dataset for a spatially independent cross-validation and allowed us to use the maximum quantity of data in GEE to train the RF without holdouts.

Comparison with National Agricultural Statistics Service Data
For comparison purposes, we compiled Census of Agriculture data for 1987-2017 to find semi-decadal, county-level irrigated area. We aggregated data for years 1987, 1992, and 1997 from [90] and years 2002, 2012, and 2017 from Quick Stats [2]. To remove outliers in our comparison of NASS data with IrrMapper, we masked any pixel location where irrigation was detected for less than five years over the 33-year study period.

Calculation of Irrigated Area Change
To capture change in irrigated area over the course of the study period, we processed 'early' and 'late' irrigation-equipped masks. These masks represent areas where irrigation was detected during at least two of the five-year periods at the beginning and the end of the study period. We resampled these rasters to a 4-km resolution grid and calculated the change in irrigated area per 16 km 2 pixel.

Results
IrrMapper consists of 33 annual, 30-m resolution maps of the binary classification of irrigation status of the western 11 states, 1986-2018. We used GEE to train the RF and predict over the entire study region annually, producing a GEE Image Collection of 33 maps at 30 m resolution. Computation time for training and prediction was about 60 h ( Figure 6).

Model Accuracy
Using 40,000 points for cross validation, we found an overall binary classification accuracy of 97.8% for classification of irrigated vs. unirrigated lands at the validation point locations. False positive prediction of unirrigated land as irrigated by IrrMapper dominated the model error, accounting for 88% of false classifications. IrrMapper has some limitations in discriminating between non-agricultural classes and shows a high level of confusion between the wetland and uncultivated lands classes in the validation data (Table 2). We found an overall accuracy of wetland vs. uncultivated classification by IrrMapper of 88.2%. Wetlands classification in terms of producer's accuracy was the lowest of the four classes at 77.1%. IrrMapper discriminates with a high level of accuracy between irrigated and dryland classes, however, and has an overall irrigated vs. dryland classification accuracy of 99.1%. IrrMapper had producer's accuracy of 98.9% and 96.6% for irrigated and dryland classes, respectively. Table 2. Confusion matrix of the four-class cross validation dataset, comparing the spatially independent, randomly sampled cross validation dataset of training data (i.e., 'Actual') and IrrMapper inference (i.e., 'Predicted'). The limitations of the IrrMapper training data caused by the limited geographic extent of irrigated areas in our training data become apparent when the cross validation data are grouped into binary classes (i.e., irrigated and unirrigated) and weighted for the relative area of each training dataset (Table 3). While the overall accuracy of the weighted cross validation dataset is 98.6%, a small number of false positive classifications of unirrigated lands led to a low producer's accuracy of 57% for the irrigated class. Table 3. Confusion matrix of the binary cross validation dataset weighted according to areal extent of the training data. The points are a spatially independent, randomly sampled cross validation dataset of training data (i.e., 'Actual') and IrrMapper inference (i.e., 'Predicted').

Variable Importance
Of the 132 parameters used in the study, the ten most important, in descending order, are CDL classification, NLCD classification, late summer near infrared, mid-summer near infrared, calendar year maximum NDVI, previous year maximum NDVI, latitude, terrain slope, two year's previous maximum NDVI, and mid-summer red (Figure 7

Comparison with NASS Data
IrrMapper shows good agreement with the NASS agricultural statistics (Quick Stats) at the state scale and for counties with high irrigated area (Figures 8 and 9). Counties with low NASS-reported irrigated area have large relative differences with IrrMapper. Statewide estimates of irrigation matched well with NASS reported statistics over the seven years of available data from NASS (r 2 = 0.94). The county NASS data and IrrMapper had a lower level of agreement (r 2 = 0.90). IrrMapper and NASS show general agreement on the study area trends over the study period; both show relatively low irrigated area at the beginning of the study, a peak in the late 1990s, and increasing irrigation toward the end of the study (Figure 9).
IrrMapper tends to make lower estimates of irrigated area along the Pacific coast and in semi-arid areas where irrigation density is low ( Figure 10). IrrMapper tends to make higher estimates of irrigated area in counties with urban centers and counties on the eastern plains. The best overall agreement between IrrMapper and NASS was found in the states of Idaho and Utah.   , 1987, 1992, 1997, 2002, 2007, 2012, and 2017). Positive values indicate where IrrMapper made larger estimates than NASS.

Trends in Irrigation
IrrMapper detected a general increase in total irrigated area over the course of the study period of 15.4%, from 97,100 km 2 in 1986 to 112,100 km 2 in 2018, with the maximum irrigated area reaching 116,100 km 2 in 1998, and the minimum irrigated area of 91,900 km 2 in 1992 ( Figure 9 IrrMapper shows a general increase in irrigated area in the major arid and semi-arid agricultural regions around the west, including the eastern Columbia River Basin, the Snake River Plain, eastern Colorado and New Mexico, and southern Arizona (Figure 12). Notable decreases in detected irrigation were found in the Treasure Valley of Idaho, the southern Central Valley in California, and the western slope of the Columbia River Basin.

Discussion
Results of this study show IrrMapper classifies irrigated areas with a high degree of accuracy when tested on a spatially independent validation dataset (Table 2). Overall accuracy of IrrMapper in terms of irrigated vs. unirrigated classification (97.8%) is higher than comparable maps (MIrAD-US, 92%; LANID, 94%; and HPA, 91.4%). The skill of IrrMapper classification suggests the selected input data has a strong correlation with each of the target classes, and demonstrates the suitability of most predictive variables, i.e., land cover, Landsat satellite data, geographic location, and terrain ( Figure 7). Further, IrrMapper validation results (Table 2) suggest the inclusion of training data from a vast representation of geographic locations, climate conditions, and meteorological scenarios enables high-accuracy classification over the extremely varied spatiotemporal domain of our study.
When weighted by relative area of training data, validation results suggest over-prediction of irrigation by IrrMapper ( Table 3). The relative contribution of each unirrigated class to over-prediction can be inferred from Table 2, where misclassification of unirrigated land as irrigated (i.e., false positive) is much more common than the misclassification of irrigated as unirrigated (i.e., false negative). This is likely a result of both the unbalanced area of training data from each class and the unclear differentiation between irrigated areas and wetlands in the wetland training data. Over 97% of the total training data area is composed of the uncultivated and dryland classes. As these land uses represent the majority of both our training data and the study area as a whole, a low rate of false positives likely leads to a small but significant contribution to total irrigated area from unirrigated lands. This is evident in results over known uncultivated and dryland areas, where false positive classification of single or small groups of pixels is noted. This problem may be mitigated by using a noise removal technique in post-processing, as done by Deines [51]. While wetlands data represent a small fraction of the training area, during model development we found the inclusion of those data to be critical to IrrMapper's discriminative power in riparian areas where adjacent wetlands and irrigation are common and share a similar appearance. However, inspection of our wetland data reveals areas where irrigation likely occurs as evidenced by simple diversions and ditch networks. It is often unclear in NAIP imagery where areas supplied with irrigation water end and wetlands begin. In our training data and in nature, the existence of wetlands and irrigation in the same place is possible, and therefore both semantic and physical distinction between irrigated areas and wetlands is blurred. This problem may be overcome by restricting the wetlands training data to areas where irrigation does not occur.
Comparison of county-level NASS irrigation survey data and IrrMapper results shows general agreement (r 2 = 0.90) with the best agreement in areas with more irrigation and less agreement in counties with low rates of irrigation ( Figure 8). Large relative differences are expected in counties where both estimates are a small fraction of total area (e.g., the northern counties of Arizona). In urban areas with limited irrigated area, IrrMapper generally estimates greater irrigated area relative to NASS. This can be explained in part by Census of Agriculture classification of farms, where only farms expected to produce and sell more than $1000 of agricultural products are surveyed. This approach omits irrigation by golf courses, hobby farms, and playing fields, areas which are detected by IrrMapper and may represent a large portion of total irrigation in urban and desert landscapes. The bias toward false positive classification of irrigation in IrrMapper likely also contributes to larger estimates by IrrMapper in counties with extensive dryland and uncultivated lands. In areas of extensive irrigation, results are in better agreement, likely due to higher contribution to irrigation from farms included in the Census of Agriculture survey, and less unirrigated area in which IrrMapper may misclassify land type.
IrrMapper tends to make county-level estimates of irrigated area lower than NASS estimates along the Pacific coast and in arid and semi-arid counties with low density of irrigation ( Figure 10). In the Pacific Northwest, the high relative contribution to crop water requirements from precipitation may allow low irrigation intensity and thus low contrast in satellite images between irrigated and unirrigated areas, and under-classification of irrigation by IrrMapper. Along the coast of Oregon and California, underestimates may be attributable to lower density of IrrMapper training data and under-classification as a result. The most notable region of generally higher IrrMapper estimates are the easternmost counties of the study area in Colorado and New Mexico. These areas likely have significant rates of false positive classification of dryland agriculture as irrigated. This may be caused by sub-annual cropping of dryland agriculture in areas where soil moisture is conserved through the use of herbicides during fallow periods and where subsequent croppings result in a high NDVI relative to adjacent, unirrigated land. Despite disagreement between the two methods, when aggregated over the study area, IrrMapper and NASS show rough agreement on trends in the extent of irrigation; both identify a peak in irrigated area in the mid-1990s, followed by a decline through the 2000s, and a rise toward the end of the study period ( Figure 9). This suggests that, in addition to its capacity to accurately map irrigation at the local scale, IrrMapper also has the capacity to detect regional trends in irrigation at higher temporal resolution relative to NASS.
Spatial trends in irrigation detected by IrrMapper are complex and are likely driven by many factors, including changes in land use, timing of crop planting, crop type, water resource limitations, and changes in irrigation efficiency, and also limitations in the IrrMapper approach ( Figure 12). While analysis of the drivers of changes in irrigation is outside the scope of this paper, we hypothesize several factors that deserve further investigation. We suspect the areas around Phoenix, AZ; Denver, CO; Portland, OR; Ellensburg and Yakima, WA; and Boise, ID have undergone suburban development that has replaced formerly irrigated areas. We suspect demand for fresh winter agricultural produce has driven a change in cropping time from summer to winter in southern California, a period for which IrrMapper is not designed to detect irrigation (see below). We suspect demand for orchard and vineyard crops has led to an increase in their extent. IrrMapper may not detect them due to bias in the training data development toward selection of irrigated fields with high maximum summer NDVI (see below), and that irrigation of vineyards and orchards may drive a weaker NDVI response due to crop spacing. We suspect formerly irrigated areas in Nevada, Colorado, and New Mexico have been retired due to legal and physical limits on water availability. Widespread increases in irrigated area may be due to irrigation development, and use of more efficient irrigation application equipment and thus expansion of irrigated area despite constant rates of water extraction. Deines et al. [50] studied changes in irrigation over the High Plains Aquifer; previous-year commodity price was found to be positively correlated to irrigated area, while irrigation volume and depth were negatively correlated with precipitation. Such studies of the drivers and patterns of irrigation and water use in the Western U.S. may be enabled in the future by IrrMapper.
IrrMapper limitations are likely due to its simple model parameterization and bias in the training data development process. A central assumption of IrrMapper is that the irrigation occurs during the March-November time period. The assumption that the growing season occurs between March 1 and November 31 may contribute to under-classification of irrigation in areas with a winter growing season. This is apparent in areas such as the southern Central Valley and Imperial Valley in California and Yuma, AZ, which have seen decreases in irrigated area according to IrrMapper (Figure 12). We ran a sub-model 'IrrMapper LCRB' for the Lower Colorado River Basin, and found that when the growing season is extended to the entire year, IrrMapper detects more irrigated fields. This suggests IrrMapper may benefit from customized parameterization within specific regions. Further, IrrMapper does not explicitly model the temporal dynamics of the Landsat spectral signal. IrrMapper uses the mean surface reflectance for each growing season period and thus information on the spectral dynamics of each location within that period is lost. Including temporal data associated with specific image captures may improve IrrMapper's ability to discriminate between land classes that experience distinct temporal dynamics in spectral response through the year, but have similar spectral means.
While the geometry of the fields was created by experts, the filtering process depended only on a set of NDVI statistics. This approach may systematically exclude areas that are sparsely irrigated and show a weak NDVI response, adding bias to the model. An effort was made to represent various land types, including those with weaker NDVI signal (e.g., vineyards and widely spaced orchards), but, in some cases, irrigated fields were removed from the data because the field included areas that were not reached by the irrigation equipment. The training data are thus biased toward intense irrigation, and likely fail to detect irrigation in areas with infrequent or low-intensity irrigation. The assumption of static land cover in the unirrigated classes (i.e., dryland, wetland, uncultivated) may also introduce error in the training data where land class has changed during the study period. The assumption is probably best for the uncultivated class (e.g., national forest, roadless areas), and weakest for the dryland class, where conversion to irrigation may occur. We suspect the locations where dryland was converted to irrigated are likely limited in our training data because the geospatial data development occurred recently.
IrrMapper is an improvement over previous mapping efforts in the Western U.S. given the large geographic and temporal extent of both training data and our predictions. Further, our predictions depend only on our independently verified training data, compared to many previous efforts where irrigation models have depended on agricultural census data to parameterize models using spectral thresholds (e.g., LANID, MIrAD-US). While these models effectively leverage the predictive power of irrigated areas' spectral signature, they rely on agricultural statistics and therefore incorporate both the error in the survey and irrigated areas excluded from the tabulation according to the criteria of the agricultural survey. Further, they may not be suited to generalization in time, as the conditions during census years may not be representative of regional climatic variability. Models that 'tune' to the agricultural statistics during only one or several growing seasons may mistake irrigation status when the model is applied to the same place under different climate or economic scenarios [4,91]. As the training data used in IrrMapper represent the wide range of climatic, spatial, and temporal variability we observe in the West, the model can be relied on to make good predictions for years without training data. Further, IrrMapper uses existing land use classification models (i.e., NLCD and CDL) as input parameters, rather than as training data or as a mask for areas not considered agricultural land by those model products (AIM-HPA, LANID, and MIrAD-US). This allows the model to determine the relative importance of these parameters, rather than using them as a mask and thus incorporating the error inherent in the land use data into the map. IrrMapper is created independently of the NASS agricultural statistics, and can thus be used as an independent comparison to examine both existing irrigation maps and historic agricultural census data.

Conclusions
Water resources management in the Western U.S. requires accurate, timely, and high resolution irrigation maps. These maps are a critical resource in assessing the impact of irrigation on human and ecological systems and quantifying irrigated water consumption. Despite the critical importance of irrigation, the high spatial and temporal resolution mapping of its occurrence is currently lacking. IrrMapper introduces the high resolution mapping of irrigation annually, 1986-2018, over the Western U.S. Using IrrMapper, we found that irrigated area in our study region has ranged from 91,900 km 2 in 1992 to 116,100 km 2 in 1998. Irrigation increased by about 15% over the study period, from 97,100 km 2 in 1986 to 112,100 km 2 in 2018. We found that IrrMapper compares favorably with NASS agricultural census data, especially in areas of high irrigation density. IrrMapper differs most from NASS census data along the Pacific Coast, the eastern margin of the study area in Colorado and New Mexico. IrrMapper demonstrates the ability of a RF-based method to accurately map irrigation at a sub-continental scale. Future work should use a temporal parameterization and investigate the underlying drivers of change in irrigated area in the Western U.S. Data for this project is available at https://code.earthengine.google.com/ c5a2ce562c867e6a31216128ad159d96, and the code at https://github.com/dgketchum/EEMapper/ tree/IrrMapper_RF.