Agricultural Crop Change in the Willamette Valley, Oregon, from 2004 to 2017

: The Willamette Valley, bounded to the west by the Coast Range and to the east by the Cascade Mountains, is the largest river valley completely conﬁned to Oregon. The fertile valley soils combined with a temperate, marine climate create ideal agronomic conditions for seed production. Historically, seed cropping systems in the Willamette Valley have focused on the production of grass and forage seeds. In addition to growing over two-thirds of the nation’s cool-season grass seed, cropping systems in the Willamette Valley include a diverse rotation of over 250 commodities for forage, seed, food, and cover cropping applications. Tracking the sequence of crop rotations that are grown in the Willamette Valley is paramount to answering a broad spectrum of agronomic, environmental, and economical questions. Landsat imagery covering approximately 25,303 km 2 were used to identify agricultural crops in production from 2004 to 2017. The agricultural crops were distinguished by classifying images primarily acquired by three platforms: Landsat 5 (2003–2013), Landsat 7 (2003–2017), and Landsat 8 (2013–2017). Before conducting maximum likelihood remote sensing classiﬁcation, the images acquired by the Landsat 7 were pre-processed to reduce the impact of the scan line corrector failure. The corrected images were subsequently used to classify 35 different land-use classes and 137 unique two-year-long sequences of 57 classes of non-urban and non-forested land-use categories from 2004 through 2014. Our ﬁnal data product uses new and previously published results to classify the western Oregon landscape into 61 different land use classes, including four majority-rule-over-time super-classes and 57 regular classes of annually disturbed agricultural crops (19 classes), perennial crops (20 classes), forests (13 classes), and urban developments (5 classes). These publicly available data can be used to inform and support environmental and agricultural land-use studies.


Introduction
Remote sensing of the Earth provides an unimaginable wealth of data about our planet. However, the images are rarely used in their raw form; rather, they are generally interpreted using various analytical techniques. Landsat is the satellite program that earned a preeminent place in Earth surveying. The Landsat program (i.e., first launched in 1972) has provided uninterrupted coverage of the Earth; an area is observed approximately every 16 days with a current spatial resolution of 15 or 30 m. However, Landsat sensors have not always operated as expected, such as the case of the Landsat 7 scan line corrector (SLC) failure. Even when the Landsat missions exhibited difficulties, the images supplied by the satellite were so valuable that they continued to be acquired even after the failures were noticed. To correct the recording failures, various procedures were developed to fit specific objectives, such as the one for agricultural crops and other landscape features of Mueller-Warrant [1]. Once the corrections were applied, the imagines were post-processed [2], then interpreted using parametric or non-parametric procedures.
The constant flow of Landsat images supplied the data for accurately assessing land use and changes in land-use across large swaths of land. Previous studies have used spectral analysis to develop algorithms that extract information relevant to land management from time series images [3][4][5]. Many of the investigations using Landsat imagery were confined to Landsat data only; others included additional imagery to enhance the accuracy of the results, such as images obtained with unmanned aerial systems [6] or other spaceborne platforms, such as Sentinel [7]. The vast majority of the methods used for image analysis use supervised procedures, which range from traditional maximum likelihood to neural networks and classification and regression trees [8,9]. In many instances, when multiple land use classes were considered, the accuracy of the results was compromised, with values less than 0.5 being common [9]. Therefore, significant efforts were placed on developing algorithms that improve accuracy, such as [10], which obtained data for 16 classes with an accuracy of 88%, or [11], which obtained data for 10 classes with an accuracy of 98%. Increases in the accuracy of these methods are often not combined with increases in the number of classes, which would raise the utility of the results. Therefore, the objective of the present study is to present a series of images that depict in detail (i.e., understood as a large number of classes) the landscape of the Willamette Valley, Oregon USA. The series of images were obtained by combining two datasets derived primarily from remote sensing classifications of Landsat images. The datasets, obtained from 2004 to 2014 in the first case, and through 2017 in the second case, have been used in several studies conducted by the USDA-Agricultural Research Service [1, 12,13]. In the last two years, the classified images were further enhanced by additional sets of constraints, which eliminated the majority of situations where logically impossible crop sequences initially occurred. The data are suitable for use in landscape management, as well as in land-use change, forest cover dynamics, landscape ecology, and assessing urban developments.

Study Region and Data Description
The object of the present study is the entire Willamette River Basin and the neighboring areas, a region located in western Oregon ( Figure 1) covering 25,303 km 2 . The Willamette Valley is a complex alluvial region encompassing agriculture, urban developments, and forest. It is bounded to the west by the Coast Range and by the Cascade Range to the east. In the north, the Columbia River limits the Willamette Valley, while it ends near the Klamath Mountains to the south. The north and center of the valley is primarily urban and agricultural land, and hosts the largest cities in Oregon: Portland, Salem, and Eugene.
The valley has rich soils and relatively flat topography, which makes it a highly productive agricultural area. A significant portion of the fertility of the Willamette valley is the result of a series of ice-age floods started from Lake Missoula, Montana, which swept topsoil to the Columbia River Gorge [14]. The deep deposits of gravel, silt, and clay caused by the floods braided a river system meandering through the valley, with loose and unconsolidated sub-surface channels [14]. The climate of the Willamette Valley is influenced by the Pacific Ocean, which is located less than 100 km to the west. The winters are cool and wet, whereas the summers are warm and dry. The mean high temperature in the summer is 28 • C and, in the winter, 8 • C. The mean low temperatures are approximately 12 • C in the summer and 1 • C in the winter [15]. The climate is similar to the Mediterranean climate, with slightly wetter and cooler winters [15]. The moisture is generally adequate throughout growing seasons, which are long and productive compared to the surrounding regions. To provide the readers with a comprehensive set of data, we have included two datasets with the classified images: one representing the digital terrain model (WillametteValley_DTM.zip) and one representing the stream network in the Willamette Valley (WillametteValleyHydrography.zip).   The data presented in the study are based on the imagery acquired by Landsat 5, 7, and 8, corrected for the failure of the SLC in Landsat 7, which occurred on 31 May 2003, with the Mueller-Warrant procedure [1]. The SLC correction method segmented defect-free images based on localized multiband uniformity, assessed with principal component analysis (PCA). The mapping rules extracted using principal components are subsequently used to fill in the missing values. This method is attractive because it has no parameters to be set by the users and performs as expected across diverse terrestrial scenes. The Python code implementing the Mueller-Warrant [1] SLC correction method and the corrected Landsat images are hosted by the Harvard Dataverse [16]. This method also fills data gaps caused by the presence of small, scattered clouds in any imagery source. Many of the alternate methods for filling SLC-or cloud-related data gaps are best suited to relatively infrequent occurrence of such problems in fairly homogeneous landscapes managed in relatively simple crop rotations. In contrast, the highly diverse landscape and cropping systems of western Oregon required a total of 57 land-use classes to capture a complete picture of the complex landscape-the number of land-use classes far exceed those reported in most other studies. For example, the USDA-National Agricultural Statistics Service Cropland Data Layer reports classify all grass seed crops into two categories (class 56 sod/grass seed or class 176 grassland/pasture), while our methodology identified 10 classifications. The change detection methods of Ghaderpour and Vujadinovic [4] and the spatiotemporal fusion methods of Guo et al. [3] focus on creating estimates for missing vegetation greenness indices (NDVI or EVI) in image series that are fairly well sampled over time. Because 82% of the Landsat imagery for western Oregon over the 14-year period we investigated was too cloudy to use in any manner, it was not feasible to gap fill missing data using NDVI or EVI-based approaches. Likewise, while NDVI is useful, it is unable to distinguish highly similar crops such as annual ryegrass, perennial ryegrass, tall fescue, and orchardgrass grown for seed; rather, classification of such crops is greatly improved by inclusion of all available bands. In our preliminary studies, we also found that the three to four highest significance PCA levels were equivalent to the full set of all raw imagery bands in the performance of the maximum likelihood (ML) remote sensing classifier, while reducing operational running time and minimizing the occurrence of 0-variance errors present when two ground-truth training classes have identical signatures over one or more of image bands, whether in the raw data format or PCA.
In brief, PCA is used to identify the most homologous regions of a defect-free image across all bands, and to quantify the uniformity on varying scales. Regions with greatest uniformity can supplement any missing data (in another image date) by filling the missing data with the average pixel value of the corresponding 2 N -sized blocks in the defect-free image (N ∈ {2, 3 , 4 , 5 , 6}, values which were selected such that the region was wide enough to encompass missing and non-missing pixels and small enough to minimize the estimation error). Once the rules for supplementing the missing data have been implemented, the gap-filled data can be used directly in ML remote sensing classifications. The gap-filled data can also be subjected to PCA to reduce the number of dimensions in the data being provided to the ML classifier to somewhere well under half of the original band count, or could be subjected to further processing steps, such as the calculation of EVI, NDIV, or absolute rather than relative reflectance. Our success in classifying a large number of diverse land-use classes in western Oregon using PCA reformulation of the gap-filled data rather than the alternative full set of data for all raw bands is not meant to answer any general questions of the suitability of running ML classifiers on raw data versus PCA reduced-dimension data. In our limited exploration of the topic, the use of PCA had no apparent drawbacks and at least one major benefit-reduced run-times of the ML classifier in ENVI. The classified images and the ground truth shapefiles are stored in the ScholarArchives from the Oregon State University. All the spatial files are archived into one file, named WillametteValleyClassified.tar. The tar dataset contains two zip file archives: one called GroundTruth.zip and one called WillametteValleyClassified.zip. GroundTruth.zip contains the shapefiles, the ESRI format of the spatial data, with the field data collected to verify the accuracy of the classification for all the years from 2004 through 2017, inclusive. For each year, the following set of seven files, which define the shapefile, described the ground for that year: a dbf, a prj, a sbn, a sbx, a shp, a shp.xml, and a shx. Each file stores a particular type of spatial information, whose details are provided by ESRI [17]. The files are responsible for describing the geometry itself (shp) or storing the positional index of the geometry (shx) the attributes for each shape in a dBase IV format (dbf), the projection of the shapefile and the coordinate systems (prj), the spatial index of the features found in the shapefile (sbn and sbx), and the metadata in XML format (shp.xml). The specific ground-truth data used for classifying 12 years of imagery (from 2003 through 2014) into 57 land-use categories for 11 years of crops harvested from 2004 through 2014 plus urban development and forests in our previously published reports [15,16] is not included in the archived data. Rather, a 14-year-long version (from 2004 through 2017 harvests) covering strictly the annual and perennial agricultural crops as 35 categories of land-use, but excluding all forests and urban development, is presented in the archive. The name of the classified images starts with "Landsat," the spaceborne platform, followed by the year and the date on which the file was checked and prepared for upload. The ground truth files are named using a similar structure, but start with "sygt," for "surveyed yearly ground truth," followed by the year when the survey was executed. Immediately after the surveying year, the name of the ground truth files contains the details describing the development of the spatial information, namely "train_shapefile_dissolved." WillametteValleyClassified.zip contains the classified land-use rasters from 2004 to 2017 and the associated files needed to display the files correctly and completely. This file also contains two multi-year summaries from the 11-year analysis, the four superclasses (supercl4ver6) and a synthetic average (bignormalver6) show where each crop was most likely grown over that period of time. A file structure, similar to that of the shapefiles, was created for each year. This file structure includes: a tfw file, a tif file, a tif.aux.xml file, a lyr file, a cpg file, a vat.dbf file and a tif.xml file. A complete description of each file is provided by ESRI [17]. The tif file, which contains the bulk of the classification, is a tagged image file that stores the location and the class of each pixel. The tfw file is an ASCI file that stores the pixel size (15, 30, or 60 m), the rotational information, and the world coordinates of the tif file. The tif.aux.xml file is a file that accompanies the tif file and contains information that cannot be stored in the tif file itself, such as color maps, statistics, or pointers to the pyramid file. The lyr file is a persistent representation of the raster classification, which links the location of the tif file and contains the information on how to render the data from the tif file. The cpg file is optional, and is used to specify the character set to be used. The vat.dbf file contains the information defining values, the color to use to display the value, and the number of points in the grid that have that value. The tif.xml file stores the metadata of the tiff file in XML format. Links between the lyr file and the other associated files for given rasters or shapefiles may break depending on details of the unpacking of the zip file into local file directories, but are easily reconnected in ArcMap using the "Set Data Source" option in "Layer Properties" for "Table of Contents" entries. Layer files used equivalent symbology for both the shapefile ground-truth and the classified imagery rasters.

Land-Use Classification
The classification of the Landsat and other images contains 57 classification categories, which were grouped into four super classes on the basis of majority-rule of time from 2004 through 2014 [13]. The super classes, which were created to ensure the accuracy for general inquiries, were annually disturbed agriculture, perennial crops, forests, and urban development. To simplify the data manipulation, the four super-classes were coded using only numbers, which are:
The land-use classification with all 57 classes, the corresponding super-classes, and their description, as they were originally created by Mueller-Warrant [13], is presented in Tables 1 and 2. The abbreviations used to describe the classes are: Table 1. Western Oregon 2014 forest and urban development area as previously published [3].

Ground Truth and Training Data
The data serving as ground-truth were obtained by surveying agricultural fields in four western Oregon counties every year from 2005 to 2017 from April through July. The surveys were executed near the end of the August through July annual cropping cycle to develop a comprehensive list of crops and crop management, including stand establishment and post-harvest residue management [18]. More than 4000 fields were surveyed in each growing season in the first four years; approximately 3000 fields in the next two years, and 2000 fields in the final seven years. Field sizes ranged from 10 ha to 100 ha, providing a very large number of potential ground-truth pixels. The survey revealed 48 crop types, 15 stand establishment conditions, 7 post-harvest residue management practices, and 16 other management practices or field conditions. Simplification of the list revealed that 46 combinations of crop type/stand establishment condition/residual vegetation management accounted for more than 99% of agricultural land-use in the Willamette Valley. To represent the forest, urban, and non-agricultural land-use, 11 classes from the most recent NASS-CDL were added: NLCD classes 11, 21, 22, 23, 24, 41, 43, 44, 53, 90, and 95 (Table 1). To minimize the mixing pixels that belong to different classes, the newly added 11 classes were constrained to only include pixels with values identical to neighbors.
Training data, containing approximately 20,000 pixels per class, were created using simple random sampling. The sample size was selected such that almost half of the pixels from most classes were chosen. The validation data contain all the pixels not used in training, randomly subsampled down, when necessary, to match the size of the training data for each class. Several of the original 19 annually disturbed agricultural crops and 20 perennial crops were difficult to reliably separate from each other in published results from 2004 through 2014; as such, the three annual ryegrass land-use classes (2, 12, and 44) were consolidated into one revised class [1, 12,13,18]. In the present analysis, we have also consolidated land-use classes 3, 7, 10, and 41 for similar reasons, providing a total of 35 agricultural land-use categories, comprising 16 annually disturbed crops and 19 perennial crops.
The previously reported 11-year classifications [12,13] used as many of the full 57 landuse categories were present in the ground-truth data for each August through July western Oregon cropping year for harvests from 2005 through 2014. Synthetic ground-truth data for 2004 was generated after initial ML remote sensing classifications were calculated for 2005 through 2011, primarily by employing four superclasses to define locations where cropping practices and other land-use categories were most stable and least likely to have changed between 2004 and 2005 [19]. Both the 8-year (ending in 2011) and the 11-year (ending in 2014) ML remote sensing classifications used 57-class ground-truth data to classify the entire 110 km east-west by 230 km north-south area of interest (AOI). After the AOI was subdivided into superclasses (urban development, forest, and a pool of annually disturbed plus perennial crop agriculture), we discovered that superclass land-use across years was relatively stable. Therefore, we developed a more accurate approach to classify categories of greatest interest-the agricultural land-use. In addition to the normal method for ML remote sensing classification of individual years, we also converted the 35 single-year categories into 137 corresponding year-to-year sequences that were logically consistent and sufficiently frequent in their occurrence that ground-truth was present for a majority of the 137 different two-year-long sequences ( Table 2).

Imagery
The previously reported land-use classifications from 2004 to 2011 were executed using Landsat 5 TM and Landsat 7 ETM+ images having less than 40% cloud cover for path 46 rows 29 and 30 that were processed to Level 1G standard. For 2012 to 2014, the Landsat 5, 7, and 8 images for path 46 rows 28, 29, and 30, with clouds coverage and processing level similar to the 2003-2011 images, were used. The Landsat imagery was supplemented with available MODIS data resampled to 30-m resolution, the NASA nightlight composite dataset F152008 up-sampled to 30-m pixels and averaged over 25 by 25 rectangular neighborhoods, National Agriculture Imagery Program (NAIP) color composite photographs for years in which they had been collected, and the elevation provided by USGS. Beside spectral and elevation information, the slope and orientation were computed. The orientation was converted to paired orthogonal aspect components having values from 0 • to 180 • , from north-facing and from east-facing directions. Furthermore, for each Landsat acquisition date, the normalized differential vegetation index (NDVI) was calculated to help differentiate vegetation from other classes [20].
where NIR and Red represent the spectral intensity in the red (0.63-0.69 µm) and near infrared ranges (0.76-0.90 µm), rescaled to 8-bit color depth, for Landsat are Band 4 and Band 3, respectively. The clouds were masked either manually, by establishing limits within individual Landsat imagery bands, or directly from the image quality band of the MODIS imagery. Cloud masks were not routinely available for Landsat imagery in the early years of the project, and even those eventually included in the later years were often not as useful as the ones created manually. The masks excluded the pixels classified as cloudy from further analysis for all bands and from all individual image dates. To reduce dimensionality of the datasets, principal component analysis was executed within cloud-free regions for each satellite image using NDVI and the reflectance of all bands (except band 6.2 of Landsat 7). The top three eigenvalues, corresponding to the highest three principal components for Landsat data, and the largest two eigenvalues, corresponding to the highest two components for MODIS data, were selected in our previous analyses. Analyses published before the invention of the Landsat 7 SLC data gap correction method [1] nearly always included varying numbers of usable images across the AOI, forcing the use of multistep schemes to assign classes to individual pixels based on the number of available imagery bands at each point in each of a series of ML remote sensing classifications [13,18].
The ML remote sensing classifications reported here used USGS elevation, slope, EW orientation, and NS orientation, NASA nightlight, NAIP aerial photography, and Landsat 5, 7, and 8 imagery. Landsat 5, 7, and 8 imagery were rated as suitable for use whenever the data gap correction method had succeeded in repairing SLC failure gaps and in limiting any unrepairable cloudy areas and SLC failure gaps near to the clouds to areas of strictly forest or urban development as defined by the 11-year superclass data. The resulting raster data stacks were cloud-free and SLC failure gap-free on nearly 100% of the area for which our 11-year superclass rasters had indicated the presence of annually disturbed agriculture or perennial crops. Any superclass pixels of value 1 or 2 that lacked full raster stack imagery were assigned the average land-use category calculated in the 11-year-long analyses [19]. The four orthogonal components of highest statistical significance in PCAs were recalculated for each Landsat imagery dataset after SLC-or cloud-related data gaps were filled. For the two-year-long cropping sequences, Landsat imagery of relevance was assumed to run from June, prior to the first crop harvest, to September of the second crop harvest.

Remote Sensing Pixel Classification and Object-Based Reclassification
The land-use classification was executed using the ML method [21], as implemented in ENVI-EX software version 5.0 [22]. ML classifications were run using a parameterization similar to [23] for the normal 35-class single-year approach and for a 137-class year-to-year sequence method. To properly function, ENVI-EX requires that ground-truth training data be converted from raster to vector format. Furthermore, all classes in the ground-truth data must align with a large enough number of pixels in the raster data stack to ensure that non-zero variation is present in each band. Hence, even in years where a specific class was present in the ground-truth data, it might be removed due to lack of variation in one or more bands in the raster data stack. Finally, individual classes sometimes disappear during the actual ML remote sensing classification in ENVI-EX when they are not sufficiently unique compared to other, more abundant classes. Because the ML output is automatically recoded into a consecutively ordered set of integers with no skips, care must be taken in transforming the ENVI-EX ML file into the same list of numbers that were used in the ground-truth training data.
After ML classification of the two-year-long sequences, the results were converted back to 35-class rasters for each pair of years in the sequence as the 'early' and 'late' versions of potential land-use classes for each individual year. The 'early,' 'late,' and normal singleyear ML classifications were combined in several ways, including a 'strong4' classification raster in which all three versions agreed on the same class value. To improve the accuracy of the classification and reduce year-to-year inconsistency, several object-based procedures were applied to improve the ML classifications [24]. The first of these summarized the pixel class frequencies within individual physical fields, allowing majority-rule reclassification of all pixels. When these majority-rule values failed year-to-year consistency tests, the second most commonly present value was applied to all pixels within a given field, a change that typically resolved the inconsistency. The majority-rule and second-place reclassification procedures were applied to the 'strong4 data in which the 'early,' 'late,' and normal singleyear ML classifications had all agreed. Lacking any obvious reason for choosing one year over another to use the second-place versus majority-rule value, iterative procedures were employed to test various combinations of majority-rule or second-place values over time from the 'strong4 classification in cases where the year-to-year transitions between crops were not logically consistent. Fields where some particular combination over time of the majority-rule versus second-place choices resolved the year-to-year inconsistencies were then marked safe from being changed to alternate values in the next cycle of the process. In addition to the majority-rule and second-place options, we included eight other choices for possible substitution when inconsistencies were detected. This comprised the 'early,' 'late,' single-year ML classifications, their three pair-wise agreements, the final optimization from the earlier research ending in 2014, and a normalized raster that matched the 11-year averages across the entire AOI while selecting whichever class occurred most frequently at each pixel's location. Following several hundred cycles of this automated process, the more common inconsistencies that remained within the year-to-year classifications were manually examined and corrected. The most common manual repairs involved recognizing that dense stands of weeds such as annual ryegrass might have masked the planting year of new crop stands, particularly for those that are slow to establish. Other manual repair methods relied on the four super-classes to override sporadic misclassifications, especially those only one or two years in duration. A similar optimization approach had been applied to the previously published analysis covering the period from 2004 through 2014, but with fewer alternatives for substitution to correct year-to-year inconsistencies in cropping sequences (11).
The use of more than 91,000 physical fields based on USDA-Farm Service Administration common land unit (CLU) shapefile data from 2004 allowed simultaneous improvement in both classification accuracy and year-to-year cropping sequence consistency. Classification accuracy reported here was measured strictly from the newly calculated 35-class values corresponding to locations of superclass 1 and 2 pixels. Overall classification accuracy for the 35 agricultural land-use categories prior to the optimization process ranged from 40.2% (relative to both training and validation sets in 2010) to 96.3% (training sets for both 2016 and 2017, Table 3). Validation accuracies were only slightly smaller than training accuracies in all cases, including both initial values and those following 230 cycles of optimization. Kappa statistics (Table 4) provided nearly identical indications of accuracy as compared to the simple accuracy values ( Table 3), suggesting that classes in the ground-truth data were well-balanced and effective in producing reliable classifications. The highest accuracy occurred in the final four years of the analysis, which had nearly double the amount of usable Landsat images available. After 230 cycles of optimization, the accuracy improved in years when it had been lowest. Likewise, accuracy was lowered in the years when it had been highest. Average accuracy over the 14 years increased by 3.9% following optimization. It was also possible to examine the accuracy with which the two-year-long cropping sequences were classified. Accuracy in the two-year sequences was an average of 16% lower after the 230 cycles of optimization, dropping in 12 of 13 sequences (Table 5). Validation set accuracy in the 137-class case, while still very close to training set accuracy, was not quite as close to training set accuracy as it had been in the normal single-year 35-class case. Highest Kappa statistics for the 137 two-year-long sequences of the training and validation datasets were numerically similar to the simple accuracy data (Table 6), once again affirming the uniformity and reliability of our ground-truth data. Table 5. Overall classification accuracy by year-to-year sequence of 137 two-year-long-transitions for training and validation datasets prior to and following 230 cycles optimizing year-to-year consistency.

Initial Classification
After  (Table 7) was prepared as a simpler, more reader-friendly summary than the full set of 14 individual years by 35 classes. The two measures assessing classification accuracy were producer accuracy and user accuracy: where TP is true positive, FP is false positive, and FN is false negative.  Out of the 35 agricultural land-use classes, 11 of them occupied an average of more than 100 km 2

Results and Discussion
The average overall classification accuracy of 67.6% for the normal single-year perspective was low enough to preclude direct use of these results for site-specific activities such as validating past crop history of land for certification of grass seed crops. The much higher accuracies for certain crops, such as clover, blueberries, wild rice, and mint, however, suggest that these data could be directly useful in identifying issues such as pollination or refuge sites for bees, or disease control or limiting of cross-pollination among crops through physical spacing of fields. Because the classification data are approximately correct when averaged over time or over moderately large spatial distances (~10 km), they should be directly useful in studies of landscape-scale issues such as water quality, distribution and abundance of birds, fish, amphibians, and other wildlife. Our earlier analyses have already used to characterize general crop rotation patterns, current ages of particular perennial crop stands, and the full durations of multi-year cropping cycles. Of course, the 14-year classification record could also be a useful primer for additional remote sensing classification of land-use in the future.

Conclusions and Future Work
This data archive includes land classifications from 2003-2017. Previous research has shown that it is possible to use bootstrap procedures to generate reliable ground-truth data for years adjacent to those with traditional ground-truth data for most, but not all crops in western Oregon [19]. Crops that are truly grown as strict annuals with no identifiably separate establishment-year phase require some version of traditional ground-truth data, such as drive-by field surveys or through publicly-available or grower provided crop histories.
The data files present in the archive include the final best version of land-use classification for each year from 2004 through 2017, along with the ground-truth data used in the ML classifier in ENVI. The archive also includes several other classification rasters of interest, including "supercl4ver6.tif" and "bignormalver6.tif," which were created from data ending in 2014 as means of summarizing land-use practices in western Oregon [12,13]. The "supercl4ver6.tif" file provides the four superclasses (annual crops, established perennial crops, urban development, and forestry) showing the most common land-use practices of the 11-year period. The "bignormalver6.tif" file provides the average location of all 57 land-use classes over the 11-year period, with total areas for each class matching their averages over that period, and the locations of pixels corresponding to the most frequently present land-use, subject to the restriction that total areas had to match the 11-year average totals. In effect, this land-use raster image displays where each particular crop or other land-use class was most likely to have been found from 2004 through 2014.
The USDA-NASS produces annual cropland data layers for the entire country; however, the classifications provided by NASS are limited, and span a much smaller number of crops in western Oregon than the current archive. For example, the NASS CDL scheme combines all the grass seed crops in western Oregon into two very general categories, along with some varying degrees of overlap with hay-crop and pasture. Many of the other agricultural land-use classes in our scheme do have corresponding categories in the NASS-CDLs; however, our data have the further advantage of being present every year, unlike the NASS CDLs, which were traditionally only produced for western Oregon once every two or three years.

Data Availability Statement:
The study produced an extensive dataset that can be accessed at https://doi.org/10.7267/jd473392m.