Accuracy, Bias, and Improvements in Mapping Crops and Cropland across the United States Using the USDA Cropland Data Layer

The U.S. Department of Agriculture’s (USDA) Cropland Data Layer (CDL) is a 30 m resolution crop-specific land cover map produced annually to assess crops and cropland area across the conterminous United States. Despite its prominent use and value for monitoring agricultural land use/land cover (LULC), there remains substantial uncertainty surrounding the CDLs’ performance, particularly in applications measuring LULC at national scales, within aggregated classes, or changes across years. To fill this gap, we used state- and land cover class-specific accuracy statistics from the USDA from 2008 to 2016 to comprehensively characterize the performance of the CDL across space and time. We estimated nationwide area-weighted accuracies for the CDL for specific crops as well as for the aggregated classes of cropland and non-cropland. We also derived and reported new metrics of superclass accuracy and within-domain error rates, which help to quantify and differentiate the efficacy of mapping aggregated land use classes (e.g., cropland) among constituent subclasses (i.e., specific crops). We show that aggregate classes embody drastically higher accuracies, such that the CDL correctly identifies cropland from the user’s perspective 97% of the time or greater for all years since nationwide coverage began in 2008. We also quantified the mapping biases of specific crops throughout time and used these data to generate independent bias-adjusted crop area estimates, which may complement other USDA survey- and census-based crop statistics. Our overall findings demonstrate that the CDLs provide highly accurate annual measures of crops and cropland areas, and when used appropriately, are an indispensable tool for monitoring changes to agricultural landscapes.


Introduction
Mapping and monitoring crops and croplands can generate powerful insights about our environment and agricultural production systems [1][2][3]. Because satellite-based remote sensing products are able to efficiently capture land use/land cover (LULC) and their variations across space and time, these data are increasingly chosen as the basis for agricultural and environmental decision making, including policy creation, evaluation, and enforcement [4][5][6][7][8]. With the increased availability and use of detailed remotely sensed land cover products, however, there is a growing need to understand their accuracy and reliability for different applications [9][10][11][12].
In the United States, the Department of Agriculture's (USDA) Cropland Data Layer (CDL) is frequently utilized to monitor agricultural land due to its nationwide coverage, agricultural focus, and annual frequency [13][14][15][16]. Produced by the National Agricultural Statistics Service (NASS), this satellite-derived map has provided complete coverage of the conterminous U.S. each year since 2008. Since it tracks specific crops at field-relevant resolutions, it is an ideal tool to detect geographic trends and changes in cultivation. Previous studies have used the CDL to track crop rotations and planting patterns [17][18][19][20], evaluate Farm Bill policies such as crop insurance and the Sodsaver program [8,21,22], and assess the environmental outcomes of various land management systems [20,23,24], among many other applications. Estimates of cropland area from the CDL are also used internally by NASS for a variety of reports and survey applications as well as considered by other government organizations such as the Environmental Protection Agency, for example, to monitor compliance with land protections in renewable energy policies [4,25].
To characterize the CDL's performance, NASS calculates land cover class-specific accuracies at the state level and releases them with each annual state CDL product [26]. These estimates are based on a comparison with parcel level data from the USDA Farm Service Agency (FSA) [27] and another land cover map, the National Land Cover Dataset (NLCD) [28,29]. While these comparisons provide insights into the accuracy of the CDL for a given state and year, applications of the CDL product typically extend well beyond this scope; many analyses utilize modifications of the original CDL datasets, compare across the state products, and/or estimate changes in LULC over time [30][31][32][33][34]. Despite the prevalence of these applications, the performance of the CDLs in many of these extensions has not been evaluated.
Given this lack of evaluation, several articles have questioned the reliability of analyses that use CDL data to identify recent agricultural trends, citing concerns about both the CDL's accuracy and its appropriateness for measuring changes to the landscape [35][36][37][38][39][40]. Such critiques often cite low reported accuracies for the CDLs when mapping certain crops in specific regions or when depicting nonagricultural land covers such as grasslands. Despite the potential validity of these concerns, all such critiques to date have lacked a systematic nationwide assessment of the CDL accuracy beyond comparisons with coarse data, thereby leaving substantial uncertainty surrounding the CDL's ultimate dependability. Furthermore, select approaches for measuring LULC change using the CDL and other land cover products may help overcome some of the CDL's limitations and improve analysis outcomes [41], though the efficacy of these techniques has not yet been fully quantified. For example, aggregating specific land cover classes into broader domains, such as cropland and non-cropland, can help address low classifier accuracies of specific cover classes by eliminating errors associated with distinguishing different crop types and among various non-cropland covers, such as the many grassland categories historically delineated in the CDL [30,40,42].
In this paper, we comprehensively quantified the accuracy of the CDL at the national scale and evaluated the outcomes relevant for applications of the CDL for mapping crops and cropland. First, we investigated the benefits of consolidating classes within remote sensing products and quantified the CDL's ability to distinguish between crop and noncropland covers at multiple spatial scales and thematic resolutions. Then, we calculated nationwide accuracies for both specific and aggregate classes of the CDL and mapped the spatial variation in accuracies across the U.S. based on congruence with FSA and NLCD data. We then explored the use of pixel-level classifier confidence information to provide additional higher-resolution understanding of thematic certainty. Finally, we estimated the annual bias in mapping specific crops within the CDL and derived new, bias-adjusted area estimates for the major crop types. We conclude with a discussion of the implications of these analyses with a particular focus on recommendations for improving LULC change analyses.

Overview of Assessed and Reference Datasets
The Cropland Data Layer is a crop-specific land cover map produced annually by the USDA National Agricultural Statistics Service (NASS). Complete coverage of the conterminous United States dates back to 2008, while some states and years predate the nationwide product. Primary satellite imagery inputs for the CDL vary according to availability and effectiveness but have included the Resourcesat-1 Advanced Wide Field Sensor (AWiFS), Resourcesat-2 Linear Imaging Self Scanning (LISS), Landsat-5 Thematic Mapper (TM), Landsat-7 Enhanced TM Plus (ETM+), Landsat-8 Optical Land Imager (OLI), Sentinel-2 A/B, and Deimos-1 and UK-2 from the Disaster Monitoring Constellation. Input images are collected and used internally by NASS throughout the growing season, and the final, publicly released CDL is intended to capture the area and geospatial distribution of crops in midsummer. Data processing and classification generally occur independently at the state level by NASS analysts, and the nationwide CDL mosaic that results contains up to 155 classes of cultivated crops and 23 classes of non-cropland covers. Most states, however, contain a smaller subset of applicable classes, typically fewer than 30 crops and a dozen non-crop covers [26].
In producing the CDL, NASS uses supplementary information from both the FSA and the USGS. Specifically, NASS leverages a selection of data from the FSA's Common Land Unit (CLU) administrative database to train all cultivated crop classes of the CDL and assess their accuracy. CLU data are collected and confirmed by USDA County Field Service Centers and constitute a comprehensive geospatially tagged database of all land owned by agricultural producers who participate in an FSA program [27]. This represents the most complete dataset on U.S. agricultural land use, but is not available to the public [16].
For training and assessing non-cropland cover categories, NASS uses the USGS-led NLCD as a reference [26,43]. The NLCD is a nationwide 30-meter resolution, 20-class land cover map that follows a modified Anderson level I/II classification system [44]. The product's mapping emphasizes non-cropped vegetative areas, and was historically produced for 5-year epochs, though the most recent product release has improved coverage to 2-3-year intervals. It should be noted that while the NLCD is used as an input in training the CDL classifier, the CDL does not simply revert to the NLCD in non-crop locations. Instead, the CDL incorporates the NLCD and other data to generate its own unique mapping of non-crop areas.
During the assessment of the CDL, NASS produces and publishes online the confusion matrices used to determine the reported accuracies. Referred to as the "error supermatrices," these datasets are generated each year at the state or multistate level and report the number of times specific CDL classes were mapped either consistently or inconsistently against CLU data from the FSA for all cultivated crops, or against the NLCD for noncultivated land covers [26,27]. While the FSA data and NLCD provide valuable references for comparison, each differs from traditional reference data used for land cover map evaluation. In particular, the FSA data are not selected via a probability sampling design. In addition, because the dataset is generated for other USDA programmatic purposes, its classes do not always align perfectly with the classes of the CDL, leading to potential mismatch between the target and reference data. Nevertheless, the FSA dataset represents an incredibly rich and extensive source of reference information that is of a quality rarely available for remote sensing accuracy assessments. The NLCD, as a satellite-based land cover map, is not fully independent nor necessarily more accurate than the CDL. The NLCD is also not produced annually, such that the closest NLCD product available at the time of CDL production must be utilized, leading to potential temporal mismatch between the target and reference data. Despite these limitations, these two datasets provide powerful points of comparison for understanding how CDL performance varies across space and time.

Investigating Effects of Aggregation: Superclass and Consolidated Class Accuracies
We used the data reported in the CDL error supermatrices to derive supplemental accuracy metrics useful for characterizing and understanding the CDL across scales and applications. A summary and example of each accuracy metric we assessed is presented in Table 1, with further details of their derivation described in the section below. The likelihood that actual corn is mapped as corn The likelihood an area mapped as corn is actually corn

Superclass Accuracy
Specific classes An aggregated domain The likelihood that actual corn is mapped as cropland The likelihood an area mapped as corn is actually cropland

Consolidated Class Accuracy
An aggregated domain An aggregated domain The likelihood that actual cropland is mapped as cropland The likelihood an area mapped as cropland is actually cropland

Average Class Accuracy
An aggregated domain Specific classes The likelihood that any crop is mapped as that specific crop The likelihood that any mapped crop is actually that crop Initially, NASS treats their reference data as a simple random sample and calculates the class accuracies for all specific land cover classes within each state according to the general formula: Class Accuracy x = Pixels correct x Pixels total x (1) for each specific crop x, where pixels correct is the number of mapped pixels that match the reference data in a given region, and pixels total is either the total number of reference data observations (for calculating producer's accuracy) or mapped pixels (for calculating user's accuracy) for each class. Producer's accuracies reflect errors of omission; they indicate how likely a feature is to be correctly captured by the remote sensing product. User's accuracies reflect errors of commission, and indicate how likely a mapped class correctly resembles features on the landscape [45]. Aggregating land cover classes to broader thematic classes increases accuracy by lowering thematic specificity [28,46]. To understand how well the CDL can distinguish general cropland from non-cropland areas, we assessed the accuracy of aggregated cropland and non-cropland domains as delineated in Lark et al. (2015), based on original NASS distinctions [16,26]. The aggregated cropland category includes all annually cultivated row, closely planted, and horticultural crops as well as tree crops and actively tilled fallow (Appendix A Table A1). The non-cropland domain includes all remaining CDL classes.
First, we calculated how frequently each specific class of the CDL is mapped as any class within the correct cropland or non-cropland domain. We refer to this as the superclass accuracy for each specific class, and derived it as Superclass Accuracy C,x = Pixels in correct domain C Pixels assessed x for each specific class x included in the domain C (e.g., cropland or non-cropland). For the cropland domain, the superclass producer's accuracy indicates how frequently a specific crop on the landscape (e.g., corn) was mapped by the CDL as any type of crop in the cropland domain. The corresponding superclass user's accuracy represents how likely a pixel mapped as a specific crop was actually any type of crop (i.e., cropland) on the landscape. From the relationship between specific class accuracy and superclass accuracy, it is possible to quantify the relative number of mapping errors where confusion occurs with another class within the same broader domain. We define this metric, which we refer to as the within-domain error rate, as the difference between a class's error rate and its superclass error rate, normalized by the class error rate. It can also be derived directly from the previously calculated accuracy metrics as Within Domain Error Rate C,x = Superclass Accuracy C,x − Class Accuracy x 1 − Class Accuracy x for each specific class x included in the domain C. Then, we calculated the overall consolidated class accuracy for the entire cropland domain according the following equation: where x is each specific class belonging to the set of all classes in domain C, area is the area of class x, and superclass accuracy is the value calculated in Equation (2) above. Because the superclass accuracies give the likelihood that a specific class will correctly identify the broader domain, taking the area-weighted mean of the superclass accuracies across all classes within a domain depicts the likelihood that any class in a domain will correctly identify the broader domain. For the consolidated cropland domain, this calculation generates a single value that represents the accuracy with which the CDL can identify cropland in a given state and year. The user's accuracy for consolidated cropland represents the likelihood that any randomly selected pixel mapped as cropland in the CDL is actually cropland on the landscape. The producer's accuracy for consolidated cropland is the likelihood that cropland on the landscape is correctly mapped as cropland in the CDL. In similar fashions, Equations (2) and (4) can be used to calculate superclass accuracies for each specific non-crop class and for the single consolidated non-cropland domain. For thoroughness and comparison, we also calculated the average specific class accuracy for each domain, according to the following equation: The average specific class accuracy indicates how accurately, on average across the full domain, a randomly selected class is mapped in a given year. Tracking the average specific class accuracy across several years can thus indicate how well the CDL historically performed and improved over time at delineating specific crops.

Calculating Nationwide Accuracies
We next estimated nationwide accuracies for each original CDL class as well as for the newly derived aggregated metrics. To calculate nationwide accuracies, we weighed each state accuracy to account for disproportionate class areas and reference observations. For specific class accuracies of the original CDL, we normalized according to the following equation: where S is the set of states or multistate regions for which data are produced in a given year, Area is the total area of class x mapped within the state or region i, and accuracy is the user's or producer's accuracy (Equation (1)) for region i. Similarly, Equation (6) was used to calculate the nationwide superclass accuracies for each crop by replacing the specific class accuracies with the appropriate superclass accuracies derived from Equation (2) above.
To derive the nationwide accuracies for consolidated land cover classes, we also area-weighed by each constituent class. This accounted for unequal areas of each class within the consolidated domain and ensured proportional contributions to the accuracy of the combined class. We considered only classes for which accuracy data existed when summing class accuracies and areas, since failure to exclude the area of classes without data would falsely skew the nationwide mean values. Using the available data, we calculated nationwide accuracies for consolidated classes using the following formula: Using the specific class accuracy in this formula gives the nationwide-specific class accuracy averaged across all land covers in the broader domain. Specifically:

Mapping Spatial Patterns of CDL Accuracy and Confidence
We mapped a composite of all state-and class-level users' and producers' accuracies for each specific crop and non-cropland cover to better understand how the accuracy of CDL data varies spatially across the U.S. To generate these maps, each original CDL pixel was assigned the value of its specific class accuracy for that state and year and rounded to the nearest integer to facilitate storage as an eight-bit raster. We also mapped and delineated crop and non-crop components of the CDL confidence layer, which was provided courtesy of USDA NASS. The confidence layer is a coproduct of the remote sensing classification process and provides a measure of how well a specific pixel fits within the decision tree ruleset used to classify it [26,47]. A unique benefit of the confidence dataset is that it provides an independent value for each individual pixel, rather than a single value for all pixels of a given class within a state. It thus varies at the pixel level, enabling improved spatial understanding of expected errors within the CDL product [26].
We then combined the assessed accuracy and classifier confidence data into a single metric of CDL certainty to better understand the spatial variation in CDL performance. By integrating the pixel-resolution confidence layer into the state-and class-resolution accuracy estimates, a combined metric may offer additional insight or improved spatial representation of expected errors compared to standalone accuracy indicators. This is similar to the approach of using posterior probability spaces in change vector analysis [48].
We considered several ways to combine the accuracy and confidence data, including multiplying the two components (Equation (9)), averaging them (Equation (10)), and additional more elaborate combinations (e.g., Equation (11)): Certainty = (Class Accuracy + Pixel Con f idence) 2 (10) The approaches of Equations (9) and (10) benefit from their simplicity and intuitiveness. In Equation (11), the confidence data are used as a scalar multiplier to modify the classlevel accuracy: if a pixel is mapped more confidently than the average of the other pixels in its class, then its certainty value will be greater than its class accuracy; if a pixel is mapped less confidently than average, then its certainty value will be lower than its class accuracy. Ultimately, the selection of a formula should be based on the needs of the specific application [48]. Thus, we present results only from the simple product combination (Equation (9)) in order to illustrate the concept and potential value of combining accuracy and confidence data but leave further investigation to future work and specific applications.

Estimating Map Biases and Bias-Adjusted Crop Acreages
Due to misclassifications within remote sensing products, area estimates derived directly from pixel counts are likely to be incorrect and either over-or under-predict actual class area. Using data derived from confusion matrices, it is possible to quantify this bias relative to the reference data and subsequently make bias-adjusted area estimates accordingly [10,49]. While best practices in accuracy assessment stipulate the use of bias adjusted estimators with a probability sampling design [12,50], a simplified estimate of map bias and adjusted area estimates may still be derived and useful for products such as the CDL, where a large and high quality-though non-probabilistic-reference dataset is available. To illustrate this, we calculated the nationwide relative bias of each crop using the producer's and user's accuracy: Simple Bias x = Producer s Accuracy x User s Accuracy x − 1 (12) for each class x where the producer's and user's accuracies were those derived in Eq 6. This indicator of bias is equivalent to the number of assessed pixels mapped as class x divided by the number of assessed pixels classified as class x in the reference data, such that it reflects the relative over-or under-mapping of a class compared to the reference data. We then calculated bias-adjusted area estimates for each class x by scaling the raw CDL acreage estimates by the amount of over or underprediction suggested by the bias: where Class Area is the area estimate for each class x derived from pixel counting and the Simple Bias is that derived in Equation (12).

Results
We first present results from our nationwide analysis of specific class accuracies, followed by nationwide results for the aggregated superclass and consolidated class metrics. Throughout the results section, we focus on data for the year 2012 as an example because it represents an intermediate year within the CDL's modern era of nationwide coverage, it was used in multiple applications [22,35,51], and it aligns well with the Census of Agriculture, the Natural Resources Inventory, and other intermittent data sources often used for comparisons with the CDL. The year 2012 was also particularly challenging for mapping agricultural LULC-moderate resolution imagery was limited, and a severe drought impacted crop development in many regions-such that our findings should be considered a conservative estimate of the performance of the CDL. For completeness, results were also generated for all years of nationwide CDL coverage 2008-2016 and have been reposited online as companion datasets at https://doi.org/10.5281/zenodo.4579863 (accessed on 1 January 2021).

Nationwide Accuracy of Specific CDL Classes
Nationwide area-weighted accuracies for the major crop classes of the CDL are generally very high. In 2012, corn, soybeans, and winter wheat-the three largest crops by area-were mapped correctly 95, 94, and 92% of the time from both the producer's and user's perspectives. The top 20 CDL land cover classes by area and their associated producer and user accuracies for 2012 are presented in Table 2, with accuracies for all 130 assessed land cover classes for 2012 included in Appendix A Table A2.
Overall, 10 crops had nationwide producer's accuracies of 90% or greater in 2012. These included sugarcane (97%); rice (96%); corn (95%); soybeans (94%); sugarbeets (94%); canola (94%); winter wheat (92%); cotton (91%); almonds (91%); and cranberries (91%). Five additional crops had class producer's accuracies higher than the average for all crops, 88.7%, and the remaining 90 crops with computable accuracies fell below the average class accuracy. In the same year, 17 crops had nationwide user's accuracies of 90% or greater (Appendix A Table A2). The remaining 88 crops had user's accuracies below the average of 90.3%. The disproportionate number of crops with below-average accuracy reinforces observations that the CDL performs best for major crops (defined by area) and less so for minor crops. To this end, the 10 crops with the highest producer's accuracies made up 71.5% of the total mapped crop area. Table 2. Nationwide class accuracies of major individual land covers in the 2012 Cropland Data Layer (CDL). Table shows area-weighted national average accuracies for the 20 most common classes by area in the 2012 CDL, calculated according to Equation (6), based on data from USDA National Agricultural Statistics Service (NASS). National accuracies of all crops and land covers for 2012 are listed in Appendix A Table A2. Reported accuracies of specific non-crop classes of the CDL were generally lower than those of major crops (Table 2). However, it is important to acknowledge that the reported figures do not represent congruence with a verified ground or truth dataset of non-cropped areas, but rather are assessed against a reference dataset consisting of both FSA administrative crop data and the NLCD, itself a remotely sensed land cover map subject to misclassifications. Nonetheless, the lower levels of reported accuracy in the CDL non-crop classes suggest higher levels of uncertainty and potential error in the product and/or reference data, particularly when compared to the high-performance crop classes. The specific categories of open-, low-, and medium-intensity developed land as well as deciduous and coniferous forest, shrubland, and open water were all mapped with nationwide accuracies of greater than 80 percent, whereas specific classes of herbaceous and woody wetlands and grassland/pasture had lower nationwide performance that ranged from 47-79% (Table 2).

Consolidated Cropland and Non-Cropland Accuracies
Specific land cover classes of the CDLs are often combined into aggregated categories for applications such as measuring cropland area or conversion between major land cover types. As an example of aggregation, we assessed the accuracy of consolidated cropland and non-cropland domains across the U.S. from 2008-2016.
The area-and class-weighted nationwide accuracies for consolidated cropland in 2012 were 95.0% (producer's) and 97.4% (user's). Accuracies for the consolidated non-cropland domain were 97.8 and 88.8%, respectively. Consolidated classes also performed consistently well across time (Table 3). For example, in 2008-the oldest year for which nationwide data were produced-cropland user and producer accuracies were 95% and 98%, respectively. Table 3. Average specific class and consolidated class accuracies for each year of the CDL. Data from USDA NASS (2016) based on the comparison of CDL with data from Farm Service Agency (FSA) and National Land Cover Dataset (NLCD) and processed according to equations (7) and (8). Cropland and non-cropland domains based on class distinctions in Appendix A Table A1. In 2012, 30 of the 40 state or multistate assessment regions of the CDL had consolidated cropland producer's accuracies of 90% or greater (Appendix A Table A3). On the user's sides, all but two states-New York and Pennsylvania-mapped cropland correctly 90% of the time or greater. Oklahoma (OK) and Arizona (AZ)-more arid states where cropland contrasts with the surrounding landscape and is often irrigated-had the highest cropland user's accuracies, with values over 99%. More broadly, states with greater amounts of cropland typically had higher consolidated cropland accuracies (Figure 1), though this effect appeared to saturate beyond a certain threshold of crop area (e.g., 5 million acres). Similar trends were also observed when assessed by proportion (rather than total area) of cropland within each state [16,35].

Superclass Accuracies of Specific Crops and Land Covers
Within the aggregated domains, certain classes are more (or less) likely to align with their broader domain. Among crops mapped in the CDL with greater than one million acres, rice was the most accurate predictor of cropland on the landscape and most likely to be correctly identified as cropland, having superclass user's and producer's accuracies both over 99% in 2012 ( Table 4). Areas of corn, the most prevalent crop, were labeled as cropland by the CDL 98% of the time in 2012 (superclass producer's accuracy), and pixels mapped as corn in the CDL were actually cropland on the landscape 98.5% of the time (superclass user's). Fields of alfalfa, oats, and fallow/idle cropland, on the other hand, were correctly labeled as cropland by the CDL just over 80% of the time. On the user's side, alfalfa was the only low outlier, yet still had an 86% superclass user's accuracy for the cropland domain.
Within the non-cropland domain, most superclass accuracies were high, with only a few exceptions (Table 5). Developed/Open Space was incorrectly mapped in locations that were actually cropland 25% of the time in 2012. Grassland/Pasture had an even lower user's accuracy and was mapped in cropped locations 32% of the time that year. Furthermore, the high ratio of superclass producer's accuracy to superclass user's accuracy-indicative of bias-in each of these classes suggests they are both considerably overmapped in locations that are actually cropland. Table 4. Superclass producer and user accuracies for the top 20 classes by area in the cropland domain in 2012 as well as the relative rate of within-domain errors. Superclass accuracy is the likelihood that a given crop is identified correctly as cropland. Percentage of errors within domain is the proportion of errors in the original CDL where the confusion occurs among two crops within the cropland domain, rather than between a crop and non-cropland cover. Overall, the high superclass accuracies of non-crop classes compared to their low specific class accuracies reported in Table 2 suggests that a sizable portion of the mapping errors result from within-domain confusion among the various non-crop classes, rather than between non-cropland covers and crops. To quantify this, we calculated the relative within-domain error rate for each CDL class. This metric indicates what percentage of mapping errors were a result of confusion within the same domain. For example, corn had a relative within-domain omission error rate of 57% in 2012, which means that slightly more than half of the missed (i.e., omitted) corn fields were mapped as another crop in the CDL, rather than mapped as a non-cropland cover ( Table 4). The within-domain proportion of commission errors for corn was 73%, which indicates that roughly three-quarters of all pixels that were incorrectly mapped as corn in the CDL were actually another crop on the landscape rather than a non-cropland cover. Table 5. Superclass producer and user accuracies for all 16 classes in the non-cropland domain in 2012. Superclass accuracy is the likelihood a given class is correctly identified as non-cropland. Percentage of errors within domain is the proportion of errors in the original CDL where the confusion occurs among two land covers within the non-cropland domain, rather than between a crop and non-cropland cover. Nationwide, most crops had within-domain error proportions greater than 50%, which signifies that they were most frequently confused with another crop when mapped incorrectly. Two notable exceptions were alfalfa and fallow/idle cropland, which had within-domain omission error rates of 27 and 35%, respectively. Thus, alfalfa and fallow fields that were incorrectly captured by the CDL were most frequently classified as a non-cropland cover. Alfalfa's within-domain commission error rate was also less than 50%, which suggests that pixels incorrectly mapped as alfalfa in the CDL were most likely to be non-cropland covers on the landscape.

CDL ID
The proportion of within-domain errors for errors of omission for all non-cropland covers were greater than 50%, indicating that misclassified non-cropland covers were most likely to be labeled as another non-crop cover by the CDL. However, aquaculture, developed/open space, and grassland/pasture all had low within-class rates of errors of commission, which indicates that when incorrect, these land covers were frequently mapped in locations that were actually cropland.

Spatial Patterns of CDL Accuracy, Confidence, and Certainty
CDL accuracy for specific crops varied greatly across the U.S. In general, most crop accuracies in 2012 were highest within major cropping regions such as the Corn Belt, Central Plains, and Mississippi Delta (Figure 2a; Appendix A Figure A1). Conversely, crop accuracies were lower along the periphery of these core production zones and in less dominant agricultural regions of the eastern, southern, and western parts of the U.S. These locations with lower accuracy have a higher prevalence of less common crops (e.g., crops other than corn and soybeans), which are typically mapped less accurately due to more limited reference and training data from FSA and a charter by USDA to focus mapping efforts on major program crops [15,41]. In addition, a greater mixture of crop and non-cropland covers in these areas generates more opportunities for misclassification. Non-crop classes had the highest levels of reported disagreement between mapped and reference sources in the northern and southern plains (Figure 2b). Most western states, on the other hand, had a clearer identification of non-cropland cover types, particularly across the vast non-cultivated areas in the region. Mid-Atlantic states and the eastern Corn Belt also contained relatively high non-crop accuracies considering their diverse composition of land cover classes.
The visual inspection of confidence layers suggests that the locations of mixed pixelsmap units which fall across two or more land covers-are often mapped with lower confidence than adjacent single cover pixels. For example, in heavily cultivated regions of the country such as Iowa, mixed pixels commonly occur between adjacent fields and along roadways, where they are often the cause of misclassification in the CDL and other remote sensing products [52,53]. In forested regions of the U.S., confidence levels were also low, even across large uninterrupted swaths of forest land cover. In these such locations, the low confidence reflects difficulty by the classification algorithm in delineating the specific type of forest cover-i.e., deciduous, coniferous, mixed forest, or woody wetland.
Regionally, CDL confidence levels are high across the Midwest and west, and lowest in the southeast, northeast, and Great Lakes regions (Figure 2c,d). Within specific regions of similar land cover, there is also variation. For example, in the cultivated region of the Texas panhandle, cotton and corn on the western edge are both mapped with lower confidence, perhaps due to a greater amount of land use change and intermittent cropping patterns in that area. Across the North and South Dakota, crops tend to be consistently mapped with lower confidence the farther west they are located (Appendix A Figure A2).
To extract further insights about the within-class spatial variation of CDL performance, we combined the classifier confidence data with assessed class accuracy into a single measure of CDL certainty. Figure 2e,f shows an example of the combined accuracy x confidence product at the national scale. Integrating pixel resolution spatial variation from the confidence layer into the existing state and class resolution accuracy estimates is particularly applicable to nationwide and multistate analyses since the confidence data have greater continuity among state products.
In addition to helping normalize certainty across regions, the use of both accuracy and confidence information independently or in combination may provide improved insights into local uncertainty. Figure 3 shows an example of an agriculturally intensive region of southern Iowa. Here, accuracy data help demarcate field-sized tracts of land that have low class accuracies (Figure 3a), which are locations that data users may wish to withhold from analyses due to the large uncertainty associated with their classification. Alternatively, the confidence layer captures finer levels of uncertainty due to mixed pixels or other contributors to local uncertainty such as topography or ambiguity among land covers (Figure 3b) but fails to consider the likelihood of the mapped class being incorrect. Considering both accuracy and confidence data (Figure 3c) thus provides insights into multiple dimensions of uncertainty and may be valuable for improving the certitude of mapping and map applications.

Measured Biases and Adjusted Crop Area Estimates
Adjusted estimates of crop area informed by map biases can improve upon raw pixelcount area estimates by calibrating them against the reference data used for assessment. Table 6 presents the simple map biases (Equation (12)) and associated adjusted acreage estimates (Equation (13)) for the 18 largest crop classes for the CDL for which there are also relevant data from official USDA acreage estimates. Given that the CDL represents midsummer estimates of crop extent, we include NASS data for both planted and harvested areas, as well as the average of these two metrics. For ten of the 16 crops with comparable NASS planted and harvested data, the simple bias-adjusted acreage estimate was closer than the raw pixel-count estimate to the average of NASS planted and harvested areas. As such, the adjusted results provide refined measures of crop area that are independent of (but more consistent with) other acreage estimates such as the NASS Surveys or Census of Agriculture and could be used to complement or replace raw CDL pixel count area estimates in various applications. Table 6. Simple bias and bias-adjusted acreage estimates for major crops for 2012. CDL area represents the summed area of all pixels in the CDL. CDL bias and bias-adjusted acreage were calculated for each crop according to Equations (12) and (13) using the producer's and user's accuracy data of Appendix A Table A2. NASS planted and harvested areas are from the annual NASS acreage report, released on June 29, 2012. Harvested cotton from 2012 October production report. All area values are reported in acres. Assessing the changes in mapped biases over time may also aid in understanding the true dynamic compositions of crops on the landscape. Figure 4 charts the simple bias of four major crops over time. According to the estimates, the mapping of both corn and soybeans by the CDL relative to their reference data have increased only slightly, and in tandem, over time. In contrast, alfalfa has gone from being under-mapped by 12.6% relative to the reference data in 2008 to being under-mapped by only 1.6% in 2016, which marks a considerable change over time. As a result, estimates of alfalfa area based on direct CDL pixel counts could embody a sizeable artificial increase.

Discussion
The Cropland Data Layer currently provides the only annual information on agricultural land use/land cover across the United States that is geographically comprehensive, spatially explicit, and crop specific. Despite its prominent use and application, the accuracy of the CDL had not been well characterized at national scales nor across common aggregated classes. To fill this gap, we derived and analyzed multiple metrics of certainty for the CDL across space and time to better understand its performance and associated implications for measuring LULC and its change.

CDL Performance
Based on nationwide assessment, it is evident that the CDL consistently identifies specific major crops like corn and soybeans with very high accuracy. On the other hand, select land cover classes such as alfalfa and grassland/pasture are captured correctly only about 75% of the time, which reflects the CDL's generally lower performance outside of the major crop classes, a point frequently discussed in state and regional evaluations [35,40].
To accommodate low accuracies, specific classes can be aggregated into broader land cover domains such as cropland or non-cropland. Our results spatially and numerically quantify the effectiveness of this approach and show that across the U.S., cropland areas are mapped correctly by the CDL at least 97% of the time for all years. These findings confirm the CDL's acuity of identification and demonstrate its validity for monitoring cropland locations and associated shifts over time.
Mapping the spatial variation in class accuracies across the United States reveals clear geographic trends and patterns in the CDL's performance. In general, specific crop accuracies are highest within core agricultural areas and among major USDA program crops. Cropland superclass and consolidated cropland accuracies, however, are consistently high across the country, and further illustrate the value of aggregating to broader domains when attempting to measure land cover across large areas or across all CDL classes, particularly on the margins of major crop zones.
The use of map bias information to adjust area estimates provides a quantitative means to improve crop area calculations based on remote sensing products [10,50]. Similarly, the simplified bias-adjusted approach for estimating crop area reported here improved upon raw CDL pixel-based estimates by correcting for misclassifications and also provides a more comprehensive accounting of cropland than the FSA reference data would provide on its own, since that data source only captures land with crops that participate in FSA programs. Our approach thus combines desirable features of both the CDL and FSA datasets, while remaining independent of other USDA data sources like the NASS surveys or Census of Agriculture that are occasionally used for calibration or comparison.

Improvements over Time
For most metrics, we reported on the performance of the 2012 CDL, although variability exists across years. Overall, CDL accuracy has improved over time, due in part to use of additional satellite input (more sources and more images per year), a more robust classification process (an ensemble decision tree instead of maximum likelihood methodology), and increasing amounts of training data from the FSA and elsewhere [41]. As a result, average class-specific accuracy for all crop classes has improved from 87% in 2008 to 92% in 2016. By 2016, a total of 17 crops were mapped with 90% or higher producer's accuracy, up from just 10 crops in 2008. Aggregate metrics, including consolidated and superclass accuracies for the cropland and non-cropland domains, have also improved. However, the magnitude of their increases is more limited due to their already high performance across time.
The annual changes in performance of the CDL can have important ramifications for CDL-based analyses. If the bias or relative over-or under-mapping of a class changes over time, it can induce false signals of LULC change or skew estimates of crop area change. Lark et al. (2017) explore the implications from the change in total cropland bias and suggest potential solutions [41]. Here, we show that there are also sizable changes in bias for specific crop types. These changes, if disregarded, may influence the results of analyses of those crops over time. For example, unadjusted estimates of the increase in corn acreage following the biofuels boom could be affected by artificial changes in corn mapping across time. However, the magnitude and direction of impact depends on the specific years of analysis and may be counterbalanced by parallel biases in soybeans and other crops. Thus, analyses that focus on the relationship among corn, soybeans, and cropland-or any classes that have experienced synchronized changes in bias-likely remain valid despite potential eccentricities in the underlying data. Nonetheless, it is important to consider the biases of mapped data in applied analyses, particularly when results may influence industry and policymaking.

Implications for Measuring LULC Change
The use of aggregated classes to measure LULC change benefits from the high acuity of the product to detect a broader domain while avoiding challenges of delineating spectrally similar land covers within the same domain. When measuring conversions between cropland and non-cropland, the consolidated classes can thus be used to initially detect change, followed by subsequent identification of the specific land cover or crop planted before and after the conversion [22]. The assessment of crop specificity after detecting change maintains the thematic richness of the original CDL dataset without adversely affecting detection of a conversion between the aggregated domains. In practice, this isolates the known uncertainty in specific class identification and removes it from the change detection process.
Using this two-stage approach, the likelihood that a conversion occurred becomes a function of the highly accurate aggregated classes, whereas the certainty of which specific land cover class preceded and followed a conversion (given that the conversion was correctly identified) is dependent upon the land cover's specific class accuracy. Thus, for cropland conversion estimates such as Lark et al. (2015) or Morefield et al. (2016), the class accuracies reported in our Table 2 most closely represent the likelihood that a given crop was planted on newly converted land, rather than directly indicate the likelihood that a conversion occurred [22,54].
The challenges of mapping less-common specific crops and the ease of mapping aggregate cropland have additional implications for CDL-based applications. For example, it might be argued that the CDL is more appropriate for detecting broad land use changes (e.g., conversion between cropland and non-cropland) than for identifying nuanced changes among specific crops (e.g., identifying crop rotations) unless the focus of rotations remains on major crop types [18,19,55]. Crop-specific applications should also consider each class's prevalence and accuracy and how such factors may influence results.
Our findings can also be used to guide how specific crops should be treated within analyses. Alfalfa, for example, is often cited as a problem crop due to its semi-perennial nature, spectral similarity to non-cropland covers, and occasional interplanting within mixed species hay and pasture. The crop was incorrectly mapped in non-cultivated areas 14% of the time in 2012. By 2016, this superclass error rate dropped to just 8%. From a producer's perspective, alfalfa was mapped as a non-cultivated land cover 20% of the time in 2012, but this error rate dropped to 8% by 2016. Overall, the lower superclass accuracies for alfalfa relative to other crops reinforce precautions of past analyses, such as the exclusion by  (Table 4) further highlight the challenge of including alfalfa in the cropland domain, since the crop is more frequently confused with non-cropland covers than with other crop classes. However, the latest improvements in alfalfa accuracy suggest that analyses of more recent CDL data may want to consider including the forage crop in their analyses.
Visual mapping of specific and aggregate accuracies can help users identify hotspots and problem areas within the country and understand how they vary across space and time. Coupling accuracy data with its spatial location on the landscape thus offers opportunities unafforded by the nonspatial structure of the NASS metadata tables and confusion matrices for each state and year. For example, rather than excluding entire land cover classes from analyses, such as the exclusions of alfalfa by Morefield et al. (2016) and Lark et al. (2015), the spatial mapping of the accuracy of individual classes would allow the empirical removal of just those pixels with low mapped accuracy in certain state-year combinations, while retaining those with a higher likelihood of being correct. The value of this spatial approach is greatest in analyses that consider multiple years of CDL data, where the number of state, class, and year combinations is multiplicative. For example, for an assessment of change between two years, there are typically over a million unique combinations of state and class pairs, each with its own likelihood of being correct (e.g., 50 classes times 40 states for year one multiplied by 50 classes times 40 states for the second year yields four million combinations). The manual selection of which specific LULC class combinations to include or exclude based on accuracy thus becomes intractable, whereas the spatial accuracy maps can be used to easily select only those combinations that meet a quantitative accuracy threshold.
The integration of confidence layer data with assessed accuracy data may also improve spatial insights. For example, in many CDL-based change detection analyses, postclassification processes such as spatial filters and minimum mapping units have been used to indiscriminately remove areas of apparent change that are likely falsely mapped due to mixed pixels or misclassifications. Alternatively, accuracy and confidence data could be used to set a threshold of certitude below which any identified potential change is flagged for removal. Probability information from the remote sensing process has previously been used to improve vector-based detection of land cover change using unclassified Landsat data [48]. Here, we suggest that confidence information from the remote sensing process could similarly help improve the post-classification detection of LULC change using land cover products. While we have not quantified the impact of such an approach, it has since been used in other studies to set a higher threshold of certainty for change detection [56].
Confidence layer data could also be used in concert with accuracy information to spatially allocate error adjustments. For example, here we modified area estimates for each crop using an accuracy-derived indicator of bias (Table 6). However, such area adjustments typically do not spatially correct pixels on the map, unless this issue of reconciliation is specifically addressed [57]. To help achieve this reconciliation in post-classification environments, confidence data could similarly be used to select the pixels with the lowest confidence as candidates for reclassification. For example, if the CDL overestimated corn area by 500 pixels in a given state, the 500 pixels of corn with the lowest confidence could be removed to make a spatially explicit, bias-adjusted map of corn that was consistent with the reference data estimates of area.

Limitations, Representativeness, and Uncertainty of Results
The class consolidation techniques described here do not modify the underlying performance of the remote sensing product, but rather improve the representativeness of the accuracy at which the product maps aggregate domains. Of note, aggregating classes improves accuracy by lowering the product's thematic resolution or specificity-thus improvements are made by accommodating errors rather than by correcting them. The greatest benefits are therefore achieved when the thematic resolution of the product matches the desired application. When using aggregated remote sensing products in applications, it is important to quantify these associated changes in accuracy so that the reported metrics and critiques reflect the actual data used.
There may also be variation in the representativeness of the CDL's reported accuracy statistics. The FSA reference data used to assess the CDL are not based on a probabilistic sample, but rather on an availability approach, with the majority coming from 10 key USDA program crops. As a result, the reported consolidated class accuracies are most representative for those crops, and less characteristic for specialty crops and non-crop covers. Similarly, the distribution of crop sample data across geographic regions are in some places disproportionate to the amount of crop produced there. Therefore, the accuracies of certain regions are more reliable than others due to differing levels of reference data available for assessment.
To maintain the highest level of representativeness while calculating national average crop accuracies, we weighted the accuracy of each crop in each state by the total acreage of that crop in that state. For example, Iowa produced 14% of all corn in the nation in 2012; thus, its accuracy was weighted to contribute 14% of the national accuracy for corn. An alternative method for calculating nationwide accuracies is to sum all national reference observations without regard to spatial distributions of the data, and such an approach has recently been implemented by NASS to report nationwide accuracies for select years in the online CDL metadata [26]. Here, we choose to area-weight by class prevalence, such that the nationwide estimates reflect that of a pixel selected at random and are unskewed by nonrepresentatively sampled reference data.
Uncertainty can also stem from errors in the reference data or a mismatch between reference and evaluated data. For example, the FSA CLU classifications of grasslands are often inconsistent across states and time, and occasionally they do not align with CDL land cover designations. Thus, analysts at NASS make a judgement for each state and year on how to best utilize the FSA data for training and assessing accuracy. Discrepancies in how the FSA data are reported and incorporated can thus occasionally lead to apparent differences in error rates across states and years, when in reality the inconsistencies between the CDL and the landscape are much smaller. Similarly, errors exist in the NLCD data used for training and assessing non-crop areas of the CDLs, which in turn affect their production and assessment. It is possible that some CDL non-crop classes are more correct than the associated NLCD classes on which they are based and evaluated, given that the CDL is updated and improved annually, it includes exclusive confidential FSA training data, and it generates higher accuracies for cultivated areas. Thus, the reported non-crop accuracies of the CDL (based on comparison with the NLCD) may underestimate the true performance of those CDL classes.

Conclusions
The CDL is a powerful and unrivaled tool for the exploration of agricultural landscapes and is poised to remain the premier remotely sensed agricultural LULC map in the U.S. due to its annual availability, crop-specific detail, and exclusive access to expansive and robust ground-based reference datasets from the USDA. We show that the CDL identifies major crops and certain land covers with high accuracy across the U.S., and that this ability holds true for all years of nationwide data coverage. Our findings also confirm that the CDL exhibits extremely high acuity at discerning the aggregated classes of cropland and non-cropland across spatial and thematic scales. Explicitly considering the bias within specific classes and incorporating confidence layer data provide two additional opportunities to further improve CDL performance and its use in LULC change assessments and other applications.
While the original CDL dataset can indeed provide challenges for applications that are beyond its original intent of mapping annual crop locations, it is the responsibility of its users to apply the data in ways that do not compromise results. The CDL's consistent and reliable performance in mapping crops and cropland nationwide and across time clearly demonstrates that many of the critiques and concerns regarding the underlying accuracy of the product are unfounded or dissipate when thoroughly assessed at appropriate scales. Furthermore, the substantial uncertainty and resource costs of alternative methods for monitoring crops and croplands, such as through ground surveys or air photo interpretations, underscores the need for approaches that can systematically identify continental scale LULC change in an automated, reproducible, and verifiable manner. While many products based on remote sensing seek to fill this gap, the CDL is a dataset proven to be well suited for the task. When used appropriately, the CDL is a valid and indispensable tool for studying LULC and a crucial asset for monitoring contemporary cropland dynamics across the United States. Acknowledgments: Special thanks to Rick Mueller, Dave Johnson, and Patrick Willis at USDA NASS for their helpful discussions and for providing CDL confidence data. Thanks also to Meghan Salmon for her insights and initial help exploring consolidated crop accuracies using the CDL supermatrices and to Carol Barford and Volker Radeloff for their feedback and suggestions of analyses from early discussions of this manuscript. Thanks to George Allez for editing.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A   Table A1. List of CDL codes and class names and whether they were included in the cropland or non-cropland domain in the analyses of superclass and consolidated class accuracies. Domain delineations follow that of Lark et al. (2015) based on original NASS distinctions [16,22].

Cropland
Non-Cropland ID Class Name ID Class Name ID Class Name ID Class Name   An arbitrary grading scale of "A"-"F" was assigned to accuracy intervals to help users easily identify where the CDL crop map excels versus where additional caution may be warranted. Figure A2. Confidence of pixels mapped as corn in the 2012 CDL. Within a specific state, there can be large spatial variation in the degree of certainty with which specific crops are mapped. In South Dakota and North Dakota, corn is mapped more confidently in the eastern parts of the states (dark blue), where the crop is more prevalent, and is mapped less confidently (green to yellow) as one moves westward and the crop becomes less prominent.