Methods of Rapid Quality Assessment for National-Scale Land Surface Change Monitoring

: Providing rapid access to land surface change data and information is a goal of the U.S. Geological Survey. Through the Land Change Monitoring, Assessment, and Projection (LCMAP) initiative, we have initiated a monitoring capability that involves generating a suite of 10 annual land cover and land surface change datasets across the United States at a 30-m spatial resolution. During the LCMAP automated production, on a tile-by-tile basis, erroneous data can occasionally be generated due to hardware or software failure. While crucial to assure the quality of the data, rapid evaluation of results at the pixel level during production is a substantial challenge because of the massive data volumes. Traditionally, product quality relies on the validation after production, which is ine ﬃ cient to reproduce the whole product when an error occurs. This paper presents a method for automatically evaluating LCMAP results during the production phase based on 14 indices to quickly ﬁnd and ﬂag erroneous tiles in the LCMAP products. The methods involved two types of comparisons: comparing LCMAP values across the temporal record to measure internal consistency and calculating the agreement with multiple intervals of the National Land Cover Database (NLCD) data to measure the consistency with existing products. We developed indices on a tile-by-tile basis in order to quickly ﬁnd and ﬂag potential erroneous tiles by comparing with surrounding tiles using local outlier factor analysis. The analysis integrates all indices into a local outlier score (LOS) to detect erroneous tiles that are distinct from neighboring tiles. Our analysis showed that the methods were sensitive to partially erroneous tiles in the simulated data with a LOS higher than 2. The rapid quality assessment methods also successfully identiﬁed erroneous tiles during the LCMAP production, in which land surface change results were not properly saved to the products. The LOS map and indices for rapid quality assessment also point to directions for further investigations. A map of all LOS values by tile for the published LCMAP shows all LOS values are below 2. We also investigated tiles with high LOS to ensure the distinction with neighboring tiles was reasonable. An index in this study shows the overall agreement between LCMAP and NLCD on a tile basis is above 71.5% and has an average at 89.1% across the 422 tiles in the conterminous United States. The workﬂow is suitable for other studies with a large volume of image products.


Introduction
Land change science helps understand the interactions between people and nature that lead to changes in the type, intensity, condition, and location of land use and cover. It seeks to provide scientific knowledge of land use and land cover patterns and dynamics affecting the structure and function of the Earth system [1]. Knowledge about Earth's land cover condition and how it changes over time is a other studies with a large volume of land surface image products. In the sections that follow, we provide introductions and preprocessing of input data in Section 2. In Section 3, we describe the evaluation indices derived from data and the method for error tile detection. We then present results that illustrate the method sensitivity via simulated data and the actual quality assessment during the production (Section 4). We conclude the paper with a discussion of the results and implications of the study.

The Land Change Monitoring, Assessment, and Projection (LCMAP) Products
Four products from the Collection 1 LCMAP [32] were evaluated: primary land cover, time of spectral change, annual land cover change, and model quality. The primary land cover product mapped eight land cover types (Table 1) at the annual scale, while the time of spectral change product mapped the day of year that spectral changes were detected. These two are major LCMAP products that are closely related to the other products. The annual land cover change product indicated thematic land cover changes that had occurred from the prior year to the current year. The model quality product characterized the time series model, such as how many harmonic frequencies are used in the model or whether a model failed to generate because of insufficient valid observations [32].

The USGS National Land Cover Database (NLCD)
We acquired the 2019 released NLCD land cover maps for 2001 [29], 2006 [42], and 2011 [28] as reference data for evaluation. NLCD products were selected because they provide spatially explicit and reliable information for a wide user group [28,30], making them a good resource for LCMAP in terms of consistency. Note that the initial analysis was conducted based on previous versions of NLCD 2011 maps and did not include the 2016 land cover map. Thus, in order to make a consistent comparison between analyses, we did not include the NLCD 2016 land cover map in this study. The NLCD products provided 16 classes, while the LCMAP land cover scheme had eight general classes. Thus, we translated the NLCD product to LCMAP classes (Table 1). Moreover, we removed the road network from the translated NLCD classes to reduce potential mismatches. This is because NLCD embedded the road features after the land cover classification from a non-remote sensing source and typically are not rendered as complete linear features in the LCMAP annual maps.

Method
The rapid quality assessment approach includes two major steps: (1) a list of indices is generated to characterize the spatial and temporal reasonableness of LCMAP products in each tile. The indices compare the LCMAP products with multiple years of NLCD products and evaluate the time series of the LCMAP products, and (2) the indices of each tile are compared with its nearest neighbor tiles using the local outlier analysis to evaluate the overall reasonableness, which assumes geospatially close tiles should have similar characteristics based on indices. We first tested the sensitivity of the approach using data with simulated errors, which also helped us to interpret the local outlier score (LOS) value. The approach was then applied to the production data. Table 2 shows the list of indices for this study. The indices are designed to evaluate the LCMAP product in three aspects: (1) the quality of products at the spatial domain by comparing the LCMAP and NLCD land cover maps; (2) the quality of the product in the temporal domain by calculating the variation of land cover and change products across 33 years; and (3) the prevalence of certain cases, such as insufficient valid observations, to initialize model or conversion of developed area. We extracted pixels that had a disagreement between LCMAP and the three translated NLCD land cover maps for 2001, 2006, and 2011. The Disagreement large patch indicated the largest area of cohesive disagreed pixels, while Disagreement salt pepper counted the number of single pixels that disagreed. As with the disagreement analysis, we analyzed the spatial pattern that did not have enough clear observations to build a CCDC model (No model large patch and No model salt pepper). The Urban decrease index depicted the maximum year-to-year loss of developed area according to the LCMAP land cover data, which assumed developed areas were less likely to be converted to other land cover types. The remaining indices estimated the temporal variation of land cover and change products. We calculated the rate of spectral changes for each year (Equation (1)) and extracted minimum, maximum, mean, and standard deviation of the rate time series as indices. Similarly, we evaluated the rate of land cover change from year to year (Equation (1)).

Index-Based Products Evaluation
where R and N total represent the change rate and the total number of pixels, respectively. N change represents the number of pixels with recorded change in the Timing of Spectral Change product (SCTIME), and the number of pixels with land cover changed from two consecutive years for the land cover product.

Comparing with Neighbor Tiles
To integrate the above indices and find erroneous tiles, the distance (reach-dist k (p, o)) from the tile p to its neighbor tile o is defined as Equation (2), while the density (lrd k (p)) at the tile p is the inverse of the average distance based on the N nearest neighbors (Equation (3)).
where d(p, o) represents the Euclidean distance between tile p and o, derived from indices. dist k (o) is the k th nearest distance to the tile p, which is used to ensure the distance is not too small, so the result is stable.
The local outlier score (LOS) at the tile p is then defined as the average ratio of the local density at the tile p and the local density of its neighbors (Equation (4)).
An LOS n (p) of approximately 1 or less suggests that tile p is comparable to its neighbors, while LOS n (p) that is significantly larger than 1 suggests the neighbor tiles are clustered and the tile p is an outlier. In this study, we estimated LOS for each tile in relation to the 8 nearest tiles based on all indices. Because those indices have diverse ranges of values, we normalized them to a mean equal to 0 and standard deviation equal to 1 based on its neighbors, so each index has the same contribution to the LOS.

Sensitivity Test Using Simulated Data
To investigate the sensitivities of this approach in responding to potential errors, we simulated two possible conditions ( Figure 1):

1.
Randomly erroneous pixels in one year 2.
Randomly erroneous pixels in all years These data simulated possible production problems at random pixels, in which actual value of primary land cover, time of spectral change was replaced by 0 to represent invalid outputs. We increased the portion of randomly erroneous pixels at every 5% step in a tile, and each simulation was evaluated by the LOS based on all indices.

Implementation to the Production
The rapid quality assessment approach was applied to the actual products repeatedly during the production phase to identify problematic tiles. The production was conducted by dividing CONUS into four mega ecoregions (East, East Central, West Central, and West) [10], and the production started from the eastern ecoregions of the United States each time. Thus, we applied the approach once the results were ready for the eastern ecoregions. The products were also intensively evaluated by members of the LCMAP team during production. Their evaluation was also used to assess the performance of this approach. Moreover, this approach provided auxiliary information for the evaluation procedure. production started from the eastern ecoregions of the United States each time. Thus, we applied the approach once the results were ready for the eastern ecoregions. The products were also intensively evaluated by members of the LCMAP team during production. Their evaluation was also used to assess the performance of this approach. Moreover, this approach provided auxiliary information for the evaluation procedure.

Sensitivity Test Result Using Simulated Data
The LOS was calculated for simulated data with randomly failed pixels ( Figure 2). As can be seen from Figure 2, 5% scattered errors raised the LOS from 1 to around 2. The LOS value of 2 suggested that some indices were twice as high as neighboring tiles in the simulated data, which suggested the approach was highly sensitive to the simulated mistakes whenever they occurred in a single year or through all years ( Figure 2). The LOS did not continue to increase dramatically with more failed pixels, though the value remained high. This is because the increased failed pixels could

Sensitivity Test Result Using Simulated Data
The LOS was calculated for simulated data with randomly failed pixels ( Figure 2). As can be seen from Figure 2, 5% scattered errors raised the LOS from 1 to around 2. The LOS value of 2 suggested that some indices were twice as high as neighboring tiles in the simulated data, which suggested the approach was highly sensitive to the simulated mistakes whenever they occurred in a single year or through all years ( Figure 2). The LOS did not continue to increase dramatically with more failed pixels, though the value remained high. This is because the increased failed pixels could turn to cohesive patches, which weakens the effectiveness of salt pepper related indices. Moreover, the failed pixels could also reduce the incidence of urban decrease or insufficient clear pixels.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 18 turn to cohesive patches, which weakens the effectiveness of salt pepper related indices. Moreover, the failed pixels could also reduce the incidence of urban decrease or insufficient clear pixels.

Quality Assessment for the Products
The rapid quality assessment approach was applied when the LCMAP products were first generated for the East mega ecoregion (Figure 3). Most of the tiles showed LOS less than 1.22, while two coastal tiles (H27V18 at West Palm Beach, Florida, and H26V20 at Key West, Florida) were above 2 ( Figure 3). The primary land cover confidence map showed that the two tiles did not have time series models saved at the land pixels, and LCMAP land cover classes were filled using the translated NLCD 2001 map ( Figure 4). The high LOS suggested the approach successfully identified these problematic tiles.

Quality Assessment for the Products
The rapid quality assessment approach was applied when the LCMAP products were first generated for the East mega ecoregion (Figure 3). Most of the tiles showed LOS less than 1.22, while two coastal tiles (H27V18 at West Palm Beach, Florida, and H26V20 at Key West, Florida) were above 2 ( Figure 3). The primary land cover confidence map showed that the two tiles did not have time series models saved at the land pixels, and LCMAP land cover classes were filled using the translated NLCD 2001 map ( Figure 4). The high LOS suggested the approach successfully identified these problematic tiles.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 18 turn to cohesive patches, which weakens the effectiveness of salt pepper related indices. Moreover, the failed pixels could also reduce the incidence of urban decrease or insufficient clear pixels.

Quality Assessment for the Products
The rapid quality assessment approach was applied when the LCMAP products were first generated for the East mega ecoregion (Figure 3). Most of the tiles showed LOS less than 1.22, while two coastal tiles (H27V18 at West Palm Beach, Florida, and H26V20 at Key West, Florida) were above 2 ( Figure 3). The primary land cover confidence map showed that the two tiles did not have time series models saved at the land pixels, and LCMAP land cover classes were filled using the translated NLCD 2001 map ( Figure 4). The high LOS suggested the approach successfully identified these problematic tiles.   The problematic tiles successfully identified in West Palm Beach and Key West were later reproduced. Figure 5 shows the LOS values of H27V18 and H26V20 were reduced to 1.06 and 1.09, respectively, while the maximum LOS (1.5) came from tile H25V15. The indices suggested tile H25V15 had the smallest cohesive disagreement pixels with NLCD, but the highest SCTIME maximum, mean, and standard deviation (Table 3). A comparison of the LCMAP and NLCD land cover products showed that the primary disagreement came from the fragmented tree or grass-shrub ( Figure 6). Both LCMAP and NLCD indicated this tile had intensive forest harvest activities, while trees in this region only took about 6 years to grow back. Thus, a possible reason for the disagreement is that NLCD often searches surrounding years of Landsat images for clear observations, which could cause the temporal shift of the tree regrowth estimation in NLCD. Moreover, the Okefenokee Swamp (across Georgia and Florida) in tile H25V15 had two large fires in 2007 and 2011 [43], which resulted in a distinctive annual change rate (Figure 7). The problematic tiles successfully identified in West Palm Beach and Key West were later reproduced. Figure 5 shows the LOS values of H27V18 and H26V20 were reduced to 1.06 and 1.09, respectively, while the maximum LOS (1.5) came from tile H25V15. The indices suggested tile H25V15 had the smallest cohesive disagreement pixels with NLCD, but the highest SCTIME maximum, mean, and standard deviation (Table 3). A comparison of the LCMAP and NLCD land cover products showed that the primary disagreement came from the fragmented tree or grass-shrub ( Figure 6). Both LCMAP and NLCD indicated this tile had intensive forest harvest activities, while trees in this region only took about 6 years to grow back. Thus, a possible reason for the disagreement is that NLCD often searches surrounding years of Landsat images for clear observations, which could cause the temporal shift of the tree regrowth estimation in NLCD. Moreover, the Okefenokee Swamp (across Georgia and Florida) in tile H25V15 had two large fires in 2007 and 2011 [43], which resulted in a distinctive annual change rate (Figure 7). The problematic tiles successfully identified in West Palm Beach and Key West were later reproduced. Figure 5 shows the LOS values of H27V18 and H26V20 were reduced to 1.06 and 1.09, respectively, while the maximum LOS (1.5) came from tile H25V15. The indices suggested tile H25V15 had the smallest cohesive disagreement pixels with NLCD, but the highest SCTIME maximum, mean, and standard deviation (Table 3). A comparison of the LCMAP and NLCD land cover products showed that the primary disagreement came from the fragmented tree or grass-shrub ( Figure 6). Both LCMAP and NLCD indicated this tile had intensive forest harvest activities, while trees in this region only took about 6 years to grow back. Thus, a possible reason for the disagreement is that NLCD often searches surrounding years of Landsat images for clear observations, which could cause the temporal shift of the tree regrowth estimation in NLCD. Moreover, the Okefenokee Swamp (across Georgia and Florida) in tile H25V15 had two large fires in 2007 and 2011 [43], which resulted in a distinctive annual change rate (Figure 7).     Figure 8a shows the final LOS map for the whole CONUS. The lowest agreement between the NLCD and LCMAP land cover products was at least 71.5% and averaged 89.1% (Figure 8b). The coasts and borders show high agreement between the two products because they are dominated by water bodies. The average and median LOS were both 1.0, while the maximum LOS was 1.72 at H18V05. According to Table 4, H18V05 had the lowest agreement with NLCD (79.3%), the highest single pixels with no model initialized, and the highest SCTIME maximum, mean, and standard deviation. However, compared to H25V15, tile H18V05 had much lower SCTIME and LC Change indices (Tables 3 and 4), which means land cover in H18V05 was much more stable. The primary disagreement between LCMAP and NLCD in tile H18V05 was woody wetland in NLCD, but tree cover in LCMAP (Figure 9). The disagreement is a result of the differences in classification methods. NLCD used hierarchical classification and integration, which assigned higher priority to wetland cover than forest [44]. In contrast, LCMAP relied mainly on the seasonal variation of surface reflectance, in which woody wetland and trees are similar. Changes were detected in many wetlands in this tile for 1990, but the cover type remained the same in 1991 for most of those pixels, which resulted in a high spike in the annual spectral change rate (Figure 10a) but moderate land cover change rate (Figure 10b). Another reason for the distinction of H18V05 is the metropolitan area (Minneapolis and St. Paul) that covers a large portion of tile H18V05 and is the only metropolitan area in the 3 by 3 tile grid. The highest land cover change rate occurred in 1988, and the most common change type was the conversion from cropland to developed, which mainly occurred in the southeastern part of Minneapolis and St. Paul, Minnesota, along the Mississippi River.  Figure 8a shows the final LOS map for the whole CONUS. The lowest agreement between the NLCD and LCMAP land cover products was at least 71.5% and averaged 89.1% (Figure 8b). The coasts and borders show high agreement between the two products because they are dominated by water bodies. The average and median LOS were both 1.0, while the maximum LOS was 1.72 at H18V05. According to Table 4, H18V05 had the lowest agreement with NLCD (79.3%), the highest single pixels with no model initialized, and the highest SCTIME maximum, mean, and standard deviation. However, compared to H25V15, tile H18V05 had much lower SCTIME and LC Change indices (Tables 3 and 4), which means land cover in H18V05 was much more stable. The primary disagreement between LCMAP and NLCD in tile H18V05 was woody wetland in NLCD, but tree cover in LCMAP (Figure 9). The disagreement is a result of the differences in classification methods. NLCD used hierarchical classification and integration, which assigned higher priority to wetland cover than forest [44]. In contrast, LCMAP relied mainly on the seasonal variation of surface reflectance, in which woody wetland and trees are similar. Changes were detected in many wetlands in this tile for 1990, but the cover type remained the same in 1991 for most of those pixels, which resulted in a high spike in the annual spectral change rate (Figure 10a) but moderate land cover change rate (Figure 10b). Another reason for the distinction of H18V05 is the metropolitan area (Minneapolis and St. Paul) that covers a large portion of tile H18V05 and is the only metropolitan area in the 3 by 3 tile grid. The highest land cover change rate occurred in 1988, and the most common change type was the conversion from cropland to developed, which mainly occurred in the southeastern part of Minneapolis and St. Paul, Minnesota, along the Mississippi River.       Traditional product quality control activities were based on export interpretation [35,36,45] or validation after the production was completed [37,38,[46][47][48]. Most validation methods involve elements of sampling design, land cover identification, and accuracy assessment [39,40]. However, it was impractical for interpreters to evaluate the massive amount of data during production, and inefficient to reproduce the whole product if we rely only on the final validation. This study presents a workflow to consistently, automatically, and rapidly assess the quality of LCMAP results during production. We assessed the stability of LCMAP products both spatially and temporally. While multiple years of NLCD land cover maps were used to ensure that the overall results were consistent with existing products, the temporal stability was estimated for both land cover and change products. We removed the road network from the translated NLCD classes to reduce potential mismatches. This is because NLCD embedded the road features after the land cover classification from a nonremote sensing source, while we focus on the agreement between the classification results of the two products. This study does not provide a statistical validation of the LCMAP land cover results using NLCD products. Any disagreement between the two land-cover datasets does not necessarily indicate an error in the LCMAP results. For the purpose of validation, LCMAP has developed an independent reference dataset at the national level to validate the final products [37]. The rapid quality assessments discussed here were conducted for each ARD tile and results were compared with the surrounding tiles to identify anomalous tiles. In the simulated data tests, we found that tiles with random noises have LOS higher than 2, given the 14 indices. Thus, we use the threshold 2 as a flag for erroneous tiles. However, when all tiles have a LOS lower than the threshold, we still evaluate tiles with relatively high LOS to ensure the quality of the product. The approach successfully detected erroneous tiles during production and provided auxiliary information for future investigation to remedy the problems.
The comparison procedure integrated multiple independent indices that can be easily adjusted to meet future concerns. The method developed for evaluating LCMAP products in a rapid manner could be useful for other time series land cover change-related research and large-scale applications. However, users should be aware of some limitations of the method. One limitation is that the LOS is Traditional product quality control activities were based on export interpretation [35,36,45] or validation after the production was completed [37,38]. Most validation methods involve elements of sampling design, land cover identification, and accuracy assessment [39,40]. However, it was impractical for interpreters to evaluate the massive amount of data during production, and inefficient to reproduce the whole product if we rely only on the final validation. This study presents a workflow to consistently, automatically, and rapidly assess the quality of LCMAP results during production. We assessed the stability of LCMAP products both spatially and temporally. While multiple years of NLCD land cover maps were used to ensure that the overall results were consistent with existing products, the temporal stability was estimated for both land cover and change products. We removed the road network from the translated NLCD classes to reduce potential mismatches. This is because NLCD embedded the road features after the land cover classification from a non-remote sensing source, while we focus on the agreement between the classification results of the two products. This study does not provide a statistical validation of the LCMAP land cover results using NLCD products. Any disagreement between the two land-cover datasets does not necessarily indicate an error in the LCMAP results. For the purpose of validation, LCMAP has developed an independent reference dataset at the national level to validate the final products [37]. The rapid quality assessments discussed here were conducted for each ARD tile and results were compared with the surrounding tiles to identify anomalous tiles. In the simulated data tests, we found that tiles with random noises have LOS higher than 2, given the 14 indices. Thus, we use the threshold 2 as a flag for erroneous tiles. However, when all tiles have a LOS lower than the threshold, we still evaluate tiles with relatively high LOS to ensure the quality of the product. The approach successfully detected erroneous tiles during production and provided auxiliary information for future investigation to remedy the problems.
The comparison procedure integrated multiple independent indices that can be easily adjusted to meet future concerns. The method developed for evaluating LCMAP products in a rapid manner could be useful for other time series land cover change-related research and large-scale applications. However, users should be aware of some limitations of the method. One limitation is that the LOS is hard to interpret in order to decide the threshold of the outlier. Thus, instead of relying on the threshold, we use the LOS as a guide in evaluating tiles with a relatively high score. Another solution is the extension of the local outlier factor analysis, which scaled the LOS to a value range of [0:1]. The result is directly interpretable as a probability of outlier. Another limitation is that this workflow only flags outlier tiles with potential hardware or software failure, rather than algorithm deficiency. For example, if the algorithm had lower accuracy in the 1980s for all tiles, the workflow would not derive high LOS related to the problem. Thus, the approach should not be used as an accuracy assessment. For the LCMAP project, an independent validation study was conducted to estimate the annual land cover and change accuracy [41].

Conclusions
In this paper, we described a procedure for automatically and rapidly monitoring the quality of extensive results during the production of land cover and land surface change products. We developed methods involving 14 indices to assess the stability of LCMAP products, both spatially and temporally. The 14 indices allow us to find and flag potential anomalous tiles in the suite of 10 LCMAP products by comparing them with neighboring tiles using local outlier analysis. The methods successfully identified erroneous results from the simulated data and during the LCMAP production, in which erroneous tiles had LOS greater than 2. The calculated indices also suggest directions for further investigation of anomalous tiles. The local outlier analysis is flexible to input indices, which allows us to extend the indices' metrics for distinct characteristics, such as land cover change patterns or consistency with existing products. Our results showed all LOS values are below 2 in the final products, and overall agreements between LCMAP and NLCD are at least 71.5% and average 89.1%. A large volume of land cover data available globally is a significant benefit for ecosystem study and environmental management. However, quickly assessing production quality is challenging. Our approach offers the potential to automatically evaluate product quality from the perspective of multiple characteristics.