A New Pseudoinvariant Near-Infrared Threshold Method for Relative Radiometric Correction of Aerial Imagery

: The utilization of high-resolution aerial imagery such as the National Agriculture Imagery Program (NAIP) data is often hampered by a lack of methods for retrieving surface reﬂectance from digital numbers. This study developed a new relative radiometric correction method to retrieve 1 m surface reﬂectance from NAIP imagery. The advantage of this method lies in the adaptive identiﬁcation of pseudoinvariant (PIV) pixels from a time series of Landsat images that can fully characterize the temporally spectral variations of land surface. The identiﬁed PIV pixels allow for an e ﬀ ective conversion of digital numbers to surface reﬂectance, as demonstrated through the validation at 150 sites across the contiguous United States. The results show substantial improvement in the agreement of NAIP-derived normalized di ﬀ erence vegetation index (NDVI) values with Landsat-derived NDVI reference. Across the sites, root mean square error and mean absolute error were reduced from 0.37 ± 0.14 to 0.08 ± 0.07 and from 0.91 ± 0.64 to 0.18 ± 0.52, respectively. Over 70% PIV pixels on average were derived from vegetated areas, while water and developed areas together contributed 27% of the PIV pixels. As the NAIP program is continuing to generate new images across the country, the advantages of its high spatial resolution, national coverage, long time series, and regular revisits will make it an increasingly crucial data source for a variety of research and management applications. The proposed method could beneﬁt many agricultural, hydrological, and urban studies that rely on NAIP imagery to quantify land surface patterns and dynamics. It could also be applied to improve the preprocessing of high-resolution aerial imagery in other countries.


Introduction
The National Agriculture Imagery Program (NAIP) orthoimagery is the only high-resolution aerial imagery with repeated nationwide coverage in the United States. It is available at the spatial resolution of 0.6 to 2 meters with very low cloud coverage and consists of repeat images during the growing season with two or three year cycles for more than 15 years [1]. It has been a unique choice for a variety of geospatial mapping applications, such as analysis of land cover and land use change [2][3][4], evaluation of ecosystem services [5][6][7], monitoring of forest health [8][9][10][11], and assessment of urban green infrastructure [12,13]. The NAIP imagery will likely continue to be the one of the best data sources for many research and operational efforts that need high-resolution multispectral imagery for feature extraction, change detection, or collection of ground truth for validate coarse-resolution satellite products. While available with low or no cost, the NAIP data have not been recognized and utilized as effectively as one might expect, since important challenges in data preprocessing have hindered the exploration of this often overlooked valuable data [1]. Compared to satellite imagery programs such as Landsat and MODIS, the NAIP data are acquired from aerial platforms using different sensors that have lower radiometric resolutions and the information on sensor characteristics is less available and consistent over time [1]. Because of the small swath of NAIP images (approximately 7 km × 8 km), the acquisition of imagery for one region usually involves multiple flights that may last weeks or potentially even months, and the mosaics of NAIP images have artifacts associated with atmospheric interference, viewing geometry, illumination, shadows, time of the day, and plant phenology [1,14]. More importantly, NAIP images are distributed in the format of digital numbers (DNs), which are integer values to facilitate computation and transmission and to scale brightness for convenient display [15]. The number of brightness values within a DN image is determined by the number of bits (e.g., 256 values for an 8-bit NAIP image) available. DNs only express relative brightness within the scene, but they lack the physical units that are necessary to understand the optical processes behind the observed brightness and the interference of atmosphere. Due to the differences in acquisition times and dates, DN values can vary between adjacent image tiles [1]. DNs cannot be directly used to examine brightness over time, to match one scene with another, to prepare mosaic of large regions, nor to serve models of physical processes such as those in agriculture, forestry or hydrology [15]. The conversion from DNs to surface reflectance is increasingly understood as a minimum standard for analysis-ready data in order to ensure consistency when comparing images over time and from different sensors [16]. This is important to the calculations of many multispectral indices, such as the normalized difference vegetation index (NDVI) and the normalized difference water index (NDWI), that are very sensitive to atmospheric effects. However, many studies directly applied NAIP without correction for spectral analysis such as NDVI calculation, land cover classification, and vegetation cover estimation [3,4,6,8,9,[11][12][13][14]17]. This is mainly due to a lack of easy access to radiometric response data for the sensors used and the lack of methods for retrieving surface reflectance from NAIP DN values [1].
To reduce influences from the atmosphere and retrieve surface reflectance, two categories of radiometric correction approaches are available: absolute correction and relative correction [18]. Absolute correction primarily uses physical-based radiative transfer models, such as the MODerate resolution atmospheric TRANsmission (MODTRAN) [19] and the Satellite Signal in the Solar Spectrum (6S) code [20]. Their implementations require supplementary field measurements of radiance and other atmospheric conditions at the date and time of the image acquisition [21] and involve many complicated steps and manual operations [18], which make them usually only available for the processing of satellite imagery. Absolute correction can also be used to calibrate the sensor if ground measurement includes a measure of surface reflectance. When the coincident measurements of atmospheric conditions and surface reflectance are not available, it is necessary to use relative correction approaches [16]. These approaches aim to find a conversion from DN to reflectance that would allow for calibration and comparison of images taken under different conditions. This often involves identifying pixels whose absolute reflectance is not changing over time and using them as references for all the images. Ideally, such pixels would be found for a range of absolute reflectance so that a linear regression can be performed. The relative correction approaches assume that at-sensor radiance measurements in DNs is proportional to surface reflectance within visible and shortwave infrared regions when solar angle variations are minimal and sky irradiance are common, leading to uniform downwelling irradiance across the scene. When information is available for two or more pseudoinvariant (PIV) features that have minimal spectral variations over time, a linear regression can be established to convert DNs or at-sensor radiance into to surface reflectance [21][22][23][24]. The linear transformation can be extended to the rest of the image or even multitemporal images [25]. This is the basis of the empirical line approach that has been widely used for relative radiometric corrections of a variety of satellite and aerial imagery.
A small number of studies have included the radiometric correction of NAIP images to retrieve surface reflectance. The empirical line approach has been applied using pseudoinvariant features based on calibration panels [26,27] or constructed field plots [28,29]. Those man-made objects have a specific reflectance value that is almost constant over the targeted wavelength range, which can be verified in field and/or laboratory conditions using a handheld spectrometer. Natural features were also used for the empirical line method, such as bare sand pixels [5]. Alternatively, Buffington et al. (2016) [30] applied the dark object subtraction approach using the histogram of each band, which was an empirical form of absolute correction. In addition, a color balancing approach was implemented in conjunction with object-oriented analysis to estimate fractional vegetation cover in tidal marshes [6]. Furthermore, a few studies have explored the integration of satellite and NAIP imagery to develop linear regression models, for example, through extracting randomly placed points in a Landsat surface reflectance image [31] or identifying the highest 10% and lowest 10% NDVI pixels [32]. The success of the relative correction depends strongly on the selection of PIV features, as the degree to which there are truly spectrally invariant determines the amount of error propagated through the correction process [21]. However, the establishment of the DN-reflectance relationship in previous studies relied on ground measurements or pixels derived from a single satellite reference image. The effect of the temporally spectral variations of the selected PIV pixels or features was not addressed in the assessment of the established linear transformation.
The objective of this study is to develop a new relative radiometric correction method that can convert NAIP digital numbers into surface reflectance using PIV pixels extracted from the near-infrared (NIR) band of satellite images. It takes advantage of the decadal archive of Landsat surface reflectance data to characterize the temporal stability of land surface spectral variations. We demonstrate the performance of this method by examining corrected NDVI values at 150 sites across the contiguous United States that address a variety of land cover and cloud conditions.

Study Area and Data
Four primary sites were selected for the development of the proposed method (Table 1 and Figure 1). Site 1 in southeastern New Hampshire (43 •  At each site, the NAIP data included an image that was selected from the 2011-2012 archive with three bands: green, red, and NIR. A Landsat-5 TM image acquired within 4 days of the NAIP image acquisition was used as the reference image for the DN-to-reflectance conversion. The identification of PIV pixels used all Landsat TM and ETM+ images that had acceptable cloud cover and were acquired within 10 years of the NAIP image acquisition. The Landsat images were Collection 1 Tier-1 surface reflectance data with four bands used in this study: green, red, NIR, and the 16-bit quality assessment band [33] that is generated based on the C Function of Mask (CFMask) algorithm [34]. The CFMask algorithm labels cloud and cloud shadow pixels with different confidence levels using a multi-pass decision tree approach and then refines those labels according to scene-wide statistics. All NAIP and Landsat data were accessed through Google Earth Engine [35], which hosts the full archive of NAIP imagery since 2004 and Landsat imagery since 1986.
The four primary sites were used for initial algorithm development that involved empirical adjustments and visual inspections. After the algorithm was able to provide an accurate depiction of the primary sites, it was extended to an additional set of 146 sites ( Figure 2) to fully test its performance under a broader range of land cover and cloud cover conditions. These sites were selected using a stratified sampling scheme. First, ten points were randomly generated within each of the 462 Landsat WRS-2 scenes that covered the contiguous United States. At each point, available NAIP images and Landsat TM images in 2011-2012 were screened to identify if there was a pair of coincident Landsat and NAIP images that were acquired no more than four days apart. If multiple pairs were available for one site, the pair with the least cloud cover was used. The National Land Cover Database for 2011 was used to examine the land cover conditions of the study sites. A complete list of NAIP and Landsat images used in this study is shown in Supplementary Information (Table  S1). At each site, the NAIP data included an image that was selected from the 2011-2012 archive with three bands: green, red, and NIR. A Landsat-5 TM image acquired within 4 days of the NAIP image acquisition was used as the reference image for the DN-to-reflectance conversion. The identification of PIV pixels used all Landsat TM and ETM+ images that had acceptable cloud cover and were acquired within 10 years of the NAIP image acquisition. The Landsat images were Collection 1 Tier-1 surface reflectance data with four bands used in this study: green, red, NIR, and the 16-bit quality assessment band [33] that is generated based on the C Function of Mask (CFMask) algorithm [34]. The CFMask algorithm labels cloud and cloud shadow pixels with different confidence levels using a multi-pass decision tree approach and then refines those labels according to scene-wide statistics. All NAIP and Landsat data were accessed through Google Earth Engine [35], which hosts the full archive of NAIP imagery since 2004 and Landsat imagery since 1986.
The four primary sites were used for initial algorithm development that involved empirical adjustments and visual inspections. After the algorithm was able to provide an accurate depiction of the primary sites, it was extended to an additional set of 146 sites ( Figure 2) to fully test its performance under a broader range of land cover and cloud cover conditions. These sites were selected using a stratified sampling scheme. First, ten points were randomly generated within each of the 462 Landsat WRS-2 scenes that covered the contiguous United States. At each point, available NAIP images and Landsat TM images in 2011-2012 were screened to identify if there was a pair of coincident Landsat and NAIP images that were acquired no more than four days apart. If multiple pairs were available for one site, the pair with the least cloud cover was used. The National Land Cover Database for 2011 was used to examine the land cover conditions of the study sites. A complete list of NAIP and Landsat images used in this study is shown in Supplementary Information (Table S1).

Methods
We propose a pseudoinvariant near-infrared threshold (PINT) method that can convert the digital numbers of NAIP imagery into surface reflectance based on PIV pixels identified from Landsat imagery ( Figure 2). It assumes that a linear relationship exists between digital numbers from a 1 m NAIP image and surface reflectance from a 30 m Landsat image for overlapping areas, if the two images are acquired simultaneously and the atmospheric condition is homogenous within the scene. The Landsat surface reflectance image is considered as a proxy of actual ground-level reflectance measurements for PIV pixels, which follows a strategy that has been used for the correction and validation of satellite images with different resolutions [36,37]. Landsat has been used as a reference radiometer against which many other satellite payloads have their performance gauged against. The overall uncertainty of Landsat Tier-1 surface reflectance product is on the order of 6%-10% around the world, depending on spectral channel, underlying atmospheric conditions and radiometric calibration characteristics [16,38]. The correlation between Landsat and MODIS surface reflectance products is high and the bias is mostly around 1% to 5% with the largest difference in the blue band [38].
The procedure of PINT is centered on the identification of PIV pixels that can best define the linear transformation between surface reflectance and digital numbers, consisting of several steps, as shown in Figure 3. First, when the targeted NAIP image and the coincident Landsat image are determined, a ten year collection of Landsat TM and ETM+ images at this location is rigorously filtered. Images with cloud cover >50% in the scene were discarded. For the rest of the collection, pixels detected as clouds or shadows were masked out on each image. In the pixel quality band, the value of a pixel indicates the levels of confidence that a cloud and/or shadow condition exists at this pixel. Pixels with high cloud or shadow confidence were identified and masked out. Then, areas within 90 m (i.e., three Landsat pixels) of the edge of the targeted NAIP image were excluded to avoid the influence of boundary pixels. The temporal standard deviation of the near-infrared band at the pixel level is calculated over the ten years of Landsat images.

Methods
We propose a pseudoinvariant near-infrared threshold (PINT) method that can convert the digital numbers of NAIP imagery into surface reflectance based on PIV pixels identified from Landsat imagery ( Figure 2). It assumes that a linear relationship exists between digital numbers from a 1 m NAIP image and surface reflectance from a 30 m Landsat image for overlapping areas, if the two images are acquired simultaneously and the atmospheric condition is homogenous within the scene. The Landsat surface reflectance image is considered as a proxy of actual ground-level reflectance measurements for PIV pixels, which follows a strategy that has been used for the correction and validation of satellite images with different resolutions [36,37]. Landsat has been used as a reference radiometer against which many other satellite payloads have their performance gauged against. The overall uncertainty of Landsat Tier-1 surface reflectance product is on the order of 6-10% around the world, depending on spectral channel, underlying atmospheric conditions and radiometric calibration characteristics [16,38]. The correlation between Landsat and MODIS surface reflectance products is high and the bias is mostly around 1% to 5% with the largest difference in the blue band [38].
The procedure of PINT is centered on the identification of PIV pixels that can best define the linear transformation between surface reflectance and digital numbers, consisting of several steps, as shown in Figure 3. First, when the targeted NAIP image and the coincident Landsat image are determined, a ten year collection of Landsat TM and ETM+ images at this location is rigorously filtered. Images with cloud cover >50% in the scene were discarded. For the rest of the collection, pixels detected as clouds or shadows were masked out on each image. In the pixel quality band, the value of a pixel indicates the levels of confidence that a cloud and/or shadow condition exists at this pixel. Pixels with high cloud or shadow confidence were identified and masked out. Then, areas within 90 m (i.e., three Landsat pixels) of the edge of the targeted NAIP image were excluded to avoid the influence of boundary pixels. The temporal standard deviation of the near-infrared band at the pixel level is calculated over the ten years of Landsat images.
By specifying a percentile of the temporal standard deviation image for thresholding, pixels that have sufficiently low variations are identified as potential PIV pixels. Because the NIR band tends to have greater variations than the visible bands ( Figure S1), it is easier and more robust to identify PIV pixels from the NIR band when all bands have the same radiometric resolution. The locations of tentative PIV pixels are aggregated into a mask that is applied to the coincident pair of NAIP and Landsat images. The NAIP image is upscaled to the 30 m spatial resolution of the Landsat reference image in advance. Homogeneity of the upscaled pixels is discussed in the results. For each of the three bands (i.e., red, green, and NIR), digital number and surface reflectance are extracted at locations of PIV pixels from the NAIP and Landsat images, respectively, and linear regression is performed after the removal of outliers. The mean value of the coefficient of determination (R 2 ) of three bands is taken as the measure of regression performance. Through iteratively testing thresholds starting from 0.01st percentile with an increment of 0.01% until the 5th percentile, the threshold with the highest mean R 2 is determined as the best threshold. Given that an NAIP image covers an area of approximately 53 km 2 on average or the equivalence of about 60,000 pixels at the 30 m resolution, the increment of 0.01% is equivalent to the addition of 6 pixels, which is considered to be an appropriate scale for common invariant features. The linear transformation models associated with the best threshold are applied to the bands of upscaled 30 m NAIP for validation and eventually the original 1 m NAIP image for high-resolution mapping applications. To validate the performance of PINT, NDVI is calculated using the 30 m NAIP surface reflectance and compared to NDVI generated from Landsat surface reflectance. Evaluation metrics consist of R 2 , Nash-Sutcliffe efficiency (NSE), mean absolute error (MAE), and root mean square error (RMSE). By specifying a percentile of the temporal standard deviation image for thresholding, pixels that have sufficiently low variations are identified as potential PIV pixels. Because the NIR band tends to have greater variations than the visible bands ( Figure S1), it is easier and more robust to identify PIV pixels from the NIR band when all bands have the same radiometric resolution. The locations of tentative PIV pixels are aggregated into a mask that is applied to the coincident pair of NAIP and Landsat images. The NAIP image is upscaled to the 30 m spatial resolution of the Landsat reference image in advance. Homogeneity of the upscaled pixels is discussed in the results. For each of the three bands (i.e., red, green, and NIR), digital number and surface reflectance are extracted at locations of PIV pixels from the NAIP and Landsat images, respectively, and linear regression is performed after the removal of outliers. The mean value of the coefficient of determination (R 2 ) of three bands is taken as the measure of regression performance. Through iteratively testing thresholds starting from 0.01st percentile with an increment of 0.01% until the 5th percentile, the threshold with the highest mean R 2 is determined as the best threshold. Given that an NAIP image covers an area of approximately 53 km 2 on average or the equivalence of about 60,000 pixels at the 30 m resolution, the increment of 0.01% is equivalent to the addition of 6 pixels, which is considered to be an appropriate scale for common invariant features. The linear transformation models associated with the best threshold are applied to the bands of upscaled 30 m NAIP for validation and eventually the original 1 m NAIP image for high-resolution mapping applications. To validate the performance of PINT, NDVI is calculated using the 30 m NAIP surface reflectance and compared to NDVI generated from Landsat surface reflectance. Evaluation metrics consist of R 2 , Nash-Sutcliffe efficiency (NSE), mean absolute error (MAE), and root mean square error (RMSE).
This method is different from previous studies in several aspects. First, the identification of PIV pixels only requires the temporally variation of the NIR band, eliminating the need to inspect the  This method is different from previous studies in several aspects. First, the identification of PIV pixels only requires the temporally variation of the NIR band, eliminating the need to inspect the pixels across all bands for the consistency of spectral brightness. The PIV pixels identified in the NIR region are assumed to be spectrally invariant in visible bands as well. The noise components of some multispectral bands may have larger amplitude than the signal components of other bands [21]. For Landsat TM and ETM+ sensors, the visible bands record less variation and are more affected by residual clouds and snow, in comparison to the infrared bands [39]. Second, the number of PIV pixels varies based on the optimal threshold of each image, in contrast to the use of a fixed empirical threshold for all images [37,40]. The adaptive scheme of PINT could alleviate the disturbance of nonlinearity of the spectral variance and enhance the spectral diversity of PIV features (i.e., pixels are well spread other than concentrated in a small range of brightness), all amendable to the robustness of the linear regression.

Results from Primary Sites
Site 1 (Figure 4) demonstrates a mixture of urban areas and forest. Highways, buildings, and waterbodies show low temporal variance in the near-infrared band. The five PIV pixels (0.01st percentile) consisted of three pixels located in a lake and two pixels in developed areas (Figure 4b).
The performance of linear regression decreased dramatically with more PIV pixels, before it was improved when more than 0.8% pixels were included (Figure 4d). NDVI based on the NAIP surface reflectance matched well with the Landsat reference data (Figure 4g), in contrast to the apparent underestimation for NDVI derived from NAIP digital numbers (Figure 4i). The difference is also visually evident as shown in Figure 4e,f,h. While capturing the pattern of NDVI from the Landsat reference image, the corrected NAIP image clearly provides more details about the heterogeneity of vegetation cover and impervious surfaces with the 1 m spatial resolution.    Site 3 ( Figure 6) has a dense forest cover. The best linear transformation was achieved using the 0.33rd percentile threshold with 168 PIV pixels. These pixels spread over the scene and indicated the spectral stability of small waterbodies and dense vegetation patches. In contrast to the low percentile threshold of 0.01st percentile in sites 1 and 2 (Figures 4d and 5d), more pixels were needed to construct the linear regression at site 3. This indicates that the pixels associated with very low NIR thresholds at this site may not be spectrally invariant as their values suggest. The quality of cloud and shadow detection in image preprocessing could be a reason for errors in the quantification of temporal NIR variance at this site. Nevertheless, the incorporation of a sufficient number of PIV pixels ensured the improved agreement with Landsat-derived NDVI in terms of RMSE, NSE, and MAE (Figure 5g,i).  At site 4 ( Figure 7) in a coastal environment, six PIV pixels were identified in marshes and a ship channel to construct the linear transformation. No PIV pixels were identified in the tidal flats which had high NIR variations due to the irregularly wind-driven inundation, while marshes were less flooded due to their higher elevation. Raising the percentile threshold resulted in gradually decreasing R 2 , but the performance tended to be stabilizing after more than 2% of pixels were used. The corrected NDVI yielded a slightly lower R 2 , but RMSE, NSE, and MAE were improved significantly.  Figure 8 summarizes the best NIR thresholds and the regression lines that defined these thresholds for 150 sites nationwide. The average threshold was the 0.69 th percentile, or 0.025 in terms of the actual value of the temporal standard deviation of the NIR band. This suggests that approximately 300 pixels on average are needed to define the linear relationship between surface reflectance and digital numbers. The 0.01st percentile (five pixels) was identified as the threshold for 35 sites and the 0.02 nd percentile (ten pixels) for 23 sites. On the other hand, the thresholds were above the 4.5 th percentile (i.e., >2250 pixels) for ten sites. The range of thresholds determined in this study Overall, using NAIP digital numbers appeared to generate underestimated NDVI values across the four primary sites. The PINT method was effective in retrieving surface reflectance and improving the quality of NAIP-derived NDVI data. Figure 8 summarizes the best NIR thresholds and the regression lines that defined these thresholds for 150 sites nationwide. The average threshold was the 0.69th percentile, or 0.025 in terms of the actual value of the temporal standard deviation of the NIR band. This suggests that approximately 300 pixels on average are needed to define the linear relationship between surface reflectance and digital numbers. The 0.01st percentile (five pixels) was identified as the threshold for 35 sites and the 0.02nd percentile (ten pixels) for 23 sites. On the other hand, the thresholds were above the 4.5th percentile (i.e., >2250 pixels) for ten sites. The range of thresholds determined in this study was comparable to the threshold of 0.020-0.027 recommended for MODIS time series [37,40]. Regarding the regression lines, the average slope and intercept were 0.0017 and 0.017 for the NIR band, respectively, and 0.00092 and −0.013 for the red band, respectively. For both bands, the slope was positively skewed, while the intercept was negatively skewed.  Validation results for 150 sites nationwide show substantial improvement in the NDVI agreement against the Landsat NDVI reference (Figure 9). The values of RMSE were reduced from 0.37 ± 0.14 to 0.08 ± 0.07. The values of MAE were reduced from 0.91 ± 0.64 to 0.18 ± 0.52. The values of R 2 were stable: 0.70 ± 0.19 before the correction and 0.71 ± 0.20 after the correction. The values of NSE was reduced from −41.71 ± 107.68 to −0.65 ± 10.39. The negative mean value of NSE reflects the sensitivity of this measure to extreme values. Detailed measures of all 150 sites are provided in Supplementary Information (Tables S1 and S2).

Land Cover of PIV Pixels
PIV pixels are taken as approximations of features with spectral properties that do not change significantly over time [15]. Land cover has been used as a key characteristic for the identification and description of PIV pixels. The results of this study provide a comprehensive viewpoint to examine the link between land cover and PIV pixels. On the basis of the 2011 National Land Cover Database (NCLD), Figure 10 shows the average land use conditions for the 150 NAIP images and the Validation results for 150 sites nationwide show substantial improvement in the NDVI agreement against the Landsat NDVI reference (Figure 9). The values of RMSE were reduced from 0.37 ± 0.14 to 0.08 ± 0.07. The values of MAE were reduced from 0.91 ± 0.64 to 0.18 ± 0.52. The values of R 2 were stable: 0.70 ± 0.19 before the correction and 0.71 ± 0.20 after the correction. The values of NSE was reduced from −41.71 ± 107.68 to −0.65 ± 10.39. The negative mean value of NSE reflects the sensitivity of this measure to extreme values. Detailed measures of all 150 sites are provided in Supplementary Information (Tables S1 and S2).  Validation results for 150 sites nationwide show substantial improvement in the NDVI agreement against the Landsat NDVI reference (Figure 9). The values of RMSE were reduced from 0.37 ± 0.14 to 0.08 ± 0.07. The values of MAE were reduced from 0.91 ± 0.64 to 0.18 ± 0.52. The values of R 2 were stable: 0.70 ± 0.19 before the correction and 0.71 ± 0.20 after the correction. The values of NSE was reduced from −41.71 ± 107.68 to −0.65 ± 10.39. The negative mean value of NSE reflects the sensitivity of this measure to extreme values. Detailed measures of all 150 sites are provided in Supplementary Information (Tables S1 and S2).

Land Cover of PIV Pixels
PIV pixels are taken as approximations of features with spectral properties that do not change significantly over time [15]. Land cover has been used as a key characteristic for the identification and description of PIV pixels. The results of this study provide a comprehensive viewpoint to examine the link between land cover and PIV pixels. On the basis of the 2011 National Land Cover Database (NCLD), Figure 10 shows the average land use conditions for the 150 NAIP images and the

Land Cover of PIV Pixels
PIV pixels are taken as approximations of features with spectral properties that do not change significantly over time [15]. Land cover has been used as a key characteristic for the identification and description of PIV pixels. The results of this study provide a comprehensive viewpoint to examine the link between land cover and PIV pixels. On the basis of the 2011 National Land Cover Database (NCLD), Figure 10 shows the average land use conditions for the 150 NAIP images and the PIV pixels identified in them, with a comparison to the land cover of the contiguous United States. The sixteen NCLD land cover classes were grouped into the following six general categories: water, developed, barren land, forest, grassland, and cropland. As the NAIP images were selected using a stratified sampling approach, the pattern of their land cover effectively represented the national land cover pattern in 2011. The grassland category consisted of various types of shrublands, grasslands, and herbaceous wetlands. It was the dominant land cover category across the NAIP images (43%) and the identified PIV pixels (44%). Forest accounted for 26% in the NAIP images, but only 20% of the PIV pixels. Similarly, cropland accounted for 23% in the images, but merely 8% of the PIV pixels. This indicates that the phenological variations of forest and crop species are more detectable than that of natural herbaceous plants in the Landsat NIR band. Water and developed areas together contributed to 27% of the PIV pixels, a more influential role as compared to the small fractions of scenes (7% for both) occupied by them. This is consistent with the common uses of various man-made structures and clear deep waterbodies as PIV features in previous studies [21,41,42]. Nevertheless, the results of this study show that on the average over 70% PIV pixels were derived from vegetated areas. PIV pixels might be identified in some vegetation areas where the fractional vegetation coverage is low and phenological changes are less significant as compared with fully-vegetated areas ( Figure S2 in Supplementary Materials). This suggests that PIV-based relative radiometric correction approaches are suitable for a variety of regions and are not limited to urban areas. Comparisons between the 1 m NDVI images and the 30 m reference Landsat images at the primary sites (Figures 4-7) also indicate that a PIV pixel may not be occupied by a homogenous land cover. This provides evidence to support the finding by Padró et al. (2017) [21,41,42]. Nevertheless, the results of this study show that on the average over 70% PIV pixels were derived from vegetated areas. PIV pixels might be identified in some vegetation areas where the fractional vegetation coverage is low and phenological changes are less significant as compared with fully-vegetated areas ( Figure S2 in Supplementary Materials). This suggests that PIV-based relative radiometric correction approaches are suitable for a variety of regions and are not limited to urban areas. Comparisons between the 1 m NDVI images and the 30 m reference Landsat images at the primary sites ( Figures  4-7) also indicate that a PIV pixel may not be occupied by a homogenous land cover. This provides evidence to support the finding by Padró et al. (2017) [37] that a PIV feature can be spectrally heterogeneous while it is radiometrically stable, i.e., its global reflectance does not change over time even if there are different land covers inside the area.

Discussion
The calculation of temporal variation of NIR reflectance is a critical component of the PINT method. A key parameter of this procedure is the number of satellite images to be included in the calculation. How many years of images are sufficient for identifying PIV pixels? Will using more images improve the identification of PIV pixels? Thanks to the multidecadal Landsat legacy, we were able to examine the effect of the quantity of Landsat images on the performance of PINT over different durations ranging from two years to 30 years before the acquisition of the NAIP image. The responses of the NIR threshold and the metrics of NDVI validation were examined at the four primary sites ( Figure 11).

Discussion
The calculation of temporal variation of NIR reflectance is a critical component of the PINT method. A key parameter of this procedure is the number of satellite images to be included in the calculation. How many years of images are sufficient for identifying PIV pixels? Will using more images improve the identification of PIV pixels? Thanks to the multidecadal Landsat legacy, we were able to examine the effect of the quantity of Landsat images on the performance of PINT over different durations ranging from two years to 30 years before the acquisition of the NAIP image. The responses of the NIR threshold and the metrics of NDVI validation were examined at the four primary sites ( Figure 11). At these sites, longer durations led to the inclusion of more Landsat images. The changes in the number of images appeared to follow a piecewise linear pattern with a reduced slope for durations longer than 12-15 years, suggesting a reduced availability of Landsat images prior to 1999-2001, as Landsat 7 was launched in 1999. Changes in the NIR threshold, however, were less consistent across the sites. Sites 2 and 4 had stable thresholds over different durations, while sites 1 and 3 had fluctuations in the ranges of 0.5th percentile and 1.5th percentile, respectively. The response of correction performance was reflected in the pattern of different measures. The stability of the NIR threshold was not linked to the stability of the measures. For example, site 2 had both the most stable threshold and the most variable measures among the four sites. The measure of NSE appeared to be more sensitive to the number of images than other measures. Overall, the use of ten years of Landsat images was sufficient for converting digital numbers into surface reflectance and incorporating a longer time series from the same sensor did not substantially improve the correction of NDVI in this study. This duration of Landsat time series was consistent with the value recommended for the identification of PIV features using MODIS time series [40] and could be considered as a reasonable configuration for the implementation of PINT for other regions.
The integration of data from multiple sensors on different satellites should enhance the robustness of the identification of NIR threshold and facilitate the comparison of PINT to existing relative correction approaches. However, the spectral differences among the sensors of interest should be taken into consideration. For example, incorporating additional images from the Landsat 8 Operational Land Imager (OLI) sensor could considerably increase the temporal resolution of time series, but the OLI bands are spectrally narrower than the corresponding TM and ETM+ bands, especially in the near-infrared region, which could result in NDVI differences of about 5% [43,44]. Incorporating images from the Sentinel-2A Multispectral Instrument (MSI) sensor is expected have the same problem, as MSI and OLI bands are very spectrally similar except in the thermal infrared region.
The development and validation of PINT in this study was particularly facilitated by the use of the Google Earth Engine. Traditionally, NAIP images are downloaded as county-scale mosaics through the Agriculture Farm Service of the United States Department of Agriculture (USDA). Only At these sites, longer durations led to the inclusion of more Landsat images. The changes in the number of images appeared to follow a piecewise linear pattern with a reduced slope for durations longer than 12-15 years, suggesting a reduced availability of Landsat images prior to 1999-2001, as Landsat 7 was launched in 1999. Changes in the NIR threshold, however, were less consistent across the sites. Sites 2 and 4 had stable thresholds over different durations, while sites 1 and 3 had fluctuations in the ranges of 0.5th percentile and 1.5th percentile, respectively. The response of correction performance was reflected in the pattern of different measures. The stability of the NIR threshold was not linked to the stability of the measures. For example, site 2 had both the most stable threshold and the most variable measures among the four sites. The measure of NSE appeared to be more sensitive to the number of images than other measures. Overall, the use of ten years of Landsat images was sufficient for converting digital numbers into surface reflectance and incorporating a longer time series from the same sensor did not substantially improve the correction of NDVI in this study. This duration of Landsat time series was consistent with the value recommended for the identification of PIV features using MODIS time series [40] and could be considered as a reasonable configuration for the implementation of PINT for other regions.
The integration of data from multiple sensors on different satellites should enhance the robustness of the identification of NIR threshold and facilitate the comparison of PINT to existing relative correction approaches. However, the spectral differences among the sensors of interest should be taken into consideration. For example, incorporating additional images from the Landsat 8 Operational Land Imager (OLI) sensor could considerably increase the temporal resolution of time series, but the OLI bands are spectrally narrower than the corresponding TM and ETM+ bands, especially in the near-infrared region, which could result in NDVI differences of about 5% [43,44]. Incorporating images from the Sentinel-2A Multispectral Instrument (MSI) sensor is expected have the same problem, as MSI and OLI bands are very spectrally similar except in the thermal infrared region.
The development and validation of PINT in this study was particularly facilitated by the use of the Google Earth Engine. Traditionally, NAIP images are downloaded as county-scale mosaics through the Agriculture Farm Service of the United States Department of Agriculture (USDA). Only the data of the most recent year are available on an ArcGIS server. The collection of NAIP data for a large region or a long period is often a very time-consuming procedure. Through hosting the entire NAIP legacy in cloud storage, the Google Earth Engine provides a highly-efficient means to improve not only the accessibility of this unique dataset, but also its processing and analysis. The free and open access of NAIP data through the Google Earth Engine could result in landmark changes in the utilization of NAIP data and its integration with other remotely sensed data, an information breakthrough comparable to the initiation of Landsat data release in 2008. In addition, through the Google Earth Engine, the proposed method takes advantage of the archives of Landsat 5 and 7 and identifies PIV features based on hundreds of images that fully address the temporally spectral variation of ground features. This eliminates the previous efforts to calibrate to specific laboratory or field spectra, because the 30 m PIV pixels are stable combinations of a variety of materials instead of the pure presence of a single material.
There are several potential sources of uncertainty that could be the directions for future work. First, the spectral mismatch between Landsat sensors and NAIP cameras will affect the conversion of digital numbers to surface reflection, as the visible and NIR bands of Landsat sensors are different from that of NAIP cameras in many aspects, e.g., bandwidth and radiometric resolution. Secondly, various cameras have been used for the NAIP program since 2004. The Farm Service Agency of USDA does not specify camera and sensor requirements for the image collections and different states might use different contractors to conduct their flights [1]. Although the entire NAIP archive is available on the Google Earth Engine, there is a lack of information about the usage of different cameras, their spectral responses, and particularly how linear they are. Third, the proposed method tends to interpolate and extrapolate the DN-reflectance relationships from PIV pixels to the rest of the scene, which could consist of areas that may have different atmospheric conditions and have different solar angles and perhaps viewing angles, although the swath of NAIP images is smaller than that of Landsat images. Each of these sources could result in a few percentages of uncertainty and their cumulative effect needs a systematic understanding

Conclusions
This study developed a new relative radiometric correction method to retrieve 1 m surface reflectance from NAIP imagery. The advantage of this method lies in the adaptive identification of pseudoinvariant pixels from a time series of Landsat images that can fully characterize the temporally spectral variations of land surface. The identified PIV pixels allow for an effective conversion of NAIP digital numbers to surface reflectance, as demonstrated through the results from 150 sites across the United States. The corrected NAIP imagery could benefit many agricultural, hydrological, and urban studies that rely on this data to quantify land surface patterns and dynamics. As the NAIP program is continuing to generate new images across the country, the advantages of its high spatial resolution, national coverage, long time series, and regular revisits will make it an increasingly crucial data source for a variety of research and management applications. The proposed PINT method could enhance the preprocessing of NAIP imagery and improve the utilization of often overlooked valuable data.
As this method is based on Landsat imagery that has global coverage, it could benefit the preprocessing of high-resolution aerial imagery programs in other countries. This potential is particularly supported by the easy implementation of PINT via the Google Earth Engine that enables the access and analysis of aerial and satellite imagery across large spatial and temporal scales. On the other hand, PINT could be adapted to handle the rapid growing data from unmanned aircraft systems (UAS). A typical UAS flight at 120 m altitude can generate a mosaic image that covers an area of 0.50 km 2 , which is equivalent to the total area of about 550 pixels on a Landsat image. A UAS image could contain PIV pixels that are identifiable from Landsat or other medium-resolution satellite images. This could help address the increasing awareness of and need for UAS-derived ultrahigh-resolution surface reflectance data for various applications.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/11/16/1931/s1: Table S1, Regression results based on PIV pixels at 150 sites; Table S2, NDVI validation results at 150 sites; Figure  S1, Comparison of pixelwise 10-year standard deviations of surface reflectance in red, green and NIR bands at site 1; Figure S2, Comparison of ten-year NIR reflectance of PIV pixels and non-PIV pixels in vegetated areas at site 3.