Analysis Ready Data : Enabling Analysis of the Landsat Archive

Data that have been processed to allow analysis with a minimum of additional user effort are often referred to as Analysis Ready Data (ARD). The ability to perform large scale Landsat analysis relies on the ability to access observations that are geometrically and radiometrically consistent, and have had non-target features (clouds) and poor quality observations flagged so that they can be excluded. The United States Geological Survey (USGS) has processed all of the Landsat 4 and 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) archive over the conterminous United States (CONUS), Alaska, and Hawaii, into Landsat ARD. The ARD are available to significantly reduce the burden of pre-processing on users of Landsat data. Provision of pre-prepared ARD is intended to make it easier for users to produce Landsat-based maps of land cover and land-cover change and other derived geophysical and biophysical products. The ARD are provided as tiled, georegistered, top of atmosphere and atmospherically corrected products defined in a common equal area projection, accompanied by spatially explicit quality assessment information, and appropriate metadata to enable further processing while retaining traceability of data provenance.


Introduction
The release of the Landsat archive as free and open data via the internet in 2008 removed the monetary cost of accessing Landsat data [1] and led to a dramatic increase in Landsat data utilization [2].Pre-processing is still required to ensure that Landsat data are comparable in time and space, particularly for large area (e.g., national to global coverage) and long term (e.g., seasonal to multi-decadal) analyses [3][4][5][6].The degree of pre-processing required is application specific but common pre-processing steps include geolocation and spatial alignment (to allow comparison of observations of the same location on the Earth's surface over time), radiometric calibration (to provide consistent data that reflect surface changes and not changes due to sensor changes), atmospheric correction (to reduce the variable radiometric influence of the atmosphere), and generation of per pixel quality flags (to allow users to filter unsuitable observations, e.g., cloudy ones, from inclusion in their analysis).Together these steps can be complex for individual users to implement and are computationally expensive.There is an emerging recognition of the need for easy-to-use satellite data that have been processed in a community endorsed manner.For example, the Committee on Earth Observation Satellites (CEOS) has defined analysis ready data (ARD) as satellite data that have been processed to a minimum set of requirements and organized into a form that allows immediate analysis with a minimum of additional user effort, and, interoperability both through time and with other datasets (http://ceos.org/ard/).
In 2017, the United States Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center (Sioux Falls, SD, USA) made available Landsat ARD for the conterminous United States (CONUS), Alaska, and Hawaii to provide access to Landsat data in formats that can be more directly and easily used for monitoring and assessing landscape change.The release of consistently processed and citable ARD from the USGS removes the need for users to gather Landsat images and perform their own bespoke geometric alignment and atmospheric correction algorithms and so enables large scale automated analyses.The ARD are provided as tiled, georegistered, top of atmosphere and atmospherically corrected products defined in a common equal area projection.This paper overviews the main characteristics of the Landsat ARD and future plans.The availability of the ARD will be conducted in phases.The initial phase has focused on Landsat 4 and 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) data.The paper concludes with a discussion of future plans and a brief summary.

Input Landsat Data and Processing
The ARD are generated using the Landsat Collection 1 processing algorithms.Collection 1 Landsat data are now the only Landsat data available from the USGS and were made by reprocessing the entire global U.S. Landsat archive, composed of more than 5.6 million acquisitions sensed by Landsat 1 through Landsat 8, with reprocessing finished 10 May 2017.The Collection 1 Landsat data are defined in the heritage ~185 km × ~180 km Worldwide Reference System (WRS-2) of path and row scene coordinates in the Universal Transverse Mercator (UTM) projection referenced to the World Geodetic System 1984 (WGS84) datum.The Collection 1 products are similar to the heritage Landsat level 1 products but have more rational filenames, improved per-pixel cloud mask and per-pixel quality data, improved calibration information, and improved product metadata including metadata that enable per-pixel view and solar geometry calculation.More information on Collection 1 is at (https://landsat.usgs.gov/landsat-collections). The Collection 1 reprocessing was undertaken in recognition of the need for consistent Landsat 1-8 sensor data and in anticipation of future periodic reprocessing of the archive to reflect new sensor calibration and geolocation knowledge.For example, the archive is planned to be reprocessed as a Collection 2 to improve the Landsat absolute geolocation accuracy [7].
The Collection 1 Landsat products are divided into three categories, or tiers: The Tier 1 products are the highest quality.Their georegistration is consistent and within prescribed image-to-image tolerances of ≤12 m radial RMSE.This tolerance is required for reliable information extraction from time series, particularly for change detection [8], as it is less than half the 30 m pixel size of the multi-spectral Landsat TM, ETM+ and OLI bands.Landsat scenes not meeting Tier 1 criteria during processing are assigned to Tier 2. Typically, Tier 2 processed scenes have significant cloud cover and so insufficient available ground control needed for precise geolocation.More information on the Landsat geolocation approach is provided in [9,10].Newly-acquired Landsat 7 ETM+ and Landsat 8 OLI/TIRS data are also processed rapidly upon downlink, using predicted ephemeris, initial bumper mode parameters, and initial TIRS line-of-sight model parameters, as Near Real-Time tier Collection 1 data that are made available for immediate download.Once the newly-acquired data have been reprocessed using definitive ephemeris, updated bumper mode parameters and refined TIRS parameters, the data are transitioned (within 14 to 26 days) to either Tier 1 or Tier 2 and are removed from the Near Real-Time tier.
The ARD are generated using the same input raw Landsat data used to generate the Landsat Collection 1 Tier 1 data.This ensures that the ARD can be used reliably for time series applications.Currently, the ARD are defined for Landsat 4, 5, 7 and 8 and so are generated using the Collection 1 processing applied to the Landsat 4 and 5 TM, Landsat 7 ETM+, and Landsat 8 OLI and TIRS data.The ARD are processed using an evolution of the Landsat Product Generation System (LPGS) used to process the Collection 1 data and developed for future Landsat 9 product processing.

ARD Projection
The ARD are defined in the Albers Equal Area Conic map projection referenced to the WGS84 datum.The Albers projection is well suited for regions, such as the CONUS, that are mainly east-west oriented [11] and is also used by a number of U.S. products, including the USGS National Land Cover Database (NLCD) [12] and the United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS) cropland data layer (CDL) [13].The equal area projection is convenient for summarizing the areas of terrestrial attributes, for example, of a particular land cover class, as areas measured in the ARD are directly proportional to their areas on the ground.Only a single set of projection parameters are defined for each ARD region (CONUS, Alaska and Hawaii) (Table 1).This is in contrast to the Landsat UTM projection that has parameters defined for each 6 • of longitude UTM zone and so each UTM zone is effectively a different projection [11].The ARD are generated using the Collection 1 Tier data processing, which uses cubic convolution resampling, but instead of reprojecting the Collection 1 UTM data into the Albers projection, the raw Landsat data are calibrated and reprojected directly into the ARD Albers projection.This means that the ARD are not resampled twice.This is advantageous as resampling degrades the geometric fidelity, either by introducing pixel shifts (for nearest neighbor resampling), or by smoothing the data (e.g., bilinear resampling) and/or introducing artefacts at high contrast edges (e.g., cubic convolution resampling) [14,15].The utility of reprojecting the data directly into Albers is illustrated in Figure 1 that shows (top) ARD data (i.e., resampled once) and (middle) the equivalent data nearest neighbor resampled from Collection 1 UTM into the Albers projection (i.e., resampled twice).The bottom row of Figure 1 shows the absolute difference between the near-infrared reflectance for these two examples.Along high contrast edges, for example, between adjacent vegetated and bare fields, the differences are ≥0.035(maximum absolute difference 0.093)., resampled twice with reduced apparent geometric fidelity), (bottom) the absolute difference between the top and middle row near-infrared reflectance colored as grey (/difference/ < 0.005), blue (0.005 ≤ /difference/ < 0.02), green (0.02 ≤ /difference/ < 0.035), and red (/difference/ ≥ 0.035).Landsat-7 ETM+ reflectance for 400 × 200 30 m pixels acquired 7 October 2016 over agricultural fields in South Dakota.The black stripes are missing data due to the Landsat-7 ETM+ scan line corrector failure.

ARD Tiling
The ARD are defined in the Albers projection in fixed non-overlapping tiles so that spatial mosaics can be easily assembled from adjacent tiles.This is in contrast to the Sentinel-2 Level 1C tiling system that also uses fixed tiles but adjacent Sentinel-2 Level 1C tiles overlap and can be defined in different UTM zones i.e., in different projections [16,17].The Landsat ARD data are straightforward to use because the geographic coordinates of each ARD tile pixel are fixed.This is important because the sensed locations of each Landsat WRS path/row scene are not temporally constant.For example, considering three years of global Landsat data the mean global deviation in the location of the upper left pixel was several kilometers for Landsat 5 TM and Landsat 7 ETM+ in the east-west direction [18] due to changes in the Landsat satellite orbit and sensor attitude [19].Because the geographic coordinates of each ARD tile pixel are fixed, no additional processing and alignment steps are necessary prior to multi-temporal analysis using the ARD.
Figures 2-4 illustrate the ARD tile coordinates and Table 2 summarizes the ARD grid systems for CONUS, Alaska and Hawaii.Each ARD tile is 5000 × 5000 30 m pixels (i.e., 150 × 150 km) and the tiles are referenced in each region by horizontal and vertical tile coordinates based on a previous Landsat tiling scheme [3].The ARD are stored for each Landsat sensor independently and only tiles with Landsat data are produced.This means that tiles with no observations (due to the satellite orbit and sensor geometry) are not produced.To illustrate this, Figure 5 shows a 10 day period of Landsat 5 TM acquisitions over 18 ARD tiles covering 900 × 459 km in the southern United States.The Landsat 4, 5, 7, and 8 sensors have a 15 • field-of-view and acquire an approximately 185 km swath which is larger than the 150 km ARD tile side dimension.The Landsat 4, 5, 7, and 8 sensors are in 98.2 • inclined orbits with a 16-day repeat cycle and each sense a location from the same orbit 22 or 23 times per year.Thus, for the ten day period illustrated in Figure 5 there were only four days with Landsat 5 overpasses and on these days only the ARD tiles containing data were produced.Table 2. CONUS, Alaska and Hawaii ARD tile grid coordinates summary.Each tile is composed of 5000 × 5000 30 m pixels in the Albers projection (Table 1) where h is the tile east-west (horizontal) tile coordinate, v is the tile vertical (north-south) tile coordinate, ulX and ulY are the upper left tile corner Albers coordinates, lrX and lrY are the lower right tile corner Albers coordinates.

Region
Upper Left Tile (UL Corner) Lower Right Tile (LR Corner)    Adjacent Landsat orbit swaths overlap in the across-track direction, with progressively greater overlap further poleward, varying from about 0.1 • at the equator to more than 12.8 • (equivalent >90% of the swath width) at latitude 80 • for the same Landsat sensor [18].This overlap is not an issue because the ARD data are stored for each sensor and each sensor orbit independently.In the along-track direction however overlapping acquisitions occur along the same orbit because of the way the Collection 1 image data are generated and not because the same location is sensed twice on orbit [3].Care is taken to handle this along-track overlap that occurs at the northern and southern ends of successive Landsat images (by typically 10's of km).To handle this, the ARD processing selects the pixels from the northern image rather than store duplicated pixels in the along-track overlap region.Landsat 7 ARD may have pixels selected from the southern image if the northern image pixels in the along-track overlap were missing due to the scan line corrector failure that started on 31 May 2003 [20].

Overview
The ARD stores the Landsat sensor band values at each tiled pixel location.To provide ease of use the band values are stored converted to reflectance and brightness temperature for the reflective and thermal wavelength sensor bands respectively.In addition, the ARD store a number of per-pixel quality assessment (QA) bands.In recognition that users require atmospherically corrected data, and because certain users may wish to develop or apply their own atmospheric correction routines, the ARD includes atmospherically corrected and also top of atmosphere (TOA) reflectance as separate ARD tile bands.

ARD top of Atmosphere Reflectance and Brightness Temperature, and Viewing and Solar Geometry Data
Top of atmosphere (TOA) reflectance (unitless) for each Landsat reflective wavelength band and TOA brightness temperature (units Kelvin) for each thermal band are provided as ARD tile bands.The standard Collection 1 calibration, derived using various on-board and vicarious calibration techniques [21,22], are used to define the transformations between the Landsat instrument data (digital numbers) and TOA radiance and TOA reflectance.In Collection 1, a reflectance-based calibration is used as it has higher accuracy than previous radiance-based calibrations of the reflective wavelength bands [23].The calibrated ARD TOA reflectance for each reflective wavelength band pixel is normalized with respect to the cosine of the solar zenith angle.Landsat thermal calibration constants are derived in the conventional manner considering the thermal band spectral responses [24] and used to derive calibrated TOA brightness temperature, i.e., the temperature of the observed surface if it was a perfect black body.
The solar and viewing geometry (view and azimuth angles) for each 30 m pixel are provided as ARD tile bands in recognition of the need for angular information for advanced Landsat applications and post-processing.For example, they are needed to develop Landsat algorithms to minimize bi-directional reflectance effects [25,26], to estimate albedo [27], and for topographic correction [28,29].The solar angles are derived using the NOVAS version 3.1 software from the U.S. Naval Observatory, that is configured to access the JPL DE421 planetary ephemeris, and are parametrized with the location, date and time of each pixel acquisition.The view geometry is derived as part of the standard Landsat geometric processing.
Table 3 summarizes the TOA and angular bands.Each band is stored in two bytes (INT16 data type).Radiances emitted from very hot or reflective targets (e.g., wildfires, dry white sand) may exceed the dynamic range of the bands.The saturation status is flagged with a saturated pixel value of 20,000 that is outside the possible physical range of reflectance (i.e., 2 = 20,000 × 0.0001) or brightness temperature (i.e., 2000 K = 20,000 × 0.1) of terrestrial surfaces.The saturation status is also stored in an ARD QA band (Section 3.4).

ARD Surface Reflectance Data
Landsat surface reflectance, i.e., the TOA reflectance corrected for atmospheric effects, is needed to derive consistent geophysical and biophysical products because the impact of atmospheric aerosols, gases and water constituents on Landsat reflectance can be significant.For example, the Landsat normalized difference vegetation index (NDVI) is on average 0.1 greater when derived from surface reflectance than from TOA reflectance over CONUS vegetated surfaces [30].Atmospheric gaseous and molecular effects are relatively straightforward to correct but aerosols are difficult to characterize or correct reliably and can be highly variable with significant impacts on the shorter wavelength visible bands [31][32][33].
The Landsat 4 and 5 TM and Landsat 7 ETM+ ARD reflectance bands are atmospherically corrected using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) Surface Reflectance Algorithm [34].The LEDAPS uses the 6SV radiative transfer code [35] with ancillary sea level atmospheric pressure and water vapor characterization and ozone data derived from other sources.The aerosol optical thickness (AOT) is retrieved independently from each Landsat image using an improved version of the dense dark vegetation (DDV) approach [36].The currently available ARD surface reflectance for Landsat 4 and 5 TM and Landsat 7 ETM+ were generated by version 3.1.2(June 2017) of the LEDAPS code.Landsat 7 ETM+ LEDAPS atmospherically corrected data have been validated by comparison with 6SV atmospherically corrected ETM+ data using Aerosol Robotic Network (AERONET) atmospheric characterizations over the CONUS [32].The reported mean residual (and mean reflectance normalized residuals) for the atmospherically corrected data was 0.0053 (11.8%, blue band), 0.0039 (5.7%, green band), 0.0042 (5.9%, red band), 0.0100 (4.8%, NIR band), and 0.0056 (1.0%) and 0.0051 (2.0%) for the two short wave infrared bands.
The Landsat 8 OLI ARD reflectance bands are atmospherically corrected using the Landsat Surface Reflectance Code (LaSRC).The LaSRC has higher OLI atmospheric correction accuracy than the LEDAPS algorithm [37].It uses a modified version of the 6SV radiative transfer code with, in particular, an improved aerosol determination based on red to blue and red to ultra-blue reflectance ratios and a linear function of a spectral index with slope and intercept terms derived from a spatially explicit climatology of MODIS and MISR data [37].The currently available ARD surface reflectance for Landsat 8 OLI were generated by version 1.2.0 (March 2017) of the LaSRC code.Similar to the LEDAPS validation, the accuracy of the LaSRC Landsat 8 OLI reflectance was validated using 6SV atmospherically corrected OLI data with AERONET atmospheric characterizations [37].The mean residuals were 0.0048 (ultra-blue band), 0.0038 (blue band), 0.0025 (green band), 0.0017 (red band), 0.0014 (NIR band), and 0.0004 and 0.0015 for the two short wave infrared bands.
The surface reflectance ARD bands are stored in the same way as the TOA reflectance ARD bands i.e., in two bytes (INT16 data type) as separate GeoTiff files, and with the same (Table 3) band specific scaling factors, and fill values.Currently there is no atmospherically corrected temperature ARD band (see Section 4).

ARD Quality Assessment Bands
Per-pixel quality assessment (QA) information are standard in most systematically generated satellite products and are used to support the needs of the science and applications user community and for product generation and quality assessment processes [38,39].Users typically inspect per-pixel QA information and discard unsuitable and low quality pixel observations from their analyses.There are a large number of QA bands in the ARD that are different for the Landsat 8 OLI ARD compared to the Landsat 4 and 5 TM and Landsat 7 ETM+ ARD due to differences in the surface reflectance product generation algorithms for these sensors.4 summarizes the five QA bands, of which three are stored in a bit-packed format.They contain information that are passed through from the Collection 1 processing and also include information pertaining to the LEDAPS atmospheric correction that should be consulted if the surface reflectance Landsat 4 and 5 TM or Landsat 7 ETM+ ARD are used.The Pixel QA band is a bit-packed band with bits set to 1 to denote the surface state, cloud information, or fill value (Table 5).These information are based on the CFmask cloud algorithm described in [40,41].Some modifications have been implemented most notably a cloud dilation is performed in order to label pixels adjacent to clouds in which the signal could be cloud contaminated.The standard bit-packing convention is used for this and the other bit-packed QA bands.For example, if an ARD tile pixel is flagged as only water then at that pixel location all bits of the Pixel QA band are set to zero except bit 2 that is set to 1 (providing a decimal Pixel QA band value of 2 2 = 4).Similarly, for example, if an ARD tile pixel is flagged as cloud with median cloud confidence then bit 5 is set as 1 and bits 6 and 7 are set to 0 and 1 (providing a decimal Pixel QA band value of 2 5 + 2 7 = 160).4) bit-packing detail.Bits are numbered from right to left where bit 0 = the Least Significant Bit (LSB), bit 15 = the Most Significant Bit (MSB), and bits 8-15 are reserved for future use.The Radiometric Saturation QA band is a bit-packed representation (Table 6) of which sensor bands are saturated.Highly reflective surfaces, including snow and clouds, and sun-glint over water bodies, may saturate the reflective wavelengths Landsat 4 and 5 TM and Landsat 7 ETM+ bands, with saturation varying spectrally and with the illumination geometry.For example, in a study examining approximately 60 million Landsat 7 ETM+ observations extracted from three winter and three summer months over the CONUS more than 10% of the blue and red band Landsat 7 ETM+ pixel values were saturated [42].In general, saturated ARD tile pixel data are unreliable.The Internal SR QA and Internal SR Atmospheric Opacity bands are generated by the LEDAPS atmospheric correction code (Section 3.3).The Internal SR QA band (Table 7) documents the cloud and surface state flags generated by the LEDAPS and are provided for advanced users and for product quality assessment.Most users are recommended to use the Pixel QA band cloud and shadow information (Table 5) as only these cloud and cloud shadow bits have been validated [41].The Internal SR Atmospheric Opacity is stored as single scaled float value and is proportional to the aerosol optical thickness (i.e., the greater the atmospheric opacity, the greater the aerosol optical thickness).ARD surface reflectance pixels with high Atmospheric Opacity values will be less reliable, particularly under high solar zenith angle conditions.The Lineage QA band is provided to support downstream processing and uses the same concept as described in Roy et al. [30] (see Figure 1).Specifically, a per-pixel index value (1, 2 or 3) is stored at each ARD tile pixel location that refers to the Landsat input image filename.The Landsat input image filename and index values are stored as ARD tile metadata.In this way the Landsat input image filename can be determined at each tile pixel.This is important as although the ARD tiles store the data from each Landsat sensor orbit independently (Figure 5) up to three Landsat input images may encompass an ARD tile in the along-track direction.

Landsat 8 OLI ARD Quality Assessment Bands
Four summary QA bands are stored (Table 8).They are similar to the Landsat 4 and 5 TM, and Landsat 7 ETM+ ARD summary quality assessment bands (Table 4) but some of the bit-packed bands contain more information due to the greater number of Landsat 8 OLI bands.The Pixel QA band (Table 9) is a bit-packed band.The cloud, cloud confidence, cloud shadow, and snow/ice QA values are derived using the CFMask algorithm [40,41].The cirrus and cirrus confidence QA values are derived from the OLI band 9 (1.375 µm) reflectance which is a new band in the Landsat sensor series [43].Cirrus clouds scatter and absorb radiation, and can be quite prevalent, for example, 6.9% of a year of Landsat 8 OLI CONUS observations were flagged as high confidence cirrus but low confidence cloud [44].8) bit packing detail.Bits are numbered from right to left (bit 0 = LSB, bit 15 = MSB); bits 11-15 are reserved for future use.The Radiometric Saturation QA band (Table 10) is a bit-packed representation of which sensor bands were saturated.Saturation in Landsat 8 is uncommon due to the higher 12-bit radiometric resolution and dynamic range compared to heritage Landsat sensors [22].For example, in a study examining approximately 60 million Landsat 8 OLI observations extracted from three winter and three summer months over the CONUS less than 0.0025% were saturated [42].When saturation does occur in Landsat 8 OLI, it is typically over volcanoes and wildland fires [45,46].Saturation may manifest as very high or very low (including negative) reflectance values [22].As noted above, in general, the values stored in saturated ARD tile pixel bands are unreliable.

Bit Interpretation
The Aerosol QA band (Table 11) is generated by the LaSRC atmospheric correction code (Section 3.3) and is provided for advanced users and for ARD product quality assessment.These attributes provide insight into the quality of the atmospheric correction surface reflectance retrieval.Bit 1 indicates whether the aerosol inversion model ran successfully and the aerosol retrieval is valid, whereas bit 2 denotes whether the aerosol estimation for a pixel was interpolated from surrounding pixels.Bits 3, 4, and 5 indicate whether a given pixel was identified as water, whether the aerosol retrieval failed thus requiring the value to be interpolated from surrounding pixels, or whether the given pixel is adjacent to a pixel for where the aerosol retrieval failed.Bits 6 and 7 indicate whether the aerosol values were either derived from an external source (i.e., MODIS atmosphere climate modeling grid), or the relative aerosol concentration estimated from the aerosol inversion model.The Lineage QA band has the same format and rationale as for Landsat TM and ETM+ ARD tiles, and also accompanies the Landsat 8 ARD tiles.

ARD Filename Convention, Format, Metadata and Documentation
The ARD are stored for each Landsat sensor independently and, as illustrated in Figure 5, only tiles with Landsat data are produced.The Landsat bands for each tile are stored as separate GeoTiff files with band specific scaling factors and fill values to denote no data was observed at an ARD tile pixel location (for example, the black pixels in Figure 5 are fill values).The GeoTiff files are generated with internal Deflate compression to reduce storage volume.Each ARD band is stored with a human-readable filename structure that can be parsed by programming scripts as: LXSS_US_HHHVVV_YYYYMMDD_yyyymmdd_CCC_VVV_BAND.tifwhere BAND refers to the band name and is set as TAB<nρ> = top of atmosphere reflectance for band <nρ>, SRB<nρ> = surface reflectance for band <nρ>, BTB<nT> = brightness temperature for band <nT>; or is set as SEA4 = SEnor Azimuth, SEZ4 = SEnsor Zenith, SOA4 = SOlar azimuth, SOZ4 = SOlar zenith (all the angles are defined for band 4 as it is the closest band to the center of the focal plane for all Landsat sensors); or set as PIXELQA = Pixel QA, RADSATQA = Radiometric Saturation QA, SRCLOUDQA = Internal SR QA, SRATMOSOPACITYQA = Internal SR Atmospheric Opacity, LINEAGEQA = Lineage QA, or SRAEROSOLQA = Aerosol QA (see Tables 4 and 8).
The GeoTiff files for an ARD tile are bundled together into a "compressed tarball" format with a filename structure: LXSS_US_HHHVVV_YYYYMMDD_yyyymmdd_CCC_VVV_PRODUCT.tar where L refers to the mission (Landsat); X refers to the sensor ("C" = OLI/TIRS Combined, "O" = OLI-only, "T" = TIRS-only, "E" = ETM+, "T" = TM); SS is the Landsat satellite number (04-08); US designates which region of the United States ("CU" = CONUS, "AK" = Alaska, "HI" = Hawaii); HHHVVV corresponds to the horizontal and vertical tile location (Figures 2-4); YYYYMMDD is the sensor acquisition year, month and day; yyyymmdd is the production year, month and day; CCC is the input data Collection number (currently set as C01 to reflect Collection 1); VVV is the ARD processing version number (e.g., V01), and PRODUCT refers to the product type (SR for surface reflectance, BT for brightness temperature, TA for top of atmosphere reflectance, and QA for quality assurance).Each ARD tile is accompanied by a standards-compliant metadata file to ensure processing traceability, data provenance documentation, and enable tile-level data selection based on metadata queries.The metadata is human-readable and includes tile metadata that summarize the non-fill tile pixel values, tile spatial and temporal attributes, and provenance and Landsat processing information.The metadata filename structure is based on the "compressed tarball" filename structure but with a different suffix:

LXSS_US_HHHVVV_YYYYMMDD_yyyymmdd_CCC_VVV.xml
The metadata and detailed ARD product specifications, packaging, and format characteristics, are provided in LSDS-1873 U.S. Landsat Analysis Ready Data (ARD) Data Format Control Book (DFCB).

ARD Future Revision Schedule
The ARD are defined for Landsat 4, 5, 7 and 8 and so are generated from Collection 1 Landsat 4 and 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) data.The ARD is not generated for the Landsat Multispectral Scanner (MSS) data.The ARD processing will be expanded to include the MSS data that make up a significant portion of the early Landsat data record acquired from 1972 to 2012 systematically by Landsat 1-3 and intermittently by Landsat 4 and 5 [2].ARD MSS processing is challenging because the spatial and spectral resolution of the MSS is significantly coarser than later Landsat sensors.In particular, MSS had reduced spatial resolution (57 m × 79 m pixels) and no shortwave infrared or thermal wavelength bands and so reliable MSS atmospheric correction and cloud masking is difficult to maintain on a systematic basis [47,48].
The ARD will be reprocessed when the Collection 1 Landsat data are reprocessed to Collection 2. In particular, the Collection 1 Landsat data will be reprocessed to Collection 2 to improve its absolute geolocation accuracy.The Collection 1 Tier 1 data are well aligned through time but at certain WRS path/rows the data can be misaligned relative to absolute true location.This is because the Collection 1 Landsat ground control framework over the CONUS, Alaska and Hawaii, and may other parts of the globe, is based upon Global Land Survey Landsat images that contain residual geolocation errors that vary geographically [7].The Landsat framework will be readjusted for consistency with the Sentinel-2 Global Reference Image [49] with completion expected in 2019 and then the Collection 1 data will be reprocessed as Collection 2. The resulting Collection 2 processing will be used to reprocess the ARD archive.
A future version of the ARD will include land and inland and coastal water surface temperature data [50,51].Conventional split window surface temperature algorithms cannot be used because Landsat 4 and 5 TM and Landsat 7 ETM+ acquire thermal radiance in only a single wavelength band and because the two Landsat 8 TIR bands have stray light contamination that preclude reliable split window retrieval [52].Instead, North America Regional Reanalysis (NARR) data from the National Center for Environmental Prediction (NCEP) will be used to characterize the atmosphere (air pressure, temperature, and relative humidity) and used with the MODTRAN radiative transfer code to calculate atmospheric transmission, upwelling and downwelling radiances for multiple levels in the atmosphere.The brightness temperature will then be adjusted for emissivity based on ASTER based emissivity maps to derive surface temperature.Uncertainty metrics will be based on factors including the pixel's proximity to clouds [53].The approach provides Landsat 5 and 7 land surface temperature estimates with a root mean square error of approximately 2.2 K over land [50].
Each band of ARD data are provided in GeoTIFF format with internal Deflate compression that precludes direct read-access of ARD tile sub-regions.This enforces a spatial tile-level processing granularity that is not well suited for time-series applications that work on small areas.In particular, this is sub-optimal for large data volume processing in high performance and cloud computing environments.A future version of the ARD will use a Cloud-Optimized GeoTIFF format that enables more efficient random access and that also enables HTTP range get operations.Furthermore, the ARD will be available as separate, directly-readable bands instead of the current "compressed tarball" format, which must be uncompressed and untarred before any of the bands are readable.
In the future, the ARD will be applied to the Landsat 9 data.Landsat 9 will be placed into the current Landsat 7 orbit and will carry a refined version of the Landsat 8 sensor payload [54] and so Landsat 9 ARD is expected to be straightforward to generate.

Discussion
The Landsat ARD were only recently made available in 2017 but their use has already been demonstrated.For example, the utility of ARD for 30 m percent tree cover mapping was demonstrated using five years of Landsat 5 TM and 7 ETM+ ARD over 12 tiles encompassing Washington State and the impact of different ARD processing levels on mapping accuracy examined [55].For example, a year of Landsat 5 TM and Landsat 7 ETM+ ARD were used to compare the capability of different harmonic time series models to monitor crop evolution at six CONUS ARD tiles [56].Notably, the USGS EROS is developing the Land Change Monitoring, Assessment, and Projection (LCMAP) program that will systematically use all the ARD to generate annual land cover maps for 1985 through 2018 and generate near real-time change maps [57].The monitoring element of LCMAP will be executed through the continuous change detection and classification (CCDC) algorithm [58] that is straightforward to implement using the ARD as an input.
The Landsat ARD provide a medium resolution analogue of some of the MODIS land products that were developed with the goal of providing analysis ready and higher-level products in support of global change science and applications [59].Like the MODIS land products, that are subject to ongoing systematic quality assessment and reprocessing, the Landsat ARD will be reprocessed as needed to reflect improved sensor calibration and geolocation knowledge.This is only possible due to major improvements in computer processing capabilities and reducing costs.As of this writing there are more than 5.6 million Landsat acquisitions sensed globally by Landsat 1 to 8 in the U.S. Landsat archive.The ARD are available for the CONUS, Alaska and Hawaii.Subject to the availability of sufficient resources, global ARD production is planned.The driving factor towards defining specifications for a global ARD is the programmatic desire to make this available in time for planned Landsat Collection 2 reprocessing and the availability of Landsat 9 data.The Landsat Science Team have strongly endorsed the concept of global coverage ARD and systematically derived higher-level Landsat products [4].However, currently, through an informal survey of Landsat stakeholders, including the Landsat Science Team and Landsat International Cooperator ground station managers, there is no unanimous preference for a single cartographic projection and tiling scheme suitable for global ARD production.
The CEOS ARD for Land (CARD4L) working group is seeking to encourage the delivery of ARD from space agencies to improve the interoperability of the rapidly increasing volumes of moderate resolution optical imagery (http://ceos.org/ard/).The Landsat ARD provide an important step in this direction.However, ensuring data interoperability and seamless sensor combination or harmonization with non-Landsat data is non-trivial, and, for example, research on this for Landsat and the Landsat-like Sentinel 2 data is underway [7,[60][61][62].

Conclusions
The USGS release of the Landsat archive as free and open data in 2008 meant that users no longer needed to carefully select and use only a small number of cloud-free Landsat images.The systematic production and release by the USGS of the Landsat ARD in 2017 further reduces barriers to the use of Landsat data.The ARD provides a consistent set of geophysical parameters derived with per pixel quality attributes and standardized metadata.Users who are not expert in pre-processing Landsat data no longer need to be.Users who wish to undertake large area and long time series analyses no longer need to invest in computationally expensive geometric alignment and atmospheric correction processing chains to pre-process data.Users who wish to share science and applications results developed from Landsat data can use the ARD to reduce the potential for discrepancies in results due to differences in pre-processing.The ARD is a major step forward in the long history of Landsat [63] and enables the automated processing of the Landsat archive needed for large area and long-time period analyses.The ARD are available for search and download from https://earthexplorer.usgs.gov/.

Figure 1 .
Figure 1.Example of the difference between (top) true color ARD reflectance defined in the Albers projection, (middle) the equivalent data nearest neighbor resampled from Collection 1 UTM into the Albers projection (i.e., resampled twice with reduced apparent geometric fidelity), (bottom) the absolute difference between the top and middle row near-infrared reflectance colored as grey (/difference/ < 0.005), blue (0.005 ≤ /difference/ < 0.02), green (0.02 ≤ /difference/ < 0.035), and red (/difference/ ≥ 0.035).Landsat-7 ETM+ reflectance for 400 × 200 30 m pixels acquired 7 October 2016 over agricultural fields in South Dakota.The black stripes are missing data due to the Landsat-7 ETM+ scan line corrector failure.

Figure 1 .
Figure 1.Example of the difference between (top) true color ARD reflectance defined in the Albers projection, (middle) the equivalent data nearest neighbor resampled from Collection 1 UTM into the Albers projection (i.e., resampled twice with reduced apparent geometric fidelity), (bottom) the absolute difference between the top and middle row near-infrared reflectance colored as grey (/difference/ < 0.005), blue (0.005 ≤ /difference/ < 0.02), green (0.02 ≤ /difference/ < 0.035), and red (/difference/ ≥ 0.035).Landsat-7 ETM+ reflectance for 400 × 200 30 m pixels acquired 7 October 2016 over agricultural fields in South Dakota.The black stripes are missing data due to the Landsat-7 ETM+ scan line corrector failure.

Figure 2 .
Figure 2. CONUS horizontal and vertical ARD tile coordinate maps, each tile covers 150 × 150 km and is composed of 5000 × 5000 30 m pixels.

Figure 3 .
Figure 3. Alaska horizontal and vertical ARD tile coordinate maps, each tile covers 150 × 150 km and is composed of 5000 × 5000 30 m pixels.

Figure 4 .
Figure 4. Hawaii horizontal and vertical ARD tile coordinate maps, each tile covers 150 × 150 km and is composed of 5000 × 5000 30 m pixels.

Figure 5 .
Figure 5. Example 10 day period (19 July to 28 July 2011) of Landsat 5 TM ARD for 18 ARD tiles (h18 to h23, v12 to v14) over Arkansas, northern Mississippi and Alabama, south west Tennessee and north west Georgia, United States.True color surface reflectance ARD are shown for the four days (20, 22, 24 and 27 July 2011) that had Landsat 5 TM observations.White ARD tiles with black boundaries indicate no Landsat 5 TM ARD was produced on that day.No Landsat 5 TM overpasses occurred, and so no ARD data was produced, on 19, 21, 23, 25, 26, and 28 July.

Figure 5 .
Figure 5. Example 10 day period (19 July to 28 July 2011) of Landsat 5 TM ARD for 18 ARD tiles (h18 to h23, v12 to v14) over Arkansas, northern Mississippi and Alabama, south west Tennessee and north west Georgia, United States.True color surface reflectance ARD are shown for the four days (20, 22, 24 and 27 July 2011) that had Landsat 5 TM observations.White ARD tiles with black boundaries indicate no Landsat 5 TM ARD was produced on that day.No Landsat 5 TM overpasses occurred, and so no ARD data was produced, on 19, 21, 23, 25, 26, and 28 July.

Table 1 .
Albers Equal Area Conic map projection parameters for the CONUS, Alaska and Hawaii ARD (parameters defined with respect to the WGS84 datum).

Table 2 .
CONUS , Alaska and Hawaii ARD tile grid coordinates summary.Each tile is composed of 5000 × 5000 30 m pixels in the Albers projection (Table1) where h is the tile east-west (horizontal) tile coordinate, v is the tile vertical (north-south) tile coordinate, ulX and ulY are the upper left tile corner Albers coordinates, lrX and lrY are the lower right tile corner Albers coordinates.

Table 3 .
ARD Top of Atmosphere bands for Landsat 4, 5, 7 and 8, where nρ* and nT* correspond to reflective and thermal wavelength band numbers respectively (band numbers vary among the Landsat sensors); and Landsat view and solar angle bands, INT16 signifies 16-bit signed integer.