www.mdpi.com/journal/remotesensing Article Phenological Classification of the United States: A Geographic Framework for Extending Multi-Sensor Time-Series Data

This study introduces a new geographic framework, phenological classification, for the conterminous United States based on Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) time-series data and a digital elevation model. The resulting pheno-class map is comprised of 40 pheno-classes, each having unique phenological and topographic characteristics. Cross-comparison of the pheno-classes with the 2001 National Land Cover Database indicates that the new map contains additional phenological and climate information. The pheno-class framework may be a suitable basis for the development of an Advanced Very High Resolution Radiometer (AVHRR)-MODIS NDVI translation algorithm and for various biogeographic studies.


Introduction
Phenology is the study of the timing of recurring biological events [1] and examines the causes and consequences of biotic and environmental interactions.Phenology is usually influenced by photoperiod, precipitation, soil and air temperature, solar illumination, and other life-controlling factors [2][3][4].The phenology of ecosystems and its connection to climate is a key to understanding ongoing global climate and land surface changes [5].Phenology has historically been studied using ground-based observations of the timing of vegetation and animal pheno-phases such as germination, flowering, hibernation, and bird migration.Satellite observations provide continuous spatial and temporal coverage enabling scientists to assess and model seasonal dynamics and phenological variability of landscapes across large areas [6][7][8][9][10][11][12][13][14][15][16].Time series of NDVI data derived from visible red and near-infrared bands [17] have been used to calculate phenological metrics [12,13,15,18].
Regionalization is a geographic or spatial type of classification that identifies, generalizes, and maps landscape patterns [19,20].The outcome of regionalization is a geographic framework that reduces the complexity of the domain to something that is more manageable and understandable.For example, the United States has been subdivided into ecoregions [21,22].The United States was also classified into land cover classes by the National Land Cover Database (NLCD) [23] using Landsat Thematic Mapper data.Both approaches produced very different geospatial characterizations of the same landscape, and both have proven to be valuable to the user community.Yet another way to subdivide, generalize, and characterize landscapes is by using phenology.Recently, White et al. [24] developed a global pheno-region database as a geographic framework for studying global climate change.White et al. [24] used 8-km AVHRR time series NDVI data (1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999) in conjunction with an eight-element monthly global climatology to generate global pheno-regions representing regions with a minimized probability of non-climatic forcing.Hargrove et al. [25] recently derived 15 phenological ecoregions based on clustering the similarities in five years (2002)(2003)(2004)(2005)(2006) of cumulative MODIS NDVI data.Each year consists of 22 cumulative NDVI images based on the 23 composite periods per year.
AVHRR NDVI data have been proven to be valuable inputs for operational monitoring (e.g., fire danger monitoring, phenology, and drought) [12,26,27].Although the MODIS mission (aboard Aqua and Terra platforms) is primarily research oriented, a similar opportunity exists to use these data streams for monitoring [28,29].MODIS data have high spectral and spatial resolutions and provide an important bridge to the upcoming Visible/Infrared Imager Radiometer Suite (VIIRS) mission [30,31] because of the very similar radiometric characteristics between these two sensors.Characterizing multi-sensor long-term time-series vegetation index data and cross-sensor continuity is important [32] for monitoring climate impacts on vegetation response (e.g., vegetation drought monitoring).Satellite vegetation monitoring often relies on establishing baselines from NDVI time series to measure seasonal and interannual variability as deviations (or anomalies) from the established baselines [7,27,33].
Phenological classification, which characterizes and stratifies the land surface based on similar phenological patterns, may be a logical choice and suitable basis for a multi-sensor (e.g., AVHRR-MODIS) NDVI data translation algorithm to seamlessly extend the U.S. NDVI data record.This translation algorithm considers interannual variability and seasonality of biotic responses to adjust the cross-sensor continuity translation equations and requires information about the spatial heterogeneity of a representative range of land surfaces.The translation algorithm will be based on each pheno-class and will be used to extend the MODIS data record back to 1989 by tying it to the AVHRR NDVI data record.The resulting extended time series can be implemented within an operational vegetation drought monitoring system [33].Here, we use the term "pheno-classes" to describe the regions identified by similar phenological patterns.These pheno-classes have unique spectral and temporal signatures.
The main goal of this study is to introduce a new geographical framework that identifies a set of regions with similar phenological patterns (or pheno-classes) based on land surface phenological metrics (timing and magnitude of NDVI) and elevation gradients.A similar approach by Hargrove [25] created fewer pheno-classes and is based on statistical clustering of five years of seasonal cumulative NDVI values without using explicit timing metrics.We first describe the methodology for generating the pheno-class map for the conterminous United States.Then we evaluate the characteristics of the new pheno-class map by comparing it to the 2001 NLCD land cover map [23].

Study Area and Datasets
This study focused on the conterminous United States and used MODIS 16-day 1-km surface reflectance data (MOD43B4, Collection 4) for generating phenological variables and developing of the pheno-class map.The MODIS surface reflectance data were obtained from the Land Processes Distributed Active Archive Center [34].NDVI values were calculated according to the following equation: where ρ Red and ρ NIR are the reflectance values for MODIS bands 1 (620-670 nm) and 2 (841-876 nm), respectively.The 16-day NDVI composites were sequentially stacked to generate a five-year (2000-2004) time series.NDVI time-series data were filtered by the band quality flags and then smoothed (i.e., filtered) using a weighted least-squares approach to reduce any residual atmospheric noise [35].
Because elevation and corresponding vegetation gradients strongly influence phenological cycles [36,41], we incorporated a USGS 1-km DEM data for the conterminous United States to characterize topographic effects in our phenological classification.Land cover type data were obtained from the 2001 NLCD [23], which is based on multiple Landsat data transforms, elevation data, and ancillary data at a nominal resolution of 30 m.A 1-km ecoregion map generated from the Omernik level III ecoregion data [21] was compared with the newly developed pheno-class map.The ecoregion framework divides the landscape into a series of geographic regions with similar ecosystems and environmental resources that were identified using both biotic (e.g., vegetation and wildlife) and abiotic (e.g., climate, geology, hydrology, land use, and physiography) criteria.

Methodology
Our phenological classification methodology for the conterminous United States consisted of three steps.First, nine phenological metrics were derived from the five years (2000-2004) of 1-km MODIS NDVI data.We used software developed at the USGS Earth Resources Observation and Science (EROS) Center to calculate these metrics.Second, principal component analysis (PCA) was applied to the derived metrics and elevation values using ENVI software (ITT visual Information Solutions).Finally, iterative self-organizing data analysis (ISODATA) clustering was performed on the six principal component (PC) bands to generate the pheno-class map.

Derivation of Phenological Metrics
Land surface phenological variables were calculated based on the seasonality of the NDVI time-series data using a delayed moving average (DMA) method [12,13].The nine phenological metrics used in the analysis were start-of-season time (SOST), start-of-season NDVI (SOSN), end-of-season time (EOST), end-of-season NDVI (EOSN), maximum NDVI (MAXN), maximum NDVI time (MAXT), duration of season (DUR), amplitude of NDVI (AMP), and seasonal time integrated NDVI (TIN).To reduce noise and missing values (e.g., where no start of season was identified in the data) and to characterize the "normal" phenology of the land surface, we calculated median values from the five years of data for each of the phenological metrics.

Data Normalization and Principal Component Analysis
Because of the large variability of the units and ranges of the nine phenological metrics and DEM data (e.g., the data for SOST ranges from 0 to 365 days and the data for the DEM ranges from 0 to 14,018 ft), data normalization for the nine phenological metrics and the DEM data was necessary to allow these datasets to be comparable (i.e., in a similar scale).This also resulted in a more even contribution of each input data layer into the principal component analysis.All input variables were transformed to comparable units using a z-score: where x, x , and σ represent each data value, dataset mean, and dataset standard deviation, respectively.
In order to reduce noise and data dimensionality, we used PCA to generate new uncorrelated PC variables for clustering [37].The PC bands with very low eigenvalues usually represent less data variance and more noise associated with the original data.In our analysis, the first six PC bands that contained more than 99% of the total variance were subsequently used in our classification.Regions of water and areas outside of the conterminous United States were masked out in the six PC bands.

Pheno-Class Derivation Using Unsupervised Isodata Technique
The ISODATA unsupervised classification is a widely used clustering technique that classifies all pixels based on iteratively recalculating the cluster means [38].In this study, we applied the ISODATA classification method to generate the pheno-class map for the conterminous United States.Since the number of classes created by ISODATA is a user specified input, criteria to determine that number are helpful.We tested different numbers of pheno-classes (e.g., 25, 30, 40, and 50) to determine the optimum number of pheno-classes.

Evaluation of Derived Pheno-Class Map
To evaluate the final U.S. pheno-class map, we compared it to the 2001 NLCD map [23] and the Omernik ecoregions map [21] and computed Minnick's coefficients to assess the degree of overlap between the pheno-classes and vegetation cover classes.Minnick's coefficient (C m ), which represents the fraction of how much two classes overlap to overall union of the two classes, is also calculated to identify individual class associations [39,40]: where AB is the intersection of classes A and B (common area), and AB is the union of A and B (areas in A or B as well as areas in both A and B).

Examples of the Five-Year Median Phenological Metric Maps
Figure 1 shows some examples of key phenological metrics (SOST, EOST, SOSN, and EOSN) and the DEM map used to generate the pheno-classes.Specific phenological and DEM patterns within the conterminous United States are apparent in Figure 1.For example, the median start-of-season time for the winter wheat region in Oklahoma and Texas is commonly in December, and the end-of-season time for this region ranges from May to June.

Pheno-Class Map for the Conterminous United States
Iterative testing of the number of pheno-classes indicated that many isolated points existed in the western United States in the ISODATA classification when the number of pheno-classes approached 50.Conversely, only 3-4 pheno-classes represented the eastern United States if the target number of pheno-classes was less than 30.Based on the existing land cover patterns (2001 NLCD map) and above testing criteria (visually and qualitatively by balancing the number of phenoclasses in the United States), we decided to use 40 pheno-classes within the conterminous United States for this study.Figure 2 shows the final 40 pheno-classes for the conterminous United States (one pheno-class is the background) compared with the 2001 NLCD map and U.S. ecoregions map.A noticeable feature is the comparatively high number of classes located in the western half of the United States compared to the east.A likely explanation for these two contrasting spatial patterns is the relatively large variability in elevation and climate in the western United States.Elevation and corresponding vegetation gradients strongly affect phenological cycles [36,41].
Visual comparisons between the phenological framework derived classes and the 2001 NLCD land cover classes show a number of similarities and differences.For example, the Midwestern corn belt, a region that is distinguished by both phenological and agricultural homogeneity, is similar in both datasets.Conversely, differences can be seen for the conifer forest class.As depicted by NLCD, conifer forest is found across large geographic space and thrives in different environmental and climate niches (e.g., southeastern loblolly pine forest and southwestern pinyon-juniper forest).Our results show that this one NLCD land cover class (i.e., conifer forest) contains multiple pheno-classes.The results from our intercomparison and Minnick's coefficients in the next sections provide some insights into the correspondence between pheno-classes and land cover classes (NLCD).Some examples of the unique phenological and DEM features for several random selected pheno-classes (i.e., pheno-classes 7, 17, 27, and 37) are shown in Figure 3.The mean values for nine phenological metrics and DEM show both the characteristics and the basis for separating these classes in this geographic framework.More in-depth analysis results for certain pheno-classes will be presented in section 4.4.
Visual comparisons between the pheno-class map and the U.S. ecoregions were performed in this study.The ecoregions within the conterminous United States are delineated in black on the pheno-class map in Figure 2c.Results show both the consistency and the differences between these two maps.For example, pheno-class 32, which represents the Midwestern corn belt in the pheno-class map, is related to ecoregions 47 (Western Corn Belt Plains), 54 (Central Corn Belt Plains), and 55 (Eastern Corn Belt Plains), demonstrating the consistency of these two maps.On the other hand, many ecoregions contain multiple pheno-classes (Figure 2c), revealing the differences between these two frameworks.

Intercomparison of Pheno-Class and Land Cover
To provide further explanation of the phenological classes, we performed an intercomparison between the pheno-class map and the 2001 NLCD.The concurrent geographic overlaps for the two systems (i.e., the percentage of land cover types in each pheno-class) are listed in Table 1.Percentage ranges 26-50%, 51-75%, and 76-100% are highlighted in yellow, aqua, and pink, respectively.Results from Table 1 demonstrate again that one land cover type will be represented by many different pheno-classes.For example, the main land cover type for pheno-classes 8 and 17 is shrub/scrub cover (~80% for each pheno-class).However, these two pheno-classes have distinctly different phenological and elevational characteristics (Figure 4).The geographic locations of these two pheno-classes are also shown in Figure 4.This illustrates how a single land cover class can exhibit multiple phenological responses to environmental conditions, climate variability, plant communities, and topography.As a result of these differences, pheno-classes 8 and 17 (scrub/shrub cover) are separated from each other by this methodology.We deduce that the pheno-class map provides unique information that can augment that provided by the 2001 NLCD map.Table 2 provides the C m values calculated for each combination of pheno-class and land cover class.Examination of the C m values of 0.05 or higher (indicating a moderate to high spatial association) show that about half of the NLCD classes correlate with one to up to seven pheno-classes (Table 2).Many combinations showed lesser overlap between land cover and pheno-classes (0.02 < C m < 0.05).Strong associations (C m > 0.10) included values as high as 0.41 between cultivated crops and pheno-class 32.C m values of 0.10 or higher occurred for pheno-classes correlating with deciduous forest, evergreen forest, shrub/scrub, grassland/herbaceous, pasture/hay, cultivated crops, and woody wetlands.

In-depth Analysis of Selected Pheno-Classes
Four representative pheno-classes (4, 9, 32, and 37) were chosen to further illustrate the basis of this geographic framework.The dominant land cover types (from the NLCD) for pheno-classes 4, 9, 32, and 37 are 76% evergreen forest, 83% shrub, 80% cultivated crops, and 46% deciduous forest and 20% pasture/hay, respectively.The mean non-normalized values of SOST, EOST, and TIN for these pheno-classes are plotted in Figure 5 as well as their geographic distributions.We evaluated the non-normalized SOST, EOST, and TIN to help describe the phenological and elevation characteristics of these representative pheno-classes.We found that pheno-class 9 has much lower TIN (<7) than pheno-classes 32 and 37 (>38).This suggests that less seasonal greenness magnitudes and variation of shrub cover resulted in a very low time integrated NDVI.On the other hand, pheno-classes 32 and 37 have much higher TIN due to the significant seasonal greenness variations typically found for crops and forest.The NDVI time-series (2000-2004) data for pheno-classes 4, 9, 32, and 37 (mean values) are presented in Figure 6 to further illustrate the phenological trajectories for these four pheno-classes.The mean start-of-season and end-of-season times for pheno-class 9 are earlier than for pheno-classes 32 and 37 (Figure 5).The duration of the season for pheno-class 37 is nearly one month longer than for pheno-class 32, indicating that forest (and grass) for this region has a longer growing season than agriculture.Pheno-class 4 (evergreen forest) showed the longest season (~10 months) (Figure 5).

Conclusions
This study introduces a new geographic framework that is based on phenology and elevation characteristics.The new pheno-class map for the conterminous United States was generated based on nine phenological metrics derived from satellite MODIS NDVI and USGS DEM data.The pheno-class regionalization provided a geographic framework for multi-sensor data translation research because it provides information about the magnitude, timing, and spatial variability of biological phenomena.Compared to the 2001 NLCD map, the pheno-class map contained additional information about regional phenology and phenological spatial patterns.
The original proposed use for this pheno-class framework is to support the development of a pheno-class based AVHRR-MODIS NDVI translation algorithm to seamlessly extend our multi-sensor NDVI data record.Additional research to accomplish this is underway.However, we believe that the new pheno-class map described in this study will be useful for multiple purposes.These new pheno-classes can be used to further explore biogeographic patterns that may emerge as a result of diversification, migration, and extinction of species and communities in response to environmental changes and variability.The new pheno-class map may also have additional value as a regional framework for other research and applications related to land surface monitoring and assessment [9], including agricultural monitoring, real-time drought monitoring, ecological modeling of the impact of global changes [42] on biodiversity and invasive species, and biogeochemical modeling to assess carbon sequestration and impacts of climate on our dynamic environment.

Figure 1 .
Figure 1.Examples of phenological metrics maps derived from MODIS NDVI time-series data between 2000 and 2004.The DEM was used as a proxy for climate.(a) SOST and EOST; (b) SOSN and EOSN; (c) TIN and DEM maps.

Figure 4 .
Figure 4. Mean values of the nine phenological metrics and the associated elevation data for pheno-classes 8 and 17.The geographic distributions of locations for these two phenoclasses are also shown in the figure.(a) NDVI-based phenological metrics; (b) timingbased phenological metrics and DEM.

Table 1 .
Each pheno-class represents a range of different land cover types, which are expressed as a percentage of each coinciding land cover area over the total pheno-class area.Percentage ranges 26-50%, 51-75%, and 76-100% are highlighted in yellow, aqua, and pink, respectively.

Table 2 .
C m values show the degree of overlap for each combination of pheno-class and NLCD-based land cover class.C m values ranges 0.02-0.05,0.05-0.1,and 0.1-0.5 are highlighted in yellow, aqua, and pink, respectively.