Sub-Pixel Classification of MODIS EVI for Annual Mappings of Impervious Surface Areas

Regular monitoring of expanding impervious surfaces areas (ISAs) in urban areas is highly desirable. MODIS data can meet this demand in terms of frequent observations but are lacking in spatial detail, leading to the mixed land cover problem when per-pixel classifications are applied. To overcome this issue, this research develops and applies a spatio-temporal sub-pixel model to estimate ISAs on an annual basis during 2001–2013 in the Jakarta Metropolitan Area, Indonesia. A Random Forest (RF) regression inferred the ISA proportion from annual 23 values of MODIS MOD13Q1 EVI and reference data in which such proportion was visually allocated from very high-resolution images in Google Earth over time at randomly selected locations. Annual maps of ISA proportion were generated and showed an average increase of 30.65 km2/year over 13 years. For comparison, a series of RF per-pixel classifications were also developed from the same reference data using a Boolean class constructed from different thresholds of ISA proportion. Results from per-pixel models varied when such thresholds change, suggesting difficulty of estimation of actual ISAs. This research demonstrated the advantages of spatio-temporal sub-pixel analysis for annual ISAs mapping and addresses the problem associated with definitions of thresholds in per-pixel approaches.


Introduction
Rates of urbanization have been increasing and are associated with increasing shifts of populations to cities.The extent of the urban area has increased at twice the rate of population growth in developing countries [1].Urban expansion can be seen with large increase in areas covered with artificial impervious surfaces such as paved roads, parking lots, buildings and rooftops [2,3].Increases in impervious surface area (ISA) are associated with transitions from natural and agricultural land and have been found to have significant impacts on local environmental systems.For example, ISA prevents surface water penetration which affects local atmospheric, water and carbon cycles [4], and its expansion contributes to the degradation of biodiversity across local, regional, and global scales [5][6][7].Understanding the spatial and temporal characteristics of ISA expansion can inform strategies for sustainable urban development [8,9] and regular updates on the spatial distribution of ISAs in urban areas supports urban planning and land use policy [8].
Remote sensing data offer the opportunity to provide synoptic monitoring of urban ISAs but data selection frequently involves trade-offs between spatial (and thematic) detail, coverage and repeat frequencies.MODerate resolution Imaging Spectroradiometer (MODIS) data have been used to infer ISA extent and distribution [10][11][12] as they support frequent repeat surveys and land surface monitoring.Despite moderate spatial resolution, the high temporal frequency of MODIS has been used to monitor land cover [11][12][13][14][15][16].However, one of the critical issues using data at medium and coarse spatial resolutions is the mixed pixel problem [17,18].Pixels will typically contain a mix of land covers and the propensity for this is greater with medium and coarse spatial resolution data [2].Therefore, the way that any land cover class is defined and how classes are allocated are important considerations in traditional per-pixel analyses.In Boolean analyses, a threshold of land cover proportion is applied to generate crisp binary classifications: a pixel with a membership greater than the threshold level is defined as, for example, Class 1, and remainder as Class 2 [19].As the nature of such data changes according to the threshold, it is important to determine optimal thresholds for classifications but this is rarely done.Instead thresholds are typically pre-defined without any optimization strategy.To avoid with this issue, sub-pixel classification has been proposed [20][21][22][23].This approach seeks to identify the areal proportion of classes in each pixel [23,24].This has not yet been fully applied to a multi-temporal analysis of land cover dynamics.This research applies a spatio-temporal sub-pixel classification approach for the estimation of the ISAs in an annual basis and addresses its advances to compare results from per-pixel classification using a series of different thresholds of crisp classification.
Sub-pixel classification have been applied using approaches including fuzzy classification [25,26], spectral mixture analysis [27][28][29][30] and machine learning algorithms [8,20,24,31,32].Fuzzy classification explicitly acknowledges uncertainties regarding class membership and allows a degree of membership to each possible thematic class [22,33].Consequently each pixel can have partial memberships to multiple classes [26].However, ambiguous membership definitions may result in a failure to estimate class area proportions appropriately [19,34].Spectral mixture analysis assumes that the mixed spectrum is representative of land covers in a pixel and that the proportions of spectral component reflect the proportions of land covers in the pixel [28,35,36].Despite being a popular image processing technique, spectral mixture analysis is often not suitable for applying to coarse-resolution imagery within heterogeneous urban environments due to the difficulty of deriving pure endmembers in the training data [2,31].Machine learning algorithms have been shown to be reliable and efficient techniques for processing large data volumes and both per-pixel and sub-pixel analyses can be successfully implemented via regression or classification trees and associated ensemble techniques such as Random Forest (RF) [8,13,14,24,37] and support vector machines [37,38].Some studies have shown machine learning algorithms to yield better results than spectral mixture analysis for estimating proportional ISA [31,39]; Yuan et al. (2008) found that regression trees produced better results than spectral mixture analysis and Deng and Wu (2013) showed RF to better than spectral mixture analysis when sufficient random sampled reference data are available.
A number of studies have used RF approaches to classify land cover in per-pixel [40][41][42][43][44], sub-pixel [31,37,45], and spatio-temporal per-pixel approaches [13][14][15].In particular, Clark et al. (2010) applied a per-pixel RF classifier to MODIS time series to produce annual land cover mappings.They determined land covers from annual phenological metrics derived from Enhanced Vegetation Index (EVI) data and topographical metrics.Clark et al. (2012) extended this analysis using mean, standard deviation, minimum, maximum and range for EVI, and red, near infrared and mid-infrared reflectance values.These approaches focused combinations of variables so that the outputs yielded the maximum accuracy measures, however it is difficult to interpret the meaning of the classification in relation to the combination of different types of input variables.In this context, the selection of a simple and tractable set of variables would support wider multi-temporal analyses of land cover.
In summary, much research has suggested methods for quantifying annual land cover from multi-temporal data.However, sub-pixel approaches have not yet been applied in multi-temporal analyses to examine ISA despite their frequent use.In addition, pursuit of the highest accuracy of outputs may result in model opaquely parameterised by input variables and whose selection is complex.Therefore, this research produces annual ISA maps along with highlighting: (1) the application of a spatio-temporal sub-pixel classification; (2) the use of tractable input variables into the model; and (3) inconsistent results from per-pixel analyses when using different thresholds of land cover definitions.Using the Jakarta Metropolitan Area (JMA) as a case study, this research develops a sub-pixel classification of ISAs using MODIS EVI data using a RF regression during the period 2001-2013.It is assumed that it would be more insightful to use all of temporal values rather than aggregating them, especially when interpreting the classification of annual patterns of EVI.Thus, all 23 yearly values of the 16-days composite MODIS MOD09Q1 EVI are explored as input variables.
These time-series values are regarded as a proxy of annual phenological pattern of EVI and were used to estimate ISAs with spatially and temporally distributed reference data collected from available very high resolution (VHR) images in Google Earth (GE).Reference data were randomly selected at 1000 locations in the study area and the proportions of ISA were recorded for each over 13 years.This sub-pixel RF model produced 13 annual maps and inferred the proportion of ISA.Standard per-pixel analyses using a series of crisp reference data produced from different thresholds of ISA proportion were also generated to provide a comparison to the sub-pixel approach.These models demonstrated how the selection of such thresholds influences the estimated ISA.Along with these comparisons with per-pixel models, the sub-pixel model was proposed as an alternative approach for a spatio-temporal model.This approach overcomes the mixed pixel problem when estimating annual proportional ISA and can handle with a coarse spatial resolution of MODIS compared to higher resolution remotely sensed data such as Landsat and SPOT.

Study Area
In the JMA, ISA has rapidly increased over several decades [46][47][48].The JMA includes Jakarta, the capital city of Indonesia, and its suburban area and consists of the capital special province (DKI Jakarta), five municipalities (Tangerang, Tangerang Selatan, Depok, Bekasi, and Bogor) and three districts (Tangerang, Bekasi, and Bogor) (Figure 1).Its population agglomeration is one of the highest in the world [49] and it has experienced rapid urban expansion [46].This region had a population of 27.2 million people in 2011 [47] and covers a total area of 6659 km 2 .Although the metropolitan core is located in DKI Jakarta, which is the political and economic center of the country, the population in the suburbs has risen dramatically since the 1980s [47].While the finance, service and commercial sectors are still concentrated in DKI Jakarta, the manufacturing sector has expanded into suburban areas, resulting in land use transformations away from agricultural land into ISA [47].The population growth in the suburban areas has been considerable and the urban expansion has been less controlled compared to the DKI Jakarta [50].As this area is located in a tropical region, high cloud cover has prevented continuous observations of land cover dynamics using widely available optical data.

Data
This research used 16-days composite EVI data provided by the MOD13Q1 product (Version 5) observed by the MODIS instruments on board the TERRA satellite.This product is derived from observed reflectance data corrected for molecular scattering, ozone absorption and aerosols and has a 231.7 m pixel resolution over the JMA.EVI is considered to have high biomass sensitivity and vegetation monitoring capacity compared to the Normalized Difference Vegetation Index (NDVI) [51].As EVI was developed to measure of vegetation activity it can be used to estimate land covers to investigate annual characteristics of EVI change, because its temporal profile associated with vegetation activity can be regarded as useful land cover information [13][14][15]52].The temporal profile of EVI in ISA may confuse with water body due to low albedo from ground surface in non-vegetated areas, and such misidentification will lead to lower accuracy of the results.Thus apparent water surface shown in MODIS water mask product (MOD44W) in the same spatial scale of MOD13Q1 layers were masked from the study area.These masked areas are located on the northwest part and northeast edge of the study area, mainly representing as aquaculture ponds.Barelands that can be seen, including non-paved roads, are limited in the study area, thus they were not considered in this research.Pixels indicated with a good or marginal flag on the MOD13Q1 pixel reliability layer were extracted from original EVI time series resulting in a 36.3% total data loss due to cloud cover.Despite removing unreliable pixels, other studies have shown that noise may remain due to the atmospherically bias, surface anisotropic, and sensor problems [53].To deal with the interpolation of missing valued pixels and the reduction of such noise, the double logistic function provided by TIMESAT was applied to the data [53,54].This fits the data (in this case the EVI time series data) to smooth continuous asymmetric curves and has been found to be well suited for enhancing time-series patterns from data using existing reliable pixel values [54].As the aim was to produce annual ISA maps, the post-filtered EVI time series at each pixel was split into 13 annual patterns consisting of 16-days composited 23 values.
Reference data were collected at 1000 randomly chosen areas in the same size as MOD13Q1 EVI pixels (1.58% of the study area in total).The proportion of seven different land cover classes (ISA, Paddyfield, Cultivated, Trees, Grass, Bare, and Water) was recorded by a visual inspection of historical VHR images such as Quickbird provided by DigitalGlobe in GE using 5% intervals from 0% to 100%.With the selected 1000 squared lines surrounding the chosen areas overlayed in GE, seven land covers were visually interpreted from available historical VHR images in each year.Duplicated data in a same year at the same area were excluded.Land covers were not recorded when the VHR images were not found within the chosen areas.Reference data were double checked to minimize human errors.A number of other studies have built reference data from GE and have successfully overcame difficulties of collection of reference data [13][14][15][55][56][57][58].To focus on ISA, the other six land cover classes were summarized into Other class.While 13,000 samples (1000 grid ˆ13 years) were expected in total, in some places, in some years, historical VHR images were not available in GE.The final reference dataset included 4561 data points over 1000 locations and 13 years (Table 1).The reference data were then randomly divided into training and validation samples (50% each).

Analysis
This research used a sub-pixel RF classification.This model was compared with per-pixel RF classifiers.RF is an ensemble learning approach that aggregates regression or classification tree predictors such that each tree is individually trained on a bootstrapped sample of the training data with a random feature selection [40,44,[59][60][61].The random feature selection is conducted so as to decrease correlations between the trees.The individual trees constructed from selected training data are only used for analyses of regression/classification and remainders, which are called "out-of-bag" (OOB) data, are used for accuracy assessment [59].Results are determined so as to minimize the OOB error rate constructed by the predicted values and OOB data.The error rates are calculated from the mean of squared errors (MSE) for all trees for regression, or misclassification rates for classification.The results are produced by the average of the predicted values of all selected trees for regression or by the most categorized class (a majority vote) for classification [59].Using OOB bootstrap samples, outputs of RF produce the percentage of variance explained for regression and the error rate for classification.Variable importance which shows how each variable contributes to the predicted model are also estimated, measured by the total decrease in node impurities from splitting on the variable based on the averaged residual sum of squares for regression, or the Gini index for classification over all trees [60].
In this research, a RF regression was constructed using proportional reference data, which is: ‚ Model ISAsub: RF regression using ISA proportions as the dependent variable (sub-pixel classification approach) In order to classify the annual patterns of EVI, input variables were selected from the 23 annual post-filtered EVI values.As 299 EVI time series during 2001-2013 was split into 13 annual patterns consisting of 23 values, such variables can be regarded as EVI values in every 16 days (Day of year (DOY) 1, 17, . . ., 353) in a year.Variable selection from 23 variables were determined by the rf.modelSel function in rfUtilities package in R [62].The rf.modelSel function selects input variables in terms of a model improvement ratio (MIR) calculated from ranked importance of variables divided by the maximum value of the importance among all variables.Variables are only retained when its MIR is larger than a threshold, which is in the range 0-1, and the selected variables are finally determined when the percentage of variance explained is maximized (the MSE is minimized) among all tests [63].
To confirm the influence of each variable importance, relative importance measures of variables were calculated from each importance of variable divided by the maximum measure of importance among all 23 variables for Model ISAsub.Such measures tell the degree of contribution of variables to the model and it helps understanding of the model.
RF classifiers using crisp reference data created by different thresholds of ISA proportion were also assembled for the comparison with the sub-pixel model.To investigate the differences caused by different thresholds in per-pixel analyses, three thresholds were applied: 25%, 50%, and 75%.Thus the models examined by per-pixel approach were: ‚ Model ISAp25: RF classifier using the Boolean data defined as greater than 25% ISA in the reference pixel (per-pixel classification approach) ‚ Model ISAp50: RF classifier using the Boolean data defined as greater than 50% ISA in the reference pixel (per-pixel classification approach) ‚ Model ISAp75: RF classifier using the Boolean data defined as greater than 75% ISA in the reference pixel (per-pixel classification approach) Annual ISA maps for each year from 2001 to 2013 were predicted from the results of each model.The RF analyses were undertaken in R using the randomForest package [60].The RF models require some initial parameters to be set: the number of trees (ntree) and the number of variables to be randomly sampled as candidates at each tree (mtry).The larger the number of ntree parameter, the more stable the outcome becomes, although the calculation time increases.In this case, it was set to 5000.The randomForest package includes a function (tuneRF) to determine the optimal number of mtry which suggested mtry = 4 in all models.Other parameters are used in the default setting in the randomForest function.
The RF outputs contain assessments of the model performance by OOB error rate for classification.Although the OOB error rate seems to be unbiased [59], several previous studies found such rate often tends to overestimate compared with a holdout validation assessment in some conditions [64,65] and did so in this analysis.It is important that users evaluate the utility of the mapped outputs for their intended applications [66] and thus traditional accuracy assessments were implemented using holdout samples of the validation data.R-squared and the percentage of root mean square error (%RMSE) of the outputs from Model ISAsub were calculated from the validation data as the accuracy measures.For Models ISAp25, ISAp50, and ISAp75, a confusion matrix was constructed and user's, producer's, and overall accuracies were calculated.Those assessments are reported as a model assessment and map-based assessment for each.In order to estimate the total area of ISA from Model ISAsub, the area size of ISA was calculated by multiplying the area of impervious pixel by the proportion of the ISA in each pixel: where Area is the total area of ISA, n is the number of pixels estimated as ISA, I was the area of the pixel estimated as ISA, and p is the ISA proportion predicted by Model ISAsub.The simple approach of counting the pixels classified as ISAs [13] was utilized for Models ISAp25, ISAp50, and ISAp75.The total amount of ISA in the annual maps produced by each model should not show oscillatory or irregular change [14].Therefore the temporal trends in total ISA during the period 2001 to 2013 were evaluated using a simple regression approach (e.g., [13]).A flowchart of this overall approach is shown in Figure 2.

Model Performance with Different Input Variables
The selection of input variables was investigated by rf.modelSel function for sub-pixel analysis, resulting the highest percentage of variance explained (58.5%) when using all 23 variables (Table 2).Thus, all 23 variables were used for latter procedures.Multi-collinearity among the 23 variables was not found when investigated using the multi.collinearfunction in rfUtilities package as well [62].
For further comparison, two types of input variables were considered as: (i) all 23 variables; and (ii) four phenological indices summarized as minimum, maximum, mean, and standard deviation of 23 values for Model ISAsub, ISAp25, ISAp50, and ISAp75.Table 3 showed that models using 23 variables yielded higher variance explained for Model ISAsub and lower OOB error rate for Model ISAp25, ISAp50, and ISAp75, than those using four phenological indices.It suggests all models using 23 variables showed the best performance.

Sub-Pixel Classification
The annual maps of ISA proportion from Model ISAsub are shown in Figure 3.The distribution of areas with a high proportion of ISA can be found around DKI Jakarta, as well as the center of Tangerang, Bekasi, and Bogor municipalities.Gradual expansion of such areas can be seen in the peripheral areas of those urban cores over 13 years.Relative importance measures of input variables were shown that 6th-17th variables (DOY 80-256), indicating over 0.5 of relative importance measures, highly contribute to the model, while others do not (Figure 4).The 15th variable (DOY 224) is the highest importance among all variables and the 10th variable (DOY 144) is the second, while 22nd variable (DOY 336) is the lowest and 21st (DOY 320) is the second lowest.For the model assessment, the R-squared is 0.60 and %RMSE is 20.31.The hexagon plot shades each cell by the number of data points in those value ranges (Figure 5) and indicates variations between validation and predicted data.Many predicted values tend to underestimate at larger ISA proportions, while overestimate at smaller ISA proportions compared to ground values.The annual accuracy assessments of produced maps shown in Table 4 indicate that the R-squared ranged from 0.36 to 0.69, and %RMSE was from 16.39 to 23.88.Significant statistical relationships between variations of R-squared and %RMSE and between such variations and numbers of validation data were not found.

Per-Pixel Classification
Annual per-pixel ISA maps with over 25%, 50%, and 75% coverage proportion were produced from Models ISAp25, ISAp50, and ISAp75, respectively.These data are overlayed and shown together in Figure 6.It is observed that ISA from Model ISAp25 is widely distributed more than ISA from Model ISAp50, and then ISAp75.As with the sub-pixel classification model in Figure 3, most of the ISA from Model ISAp75 is distributed around the DKI Jakarta and the core of municipalities, while ISA from Model ISAp25 are widely distributed in the peripheral areas.The gradual expansion of ISA over time can be also observed in all models.
The results of the model assessment during 2001-2013 are shown in Table 5. Overall accuracies increase as levels of threshold increase; accuracies for ISAp25, ISAp50, and ISAp75 were estimated to be 0.80, 0.86, and 0.89, respectively.Regarding category-level accuracy measures, the user's and producer's accuracies of ISA class decrease as the thresholds increase.Accuracy of the Other class have the opposite trend compared to ISA: namely that the user's and producer's accuracies of Other increase as such thresholds increase.All of the accuracy measures of Other are higher than for ISA.
The results of the annual accuracy assessments are shown in Table 6.The user's and producer's accuracy of ISA for three models in 2007 were calculated but from insufficient numbers of categorized data as ISA in both validated and predicted data.Model ISAp75 yields the highest overall accuracy among three models, except in 2011, while yields the lowest user's and producer's accuracy of ISA except user's accuracy in 2002, 2003, and 2008 and producer's accuracy in 2001 and 2007.Model ISAp25 often yields the highest user's and producer's accuracy of ISA class among three models although such trend is not straightforward.No significant statistical relationships were found between the validation data and accuracy measures.

Estimation of Total ISA
The total area size of ISA was estimated for each year in the study area for the period 2001-2013.Figure 7 shows the results of the sub-pixel classification for Model ISAsub along with those of the per-pixel classification for Model ISAp25, ISAp50, and ISAp75.All of the estimations from Models ISAsub, ISAp25, ISAp50, and ISAp75 indicate a gradual increasing trend of ISA.The ISAs determined from sub-pixel classification in Model ISAsub showed a gradual increase from 1442 km 2 in 2001 to 1720 km 2 in 2013, while those from the per-pixel classification varies according to the threshold in Models ISAp25, ISAp50, and ISAp75.The largest ISAs was found in Model ISAp25 (1736 to 2389 km 2 ), while, the smallest was found in Model ISAp75 (436 to 739 km 2 ).The ISA from Model ISAp25 was 1446 km 2 larger than that of Model ISAp75 on average during the study period as a result of the different threshold.Rates of increase from Models ISAsub, ISAp25, ISAp50, and ISAp75 were 30.65,61.61, 48.14, and 32.77 (km 2 /year), respectively.

Sub-Pixel Classification for Annual Proportional ISA Mappings
Previous studies have not identified ISA extent and its distribution annually in this study area.This is partly because of the lack of availability of sufficient high-resolution image data due to high levels of cloud cover, which commonly prevents the observation of ISAs, especially in tropical regions [67].This research sought to overcome this issue by classifying annual pattern of post-filtered EVI.Although MODIS has a coarse spatial resolution (231.7 meters in this case), annual land cover proportion was estimated at each location by adopting a sub-pixel classification.The Model ISAsub results show the spatial and temporal trends of ISA distribution (Figure 3).In 2001, its extent was concentrated around the urban core.Since then ISA has increased gradually at the pace of 30.65 km 2 per year (Figure 7).This is the first time that the anecdotal knowledge of the spatio-temporal trends associated with ISA expansion in the JMA has been confirmed.
Model ISAsub describes distributions of ISAs continuously and its extents correspond to those of Landsat-based ISA maps analyzed by Rustiadi et al. (2012) (Figure 8a,b)) [68].Figure 8c [68].Our results inferred 1409 km 2 and 1674 km 2 of total ISAs in 2005 and 2010, respectively, from the sub-pixel Model ISAsub using MODIS images, being in good agreement with the previous study, while per-pixel models indicated larger disagreements to previous studies (Figure 7).The Model ISAsub showed reasonable results compared to per-pixel approaches, although some improvements for minimizing the error should be considered.Figure 5 shows predicted values tend to overestimate to some extent where the ISA proportion is low, while underestimation where the ISA proportion is high.As areas of relatively low ISA proportion are dominated surrounding the city core, such areas may overestimate the degree of ISAs.It is, however, actual ISA such as settlements and roads in rural areas in this region that are often seen surrounded by other land covers (e.g., agricultural land or tree covers).Such ISAs can be represented by the Model ISAsub, while not by other per-pixel models because per-pixel models neglect small ISAs according to their thresholds of ISA proportion.Thus, although there remain some errors of results causing over-/under-estimation of ISAs, sub-pixel approach can be emphasized as a useful way of estimating ISAs.Quantifying land cover changes using produced ISA maps relies on high accuracy to avoid error-to-error inferences [69].Table 4 shows R-squared and %RMSE of each map are not high significantly and it would make difficult to focus on annual change in ISA pixel-by-pixel.Figure 7 shows abrupt decreases in 2001-2002, 2003-2005, 2006-2007, 2008-2009, and 2012-2013 under Model ISAsub.Such trends have been reported in previous studies [3,13,14].As ISA is rarely converted into non-ISA, this unexpected trend may come from modeling errors.Minimizing such errors is always a challenge.However, it is possible to conclude in this research that ISA has been gradually increasing during 2001-2013.Developing highly accurate maps is a challenge that needs to be addressed to support future land cover change analyses.

Classification of Annual Patterns of EVI Time Series
This research analyzed the annual patterns of post-filtered EVI time series under the assumption that such patterns would adequately describe ISA.The classification of temporally aggregated phenological data and several other input variables can make the models difficult to interpret.We recognize that the annual patterns of EVI may be influenced by the local impacts of external factors and anomalies such as extreme weather events, haze, floods and El Nino and such impacts may change annual patterns of EVI.As a result, these factors may have impacts on the results and thereby provide limitations to this research.For example, phenological indices were extracted from annual patterns of EVI which may lack meaningful information for the classification and accuracy measures of a model incorporating four phenological indices (Table 3).However, as the reliability of the variables is indicated in Figure 4, variables during DOY 96-272 highly contribute to the model and this period corresponds to dry season (approximately from April to October, while the rainy season from November to March) in this region.Considering such climate condition by using all 23 variables would improve the result.Thus, this approach of classifying annual patterns of EVI time series data (in this case 23 variables) is simple, tractable and understandable in the context of land cover dynamics.

Comparison with Per-Pixel Analyses
Per-pixel classifications result in different area estimations because of the different way that they "define" ISA: Model ISAp25 uses an threshold of 25% of ISA proportion in the reference pixel and overestimates the ISA, and Models ISAp50 and ISAp75, using thresholds of 50% and 75% of ISA in the reference pixel, respectively, underestimate the cover, when compared to the sub-pixel classification of Model ISAsub (Figure 7).This variation suggests that it is important to consider this threshold problem when applying per-pixel approaches to monitor land cover dynamics, but there is no particular way to overcome this issue.
When the per-pixel classification results are considered, reasonably high overall accuracies for the three models were found (Table 5).While overall accuracies increased with the threshold of ISA (Model ISAp25 < ISAp50 < ISAp75), the user's and producer's accuracies of ISA class show the opposite trend: Model ISAp25 > ISAp50 > ISAp75.This indicates the contribution that Other makes to the overall accuracies.These tendencies may be due to the nature of the random sampling procedures which produce high accuracies with widely distributed classes in the study area.Accuracy measures of user, producer, and overall were not shown to have a significant statistical relationship with the number of reference data points, with total area size of estimation, or with their intra-year variations.This implies that changes of accuracy are not associated with temporal changes and thus the analytical focus should be on the assessment of each thematic map.As this research aimed to produce annual ISA maps, the outputs of model ISAp25 should be selected as the best among three per-pixel models in terms of high user's and producer's accuracy of ISA class.However, it is inevitable that estimated ISA pixels contain anticipated other land covers (75% at maximum) and the observation of ISA from per-pixel analyses might not be straightforward.Although per-pixel analyses are commonly employed due to their simplicity, and the ease with which they can be extended to a multi-class analysis, per-pixel analyses fail to take into account of the effects of mixed land cover [28] and such effects increase when using coarse spatial resolution data.

Reference Data
The classification results and their accuracy assessments depend on the reference data.In this instance, ground-based reference data collection by field survey is not feasible because of the practical difficulty of investigating large amounts of randomly sampled areas at different time steps [22].Annual land cover can be recorded using historical VHR images such as Quickbird in GE [14], although lower volumes of VHR images were found in unpopulated areas and remote areas as well as sufficient historical VHR data were not available in GE for some years (e.g., data were relatively sparse in 2001, 2002, 2007, and 2008).Several recent studies have reported the geo-referencing errors of VHR images in GE to be in the range of 1.8-5.0m RMSE [70][71][72].While the geo-referencing accuracy should be taken into account for all VHR images, this is impractical and any errors would not affect our results severely because the coarse (231.7 m) resolution data was used.
Although GE is a useful historical VHR database for building reference data, a potential limitation still remains in the way that the class proportions were allocated in each location as this was by human interpretation.It is inevitable because users are not permitted to extract or to digitize VHR images in GE [73].In order to demonstrate a novel method in this research, sub-pixel land cover proportions in the reference data were determined using intervals of 5%, manually interpreted from VHR images in GE.Finer proportions may be preferable (e.g., at 1% intervals) for machine learning approaches to sub-pixel mapping using reference data classified from higher resolution image data.While the use of better reference data would support a stronger understanding of ISAs and more reliable ISA information, no reference data are perfect matches with the information that they seek to calibrate (model) and validate and accordingly, other researchers have used aggregated reference data e.g., at 5% [56] or 10% intervals [74] for sub-pixel mappings.
Reference data can never perfectly match the epistemology and ontology of the predicted or modeled data.The implications are that remote sensing analyses never use perfect reference data to train or validate remote sensing products [75,76].The end result is that such analyses have to assume that the reference and classified image data have semantic meanings that match, and seek to minimize the mismatches.The manual interpretation of historical GE imagery in this work has sought to overcome these semantic differences and to accommodate the fact that availability of historical VHR images in GE varied spatially and temporally.
In summary, despite the potential benefits of building reference data from visual allocation of land cover proportion from VHR images in GE over space and time, there are some inevitable limitations of spatially and temporally unbalanced availability of VHR images, and misallocation by human interpretation.The results of this research would be further improved when those limitations are overcome.

Future Work
Future work will be in a number of areas based on the results of this research.First, recent developments in accuracy assessments have yet to be applied to temporal analysis of land cover change.R-squared and %RMSE are convenient indices representing the goodness of regression models, but they indicate only global measures of accuracy in contrast to spatially explicit accuracy assessment techniques (e.g., [55,[77][78][79]).Future work will extend such approaches into spatio-temporal accuracy assessments.Second, stratified sampling scheme would be utilized in a spatio-temporal model, while our pre-analysis did not show a better result using this scheme (R-squared is 0.53).A larger number of reference data would be required for its sufficient implementation because the stratification should be considered in terms of both ISA proportion and obtained years.Our pre-analysis employed by stratified sampling may result in failure due to the lack of sufficient numbers of reference data.Evaluation of a spatio-temporal model using random and stratified sampling would be helpful for further improvement of ISA mapping.Thirdly, annual ISA maps of finer spatial resolution by a spatio-temporal sub-pixel classification model using such as multi-temporal Landsat images are expected.Producing annual proportional ISA mappings at a high spatial resolution is still a challenge.Sexton et al. (2013) inferred annual proportional ISAs in an east coast region of US from 1984 to 2010 using planimetric map as a referenced data [3].They applied individual models for each year, and the correlation coefficient was 0.64 (R-squared was 0.41).Although their approach seems applicable to other regions, it is still uncertain whether it is a feasible way in our study area, because of higher cloud covers in images compared to US [67], and unavailability of detailed referenced map.Moreover, the accuracy of the model was less tolerable than that of our model.It is expected to overcome these difficulties.Lastly, this research considered ISA because it is a fundamental component of urban land cover in this region.However, other vegetated land covers such as trees, home gardens, and parks are also important urban components, especially in the context of sustainable urban developments and their identification and analysis would support wider spatial planning initiatives: in the JMA urban green space development and consideration of local environmental issues have been prominent [80].A detailed understanding of urban land cover dynamics would support such initiatives.

Conclusions
This research produced annual maps showing the proportional impervious surface areas.A sub-pixel, proportional classification was successfully applied to annual patterns of MODIS EVI data using a RF regression that was trained on proportionally sampled reference data over space and time.The sub-pixel classification approach was found to be a more robust method compared to per-pixel classification because of the variability in the mapped outputs associated with the application of different thresholds of ISA proportion in the training data.This approach demonstrates the utility of analyzing proportional land cover mappings over space and time under conditions of rapid urban expansion.Although the spatial resolution of MODIS is coarse compared to higher resolution remotely sensed data (Landsat or SPOT), the proposed approach of estimating multi-temporal proportional ISA by sub-pixel classification provides an effective way to manage the mixed pixel problem and to provide reliable measurements of ISA.This approach is generic and can be applied in other regions and for other types of land covers.

Figure 2 .
Figure 2. A flowchart of this research.

Figure 4 .
Figure 4. Relative importance measures of variables for Model ISAsub.

Figure 5 .
Figure 5.The relationship between observed and predicted proportions of ISA under Model ISAsub.

Figure 7 .
Figure 7. Estimation of total ISA by sub-pixel classification (Model ISAsub) and per-pixel classification (Model ISAp25, ISAp50, and ISAp75) and their linear trends during the period 2001-2013.
showed change in ISAs from 2005 to 2010 by Model ISAsub and by Landsat-based classification.It indicates considerable developments of ISAs in surrounding areas of DKI Jakarta.Losses of ISAs are shown mainly in northern costal areas from results of Model ISAsub, while any losses are not found in Landsat-based ISA maps.Aquaculture ponds are widely seen in such areas and these losses may be caused by misclassification of ISAs due to the similarity of temporal EVI profile with water body.Rustiadi et al. (2012) estimated 1501 km 2 and 1731 km 2 of total ISAs in this region in 2005 and 2010, respectively by Landsat-based land cover classification

Table 1 .
Number of reference data in each year.

Table 2 .
Variable selection by rf.modelSel function for sub-pixel analysis.

Table 3 .
Comparison of results using annual 23 values of enhanced vegetation index (EVI) and extracted four phenological indices for Model ISAsub, ISAp25, ISAp50, and ISAp75.

Table 4 .
Annual R-squared and the percentage of root mean square error (%RMSE) of Model ISAsub during 2001-2013.

Table 6 .
The overall, user's, and producer's accuracies of ISA class in annual maps from Model ISAp25, ISAp50, and ISAp75, with the highest values in bold among each accuracy in each year.