Long-Term Annual Mapping of Four Cities on Different Continents by Applying a Deep Information Learning Method to Landsat Data

Urbanization is a substantial contributor to anthropogenic environmental change, and often occurs at a rapid pace that demands frequent and accurate monitoring. Time series of satellite imagery collected at fine spatial resolution using stable spectral bands over decades are most desirable for this purpose. In practice, however, temporal spectral variance arising from variations in atmospheric conditions, sensor calibration, cloud cover, and other factors complicates extraction of consistent information on changes in urban land cover. Moreover, the construction and application of effective training samples is time-consuming, especially at continental and global scales. Here, we propose a new framework for satellite-based mapping of urban areas based on transfer learning and deep learning techniques. We apply this method to Landsat observations collected during 1984–2016 and extract annual records of urban areas in four cities in the temperate zone (Beijing, New York, Melbourne, and Munich). The method is trained using observations of Beijing collected in 1999, and then used to map urban areas in all target cities for the entire 1984–2016 period. The method addresses two central challenges in long term detection of urban change: temporal spectral variance and a scarcity of training samples. First, we use a recurrent neural network to minimize seasonal urban spectral variance. Second, we introduce an automated transfer strategy to maximize information gain from limited training samples when applied to new target cities in similar climate zones. Compared with other state-of-the-art methods, our method achieved comparable or even better accuracy: the average change detection accuracy during 1984–2016 is 89% for Beijing, 94% for New York, 93% for Melbourne, and 89% for Munich, and the overall accuracy of single-year urban maps is approximately 96± 3% among the four target cities. The results demonstrate the practical potential and suitability of the proposed framework. The method is a promising tool for detecting urban change in massive remote sensing data sets with limited training data.


Introduction
Urban expansion creates global environmental challenges that affect both developed and developing countries [1].Although urban areas cover only a small fraction of the Earth's surface (less than one percent [2]), rapid urbanization exerts disproportionate influences on biodiversity, hydrological systems and watersheds, biogeochemical cycles, and climate [3][4][5][6].Owing to its important contributions to local and global anthropogenic change, urbanization attracts considerable attention among researchers and policymaker concerned with sustainable management of the built and natural environments.It is therefore imperative to monitor long-term changes in the spatial extent of urban areas, and particularly the impervious surfaces associated with these areas.Previous studies have exploited a variety of remotely sensed data to detect urban extent at a range of spatial and temporal scales [7][8][9][10].
Optical remote sensing data are available for urban (impervious surface) mapping.For example, Landsat imagery collected by Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+), and Operational Land Imager (OLI) instruments are provided at global scale with fine spatial resolutions and much useful spectral information.Long time series constructed using Landsat images have already been demonstrated to be an effective data source for mapping urban areas worldwide [11][12][13][14][15].However, the spatiotemporal complexities of urbanization present immense challenges for timely urban mapping across seasons.Temporal nonlinearity and spatial heterogeneity in land use dynamics arise from a variety of complex interactions with the socioeconomic and ecological environment, many of which relate to the seasonal cycle [16].Multiple studies have used spectral information in the Landsat time series to monitor urban change and evaluate its causes and consequences.For example, Yang et al. [17] produced a percent impervious surface cover dataset at 30-m resolution for the United States for inclusion in the 2001 National Land Cover Database (NLCD) Land Cover Collection [18].Schneider [19] used Landsat TM and ETM+ data to study the expansion of 143 Chinese cities covering seven periods from 1978 to 2010.Sexton et al. [20] and Song et al. [14] used Landsat records to explore changes in the coverage of impervious surfaces in the Washington, D.C.-Baltimore, MD metropolitan region from 1984 to 2010.Li et al. [21] mapped urban areas at annual intervals over the period 1984-2013 in Beijing, China, and introduced a temporal consistency-check strategy that improved classification accuracy.Feyisa et al. [22] reported landscape patterns and associated changes between 1985 and 2010 in Addis Ababa, Ethiopia.
Despite the extensive research around urban mapping, fundamental issues remain.Uncertainties associated with the scarcity of training samples and the amplitude of spectral variance in long term datasets like Landsat continue to limit applications of these datasets [23].It is therefore necessary to develop efficient, generalizable frameworks for extracting urban data for urban mapping.Noise due to radiation variance and the labor-intensive collection of training samples for each new target area inhibit the application and further development of urban mapping, especially when dealing with long term changes.Uncertainties associated with single-scene retrievals reduce the accuracy and temporal resolution of long time series [24,25].Although annual maps of urban areas merged from multiple-scenes collected over the course of a year are more stable than single-scene, controlling for spectral variances remains an issue for urban mapping at all timescales.Overcoming this issue is important for expanding and improving the applicability of satellite imagery for mapping changes in urban areas over long periods.Moreover, traditional approaches to monitoring spatiotemporal patterns of urban development require large numbers of training samples in all temporal target periods [21,26].These training-intensive approaches are extremely costly.The transferability of information derived from training data is a crucial step toward more effective, general approaches to urban mapping, and particularly for achieving the high time resolutions needed to monitor rapid urban changes at large scales.
To date, many studies have reported the potential of deep learning for providing this type of transferability [27][28][29], including in the application of large remote sensing datasets to track change in the human environment [30].A key breakthrough in machine learning, deep learning can be used as an implicit general approach to dealing with large-scale problems [31].Deep features have been demonstrated to be high-level, and can be transferred to accelerate learning in different but related target domains.Jean et al. [30] used a deep learning method to extract large-scale socioeconomic indicators from high-resolution satellite imagery and nighttime light intensities.Their results illustrate the informative nature and transferability of deep features.Lyu et al. [32] demonstrated that deep features can be exploited to detect changes more effectively and efficiently, with application of a previously-trained deep learning method to detect land cover changes in new target datasets.Several studies have examined the advantages and limitations of deep-feature transferability.Yosinski et al. [28] showed that transferability in deep networks can decrease with increasing distance between the training and target task, and this transferability arises from the tendency to learn generalized deep features with wide applicability.Among deep learning methods, recurrent neural networks (RNNs) have gained attention for providing solutions to challenging problems involving sequential time series data.
In this paper, we develop a new urban extraction framework based on a recurrent neural network and demonstrate the potential for deep learning to improve generalization in urban mapping applications.We select four cities (Beijing, New York, Melbourne, and Munich) as study areas and classify all Landsat scenes in these four cities during 1984-2016 at pixel level.These four target cities share similar climate zones, broadly defined by temperate continental climate conditions.Leveraging this underlying similarity, we proceed to define and discuss the general transferability of information derived from limited training data in urban areas with temperate continental climates.been demonstrated to be high-level, and can be transferred to accelerate learning in different but related target domains.Jean et al. [30] used a deep learning method to extract large-scale socioeconomic indicators from high-resolution satellite imagery and nighttime light intensities.Their results illustrate the informative nature and transferability of deep features.Lyu et al. [32] demonstrated that deep features can be exploited to detect changes more effectively and efficiently, with application of a previously-trained deep learning method to detect land cover changes in new target datasets.Several studies have examined the advantages and limitations of deep-feature transferability.Yosinski et al. [28] showed that transferability in deep networks can decrease with increasing distance between the training and target task, and this transferability arises from the tendency to learn generalized deep features with wide applicability.Among deep learning methods, recurrent neural networks (RNNs) have gained attention for providing solutions to challenging problems involving sequential time series data.

Study Areas
In this paper, we develop a new urban extraction framework based on a recurrent neural network and demonstrate the potential for deep learning to improve generalization in urban mapping applications.We select four cities (Beijing, New York, Melbourne, and Munich) as study areas and classify all Landsat scenes in these four cities during 1984-2016 at pixel level.These four target cities share similar climate zones, broadly defined by temperate continental climate conditions.Leveraging this underlying similarity, we proceed to define and discuss the general transferability of information derived from limited training data in urban areas with temperate continental climates.

Preprocessing
All available Landsat data with cloud cover less than 10% are chosen for inclusion in this study.These data, which have been obtained from the U.S. Geological Survey (USGS: http://earthexplorer. usgs.gov/),include imagery collected by Landsat 4/5 TM, Landsat 7 ETM+, and Landsat 8 OLI data.The target data are seasonal composites (for boreal the spring, summer, autumn, and winter) based on Landsat images from 1984-2016.All locations and years are included in the analysis except for Melbourne in 1984-1985, and Munich in 1993-1996, which are excluded due to lack of data.The Landsat 4/5 TM and Landsat 7 ETM+ datasets have been processed in a uniform way using the Landsat Ecosystem Disturbance Adaptive Processing System provided by USGS [33].Images from the Landsat 8 OLI datasets have been processed using the Level-1 T Product Generation System (LPGS) in the USGS Earth Explorer interface.Only the six spectral bands from Landsat 8 OLI that correspond to spectral bands from Landsat 4/5 TM and Landsat 7 ETM+ are chosen to test transferability.Bands 2-7 of Landsat 8 OLI are chosen to correspond to bands 1-5 and band 7 in Landsat 4/5 TM.The F-mask (function of mask) algorithm developed by Zhu et al. [34] is used to reduce the influences of cloud cover and cloud shadow.

Training Data Collection
Temporal distributions of the Landsat images used for the four regions are shown in Figure 2. The total number of scenes is 304 for Beijing, 231 for New York, 168 for Melbourne and 79 for Munich.A large set of training samples is needed to construct a supervised method for extracting urban land cover from satellite imagery, including the deep learning we use in this work.It is also desirable for this training data set to contain samples from all seasons.To construct a suitable test of transferability within this framework, the training samples should be drawn exclusively from one location and limited to a relatively short time period.Here, we use Landsat imagery for Beijing during 1999.This set of images comprises 14 scenes without cloud contamination distributed relatively homogeneously with the year.The training data are selected from within these scenes, with the remaining data designed for classification.The training samples have been labeled through visual inspection of Landsat images aided by higher-resolution imagery in the Google Earth archive [12,26].According to the work of Schneider et al. [26], we define urban land sites for which the surface is predominantly impervious, including all non-vegetative, human-constructed elements (e.g., roads and buildings).Here, the term "predominantly" means that built environments with impervious surfaces cover more than 50% of the pixel.In addition to urban land, land cover categories (for vegetation, bare land and water) are considered.Bare land is defined as a natural environment dominated by exposed soil, sand, gravel, or rock, with little or no vegetation [12,21], which is more difficult to distinguish from urban pixels than water or vegetation.In total, 777,090 training samples (comprising 243,712 urban samples, 140,343 vegetation samples, 229,497 bare land samples, and 163,538 water samples) are manually identified from among the 14 scenes collected for Beijing during 1999.

Validation Data Collection
We perform two experiments (that evaluate temporal transferability and spatial transferability respectively, using the same training samples (manually labeled training data from Beijing during 1999).For the temporal transfer experiments, the target data are images of Beijing from 1984-2016.For the spatial transfer experiments, the target data are images of New York, Melbourne or Munich from 1984-2016.We provide quantitative assessments of the overall accuracies of the classification and detected changes to validate the performance of the proposed framework within these two experiments.
To validate temporal transferability, two separate test datasets are constructed to assess single year classifications and change detection.Single-year classifications are independently evaluated for images of Beijing collected during 1984, 1994, 2004, and 2014.All scenes within these four years are included in the evaluation.Within each selected year, 20,000 test samples are randomly selected from among known urban (10,000 units) and non-urban (10,000 units) regions.These test samples are used to assess the accuracy of urban mapping both within each individual year and merged set based on the four selected years.All validation samples are evaluated by visual inspection of Landsat images aided by examination of higher resolution images from Google Earth [35].The Google Earth images provide high resolution land-cover information for tracking urban expansion at pixel level in Landsat images.Test samples for assessing the detection of changes are acquired from the work of Li et al. [21], which are exactly the same with the work of Li et al. [21] (see this publication for additional details).A total of 100 test samples are randomly selected from among rapidly developing regions between 1984 and 2013.We extend these data with information from 2014-2016 following the same strategy of the work of Li et al. [21].
To validate spatial transferability, both single year classification and change detection are evaluated for the other three cities (New York, Melbourne, and Munich).For the single year classification assessment, we conduct independent evaluations of all scenes from four selected years (1986, 1997, 2004, and 2014).A total of 20,000 stable test samples are randomly selected in each city, equally distributed among urban (10,000 units) and non-urban (10,000 units) sites.As above, the latter distinction is achieved by visual inspection of Landsat images supported by Google Earth imagery at a spatial resolution of 0.75 m [36].For the change detection assessments, separate sets of test samples are selected for each of the three cities following the strategy employed by Li et al. [21].For New York, 100 sample locations are selected by randomly sampling inform among known

Validation Data Collection
We perform two experiments (that evaluate temporal transferability and spatial transferability respectively, using the same training samples (manually labeled training data from Beijing during 1999).For the temporal transfer experiments, the target data are images of Beijing from 1984-2016.For the spatial transfer experiments, the target data are images of New York, Melbourne or Munich from 1984-2016.We provide quantitative assessments of the overall accuracies of the classification and detected changes to validate the performance of the proposed framework within these two experiments.
To validate temporal transferability, two separate test datasets are constructed to assess single year classifications and change detection.Single-year classifications are independently evaluated for images of Beijing collected during 1984, 1994, 2004, and 2014.All scenes within these four years are included in the evaluation.Within each selected year, 20,000 test samples are randomly selected from among known urban (10,000 units) and non-urban (10,000 units) regions.These test samples are used to assess the accuracy of urban mapping both within each individual year and merged set based on the four selected years.All validation samples are evaluated by visual inspection of Landsat images aided by examination of higher resolution images from Google Earth [35].The Google Earth images provide high resolution land-cover information for tracking urban expansion at pixel level in Landsat images.Test samples for assessing the detection of changes are acquired from the work of Li et al. [21], which are exactly the same with the work of Li et al. [21] (see this publication for additional details).A total of 100 test samples are randomly selected from among rapidly developing regions between 1984 and 2013.We extend these data with information from 2014-2016 following the same strategy of the work of Li et al. [21].
To validate spatial transferability, both single year classification and change detection are evaluated for the other three cities (New York, Melbourne, and Munich).For the single year classification assessment, we conduct independent evaluations of all scenes from four selected years (1986, 1997, 2004, and 2014).A total of 20,000 stable test samples are randomly selected in each city, equally distributed among urban (10,000 units) and non-urban (10,000 units) sites.As above, the latter distinction is achieved by visual inspection of Landsat images supported by Google Earth imagery at a spatial resolution of 0.75 m [36].For the change detection assessments, separate sets of test samples are selected for each of the three cities following the strategy employed by Li et al. [21].For New York, 100 sample locations are selected by randomly sampling inform among known change regions, previously identified using the NLCD [37] and verified by visual inspection of Google Earth images.For Melbourne and Munich, 100 sample locations are selected via a similar random sampling drawn from change regions identified using visual inspection of Google Earth images and continuous analog Landsat sequences.3).The second is an online optimization phase, in which the target samples are used to fine-tune the initial network and increase its specialization be for the target images (as shown by the dotted black arrow).The fine-tuned network is then used to construct the final classification map for the target data.Through these steps, annual records of the distribution of impervious surface can be extracted and merged for further analysis.

Method
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 24 change regions, previously identified using the NLCD [37] and verified by visual inspection of Google Earth images.For Melbourne and Munich, 100 sample locations are selected via a similar random sampling drawn from change regions identified using visual inspection of Google Earth images and continuous analog Landsat sequences.3).The second is an online optimization phase, in which the target samples are used to fine-tune the initial network and increase its specialization be for the target images (as shown by the dotted black arrow).The fine-tuned network is then used to construct the final classification map for the target data.Through these steps, annual records of the distribution of impervious surface can be extracted and merged for further analysis.This urban extraction framework exploits deep feature representations that distinguish urbanized areas from the non-urban background.Knowledge transfer can help to generalize the application of this framework to urban mapping by avoiding issues related to the scarcity of training samples.According to [30,38], transfer learning can be defined as follows.Consider a domain = { , Ρ( )} that contains a feature space and a marginal probability distribution Ρ( ).For example, in this work, our learning task is land cover classification.Each term is a spectral feature, with the space containing all term vectors, the ith term vector corresponding to a specific land cover distribution, and the full set of learning samples.Within a specific domain, a task = { , (•)} contains a label space and a predictive function (•).The latter is not observed but can be learned from analysis of the training samples.The training samples consist of pairs { , } , where ∈ and ∈ .The function (•) then relates each term to its corresponding label [30,39].Given the source domain (training data) and learning task along with a target domain (test data) and learning task , transfer learning aims to streamline This urban extraction framework exploits deep feature representations that distinguish urbanized areas from the non-urban background.Knowledge transfer can help to generalize the application of this framework to urban mapping by avoiding issues related to the scarcity of training samples.According to [30,38], transfer learning can be defined as follows.Consider a domain D = {χ, P(X)} that contains a feature space χ and a marginal probability distribution P(X).For example, in this work, our learning task is land cover classification.Each term is a spectral feature, with χ the space containing all term vectors, χ i the ith term vector corresponding to a specific land cover distribution, and X the full set of learning samples.Within a specific domain, a task T = {Y, f (•)} contains a label space Y and a predictive function f T (•).The latter is not observed but can be learned from analysis of the training samples.The training samples consist of pairs {χ i , y i }, where χ i ∈ X and y i ∈ Y.The function f T (•) then relates each term to its corresponding label [30,39].Given the source domain (training data) D s and learning task T s along with a target domain (test data) D T and learning task T T , transfer learning aims to streamline the process for learning the target predictive function f T (•) relating D T and T T by utilizing prior knowledge of D s and T s (with D s = D T ).In our framework, the source domain comprises Landsat images of Beijing collected in 1999, and the target domain comprises all Landsat images of the four selected cities during 1984-2016.

Offline Training
We first undertake to define a suitable predictive function f T (•) through offline training.Multi-spectral data can be conceptualized as a set of orderly and continuous sequences in the spectral space [40].RNN is known to perform well when applied to sequential data types.Our adoption of RNN in this work is thus intended to leverage the sequential properties of multispectral data, such as spectral correlations and band-to-band variability.For our urban extraction task, the likelihood of predicting correct labels within the training data set D s can be maximized by solving where θ is the parameter set for the network, x si denotes a spectral pixel vector, and y si is the corresponding land-cover label.Since the spectral vector x si can be considered sequential, RNN is a natural choice for dealing with p(y si |x si ; θ).Urban pixels can be addressed in the RNN by the hidden units h t .We use an advanced recurrent unit/gated recurrent unit (GRU) approach to construct the RNN.This approach mitigates problems with vanishing or exploding gradients, which can hamper convergence.

Online Optimization
Our objective is an efficient and generalizable method for mapping urban areas.We therefore introduce an online learning step to specialize the representation of deep feature within the network to each new target space rather than applying the initial network directly.The collection of training data for real remote sensing applications covering multiple domains is expensive and time consuming.As the amount of unlabeled data is large relative to the amount of labelled training data, the unlabeled data provide information that can be used to explore and ultimately enhance the generalization capacity of the classification [41].
In our framework, we introduce information from D T into the initial RNN to fine-tune the network to each new study area.We first identify the top N data samples from a new target scene with high likelihood scores (higher than 0.99) using the trained GRU recurrent network and then adopt these samples as additional training data for the target domain.Likelihood scores are thus treated as the final output of the softmax layer, collated and applied prior to the final classification [42].These additional training data allow us to efficiently optimize the method to each target scene.The online optimization consists of three steps.First, D T is fed into the RNN trained on the source domain.This generates a provisional predicted label y Ti and corresponding likelihood p y Ti x Ti ; θ for each sample according to Equation (1), where p y Ti x Ti ; θ represents the likelihood that the sample belongs to a specific class.Second, samples with high likelihood scores (≥0.99) are selected as additional training samples on the target domain.Third, a supplementary training set D T = x T1 , y Ti , • • • , x Tn , y Tn that is specialized to the target scene (domain) is generated.The combined training data (i.e., the source data D s and the newly produced additional training set D T ) are then used to fine-tune the initial RNN method, producing an optimized method for conducting the T T task.The weights of the initial GRU recurrent network thus contribute to classifying the features of multispectral images covering the new target domain.The framework is constructed to strike a balance between generalized land cover classification (application to different land cover types, seasons, and cities) and specialized land cover classification (transfer of prior knowledge from D s and T s ).
We train the test case presented in this paper using the RMSprop algorithm [43].The recommended default parameters are used for all experiments.We use a single-layer GRU of size 32 with sigmoid activations and tanh activation for hidden representations.Sigmoid gate activations and tanh activation are both default settings under the RNN framework.All weight matrices in our method are initialized from a uniform distribution covering the range [−0.1, 0.1].All weight matrices are updated as necessary during the learning process and fine-tuned during the online optimization process.
Once each individual scene is classified using the proposed framework, the initial classification maps are merged into annual maps using the 'dominated strategy'.Under this strategy, a sample is classified as urban in the annual map if it was classified as urban in more than 50% of the scenes available for that year [10,44].Our use of this strategy is motivated by the relative stability of urban areas (or impervious surfaces) over long periods.

Performance of the Initial Classification
We evaluate temporal transfer within the proposed framework in the hope that long-term urban mapping can be achieved from spectral information alone with robust transferability in time.Landsat images covering more than 30 years are classified using training data drawn from the labeled training samples observed in and around Beijing during 1999.Figure 4 shows distributions of impervious land cover extracted using the original classification for urban and suburban regions around Beijing.The dates are randomly selected from early, middle, and late years within the analysis period and cover multiple seasons.Urban and non-urban regions can be clearly distinguished in all scenes, indicating a promising transferability of deep features learned by our method across different seasons.Once each individual scene is classified using the proposed framework, the initial classification maps are merged into annual maps using the 'dominated strategy'.Under this strategy, a sample is classified as urban in the annual map if it was classified as urban in more than 50% of the scenes available for that year [10,44].Our use of this strategy is motivated by the relative stability of urban areas (or impervious surfaces) over long periods.

Performance of the Initial Classification
We evaluate temporal transfer within the proposed framework in the hope that long-term urban mapping can be achieved from spectral information alone with robust transferability in time.Landsat images covering more than 30 years are classified using training data drawn from the labeled training samples observed in and around Beijing during 1999.Figure 4 shows distributions of impervious land cover extracted using the original classification for urban and suburban regions around Beijing.The dates are randomly selected from early, middle, and late years within the analysis period and cover multiple seasons.Urban and non-urban regions can be clearly distinguished in all scenes, indicating a promising transferability of deep features learned by our method across different seasons.Figure 4    After acquisition of the initial classification maps, all images from each year are merged into annual urban maps using the 'dominated strategy' (see Section 2.3.2 for details).Here, we use confusion matrices, overall accuracies (OA), and user accuracies (UA) to assess the results of the temporal transfer experiments.Table 1 summarizes single scene and merged annual classification accuracies for four selected years.For completeness, counts of true negatives (TN), true positives (TP), false positives (FP) and false negatives (FN) are also listed.The abbreviation UA-N is used to denote the user accuracy of non-urban pixels, while UA-U demotes the user accuracy of urban pixels.The overall accuracies (OA) for all four merged annual assessments exceed 90%, with the user accuracy (UA) for urban areas greater than 98%.Among the single scene assessments, the average OA was 97 ± 0.2% in 1984, 93 ± 5% in 1994, 96 ± 2% in 2004, and 93 ± 4% in 2014.The user accuracy is typically higher for urban areas than for non-urban areas, but consistently exceeds 80% for both.These results indicate that the framework is sensitive to urban land cover in all seasons, and that information learned by training the method on images of Beijing in 1999 is transferable in time.The merged annual classification maps (99 ± 0.3%: average OA) outperform the single scene classification maps (95 ± 3%: average OA) slightly over these four years.The merged annual classifications are more robust and more stable than the single scene results given reduced uncertainties due to, e.g., poor image quality and spectral confusion.Based on these temporal transfer experiments, we conclude that the proposed framework is a promising tool for extracting urban areas from satellite imagery over long time spans.
Figure 5 shows the overall accuracy of changes detected using the merged annual classification maps.This accuracy fluctuates from year to year, with indications of a slight decline and a slight increase before and after 2004 respectively.This general evolution of change detection accuracy has also reported by Li et al. [21], where the random forest method was exploited for classification, and a temporal consistency-check strategy was introduced to improve classification accuracy.The slight decrease before 2004 may emerge for two reasons.First, the quality of Landsat images acquired during different seasons may vary from year to year, affecting the final mapping results.Second, rapid land-cover changes are difficult to extract with precision on annual time scales.Most clear-sky Landsat images around 2004 were collected during winter and spring (Figure 2).This seasonal sampling bias may influence the final detection accuracy.Urban land cover in Beijing also expanded rapidly after 2000, with an average growth rate of 99.48 ± 1.3 km 2 year −1 between 2000 and 2013 [21].The evolution of new construction within a given year may complicate the classification, particularly in times of rapid urbanization, as urban expansion is spread throughout the year.Overall, the accuracy for change detection in long term data sets ranged from 79% to 96%, with an average accuracy of 89%.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 24 After acquisition of the initial classification maps, all images from each year are merged into annual urban maps using the 'dominated strategy' (see Section 2.3.2 for details).Here, we use confusion matrices, overall accuracies (OA), and user accuracies (UA) to assess the results of the temporal transfer experiments.Table 1 summarizes single scene and merged annual classification accuracies for four selected years.For completeness, counts of true negatives (TN), true positives (TP), false positives (FP) and false negatives (FN) are also listed.The abbreviation UA-N is used to denote the user accuracy of non-urban pixels, while UA-U demotes the user accuracy of urban pixels.The overall accuracies (OA) for all four merged annual assessments exceed 90%, with the user accuracy (UA) for urban areas greater than 98%.Among the single scene assessments, the average OA was 97 ± 0.2% in 1984, 93 ± 5% in 1994, 96 ± 2% in 2004, and 93 ± 4% in 2014.The user accuracy is typically higher for urban areas than for non-urban areas, but consistently exceeds 80% for both.These results indicate that the framework is sensitive to urban land cover in all seasons, and that information learned by training the method on images of Beijing in 1999 is transferable in time.The merged annual classification maps (99 ± 0.3%: average OA) outperform the single scene classification maps (95 ± 3%: average OA) slightly over these four years.The merged annual classifications are more robust and more stable than the single scene results given reduced uncertainties due to, e.g., poor image quality and spectral confusion.Based on these temporal transfer experiments, we conclude that the proposed framework is a promising tool for extracting urban areas from satellite imagery over long time spans.
Figure 5 shows the overall accuracy of changes detected using the merged annual classification maps.This accuracy fluctuates from year to year, with indications of a slight decline and a slight increase before and after 2004 respectively.This general evolution of change detection accuracy has also reported by Li et al. [21], where the random forest method was exploited for classification, and a temporal consistency-check strategy was introduced to improve classification accuracy.The slight decrease before 2004 may emerge for two reasons.First, the quality of Landsat images acquired during different seasons may vary from year to year, affecting the final mapping results.Second, rapid land-cover changes are difficult to extract with precision on annual time scales.Most clear-sky Landsat images around 2004 were collected during winter and spring (Figure 2).This seasonal sampling bias may influence the final detection accuracy.Urban land cover in Beijing also expanded rapidly after 2000, with an average growth rate of 99.48 ± 1.3 km 2 year −1 between 2000 and 2013 [21].The evolution of new construction within a given year may complicate the classification, particularly in times of rapid urbanization, as urban expansion is spread throughout the year.Overall, the accuracy for change detection in long term data sets ranged from 79% to 96%, with an average accuracy of 89%.

Urban Expansion
Long term urban land cover datasets are essential for tracking and understanding the spatiotemporal patterns of urban development.Figure 6 shows the spatial distribution of urban expansion during 1984-2016.Here, sites that were built earlier have lower values (shown in blue), while sites that were built more recently have higher values (shown in bright red).Built environments have expanded in both the urban and suburban areas of Beijing in recent decades.Two inset maps (Figure 6A,B) show urbanization patterns for the Langfang and Beijing Capital Airport regions respectively.The corresponding Landsat images are also included for context.

Performance of the Initial Classification
The spatial transferability of information gained from the training samples collected over Beijing in 1999 are tested for three different cities (New York, Melbourne, and Munich) in similar (temperate continental) climate zones during 1984-2016.A total of 478 Landsat images of these three cities are classified without field data from these three cities (spatial transfer experiments).Figure 7 shows examples of initial maps of impervious cover extracted for urban and suburban regions in all three cities.Corresponding Landsat images are shown for each randomly-selected classification map.Impervious surfaces can be accurately detected in both urban and suburban regions, and the locations of newly built roads and airports can be identified under different seasonal conditions.In urban detection, pure impervious surface pixels and mixed pixels in which impervious surfaces are paired with vegetation or water are easy to distinguish from non-urban classes.By contrast, mixed pixels with impervious surfaces and bare-land are difficult to classify, as illustrated for New York Suburban-1 (Levittown) in Figure 7.
As in the temporal transfer experiment (Section 3.1), the accuracies of both single-year classifications and detected changes are evaluated for the spatial transfer experiments.Table 2 lists the results for both single scene and merged annual classifications for all three urban areas in four selected years (1986, 1997, 2004, and 2014).As above, confusion matrices (TP, TN, FP, and FN), overall accuracies (OA) and user accuracies (UA) are calculated to assess the classification results.Our framework accurately extracts urban surfaces in both individual scenes (across seasons) and merged annual classification with UA-U consistently exceeding 92% in all cases).Average values of single-scene OA are 97 ± 3% in New York, 97 ± 2% in Melbourne, and 94 ± 4% in Munich.This level

Performance of the Initial Classification
The spatial transferability of information gained from the training samples collected over Beijing in 1999 are tested for three different cities (New York, Melbourne, and Munich) in similar (temperate continental) climate zones during 1984-2016.A total of 478 Landsat images of these three cities are classified without field data from these three cities (spatial transfer experiments).Figure 7 shows examples of initial maps of impervious cover extracted for urban and suburban regions in all three cities.Corresponding Landsat images are shown for each randomly-selected classification map.Impervious surfaces can be accurately detected in both urban and suburban regions, and the locations of newly built roads and airports can be identified under different seasonal conditions.In urban detection, pure impervious surface pixels and mixed pixels in which impervious surfaces are paired with vegetation or water are easy to distinguish from non-urban classes.By contrast, mixed pixels with impervious surfaces and bare-land are difficult to classify, as illustrated for New York Suburban-1 (Levittown) in Figure 7.
As in the temporal transfer experiment (Section 3.1), the accuracies of both single-year classifications and detected changes are evaluated for the spatial transfer experiments.Table 2 lists the results for both single scene and merged annual classifications for all three urban areas in four selected years (1986, 1997, 2004, and 2014).As above, confusion matrices (TP, TN, FP, and FN), overall accuracies (OA) and user accuracies (UA) are calculated to assess the classification results.Our framework accurately extracts urban surfaces in both individual scenes (across seasons) and merged annual classification with UA-U consistently exceeding 92% in all cases).Average values of single-scene OA are 97 ± 3% in New York, 97 ± 2% in Melbourne, and 94 ± 4% in Munich.This level performance and its extension across different seasons demonstrate that the information gained from and the Beijing 1999 training data is transferable to other locations in similar climate zones.The extraction of urban area in any single scene is subject to uncertainties, as shown in Table 2; however, these uncertainties can be reduced substantially by merging the single scene results for each year.The merged annual classifications are more robust and stable than the single scene results, as also found in the temporal transfer experiment (Table 1).The results are qualitatively consistent among the three cities, with large values of OA and UA-U and slightly smaller values of UA-N.The latter implies larger errors in detecting non-urban areas, although UA-N still exceeds 90% for most scenes.
Remote Sens. 2018, 10, x FOR PEER REVIEW 12 of 24 performance and its extension across different seasons demonstrate that the information gained from and the Beijing 1999 training data is transferable to other locations in similar climate zones.The extraction of urban area in any single scene is subject to uncertainties, as shown in Table 2; however, these uncertainties can be reduced substantially by merging the single scene results for each year.The merged annual classifications are more robust and stable than the single scene results, as also found in the temporal transfer experiment (Table 1).The results are qualitatively consistent among the three cities, with large values of OA and UA-U and slightly smaller values of UA-N.The latter implies larger errors in detecting non-urban areas, although UA-N still exceeds 90% for most scenes.Figure 8 illustrates the change detection accuracy for each of the three cities.The approach to selecting validation data for Figure 8 is described in Section 2.2.3.The accuracies vary among the three cities, but are consistently higher than 75%.Generally, changes are detected more accurately for New York (average OA of 94%) than for the other two cities.Changes are detected least accurately for Munich but remain relatively large on average (average OA of 89.4%).Change detection accuracy is relatively stable in time for Melbourne, with annual accuracies from 87% to 95%, and an average OA equal of 93%.The results indicate that our method can accurately detect changes related to urbanization in metropolitan areas with temperate continental climates.Figure 8 illustrates the change detection accuracy for each of the three cities.The approach to selecting validation data for Figure 8 is described in Section 2.2.3.The accuracies vary among the three cities, but are consistently higher than 75%.Generally, changes are detected more accurately for New York (average OA of 94%) than for the other two cities.Changes are detected least accurately for Munich but remain relatively large on average (average OA of 89.4%).Change detection accuracy is relatively stable in time for Melbourne, with annual accuracies from 87% to 95%, and an average OA equal of 93%.The results indicate that our method can accurately detect changes related to urbanization in metropolitan areas with temperate continental climates.

Urban Expansion
The different urban development patterns are provided for each of the three cities, with the expansion of urban land during 1984-2016 illustrated in map form (Figure 9 for New York, Figure 10 for Melbourne, and Figure 11 for Munich).Sites that were built earlier have lower values (blue), while sites that were built recently have higher values (bright red).Two focus regions are selected for each city to better visualize spatiotemporal development patterns in the three cities, supplemented by TM images corresponding to the beginning and end of the Landsat record considered in this work.Central Park (Figure 9A,B) Levittown are selected for New York.Willerby (Figure 10A) and Hume (Figure 10B) are selected for Melbourne, and areas corresponding to the city center (Figure 11A) and airport (Figure 11B) are selected for Munich.Based on the results discussed above, our framework can effectively distinguish urban and non-urban (water, vegetation and bare-land) regions in most cases, but pixels containing both impervious surfaces and bare-land remain a challenge.Although Levittown (Figure 9B) experienced few changes during 1984-2016 (Section 4.2), our method indicates widely-scattered recent urbanization throughout the region (red pixels).We use data from Google Earth to analyze the source of this phenomenon.The availability of Google Earth data at 0.75 m spatial resolution enables detailed inspection of land-cover changes within Landsat pixels over long periods.The Levittown region contains many independent and sparse houses.Although impervious surfaces occupied less than 50% of many pixels in earlier years, recent re-paving of roads results in the detection of newly-built impervious land cover.Temporal changes are more easily extracted for pure impervious surfaces, such as Munich airport (Figure 11B).
Our results highlight important differences in spatiotemporal development patterns among these four cities.We examine these patterns further using higher resolution images from Google Earth in Section 4.2.Here, we note for context that New York added only a small amount of new

Urban Expansion
The different urban development patterns are provided for each of the three cities, with the expansion of urban land during 1984-2016 illustrated in map form (Figure 9 for New York, Figure 10 for Melbourne, and Figure 11 for Munich).Sites that were built earlier have lower values (blue), while sites that were built recently have higher values (bright red).Two focus regions are selected for each city to better visualize spatiotemporal development patterns in the three cities, supplemented by TM images corresponding to the beginning and end of the Landsat record considered in this work.Central Park (Figure 9A,B) Levittown are selected for New York.Willerby (Figure 10A) and Hume (Figure 10B) are selected for Melbourne, and areas corresponding to the city center (Figure 11A) and airport (Figure 11B) are selected for Munich.Based on the results discussed above, our framework can effectively distinguish urban and non-urban (water, vegetation and bare-land) regions in most cases, but pixels containing both impervious surfaces and bare-land remain a challenge.Although Levittown (Figure 9B) experienced few changes during 1984-2016 (Section 4.2), our method indicates widely-scattered recent urbanization throughout the region (red pixels).We use data from Google Earth to analyze the source of this phenomenon.The availability of Google Earth data at 0.75 m spatial resolution enables detailed inspection of land-cover changes within Landsat pixels over long periods.
The Levittown region contains many independent and sparse houses.Although impervious surfaces occupied less than 50% of many pixels in earlier years, recent re-paving of roads results in the detection of newly-built impervious land cover.Temporal changes are more easily extracted for pure impervious surfaces, such as Munich airport (Figure 11B).features over Beijing in 1999.All validations are conducted using the labeled validation data for the temporal transfer and spatial transfer experiments (see Section 2.2.3 for details).Table 3 compares classification accuracies for these methods according to overall accuracy (OA).Our proposed-framework consistently provides comparable or better results than the other three methods in all temporal transfer and spatial transfer experiments.The basic RNN with LSTM activation also performed well, with OA values approaching those for the deep-learning approach.Classification accuracies based on SVM and RF are similar to results reported by [51,52].The results of this comparison indicate the promising potential of our proposed framework for detecting urban regions in remote sensing datasets with limited training data.Together, the information listed in Tables 1-3, also establish the potential of this method for improving transferability to new time periods and locations in similar climate zones.For data applications such as this, the computational of training and prediction is also an important consideration.Table 3 also show the run times (min) on temporal transfer experiments.Among the evaluated methods SVM requires the most processing time, while RF requires the least.Although the proposed method requires slightly more than twice the processing time required by RF, this increased computational cost is balanced by substantial improvements in classification accuracy.The enhanced accuracy achieved using this framework could be particularly valuable for multi-season mapping applications and general purpose classifications at national and global level.

Applicability of Spectral Information from Landsat
In this paper, we discuss a method for applying Landsat spectral information to map urban areas in different cities and across different seasons.Here, we exploit all 777,090 training samples collected in Beijing 1999 (see Section 2.2.2 for details) and all selected 320,000 validation samples for the temporal transfer and spatial transfer experiments (in Section 2.2.3 for details) to analyze the spectral feasibility of different cities in different seasons.The labeled training images of Beijing in 1999 were selected from 14 scenes.Defining seasons following the criteria used by [21], these 14 scenes include one acquired in spring, six acquired in summer, three acquired in autumn, and four acquired in winter.Then, we can assess the typical distinctions between spectral signatures for urban and non-urban areas.We also use all 320,000 validation samples analyzed cities at similar times of year.The analyzed images are from 14 September 2000 for Beijing, 24 August 1999 for New York, 4 September 2003 for Melbourne, and 18 August 1998 for Munich.Based on these labeled samples, we plot spectral curves with average and standard deviation (Std) value of spectral vectors of urban and non-urban across different seasons and cities for further analysis.
Figure 12 shows original Landsat spectral curves for urban and non-urban classes in the four analyzed urban areas.The spectral curves for urban areas in Beijing (Figure 12a) undergo evident variations by season, with a particularly large bias between winter and spring.Despite these differences, the qualitative shapes of the urban spectral curves for Beijing are consistent across seasons, suggesting a potential for temporal transferability in urban extraction methods that rely on these spectra.Simultaneous examination of non-urban and urban spectral curves for Beijing (Figure 12b) reveals substantial overlap between urban and non-urban spectral signatures.This overlap threatens the generalizability of urban mapping frameworks based on these spectra.Similar, illustrations are shown for spectral curves over different urban areas at similar times of year (Figure 12c,d).Although spectral variations for non-urban areas still overlap greatly with those for urban areas (Figure 12d), the similarity of the urban spectral curves for urban areas in different cities (Figure 12c) highlights the potential for the spatial transfer of information gained from training data.The experiments described in Section 3 further demonstrate the promise of our framework for detecting urban regions from massive remote sensing datasets with limited training data, with a more robust performance and improved transferability relative to other common methods (Section 3.3).Deep feature representations help to increase the information gain from the original signatures for application to mapping urban areas.We use inter-class and intra-class distances to evaluate the effectiveness of the deep features.These two metrics quantify the degree of similarity within a given class and the ability to distinguish between different classes respectively.Here, all labeled training samples (777,090) of Beijing in 1999 and validation samples (320,000) in four cities are exploited for calculating these distances.The deep features were the output before soft layer of our framework with 64 dimensions.The spectral features are the reflectivity values from Landsat directly.Both intra-class distance and inter-class distance are calculated as J-M distance, defined following [53] as: where is the average vector, is the covariance matrix, and B is the Bhattacharyya between of two classes.Larger inter-class distances and smaller intra-class distances indicate an improved capacity for accurate classification.
Table 4 lists calculated inter-class and intra-class distances calculated for the original spectral features and the deep features obtained in our framework.Relative to the original spectral features, the deep features produce a smaller intra-class distance for both urban and non-urban areas, as well as larger inter-class distances between urban and non-urban classes.We infer from these results We use inter-class and intra-class distances to evaluate the effectiveness of the deep features.These two metrics quantify the degree of similarity within a given class and the ability to distinguish between different classes respectively.Here, all labeled training samples (777,090) of Beijing in 1999 and validation samples (320,000) in four cities are exploited for calculating these distances.The deep features were the output before soft layer of our framework with 64 dimensions.The spectral features are the reflectivity values from Landsat directly.Both intra-class distance and inter-class distance are calculated as J-M distance, defined following [53] as: where m i is the average c i is the covariance matrix, and B ij is the Bhattacharyya between of two classes.Larger inter-class distances and smaller intra-class distances indicate an improved capacity for accurate classification.Table 4 lists calculated inter-class and intra-class distances calculated for the original spectral features and the deep features obtained in our framework.Relative to the original spectral features, the deep features produce a smaller intra-class distance for both urban and non-urban areas, as well as larger inter-class distances between urban and non-urban classes.We infer from these results that the deep features are a valid and efficient means of increasing the information content of Landsat spectra for urban mapping.The results of our experiments and transfer analyses consistently indicate that the adverse effects of spectral variations can be greatly reduced by using the learned deep features in place of the original spectra.Our proposed urban detection framework relies on the larger inter-class distances and smaller intra-class summarized in Table 4 to generate an accurate classification with spatiotemporal transferability.

Variations in Urban Expansion Patterns among Cities
The primary motivation behind using deep learning to map urban areas across different cities and seasons is the need to understand how spatiotemporal patterns of urbanization differ among cities from the perspective of a single uniform classification framework.Our results highlight important differences in the characteristic patterns of urban change among different cities.Here, we summarize these differences as revealed by our analysis supplemented by higher-resolution imagery from Google Earth.Beijing has developed rapidly over the past 30 years, with urbanization often accomplished within a short time (e.g., 0.5-2 years).Along with economic growth, much of the urban expansion in Beijing consists of newly built human settlements (buildings) and public transport service (roads and airports).Timely observations are necessary to support planning design that can better cope with the rapid expansion of urban land covers in Beijing.In contrast to Beijing, little urban expansion has occurred in New York during the last 30 years.However, despite this lack of expansion, reconstruction, conservation and renovation have all been actively pursued, with fragmentary and independent impervious surfaces often appearing in suburban regions.As in Beijing, Melbourne underwent considerable urban expansion during 1984-2016 primarily due to new human settlements and roads.Although the expansion rate was lower than that for Beijing, the spatiotemporal urban expansion pattern for Melbourne demonstrates the extent to which new impervious surfaces were constructed to satisfy accommodate rapid population growth between 1984 (2.9 million) and 2015 (4 million) [54].Munich's urban expansion followed a similar pattern to New York's, with few new human settlements constructed in the past 30 years.Urbanization in Munich has mainly centered on the expansion of public transit facilities, including the Munich airport and improvements in roads and rail facilities.These four different cities have distinct spatiotemporal patterns of development that accentuate the need for urban maps at annual scales, as the use of coarser time intervals would obscure many of the details of this development (Li et al., [21].Timely observations that make optimal use of the available information are necessary to study urban change in detail.

of Transfer Learning for Urban Mapping
Although our proposed method performs well, several issues remain to be resolved in future work.First, mixed pixels can be difficult to classify accurately.For large-scale urban extraction, pixels that contain both impervious surfaces and bare land are difficult to classify.The challenges associated with distinguishing bare land from impervious surfaces are well known [21], and are a major source of uncertainty in remote sensing-based urban mapping frameworks.Particularly in rural areas, the bare-land cover occupy more than 50% in the mixed pixel, then the impervious surface in this pixel is hard to detect.Pixel unmixing algorithms and higher-resolution images may be necessary to overcome this issue.Second, urban detection methods based on transfer learning should only be applied in controlled scenarios.These methods are most effective when the target and source task contain potential similarity.In this paper, although four focus areas are located on four different continents, all four cities are large scale intensive urban environments with temperate continental climates.This choice is motivated by the selection of Beijing as the sole source of training.The required degree of similarity remains unclear, and particularly whether the transferability demonstrated in this work extends to cities in different climate zones.These topics will be investigated in future work.Finally, we reiterate that the merged annual urban map is typically more accurate than the single scene classification map.Combining multiple images acquired within a year (or indeed within any extended period), helps to overcome key uncertainties that may adversely affect the single-scene results.Google Earth Engine [55] is in use across a wide variety of disciplines, which is helpful for urban mapping [56].

Conclusions
In this paper, we have proposed a framework for classifying urban and non-urban areas based on deep learning with transferability.A manually-labeled training dataset based on observations of Beijing in 1999 was sampled to train the framework.Knowledge gained from this training step was then transferred to further extractions of urban land cover in all available Landsat images collected over Beijing, New York, Melbourne, and Munich during 1984-2016.Our results demonstrate the viability of this approach for detecting urban areas in different cities and at different times with a single classification strategy.The classification accurately captures historical spatiotemporal development patterns of impervious surfaces in each city.The temporal transfer and spatial transfer experiments demonstrate the generalizability of the proposed framework for urban mapping.For cities with similar climate conditions, the transferability of learned deep features can reduce the impacts of variations in radiometric values in constructing single-scene maps, and enhance inter-class distance between urban and non-urban pixels to better facilitate extraction of urban areas from Landsat imagery.Our experiments show a promising level of accuracy in both single-scene and merged annual classifications over all four cities, with an average OA exceeding 89%.The final experimental results also show that our proposed classification strategy can be extended to different sensors.Our results based on images collected by Landsat 4/5, Landsat 7, and Landsat 8 indicate that information from six common spectral bands is sufficient for detecting urban changes despite differences in sensors.Similar combinations of deep learning and transfer learning methods should be considered for other big data challenges in remote sensing and related fields.Maps of urban areas and their spatiotemporal evolution produced using this method will be useful for ecologists, hydrologists, and social scientists studying the built environment, its ecological impacts, and public policy.

Figure 1
Figure 1 shows the four cities: Beijing-123/32 (path/row), New York-130/322, Melbourne-93/86, and Munich-193/26.Although all four cities are characterized by temperate climate condition, a variety of external factors may reduce the mutual transferability of deep features.These external factors include atmospheric conditions, illumination in different seasons, sensor calibration, and the presence and distribution of complex terrain.

Figure 1
Figure 1 shows the four cities: Beijing-123/32 (path/row), New York-130/322, Melbourne-93/86, and Munich-193/26.Although all four cities are characterized by temperate climate condition, a variety of external factors may reduce the mutual transferability of deep features.These external factors include atmospheric conditions, illumination in different seasons, sensor calibration, and the presence and distribution of complex terrain.

Figure 3
Figure 3 summarizes the proposed framework for extracting annual urban information from Landsat imagery.Inputs to the network include pixel vectors from training data and target data.Outputs are the final predicted land cover labels.Application of this framework to all pixels in all images yields a classification map.Two complementary strategies are employed to construct this framework.The first is an offline training phase, in which all labeled training data are fed into the network to construct an initial RNN-based deep learning network (as shown by the solid black arrows in Figure3).The second is an online optimization phase, in which the target samples are used to fine-tune the initial network and increase its specialization be for the target images (as shown by the dotted black arrow).The fine-tuned network is then used to construct the final classification map for the target data.Through these steps, annual records of the distribution of impervious surface can be extracted and merged for further analysis.

Figure 3
Figure 3 summarizes the proposed framework for extracting annual urban information from Landsat imagery.Inputs to the network include pixel vectors from training data and target data.Outputs are the final predicted land cover labels.Application of this framework to all pixels in all images yields a classification map.Two complementary strategies are employed to construct this framework.The first is an offline training phase, in which all labeled training data are fed into the network to construct an initial RNN-based deep learning network (as shown by the solid black arrows in Figure3).The second is an online optimization phase, in which the target samples are used to fine-tune the initial network and increase its specialization be for the target images (as shown by the dotted black arrow).The fine-tuned network is then used to construct the final classification map for the target data.Through these steps, annual records of the distribution of impervious surface can be extracted and merged for further analysis.

Figure 3 .
Figure 3. Flowchart of the proposed urban extraction framework.Solid black arrows represent training step, in which the model is initialized by feeding in labeled source data during offline training.Dotted black arrows indicate online optimization steps in which the initial model is fine-tuned, rendering it more specialized for new target data.Blue arrows represent input and output processes.

Figure 3 .
Figure 3. Flowchart of the proposed urban extraction framework.Solid black arrows represent training step, in which the model is initialized by feeding in labeled source data during offline training.Dotted black arrows indicate online optimization steps in which the initial model is fine-tuned, rendering it more specialized for new target data.Blue arrows represent input and output processes.
Figure 4 also shows evident urbanization in both urban and suburban regions of Beijing between the mid-1980s and recent years.Impervious surfaces with distinguishable shapes, such as airport runways and roads are detected at pixel level by the method.Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 24 activations and tanh activation are both default settings under the RNN framework.All weight matrices in our method are initialized from a uniform distribution covering the range [−0.1, 0.1].All weight matrices are updated as necessary during the learning process and fine-tuned during the online optimization process.
also shows evident urbanization in both urban and suburban regions of Beijing between the mid-1980s and recent years.Impervious surfaces with distinguishable shapes, such as airport runways and roads are detected at pixel level by the method.

Figure 4 .
Figure 4. Original classification maps for an urban region (Beijing city center) and several regions (Shunyi: suburban-1, Changping: suburban-2, and Langfang: suburban-3) on different dates.Note that the selected dates differ by region.In each panel, the color map shows the original Landsat data with RGB-5, 4, 3, and the binary map shows the extracted urban areas (white).In the latter set of maps, gray indicates non-urban areas and black indicates that data are unavailable due to cloud contamination or other reasons.

Figure 4 .
Figure 4. Original classification maps for an urban region (Beijing city center) and several regions (Shunyi: suburban-1, Changping: suburban-2, and Langfang: suburban-3) on different dates.Note that the selected dates differ by region.In each panel, the color map shows the original Landsat data with RGB-5, 4, 3, and the binary map shows the extracted urban areas (white).In the latter set of maps, gray indicates non-urban areas and black indicates that data are unavailable due to cloud contamination or other reasons.

Figure 5 .
Figure 5.Time series of overall accuracy for changes detected in and around Beijing during 1984-2016.The method proposed in this paper (dark dashed line) is compared against that employed by Li et al. [21] (grey dotted line).

Figure 5 .
Figure 5.Time series of overall accuracy for changes detected in and around Beijing during 1984-2016.The method proposed in this paper (dark dashed line) is compared against that employed by Li et al. [21] (grey dotted line).

Figure 6 .
Figure 6.Urban expansion in Beijing during 1984-2016.The map shows the current distribution of urban area (color shading) with sites that were built earlier taking lower values (blue) and sits that were built more recently taking higher values (red).Two magnified insets show the changes: (A) zoomed-in view of Langfang region, and (B) zoomed-in view of Beijing Capital airport region.The corresponding Thematic Mapper (TM) images of Langfang and Beijing Capital airport are also shown using RGB-5, 4, 3 representations of images collected on 19 July 1985 and 20 August 2014, respectively.

Figure 6 .
Figure 6.Urban expansion in Beijing during 1984-2016.The map shows the current distribution of urban area (color shading) with sites that were built earlier taking lower values (blue) and sits that were built more recently taking higher values (red).Two magnified insets show the changes: (A) zoomed-in view of Langfang region, and (B) zoomed-in view of Beijing Capital airport region.The corresponding Thematic Mapper (TM) images of Langfang and Beijing Capital airport are also shown using RGB-5, 4, 3 representations of images collected on 19 July 1985 and 20 August 2014, respectively.

Figure 7 .
Figure 7. Original classification maps for the urban and suburban regions of New York (left), Melbourne (center), and Munich (right) on different dates.Color maps show original Landsat data with RGB-5, 4, 3. Binary maps show extracted urban areas (white) and non-urban areas (gray), with black indicating cloud contamination or missing data.New York Suburban-1 and Suburban-2 correspond to Levittown and Stanford, respectively; and Melbourne Suburban-1 and Suburban-2 correspond to Geelong and Ballarat; and Munich-suburban-S1 and Suburban-2 correspond to Munich airport and West-Munich.

Figure 7 .
Figure 7. Original classification maps for the urban and suburban regions of New York (left), Melbourne (center), and Munich (right) on different dates.Color maps show original Landsat data with RGB-5, 4, 3. Binary maps show extracted urban areas (white) and non-urban areas (gray), with black indicating cloud contamination or missing data.New York Suburban-1 and Suburban-2 correspond to Levittown and Stanford, respectively; and Melbourne Suburban-1 and Suburban-2 correspond to Geelong and Ballarat; and Munich-suburban-S1 and Suburban-2 correspond to Munich airport and West-Munich.

Figure 8 .
Figure 8.Time series of change detection accuracy for New York, Melbourne, and Munich during 1984-2016.

Figure 8 .
Figure 8.Time series of change detection accuracy for New York, Melbourne, and Munich during 1984-2016.

24 Figure 9 .
Figure 9. Urban expansion during 1984-2016 in New York, with insets providing magnified views of (A) Central Park and (B) Levittown.TM images corresponding to the beginning and end of the analysis period are also shown.

Figure 10 .
Figure 10.Urban expansion during 1984-2016 in Melbourne, with insets providing magnified views of (A) Willerby and (B) Hume.TM images corresponding to the beginning and end of the analysis period are also shown.

Figure 9 . 24 Figure 9 .
Figure 9. Urban expansion during 1984-2016 in New York, with insets providing magnified views of (A) Central Park and (B) Levittown.TM images corresponding to the beginning and end of the analysis period are also shown.

Figure 10 .
Figure 10.Urban expansion during 1984-2016 in Melbourne, with insets providing magnified views of (A) Willerby and (B) Hume.TM images corresponding to the beginning and end of the analysis period are also shown.

Figure 10 .
Figure 10.Urban expansion during 1984-2016 in Melbourne, with insets providing magnified views of (A) Willerby and (B) Hume.TM images corresponding to the beginning and end of the analysis period are also shown.
Remote Sens. 2018, 10, x FOR PEER REVIEW 19 of 24 3.3).Deep feature representations help to increase the information gain from the original signatures for application to mapping urban areas.

Figure 12 .
Figure 12.Spectral curves for urban and non-urban areas in different seasons and study areas: (a) spectral curves for urban areas during different seasons in Beijing; (b) annual mean spectral curves for urban and non-urban areas in Beijing; (c) spectral curves for four urban areas in the four analyzed cities acquired at similar day of year; and (d) spectral curves for urban areas averaged over the four analyzed cities.

Figure 12 .
Figure 12.Spectral curves for urban and non-urban areas in different seasons and study areas: (a) spectral curves for urban areas during different seasons in Beijing; (b) annual mean spectral curves for urban and non-urban areas in Beijing; (c) spectral curves for four urban areas in the four analyzed cities acquired at similar day of year; and (d) spectral curves for urban areas averaged over the four analyzed cities.

Table 1 .
Classification performance in specific years.

Table 2 .
Classification performance for each study areas in specific years (see text for abbreviations).

Table 3 .
Classification results from alternative methods with overall accuracy and run-time over temporal transfer experiments.

Table 4 .
Inter-class and intra-class distances based on original spectral features and deep features.