Large-Scale, Multiple Level-of-Detail Change Detection from Remote Sensing Imagery Using Deep Visual Feature Clustering

: In the era of big data, where massive amounts of remotely sensed imagery can be obtained from various satellites accompanied by the rapid change in the surface of the Earth, new techniques for large-scale change detection are necessary to facilitate timely and effective human understanding of natural and human-made phenomena. In this research, we propose a chip-based change detection method that is enabled by using deep neural networks to extract visual features. These features are transformed into deep orthogonal visual features that are then clustered based on land cover characteristics. The resulting chip cluster memberships allow arbitrary level-of-detail change analysis that can also support irregular geospatial extent based agglomerations. The proposed methods naturally support cross-resolution temporal scenes without requiring normalization of the pixel resolution across scenes and without requiring pixel-level coregistration processes. This is achieved with conﬁgurable spatial locality comparisons between years, where the aperture of a unit of measure can be a single chip, a small neighborhood of chips, or a large irregular geospatial region. The performance of our proposed method has been validated using various quantitative and statistical metrics in addition to presenting the visual geo-maps and the percentage of the change. The results show that our proposed method efﬁciently detected the change from a large scale area.


Introduction
The remote sensing field has seen rapid development in recent years, which has resulted in a deluge of overhead high-resolution imagery from airborne and spaceborne remote sensing platforms. Although this huge data provides plentiful new sources, it far exceeds the capacity of humans to visually analyze and thus demands highly automatic methods. This data volume has also seen continual enhancements in temporal and spatial resolution of remote sensing information [1][2][3][4][5][6]. These circumstances necessitate the development of new human-machine teaming approaches for visual imagery analysis workflows, specifically techniques that leverage state-of-the-art computer vision and machine learning. This increasing amount of geospatial big data are utilized in a key task in the earth observation, change detection (CD), which is one of the most active topics in the remote sensing area due to the persistent occurrence of catastrophes such as hurricanes, earthquakes and floods [6][7][8][9], in addition to the interactivity of human with natural phenomena [10]. The objective of CD is finding the change in the land cover utilizing co-located multitemporal remote sensed images [11]. It is a key technology that has been investigated in many fields such as disaster management, deforestation, and urbanization [7,12].
More abundant details of geographical information can be acquired from the high spatial resolution (HSR) images that can be obtained from various sources for example IKONOS, QuickBird, and WorldView [13]. Thus, high resolution images are in demand or imperative for various remote sensing fields [14]. However, the improvements of spatial resolution conveys challenges to automated change detection processes [15]. Generally, change detection methods fall under three categories, traditional pixel-based change detection (PBCD), object-based change detection (OBCD), and hybrid change detection (HCD) [15].
Pixel based methods [16][17][18] compare the multitemporal images pixel-by-pixel. There are many traditional pixel-based algorithms, such as image differencing, image rationing, principal component analysis (PCA), and change vector analysis (CVA). Using isolated pixels as the basic units in the pixel-based methods results in a over-sensitive change detection [13]. Additionally, they can be affected by noise, in particular, when handling the rich information content of high resolution data [19]. For the object-based methods [20][21][22][23][24], first a segmentation procedure is applied to extract the features of ground objects. Next, using an object-based comparison process the change map is obtained. The final result of OBCD is usually sensitive to the segmentation process [13]. Existing HCD techniques have proven suboptimal as well [25,26], in so much as the classical HCD methods belong to intuitive decision-level fusion schemes of PBCD and OBCD [13].
In addition to the above three categories the change detection can be gathered into two categories from the view of the ultimate aim, binary CD methods and multitype CD methods. While binary CD emphases highlighting the change, multitype CD objectives is to differentiates various types of changes [27]. In this paper, we propose a new approach for change detection, which is a chip-based approach facilitating multitype CD. This chipbased approach enables analysis at a variety of geospatail levels-of-detail; from chip-level change analysis up through larger spatial region analysis.
Feature learning is an efficient approach to represent imagery data [28]. The literature shows that the deep learning based feature extraction demonstrates superior generalization properties that allow us to take advantage of the abilities of the transfer learning. In this scenario, the knowledge that is gained from a deep network that is pretrained on one domain can be transferred to other domains. Those features that are extracted from the deep networks, a.k.a. deep visual features, are proven to have more robust domain invariance ability in the remote sensing field than the features extracted from shallow networks [29][30][31][32][33]. Additionally, deep features have the advantage of capturing the spatial context information efficiently [29][30][31][32][33].
In the architecture of CNNs, convolutional layers and pooling layers are stacked interchangeably for feature extraction. The CNN-based deep model structure makes it significantly appropriate for feature learning of images, which is capable of overcoming the deformation of spatial features caused by rotation and translation [34]. Several researchers have utilized convolutional neural networks (CNN) to extract features from satellite data for a variety of use cases. Li et al. [35] proposed a deep CNN-based method for enhancing 30-m resolution land cover mapping utilizing multi-source remote sensing image. The CNN-based classifier is employed to extract high resolution features from Google Earth imagery. Zhong et al. [36] built deep neural networks with numerous architectures to classify summer crops utilizing EVI time series.
Zhang et al. [37] combined the local information extracted by histogram of oriented gradients (HOG) along with the CNN features representing contextual information. They utilized a CNN model trained on ImageNet (ILSVRC2012) as the contextual feature extractor. In [38], the authors trained a CNN to label topographic features, such as roads, buildings, etc. In [39], features and feature combinations extracted from many deep convolutional neural network (DCNN) architectures were evaluated in an unsupervised maner, then leveraged for classification of the UC Merced benchmark dataset. The authors found that both ResNet and VGG have design parameters that work well with spatial data.
In real-world application, data usually contain complex many-to-many relationships [40]. To detect the multi-type change, the data should be clustered first. Various image clustering algorithms have been utilized as an unsupervised pattern recognition technique for different tasks involving HR-RSI. Fuzzy c-means (FCM) is one of the soft computing techniques in which each data point has a degree of membership in each cluster [41]. That makes it a well suited approach for HR-RSI because image chips may have complex composition, such as trees, grass, roadways, buildings, water bodies, and more.
In the age of Big Data, it is no longer necessary to constrain broad area analysis to coarse and low-resolution geospatial data. One of our key contributions is to highlight how the details and features in the visual present within high-resolution imagery can be utilized to tease out the differences between various clusters. In lower resolution imagery, small individualized details are barely discernible as visual features. Importantly, automated methods that can detect these subtle changes can capture importing aspects related to change, such as subtle tree removal operations that are not clear-cut lumber harvests.
The main contributions of this paper can be summarized as the following: • A chip-based approach has been proposed for multi-temporal change detection. The beauty of the proposed approach is that, unlike state of the art methods, it supports detecting the change in a large scale area with massive amount of HR remote sensed imagery data, while also facilitating higher resolution level-of-detail analysis. In other words, our proposed method gives an insight into the distribution of each cluster within the study area as a single unit, as sub-areas represented by the census blocks, and at the local chip level. Additionally, it overcomes the challenges of unaligned and multi-resolution corresponding imagery. • Generation of orthogonal deep visual features utilizing a DCNN along with the transfer learning techniques to extract the deep features for change detection analysis, considering the deep features represent the remote sensing data better than the traditional features extraction methods. Additionally, we project the DCNN extracted features into an orthogonal space preserving most of the information for enhanced unsupervised machine learning utilization. This solves a key challenge in the real world applications, as sufficient training data is not always available. • Soft clustering based on fuzzy c-means algorithm has been utilized to aggregate small image patches of similar visual composition. This is because the surface of the Earth is not composed of crisp distinctions of land cover in the high-resolution domain. Instead, any local image patch could be a mixture of various land cover types. Fuzzy c-means can capture this characteristic efficiently by the means of the soft partition. • Three analysis methodologies are introduced: cluster analysis, geospatial analysis, and metric change analysis. To successfully analyze the varied resolution images, that may be misaligned, change in the sub datasets is computed at the chip level using a neighborhood approach.
The remainder of this paper is organized as follows. Section 2 describes the datasets and the preprocessing stages. Section 3 presents our approach for extracting visual features from deep neural models that feed into a dimensionality reduction and soft partitioning. Section 4 details our analysis methodologies of the proposed approach. Section 5 outlines the experimental setup and presents the results. Section 6 interprets and describes our findings. Finally, Section 7 concludes the research and highlights future directions.

Datasets and Preprocessing
In this section, we describe two datasets and our preprocessing steps. The general workflow of this research is shown in Figure 1, and Table 1 lists the symbols used in this research.

Datasets Description
In this research, two main datasets were utilized. First, a publicly available benchmark dataset is used to train the deep neural feature extractor. The second dataset is our study area encompassing Columbia, MI, USA and the surrounding areas, including three different years.

RSI-CB256 Benchmark Dataset
The Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data (RSI-CB) [42] is used in this research to train the deep neural feature extractor. Specifically, the 256 × 256 pixel sub dataset (RSI-CB256). According to the authors, it is 0.22-3 m GSD, which makes it higher spatial resolution than many other existing datasets. Higher spatial resolution means higher information details.
The authors collected points of interest (POI) from Open Street Map (OSM), which directed the imagery scraping from Google Earth and Bing Maps. In total, there are 24,000 images across 6 categories with 35 subclasses including town, stream, storage_room, sparse_forest, snow_mountain, shrubwood, sea, sapling, sandbeach, river_protection_forest, river, residents, pipeline, parkinglot, mountain, marina, mangrove, lakeshore, hirst, highway,green_farmland, forest, dry_farm, desert, dam, crossroadsbridge, container, coastline, city_building, bridge, bare_land, city_avenue, artificial_grassland, airport_runway, airplane. The major countries where the POI were chosen from include USA, China, and UK [42]. The RSI-CB dataset is an imbalanced dataset; some classes consists of more than 1300 image while other less than 200 images. Thus, data augmentation was a vital for our research to get a better performance.

CoMo Dataset
The second dataset consists of three subsets of raster data. Each subset corresponding to a different year. Table 2 summarizes the properties of this dataset. The datasets have been collected over the following dates 5 April 2011, 10 October 2015, and 4 May 2017.
Of note, the tiles at the edges of the scanned area have different spatial extents than the rest of the tiles. This is true for the three sub-datasets. Additionally, the 2011 dataset is neither aligned nor has the same resolution as the other two datasets.
For the sake of this research, each tile is divided into 256 × 256 pixel chips to be consistent with the images of RSI-CB256 dataset. Each regular tile yields 4096 chips, which is a significant amount of data; more than 108 billion pixels in 2011 dataset and more than 24 billion pixels in each of 2015 and 2017 datasets. Handling such data at pixel-level is extremely computationally expensive, without a commensurate value in the resulting pixel-wise output. In this way, the chip-level approach, in the combination with DCNN features extraction, represents a suitably efficient data analysis technique that balances resolution of analysis (level-of-detail) with computational costs.

Region Imagery Preprocessing
To detect the change in the area of the interest, the three temporal subsets of the CoMo data should cover the same area. However, as can be perceived from the range of the geographic coordinates, the year 2011 covers different area than 2015 and 2017. This is a common problem with large-scale remote sensing datasets. Specifically, datasets that are collected over various years. The change analysis is limited to only the overlapping sub-area among the analyzed datasets. Using the accompanying geospatial polygons from the data provider, we are able to intersect the overlapping area and trim each year of the CoMo dataset to an appropriate set of scene tiles. This intersected geometry represents the final study area.
For the purpose of this research, each tile is partitioned into several chips of equal size, chosen to be 256 × 256 pixels for consistency with the RSI-CB256. Thus, each regular tile yields about 4096 chips. It is worth noting that the tiles at the edge of the scanned area are not always the same size of the regular tiles based on the source imagery capture; and therefore may yield a lower number of chips.
In this research we are interested in the center location of each chip. Thus, we defined Equation (1a,b) to obtain the center of each chip from its upper left corner. This equation assumes the scene, tiles, and chips are north-up.
As mentioned, the size of the tiles at the edges are not the same size of the regular tiles, therefore, some chips at the very edges have partial information. These chips are not useful and could negatively affect the subsequent processing and are therefore filtered out.
Algorithm 1 shows the pseudo code that checks if the chip is NoData chip or not. If it is NoData chip we will exclude it from further processing. First, the size of the chip is compared against a predefined threshold. If it is less than the threshold, it could be a useless chip that needs to be excluded. To make sure this chip does not contain useful information, it is compared at the pixel level to check whether most of the pixels in the chips are NoData or not. Comparing the size of the file first helps with expediting the process as comparing all the chips in such big data is computationally expensive. Table 3 lists the details of the intersected area after the preprocessing. The 2011 datasets is slightly different than the other two datasets even after the processing because we are targeting the chip level and these data are not aligned with other two datasets. However, this difference is a result of spatial intersection of the sub-scene tiles with respect to the geospatial extent of the study area, where the tiles of the 2011 scene extend from the polygon different from the 2015 and 2017 scenes. In the case of all three subsets of the CoMo data, the chips are further refined to both remove chips out of the extent as well as remove chips with significant NoData values. Furthermore, notice that the number of the chips in the intersected 2011 dataset is higher than the other two datasets due to the higher pixel resolution of this dataset. These characteristics highlight two particular challenges of the CoMo datasets-and many other large-scale multi-temporal datasets-that our methods detailed in Section 4 sufficiently handle. First, the data is of varied resolution and it is not necessary to resample the data to uniform spatial pixel resolution. Secondly, our methods will accommodate data that is not perfectly aligned, which is often a necessary condition of PBCD and even many OBCD. FDs ← Ds 20 end

Deep Visual Feature Extraction and Clustering
This section describes the deep feature extraction, transformation to orthogonal visual features for reduced dimension, and clustering. Algorithm 2 details the process, along with the subsequent Sections 3.2 and 3.3.

Deep Neural Feature Extraction Training
As mentioned previously, deep learning has proven its superiority in extracting the features over hand-crafted methods. In this research, the ResNet50 [43] DCNN model is trained with the RSI-CB256 dataset after transfer learning from the ImageNet dataset [44]. The most remarkable characteristic of ResNet is depth [45].
The RSI-CB256 dataset is an imbalanced dataset; some classes consist of more than 1300 images while others less than 200 images. The data are divided into three subsets: training, validation, and testing leads to low performance. For this reason, preprocessing is a mandatory requirement. First, the data have been shuffled to avert any bias. Next, to avoid overfitting, the oversampling has been done after dividing the data into 60% training, 20% validation, and 20% testing. This will guarantee there is no duplication in the data across the three subsets. The image rotation is used in oversampling the data. The images have been rotated in various angles including the following ranges [0-7, 83-97, 173-187, 263-277]. These ranges have been chosen to avoid losing the information from the image in case it rotated with large angles. In each subset, the classes have been oversampled to be equal to the largest class in the subset. Furthermore, the training dataset has been horizontally flipped, zoomed, cropped to avoid overfitting.
During training with RSI-CB256, the last convolutional layer is fine-tuned along with the training of the new classification phase to ensure the resulting features are adapted from the ground photo to the overhead imagery domain. The resulting DCNN generates 2048 deep visual features that are then assembled into 800 higher-level features (i.e., hidden layer) within the multilayer perceptron (MLP) classification phase before the final softmax node classifier output (see [46] for further details).
To ensure the proper training of the DCNN, the RSI-CB256 was organized into a 60/20/20 percent training, validation, and blind testing split. After training, the validation and blind test sets had classification performance of 98.72% and 99.31%, respectively.

Deep Orthogonal Visual Features
With 800 deep visual features, there is the potential for much redundant information, and therefore we utilize PCA to reduce the feature space (and the associated memory and computation requirements). We leverage PCA to minimize information loss during feature reduction (Algorithm 2, lines 6-7 and 24-25). PCA reduces numerical redundancy by projecting the original feature vector to a lower subspace, minimizing information loss [47]. Additionally, in the context of clustering driven by high dimensional dissimilarity metrics, reducing the effects related to the curse of dimensionality can improve the derived clusters. In this research, we reduce the deep visual features to 108, 110, and 102 dimensions for years 2011, 2015, and 2017, respectively; in each case preserving 99% of the data variances. These deep orthogonal visual features are then suitable for cluster analysis and other unsupervised machine learning methods that leverage metric spaces. We randomly sample the study area (30%) to derive the principle components and the reprojection eigenvectors, which is then applied to the entire study area.

Fuzzy Land Cover Clustering
As with any large swath of remote sensing imagery, our study area is inherently non-discrete. This is especially true of high-resolution imagery, where small objects (visual features) are discernible and the composition of a collection of visual features within a local spatial extent drive the assessment of that localized area. These visual features may include trees, areas of grass, buildings (or other impervious surfaces), water, and more. In this research, the computational locality is at the chip-level, as defined in Section 2.2. To facilitate large-scale analysis when the potential variability of the chip content is so diverse, we leverage soft-computing techniques. Specifically, we employ the fuzzy c-means (FCM) method, Ref. [48], to generate a soft partition of data into groups of similar visual content.
Within the FCM model, each data point (chip) is measured for similarity to cluster centroids to generate a cluster membership in [0.0, 1.0] [49,50]. Each chip's deep orthogonal visual features are measured against the center of each cluster using Euclidean distance to form a partition matrix, u. The FCM algorithm oscillates between computing weighted mean positions (center of mass) for each cluster and updating cluster memberships (the fuzzy partition matrix). Similar to the development of the deep orthogonal visual features, we leverage a 30% sample to learn the cluster centers in the deep orthogonal visual feature space. The final cluster centers are then used to compute a final u. The FCM algorithm is Algorithm 2, line 8 that trains the FuzzModel. Algorithm 2 line 26 uses this model to label all chips into the learned clusters.
The fuzzy partition matrix is utilized to create soft labels of the study area chips. Specifically, each chip's strongest cluster membership is extracted as both a label and a membership degree; thereby making the information better aligned for geospatial visualization and human interpretation.
The fuzzy partition matrix u is reduced to firm labels with membership via Equation (2).
(c, u c,j ), c = argmax where u ij is the individual membership of chip j in cluster i, | C | is the number of clusters, and N is the number of data points. The resulting tuple vector encodes the cluster assignment for each chip while preserving its membership information.
The unsupervised machine learning methods, such as fuzzy clustering, are usually very data dependent. Due to variability encountered in seasons, atmosphere, and sun/sensor geometry, we cannot expect unsupervised methods to generate data labels (i.e., cluster labels) that are coherent across scenes. For this reason, we aggregate the clusters into key semantic classes that dominate our study area: dense tree, grass field, mixed grass and tree, and mostly grass with tree or feature. This semantic grouping allows the mapping of any scene into numerically comparable chips. The grouping can be determined from any clustering with | C | greater than the number of semantic categories. After mapping each group of clusters to its target semantic category, the partition probabilities of each group members have been added together to produce the final partition probability of the target semantic category.

Analysis Methodologies
In this section, we present our methodology for cluster, geospatial, and change metric analysis.

Cluster Analysis
To explore the study area and detect the changes over the time period from one year to another, the clustering results are analyzed in this section. The percentage of each cluster in the study area has been calculated using Equation (3).
while the percentage of the change per cluster in CoMo and the surrounding areas has been calculated according to the Equation (4) Since the study area covers a broad region, we can examine it at a compositional level-of-detail, such as census blocks. Thus, the cluster C i of the whole study area is a union of all the clusters of type i in each block, which can be defined as follows: In essence, we can agglomerate any arbitrary sub-region at a broad range of spatial extents (number of chips) to facilitate analysis at numerous levels-of-detail. In some situations, it is desirable to investigate a small, localized area such as a census block for monitoring scenarios. Therefore, each block can be analyzed individually to understand the change per block, potentially combining this with any number of other geospatial layers that are often aggregated to the same level-of-detail. The following formula (Equation (6)) is used to calculate the percentage of each cluster per block.
The percentage of the change in each cluster per block in a specified period of time has been calculated utilizing Equation (7). The output of this equation could be positive or negative. The positive output indicates that the cluster has been increased within that block while the negative output indicates that the cluster has been decreased within that block.

Geospatial Analysis
In addition to the cluster analysis, the geospatial analysis is yet another vital analysis, especially for Geographic information system (GIS) applications. It is very important to present the clusters on the geographic map, which helps to explore the change by linking the clusters and the census blocks on the maps of various years.
Each census block is compromised of numerous chips, where each chip could belong to a different cluster. As the aim is to represent each block in the geographic map using one color that represents a dominant cluster, we must compute the dominant cluster within the block. For this purpose, we have developed a method based on the fuzzy partition partition (Equation (2)). Generally, by representing the block using one cluster there will be some loss in the information.
However, the purpose here of the geographic maps is to provide an abstract view of the clusters and link them to their geospatial extents on the surface of the Earth. In fact, representing a summary of clusters facilitates capturing the nature of the Earth since we are dealing with large-scale data and hundred of thousands of datapoints (localized chip spatial extents). In contrast, is it very difficult for humans to consume the information or recognize the clusters and the nature of the area of interest when all the datapoints are rendered on the map without summarization.
The proposed method, Equation (8), utilizes the membership degree of the fuzzy clustering (Equation (2)). For each cluster within a block, the average of the membership degree for all chips is calculated, then the cluster with the maximum average of membership degrees will be chosen to represent that block.

Change Metric Analysis
Another important analysis is measuring the magnitude of the change in a time period based on the membership degree of a chip. For this purpose, we have utilized two methods. The first method uses norm2 and the second uses Jensen-Shannon divergence (JSD).
The data of 2011 is not aligned geospatially with the data of 2015 and 2017. Therefore, measuring the change between the data of the years 2011 and 2015-2017 is not a straightforward process. Algorithm 3 provides an efficient geospatial neighborhood search technique, such that analysis can be performed using very small, localized sets of chips.
The neighborhood algorithm is utilized to find the neighbors of each chip in the year we want to compare with. First, the radius is determined, which is the spatial aperture of analysis, such that it determines the area to search for the neighbors of each chip. Second, the approximate nearest neighbors are found from the spatially indexed chips. Third, the Harversine formula (see Equation (9a,b)) is used to find the final neighbors within the specified radius on a sphere (surface of Earth) for each chip based on geospatial coordinates for a given maximum distance. Algorithm 3 shows the pseudocode for the neighborhood method utilized in this research. The Haversine formula (Equatiosn (9a,b)) is a simplified version of the arcsine form of the cosine law for a sphere utilized in the calculation of great circle distance. Great circle distance is used to measure the shortest distance between two points on the surface of the Earth. Though, as a result of the arcosine form of the cosine law on the sphere, the great circle distance calculation formula has a large error at a small distance. The substituted form is Haversine formulas, which produces a more accurate solution on the calculation of the smaller distances [51].
After finding the neighbors of each chip, norm2 (L 2 ) is calculated to define the amount of the change during a specified period of time at the chip-level (Equation (10)). The average of the neighbors' membership have been calculated first. Then, norm2 of the difference between the two vectors of the average fuzzy partition have been measured according to the Equation (10).
Another method that has been used to calculate the change metric is Jensen-Shannon divergence (JSD). The JSD (a.k.a. information radius (IRad) or total divergence to the average) is a well-known method for calculating the dissimilarity between two probability distributions. Note, that by definition of the FCM algorithm, the cluster memberships for a chip will sum to 1.0; and can be treated analogous to a probability. JSD can be considered as a symmetric and bounded version of Kullback-Leibler divergence [52,53]. Equation (11a-c) shows the formula to calculate JSD.
D KL stands for the Kullback-Leibler divergence and can be calculated utilizing Equation (12).

Experimental Results
In this section, the analysis results that are obtained based on the methodologies described in Section 4 are presented.

Clusters Analysis Results
The cluster analysis provides an insight into the distribution of each cluster within the study area either as a single unit or as sub-areas represented by the census blocks. The cluster analysis could serve as the exploratory step to the change detection process since it is important to know the division of the study area based on the type of the cluster to be able to compare and conclude the multi-type change. Figure 3 shows the percentage of each cluster for the study area. As mentioned previously, since this research targets a large scale area, it could be vital in some circumstance to investigate subareas. Therefore, the region has been explored based on the census blocks level as well. Each block is compromised of more than one cluster with the exception of some blocks that consist of only one cluster. This due to the fact that most of the blocks cover significant geographic areas. Figure 4 shows a sample of the percentage of each cluster per block in the three years 2011, 2015, and 2017. In Figure 4a, the top six blocks represent a diverse cluster with a strong Dense Tree and then transitional areas of grass and trees; and the lower six blocks are a cluster that is dominated by Dense Tree. In Figure 4b, the top six blocks represent a diverse cluster with mostly Dense Tree and Mix Tree/Grass; and the lower six blocks are a cluster that visually exhibits mostly areas of Mix Grass/Tree and little to no open grass fields. In Figure 4c, the top six blocks represent a diverse cluster with mostly Mix Tree/Grass and little of Dense Tree and Grass/Field; and the lower six blocks are a cluster that visually exhibits mostly areas of Mix Grass/Tree and little of Mostly Grass w/feature or Tree.   Figure 5. Some clusters increased while some other decreased. For instance, during the period 2011-2015, three clusters, dense tree, grass/field, and mostly grass w/features decreased by about 9.34%, 0.71%, and 9.95%, respectively; while mix grass/tree increased with the percentage of 20%. Additionally, it can be seen there is a small change during the time period 2015-2017 compared to the other period since it is a smaller temporal window. The figure clearly shows that the dense tree cluster, unlike the other clusters, constantly decreases in both periods; which can be expected in a study area such as this with expanding suburban characteristics encroaching on undeveloped rural areas. The stacked change of each cluster in each block is illustrated in Figure 6. In general, the trend in the change per block agrees with the change in the CoMo data as a whole ( Figure 5). However, the block level provides more details about the change. For example, in the period 2011-2015, the grass field increases especially in the blocks B17-B62 while it decreases in most of the blocks on the right side of the figure. Additionally, it can be noticed that some blocks did not change at all, such as B4 in Figure 6b.

Geospatial Analysis Results
Visualizing the clustered data on the geographic map is yet another method for exploring the change, using the geographic location and a predesignated subdivision such as census blocks. In fact, rendering clusters on the map facilitates understanding the actual spatial extent of the change on the surface of the Earth. The proposed method introduced in Section 4.2 is utilized to visualize the clustering results on a map. Recall the purpose of these maps is to provide a summarized view of the change, thus each census block is represented by the predominant cluster only, although the majority of the blocks consist of more than one cluster. This process sacrifices some of the detailed information, unlike the other detailed analyses performed in this research. Indeed, our aim in rendering the clusters on the geographic map is to summarize the information for a clearer synoptic view. Figure 7 shows the geographic map of the three studied years. In these maps, where the map is based on the block level, the whole block is labeled based on the predominant cluster even if just partial data of the block is available. The actual study area is surrounded with black dashed rectangle in Figure 7b and this is the same study area for the years 2011 and 2017. The blocks that are partially inside the study area are based on the partial information; while the blocks that fall completely inside the study area have complete information for said blocks.

Change Metric Analysis Results
This subsection details how to measure the magnitude of the change based at the chip-level utilizing the methods demonstrated in Section 4.3. As the volume of data data increases, we need an intelligent metrics to measure the change efficiently. A promising solution is to measuring the divergence between two distributions. Researches [54][55][56] have used many metrics to measure the similarity such as the Kullback-Leibler (KL). However, to measure the change we need symmetric metrics. Thus, to measure the magnitude of the change, we have utilized the symmetric metric Jensen-Shannon divergence (JSD) in addition to norm2 (L2). For both metrics we utilized a neighborhood method to get the corresponding area in the referenced year. Two approaches for the neighborhood method implementation have been investigated:  (2015-2017). This supports the results in Section 5.1, which details the percentage of the change. The second change metric utilized in this analysis is JSD. The similarity of the results obtained from JSD (see Figure 9) and the ones obtained from norm2 is as expected. For the second approach, M-N, the results of norm2 and JSD are shown in Figures 10 and 11, respectively. It is worth mentioning here the distance value has been chosen to include the neighbors that are defined based on the Moore neighborhood (potential chip plus its direct vertical, horizontal, and diagonal neighbors).

Discussion
In the real world, the study area are not always have predefined classes. Specially when the area has a verity of groups that change over time. Thus, in this research, we utilized clustering methods unlike many researches [57][58][59][60][61], who used classification methods to detect the change. Clustering method groups the images according to similarities between the features of these images and at the same time it differentiates them from other groups. In addition to that we have extracted deep features of the images using deep CNN with transfer learning technique to generate a rich feature maps instead of the traditional methods. The first convolution layers of the deep CNN capture the low detail features while the last layers generate high level features. The deep features has proven its superiority in many fields such as infrared images [62], thyroid histopathology images [63], and fluoroscopic images [64].
It can be noticed from Figure 4 that the dominated cluster in 2011 is the dense tree. While in 2015 and 2017 the dominated cluster became mix grass/tree. This is true for all other blocks as well. This indicates that some trees have been replaced with grass. This kind of transformation could happen either naturally, as a result of human activities, or a combination of them. Additionally, if we examine some blocks closely such as blocks B1-B7 in the year 2011, it can be noticed that not all the clusters are represented in those blocks (represented as zero value in the respective figure cells). In some other cases, the whole block consists of only a single cluster, such as blocks B1 from the year 2015 and B85 from the year 2017. This can happen when the blocks are very small or in remote areas.
It can be concluded from comparing the geomaps (Figure 7) that the general trend is decreasing in the dense tree cluster (red color) while increasing the Mix Grass/Tree cluster (green color). Additionally, the Grass/Field cluster (orange color) started to replace the Mix Grass/Tree cluster in the middle of the study area in the year 2017.
In Figure 8, it can be noticed that the area of longitude range (−92.4, −92.1) and latitude of range (38.75, 39.0) has the largest size of change during the first time period. This is can be observed from the high red peek in the figure. Note that Columbia city lies within this range of coordinates, which explains the highest change compared to the other parts of study area. The terrain of the land cover is more clear in the 2D versions of the norm2. For instance, the Missouri River can be seen in the diagonal of the area. The river is visually discernible because the magnitude of the change for the river is consistent and yet different from the surrounding area. Note that the yellow area corresponds to the peek red area in the 3D version.
Comparing the results obtained using norm2 with those obtained from JSD metric, JSD appears to present more dynamic values/scale for analysis. Additionally, the dark color in the 2D version of JSD (Figure 9d) goes back to the low change that happened during the time period 2015-2017, which can be confirmed from the 3D version as well. Therefore, we can see that the choice of change metric here (norm2 or JSD) can be made based on the degree of sensitivity desired in the analysis.
Comparing the results of the M-N approach, with the results of 1-N approach, the general views of the changes look similar. However, the range of the values are smaller for the M-N approach, which is likely due to the neighborhood averaging as a normalizing effect.
Recall the chosen distance comprises the neighbors that are defined based on the Moore neighborhood. This is true for 2015 and 2017 datasets. However, for the 2011 dataset the same value for the distance is used to ensure the consistency. As noted previously, the 2011 dataset is neither aligned with nor has the same resolutions as the other two datasets (2015 and 2017). However, obtaining the neighbors within the same distance as 2015 and 2017 is satisfactory to get the averaged values of the corresponding area and detect the change. This is one of the key techniques that facilitates the analysis without normalizing the imagery pixel resolutions across time periods and also negates the need to achieve pixel level alignment compared with pixel-based method such as [65][66][67].
Our proposed method can detect the change in a large-scale area represented by a massive amount of HR remote sensed imagery data while also facilitating higher resolution level-of detail analysis. While classical HCD techniques have shown to be suboptimal, Refs. [25,26], as they belong to intuitive decision-level fusion schemes PBCD and OBCD. For example, the final result of OBCD is usually sensitive to the segmentation process, while the pixel-based methods result in an over-sensitive change detection as they use isolated pixels as the basic units [13]. Additionally, they can be affected by noise when handling the rich information content of high-resolution data [19].
It can be seen from Figure 12 that new roads and buildings (residential area) are constructed and many trees are removed. The measurements of norm2 and JSD metrics reflect this change well. It can be noticed that the area where the new buildings are constructed has the highest values of measurements in both metrics compared to the other chips of the sampled area. This demonstrates the efficiency of our proposed method in capturing the changes in the study area; even down to the chip level-of-detail. It is worth mentioning that the two metrics norm2 and JSD coincide on the degree of the change but utilizing different scales. This demonstrates that relatively small, localized change areas can be measured and prioritized for human analytical review irrespective of the size of the large-scale area that is processed.

Conclusions
To summarize this research, a novel approach for detecting the change in a large-scale HR remote sensed imagery dataset based on chip-level measurements has been proposed. The orthogonal deep visual features technique have been utilized to extract the features from the datasets as a primarily step before clustering the data. Next, due to the inherent hybrid natural of remote sensing data, fuzzy c-means clustering has been utilized to cluster the RS data to better accommodate the non-discrete nature of large RS scenes.
After clustering the data, several analysis methodologies, (cluster, geospatial, and change metric analysis), have been applied to explore the change during the specified time periods. Additionally, a neighborhood method has been proposed to compare unaligned, multi-resolution datasets. The results obtained are based on various metrics such as norm2 and Jensen-Shannon divergence, in addition to the visual geo-maps and the percentage of the change. The area is examined at multiple levels-of-detail, including one whole unit, subunits represented by the census blocks, and at the local chip level. In each level-of-detail, it has been shown that the proposed method was able to detect the change efficiently.
There are numerous opportunities for future work, ranging from alternative feature extraction techniques to further development of change analysis. Further research will also explore reducing the size of the image chips to facilitate even higher resolution levels-ofdetail, approaching the object level. Additional areas of interest for change analysis can be explored, such as from different world regions that have significantly different composition, i.e., not primarily trees and grass lands. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/lehaifeng/RSI-CB accessed on 23 April 2021.