Change Detection from Landsat-8 Images Using a Multi-Scale Convolutional Neural Network (Case Study: Sahand City)

: Identifying changes in the Earth’s phenomena is vital for understanding and mitigating the impacts of environmental issues. Monitoring the Earth’s surface phenomena can be carried out effectively using satellite images acquired at different times. In addition to spectral features, spatial features play a significant role in detecting precise changes. However, classical change detection (CD) methods rarely consider spatial information and fail to account for scale variations within images. The present study introduces a novel deep learning-based CD method that hierarchically extracts spatial–spectral features at various scales to address these issues. The proposed deep neural network generates a binary change map by employing a multi-scale approach that integrates the information of patches of varied sizes at the decision level. We conducted experiments using Landsat-8 images from Sahand City, East Azarbaijan, Iran, because of their remarkable capacity to represent the Earth’s surface details. Tabriz’s population growth has led to rapid development in Sahand City to accommodate citizens. Studying these changes can offer valuable insights into urban planning. The performance of the proposed deep model is evaluated in comparison to two classical methods, the change vector analysis (CVA) method and a random forest (RF) algorithm. Based on the change detection results, the proposed deep learning network demonstrates a significant improvement in the kappa coefficient (KC) compared to the RF and CVA methods, with increases of approximately 11.86% and 29.36%, respectively. Furthermore, in terms of overall accuracy (O.A.), the proposed network outperforms both the RF and CVA methods by approximately 17.08% and 29.16%, respectively. The proposed multi-scale deep network performs better at detecting changes across all metrics. As a result, the CVA method fails to identify changes with sufficient accuracy.


Introduction
With the expanding human population, human interventions have intensified in nature to fulfill diverse needs.Consequently, it becomes crucial to monitor environmental changes to preserve wildlife and effectively manage human activities [1].Field-based surveys are acknowledged as a primary method of detecting changes but are burdened with various drawbacks.They are time-consuming, require significant human resources for fieldwork, and have limitations in terms of geographic coverage.These factors present challenges in monitoring changes using solely field-based techniques [2].On the other hand, multi-temporal remote sensing images provide a cost-effective and efficient approach to monitoring changes in the Earth's surface [3].
Change detection (CD) methods can be categorized into supervised and unsupervised approaches.Unsupervised methods directly detect changes without needing training samples, while supervised methods utilize training samples to identify changes [4].
Mishra et al. [5] studied land use and land cover changes in a Himalayan watershed using the maximum likelihood algorithm on Landsat-5 and Sentinel-2 images.Christaki et al. [6] applied Artificial Neural Networks to detect changes in UAV images after a catastrophic earthquake, explicitly focusing on textural features.While classical algorithms primarily rely on spectral information, which may yield less accurate outcomes, they can incorporate spatial features to improve identification accuracy.However, the manual extraction of spatial features is scene-specific and requires the careful selection of appropriate features from a range of options [7].
In contrast to classical feature extraction methods, deep learning networks can automatically extract high-level spectral-spatial information.As a result, the user's involvement in determining and identifying suitable features is reduced.Furthermore, the extracted features will no longer depend on the image scene.Roy et al. [8] introduced a new framework based on convolutional neural networks (CNNs) in which deep spatial features extracted by a 2-D CNN were used as inputs for a 3-D CNN.Aghdami-nia et al. [9] developed a modified version of the standard U-Net model called the automatic coastline extraction framework to enhance sea-land segmentation.Previous methods have primarily focused on utilizing single-scale CNNs, which limits their ability to capture the intricate multiscale spatial patterns inherent in images.Additionally, selecting the appropriate input patch size around each pixel requires precise user input.
The primary motivation behind our research is to improve the accuracy and comprehensiveness of change detection by automating the extraction of high-level information and surpassing the limitations of traditional CD methods.This study introduces a novel CNNbased CD method that considers the multiscale spectral-spatial features.The performance of the proposed model is evaluated against conventional techniques such as the change vector analysis (CVA) method and a random forest (RF) algorithm.Comparative analyses demonstrate the superiority of our CNN-based CD method and provide valuable insights into its reliability and accuracy.The findings of this study have the potential to enhance wildlife conservation efforts, facilitate the effective management of human activities, and help broaden the effectiveness of remote sensing in environmental monitoring.
The paper is organized as follows: Section 2 outlines the research methodology.Section 3 discusses the experimental results.Lastly, Section 4 provides a summary of the conclusions.

Methodology
As mentioned, this study compares the proposed MSCNN CD method with two classical CD methods, the CVA and RF methods.The workflow of the study is depicted in Figure 1.Training and testing samples are collected from a manually generated ground truth image in the subsequent step.Following this, change maps are generated using the three CD methods.Finally, the change maps are evaluated and compared.The following sections provide a summarized description of each of the employed CD methods.

CVA
The CVA technique defines a change vector as the disparity vector between two ndimensional vectors in a feature space, thus establishing it as an unsupervised method.These vectors represent two separate observations of the same pixel at different time points.The length of the change vector corresponds to the magnitude of the change event in the spectral feature space.The change magnitude (CM) can be quantified as follows: where DN ij represents the digital number of band j for data i [10].Then, the Otsu thresholding technique [11] is employed to obtain a binary change map.

RF
The RF is recognized as a classifier employing a Classification and Regression Tree ensemble for prediction purposes.The trees are generated using a bagging technique in which a subset of training samples is randomly selected with replacement.Consequently, certain samples may be drawn multiple times, while others may not be chosen at all [12].The RF model is trained using the collected samples, and subsequently, predictive processing is applied to the stacked images to generate a change map.

Multiscale CNN
CNNs have been extensively used in various remote sensing applications.CNNs utilize a shared-connection kernel to extract high-level spatial features.These networks include multiple layers, including convolutional, activation function, batch normalization, pooling, and fully connected layers [13].As previously mentioned, single-scale CNNs struggle to capture multiscale information in remote sensing images.Determining the optimal patch size requires a time-consuming trial-and-error process [14].This study introduces the MSCNN as a potential solution, incorporating a multiscale framework that eliminates the need to search for an optimal patch size and reduces reliance on a single value.The desired configuration is depicted in Figure 2, illustrating the network architecture.

Study Area and Dataset
This study utilizes Landsat-8 satellite images to evaluate the effectiveness of the proposed network in monitoring changes in the Sahand City area.Sahand City is situated in the East Azerbaijan province of Iran, with a longitude of 46 • 7 ′ 19.16 ′′ and a latitude of 37 • 56 ′ 18.41 ′′ .In response to the population growth of Tabriz, this city was established in 2007 as a measure of population control and city management.Sahand City, located 20 km southwest of Tabriz, has witnessed rapid development in recent decades, particularly after the construction of the Tabriz-Sahand highway, which has improved accessibility for residents of both cities.The Landsat images were obtained from the Google Earth Engine on 10 July 2013 and 1 August 2021.The geographic location of the studied area is depicted in Figure 3.The CVA method solely relies on pixel-based information and often exhibits inadequate efficacy in change detection.Introducing sample data to the change detection algorithms, known as the supervised method, can significantly improve detection accuracy.To enhance result analysis, Figure 5 showcases the confusion matrices of the binary change maps, comparing them with the ground truth data.This presentation allows for a comprehensive evaluation.Based on the confusion matrices, it is evident that the CVA algorithm misclassified 190 pixels, while RF and MSCNN misclassified 132 and 50 pixels, respectively.These numbers provide valuable insights into the performance of each algorithm in terms of pixel classification accuracy.Additional evaluation criteria, including precision, precision, recall, f1 score, overall accuracy (O.A), and the kappa coefficient (K.C), are utilized to conduct a comprehensive and quantitative assessment of the results.Based on the assessment measures in Table 1, the proposed MSCNN approach demonstrates a precision of 89.58% in detecting changes, representing the highest precision among all methods.In contrast, CVA exhibits the lowest performance, with a precision value of 60.42%.Similarly, when considering other criteria, it becomes evident that the proposed network outperforms both the RF and CVA algorithms with respect to accuracy.These findings highlight the superior performance of the proposed network compared to the other two methods.Sahand City has experienced notable transformations, particularly in converting barren lands into urban areas.Previously, this region was predominantly barren, providing an ideal location for urban development.The majority of changes observed in the area involve the construction of buildings and transportation routes.

Conclusions
Progress in remote sensing methodologies has greatly improved the monitoring of environmental changes, including in urban areas.This advancement has significantly enhanced our understanding of and ability to address ecosystem modifications, which also have notable economic implications.However, classical methods cannot incorporate spatial information into their analyses, limiting their effectiveness in considering the spatial context of detected changes.On the other hand, deep learning-based techniques that extract spatial features offer high accuracy in change detection.Due to the rapid population growth of Tabriz, Sahand City has undergone significant development in a short period to accommodate citizens.Therefore, examining changes in this city can provide valuable insights for better urban planning.To this aim, this study compared a new deep learningbased CD approach to classical CD methods, namely RF and CVA, to detect changes in Sahand City.Based on an evaluation of the results, the unsupervised CVA method had the lowest performance in CD.Employing supervised RF algorithms can enhance change detection accuracy, but utilizing the MSCNN network resulted in a remarkable 17% increase in the overall accuracy of the binary change map.The construction of buildings and new transportation infrastructure accounts for most of the changes in the area.This investigation challenges the widely held belief that simple algorithms can effectively detect changes.
In contrast, the findings emphasize the importance and effectiveness of advanced deep learning techniques in substantially improving outcome accuracy.

Figure 1 .
Figure 1.The workflow of the study.

Figure 1
Figure 1 illustrates the initial steps of the study in which the images undergo geometric and radiometric preprocessing.This step is essential for all the research conducted.Training and testing samples are collected from a manually generated ground truth image in the subsequent step.Following this, change maps are generated using the three CD methods.Finally, the change maps are evaluated and compared.The following sections provide a summarized description of each of the employed CD methods.

Figure 2 .
Figure 2. The proposed MSCNN architecture.The change identification process involves utilizing separate 2 D CNN networks with various input dimensions to classify the stacked bi-temporal Landsat images, and the Majority Voting (MV) algorithm is employed to integrate the results.The [3 × 3, 7 × 7, and 9 × 9] patch sizes are used as inputs.The sequencing of the filters is presented in the subsequent order: [64,128,256], accompanied by a kernel size of 3 × 3. Batch normalization layers are used to address overfitting in the convolutional layers.The learning rate and the optimizer are set to 0.0001 and Adam, respectively.

Figure 3 .
Figure 3.The location of Sahand City within the country's divisions (the image is from Google Earth).

Figure 4
Figure 4 visually presents the obtained binary change maps generated by the CVA, RF, and MSCNN techniques.The findings indicate that CVA incorrectly identified most areas as changes.In contrast, the RF and MSCNN techniques demonstrated superior performance in detecting changes.

Table 1 .
Accuracy assessment of three methods in CD.