Change Detection Based on Artiﬁcial Intelligence: State-of-the-Art and Challenges

: Change detection based on remote sensing (RS) data is an important method of detecting changes on the Earth’s surface and has a wide range of applications in urban planning, environmental monitoring, agriculture investigation, disaster assessment, and map revision. In recent years, integrated artiﬁcial intelligence (AI) technology has become a research focus in developing new change detection methods. Although some researchers claim that AI-based change detection approaches outperform traditional change detection approaches, it is not immediately obvious how and to what extent AI can improve the performance of change detection. This review focuses on the state-of-the-art methods, applications, and challenges of AI for change detection. Speciﬁcally, the implementation process of AI-based change detection is ﬁrst introduced. Then, the data from di ﬀ erent sensors used for change detection, including optical RS data, synthetic aperture radar (SAR) data, street view images, and combined heterogeneous data, are presented, and the available open datasets are also listed. The general frameworks of AI-based change detection methods are reviewed and analyzed systematically, and the unsupervised schemes used in AI-based change detection are further analyzed. Subsequently, the commonly used networks in AI for change detection are described. From a practical point of view, the application domains of AI-based change detection methods are classiﬁed based on their applicability. Finally, the major challenges and prospects of AI for change detection are discussed and delineated, including (a) heterogeneous big data processing, (b) unsupervised AI, and (c) the reliability of AI. This review will be beneﬁcial for researchers in understanding this ﬁeld.


Introduction
Change detection is the process of identifying differences in the state of an object or phenomenon by observing it at different times [1]. It is one of the major problems in earth observation and has been extensively researched in recent decades. Multi-temporal RS data, such as satellite imagery and aerial imagery, can provide abundant information to identify land use and land cover (LULC) differences in a specific area across a period of time. This is very crucial in various applications, such as urban planning, environmental monitoring, agriculture investigation, disaster assessment, and map revision.
With the ongoing development of Earth observation techniques, huge amounts of RS data with a high spectral-spatial-temporal resolution are now available, which brings new requirements of change detection techniques and greatly promotes their development. To address the problems brought about by finer spatial and spectral resolution images during the change detection process, many change detection approaches are proposed. Here, they are broadly divided into two categories: traditional and AI-based. Figure 1 presents the general flow of traditional change detection and AIbased change detection. Existing change detection reviews have focused mainly on the design of change detection techniques in multi-temporal hyperspectral images (HSIs) and high-spatial-resolution images [1][2][3][4]. The techniques they reviewed are mainly traditional change detection approaches, which can be generally summarized into the following groups:  Visual analysis: the change map is obtained by manual interpretation, which can provide highly reliable results based on expert knowledge but is time-consuming and labor-intensive;  aAlgebra-based methods: the change map is obtained by performing algebraic operation or transformation on multi-temporal data, such as image differencing, image regression, image ratioing, and change vector analysis (CVA);  Transformation: data reduction methods, such as principle component analysis (PCA), Tasseled Cap (KT), multivariate alteration detection (MAD), Gramm-Schmidt (GS), and Chi-Square, are used to suppress correlated information and highlight variance in multi-temporal data;  Classification-based methods: changes are identified by comparing multiple classification maps (i.e., post-classification comparison), or using a trained classifier to directly classify data from multiple periods (i.e., multidate classification or direct classification);  Advanced models: advanced models, such as the Li-Strahler reflectance model, the spectral mixture model, and the biophysical parameter method, are used to convert the spectral reflectance values of multi-period data into physically based parameters or fractions to perform change analysis, and this is more intuitive and has physical meaning, but it is complicated and time-consuming;  Others: hybrid approaches and others, such as knowledge-based, spatial-statistics-based, and integrated GIS and RS methods, are used. According to the detection unit, these methods can also be classified based on pixel-level, feature-level, object-level, and three-dimensions (3D) object-level, and have been systematically reviewed in the literature [5][6][7]. Due to the rapid development of computer technology, the research of traditional change detection approaches has turned to integrating AI techniques. In both traditional change detection flow and AI-based flow, the first step is data acquisition and the aim of change detection is to obtain the change detection map for various applications; after preparing the data, traditional approaches typically consist of two steps, including a homogenization process and a change detection process, while the AI-based approaches generally require an extra training set Existing change detection reviews have focused mainly on the design of change detection techniques in multi-temporal hyperspectral images (HSIs) and high-spatial-resolution images [1][2][3][4]. The techniques they reviewed are mainly traditional change detection approaches, which can be generally summarized into the following groups: • Visual analysis: the change map is obtained by manual interpretation, which can provide highly reliable results based on expert knowledge but is time-consuming and labor-intensive; • aAlgebra-based methods: the change map is obtained by performing algebraic operation or transformation on multi-temporal data, such as image differencing, image regression, image ratioing, and change vector analysis (CVA); • Transformation: data reduction methods, such as principle component analysis (PCA), Tasseled Cap (KT), multivariate alteration detection (MAD), Gramm-Schmidt (GS), and Chi-Square, are used to suppress correlated information and highlight variance in multi-temporal data; • Classification-based methods: changes are identified by comparing multiple classification maps (i.e., post-classification comparison), or using a trained classifier to directly classify data from multiple periods (i.e., multidate classification or direct classification); • Advanced models: advanced models, such as the Li-Strahler reflectance model, the spectral mixture model, and the biophysical parameter method, are used to convert the spectral reflectance values of multi-period data into physically based parameters or fractions to perform change analysis, and this is more intuitive and has physical meaning, but it is complicated and time-consuming; • Others: hybrid approaches and others, such as knowledge-based, spatial-statistics-based, and integrated GIS and RS methods, are used.
According to the detection unit, these methods can also be classified based on pixel-level, feature-level, object-level, and three-dimensions (3D) object-level, and have been systematically reviewed in the literature [5][6][7]. Due to the rapid development of computer technology, the research of traditional change detection approaches has turned to integrating AI techniques. In both traditional change detection flow and AI-based flow, the first step is data acquisition and the aim of change 1.
The implementation process of AI-based change detection is introduced, and we summarize common implementation strategies that can help beginners understand this research field; 2.
We present the data from different sensors used for AI-based change detection in detail, mainly including optical RS data, SAR data, street-view images, and combined heterogeneous data. More practically, we list the available open datasets with annotations, which can be used as benchmarks for training and evaluating AI models in future change detection studies; 3.
By systematically reviewing and analyzing the process of AI-based change detection methods, we summarize their general frameworks in a practical way, which can help to design change detection approaches in the future. Furthermore, the unsupervised schemes used in AI-based change detection are analyzed to help address the problem of lack of training samples in practical applications; 4.
We describe the commonly used networks in AI for change detection. Analyzing their applicability is helpful for the selection of AI models in practical applications; 5.
We provide the application of AI-based change detection in various fields, and subdivide it into different data types, which helps those interested in these areas to find relevant AI-based change detection approaches; 6.
We delineate and discuss the challenges and prospects of AI for change detection from three major directions, i.e., heterogeneous big data processing, unsupervised AI, and the reliability of AI, providing a useful reference for future research.
The rest of the paper is organized as follows. We introduce the implementation process of AI-based change detection in Section 2; and we list data sources used for change detection in Section 3; Remote Sens. 2020, 12, 1688 4 of 35 The review of general frameworks and commonly used networks in AI for change detection are presented in Sections 4 and 5, respectively; in Section 6, we summarize the various applications of the methods; after discussing the challenges and opportunities of AI-based change detection in Section 7, we draw conclusions of this review in Section 8. Figure 1 illustrates the general flow of AI-based change detection. The key is to obtain a high-performance trained AI model. In detail, as presented in Figure 2, the implementation process of AI-based change detection includes the following four main steps:

1.
Homogenization: Due to differences in illumination and atmospheric conditions, seasons, and sensor attitudes at the time of acquisition, multi-period data usually need to be homogenized before change detection. Geometric and radiometric correction are two commonly used methods [14,15]. The former aims to geometrically align two or more given pieces of data, which can be achieved through registration or co-registration. Given two period data, only when they are overlaid can the comparison between corresponding positions be meaningful [16]. The latter aims to eliminate radiance or reflectance differences caused by the digitalization process of sensors and atmospheric attenuation distortion caused by absorption and scattering in the atmosphere [4], which helps to reduce false alarms caused by these radiation errors in change detection. For heterogeneous data, a special AI model structure can be designed for feature space transformation to achieve change detection (see Section 4.1.2); 2.
Training set generation: To develop the AI model, a large, high-quality training set is required that can help algorithms to understand that certain patterns or series of outcomes come with a given question. Multi-period data are labeled or annotated using certain techniques (e.g., manual annotation [17], pre-classification [18], use of thematic data [19]) to make it easy for the AI model to learn the characteristics of the changed objects. Figure 2 presents an annotated example for building change detection, which is composed of two-period RS images and a corresponding ground truth labeled with building changes at the pixel level. Based on the ground truth, i.e., prior knowledge, the AI model can be trained in a supervised manner. To alleviate the problem of lack of training data, data augmentation, which is widely used, is a good strategy, such as horizontal or vertical flip, rotation, change in scale, crop, translation, or adding noise, which can significantly increase the diversity of data available for training models, without actually collecting new data; 3.
Model training: After the training set is generated, it can usually be divided into two datasets according to the number of samples or the geographic area: a training set for AI model training and a test set for accuracy evaluation during the training process [20]. The training and testing processes are performed alternately and iteratively. During the training process, the model is optimized according to a learning criterion, which can be a loss function in deep learning (e.g., softmax loss [21], contrastive loss [22], Euclidean loss [23], or cross-entropy loss [24]). By monitoring the training process and test accuracy, the convergence state of the AI model can be obtained, which can help in adjusting its hyperparameters (such as the learning rate), and also in judging whether the model performance has reached the best (i.e., termination) condition; 4.
Model serving: By deploying a trained AI model, change maps can be generated more intelligently and automatically for practical applications. Moreover, this can help validate the generalization ability and robustness of the model, which is an important aspect of evaluating the practicality of the AI-based change detection technique.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 36 Figure 2. Implementation process of AI-based change detection (black arrows indicate workflow and red arrow indicates an example).
The above steps provide a general implementation process of AI-based change detection, but the structure of the AI model is diverse and needs to be well designed according to different application situations and the training data, which will be introduced in Sections 4 and 5. It is worth mentioning that existing mature frameworks such as TensorFlow [25], Keras [26], Pytorch [27], and Caffe [28], help researchers more easily realize the design, training, and deployment of AI models, and their development documents provide detailed introductions.

Data Sources for Change Detection
With the development of data acquisition platforms such as satellites, drones, and ground survey vehicles, the massive multi-source RS data that they produce bring forth new application requirements for land change monitoring. In particular, multi-sensor high-spatial and high-temporalresolution data require more automated and robust change detection methods to reduce the cost of manual interpretation. By summarizing the data types used for change detection, we can deeply analyze the applicability of existing change detection methods to the data. In this paper, the types of data used for change detection are divided into optical RS images, SAR images, and street view images. It should be noted that street view images are usually not used as RS data but as auxiliary data [29][30][31], so it is not common in the RS community. Still, there are overlapping ideas for change detection. In this paper, street view images are treated as a kind of RS data in a broad sense and reviewed, because they can provide street-level observation data. Moreover, the methods combining heterogeneous data for change detection are summarized and analyzed. Examples of different data sources are shown in Figure 3. Optical RS and SAR images are gathered with passive and active sensors, respectively, covering different electromagnetic spectral ranges. Other data sources, such as digital elevation models (DEMs), geographic information system (GIS) data, and point cloud data, can provide valuable supplementary attributes. Overhead remote sensing collects information over large spatial areas, but its time resolution is relatively low. Street view images can provide nearly real-time information at street-level. For detailed descriptions of optical RS images, SAR images, street view images, and combined heterogeneous data, please refer to Section 3.1. In addition, Section 3.2 lists existing open datasets for change detection tasks that can be employed as benchmarks for future research. The above steps provide a general implementation process of AI-based change detection, but the structure of the AI model is diverse and needs to be well designed according to different application situations and the training data, which will be introduced in Sections 4 and 5. It is worth mentioning that existing mature frameworks such as TensorFlow [25], Keras [26], Pytorch [27], and Caffe [28], help researchers more easily realize the design, training, and deployment of AI models, and their development documents provide detailed introductions.

Data Sources for Change Detection
With the development of data acquisition platforms such as satellites, drones, and ground survey vehicles, the massive multi-source RS data that they produce bring forth new application requirements for land change monitoring. In particular, multi-sensor high-spatial and high-temporal-resolution data require more automated and robust change detection methods to reduce the cost of manual interpretation. By summarizing the data types used for change detection, we can deeply analyze the applicability of existing change detection methods to the data. In this paper, the types of data used for change detection are divided into optical RS images, SAR images, and street view images. It should be noted that street view images are usually not used as RS data but as auxiliary data [29][30][31], so it is not common in the RS community. Still, there are overlapping ideas for change detection. In this paper, street view images are treated as a kind of RS data in a broad sense and reviewed, because they can provide street-level observation data. Moreover, the methods combining heterogeneous data for change detection are summarized and analyzed. Examples of different data sources are shown in Figure 3. Optical RS and SAR images are gathered with passive and active sensors, respectively, covering different electromagnetic spectral ranges. Other data sources, such as digital elevation models (DEMs), geographic information system (GIS) data, and point cloud data, can provide valuable supplementary attributes. Overhead remote sensing collects information over large spatial areas, but its time resolution is relatively low. Street view images can provide nearly real-time information at street-level. For detailed descriptions of optical RS images, SAR images, street view images, and combined heterogeneous data, please refer to Section 3.1. In addition, Section 3.2 lists existing open datasets for change detection tasks that can be employed as benchmarks for future research.   [32]); (e) Point cloud data (from International Society for Photogrammetry and Remote Sensing (ISPRS) benchmarks [33]); (f) Street view image (from Cityscapes datasets [34]).

Optical RS Images
Optical RS images can be divided into hyperspectral, multispectral, and panchromatic images according to the number of bands. HSIs are volumetric image cubes that consist of hundreds of spectral bands. They have narrow bands over a wide portion of the electromagnetic spectrum; the band range is generally less than 10 nm. Multispectral images typically contain multiple bands but fewer than 15 bands. The spectral resolution of multispectral images is in the range of 0.1 times the wavelengths. Panchromatic images have a single band that is formed by using the total light energy in the visible spectrum (instead of partitioning it into different spectra). A side-by-side example of hyperspectral, multispectral, and panchromatic images is shown in Figure 4. HSIs are gathered by the AVIRIS sensor [35]; multispectral and panchromatic images are taken from the Quickbird satellite. The second row of Figure 4 shows the spectral intensity of a selected image pixel and the spectral resolution of the three types of images. Therefore, images with different number of bands, reflecting the spectral resolution, require different methods for change detection.  [32]); (e) Point cloud data (from International Society for Photogrammetry and Remote Sensing (ISPRS) benchmarks [33]); (f) Street view image (from Cityscapes datasets [34]).

Optical RS Images
Optical RS images can be divided into hyperspectral, multispectral, and panchromatic images according to the number of bands. HSIs are volumetric image cubes that consist of hundreds of spectral bands. They have narrow bands over a wide portion of the electromagnetic spectrum; the band range is generally less than 10 nm. Multispectral images typically contain multiple bands but fewer than 15 bands. The spectral resolution of multispectral images is in the range of 0.1 times the wavelengths. Panchromatic images have a single band that is formed by using the total light energy in the visible spectrum (instead of partitioning it into different spectra). A side-by-side example of hyperspectral, multispectral, and panchromatic images is shown in Figure 4. HSIs are gathered by the AVIRIS sensor [35]; multispectral and panchromatic images are taken from the Quickbird satellite. The second row of Figure 4 shows the spectral intensity of a selected image pixel and the spectral resolution of the three types of images. Therefore, images with different number of bands, reflecting the spectral resolution, require different methods for change detection.
HSIs have hundreds or even thousands of continuous and narrow bands, which can provide abundant spectral and spatial information. Multi-temporal HSIs are of great significance in distinguishing the subtle changes in ground objects through their high-dimensional feature information. The detailed information on spectral changes presents promising change detection performance. However, it increases the redundancy of the data and makes it difficult to interpret. Moreover, due to the generally low spatial resolution of HSIs, the textures around the pixels are vague, and mixed pixels occupy a large proportion. Change detection methods for HSIs must address the problems of high HSIs have hundreds or even thousands of continuous and narrow bands, which can provide abundant spectral and spatial information. Multi-temporal HSIs are of great significance in distinguishing the subtle changes in ground objects through their high-dimensional feature information. The detailed information on spectral changes presents promising change detection performance. However, it increases the redundancy of the data and makes it difficult to interpret. Moreover, due to the generally low spatial resolution of HSIs, the textures around the pixels are vague, and mixed pixels occupy a large proportion. Change detection methods for HSIs must address the problems of high dimensionality, mixed pixels, high computational cost, and a limited training dataset. Effective AI algorithms can be employed to solve these problems and have been proved to achieve satisfactory performance [36][37][38][39].
Multispectral images can be acquired economically and stably, with spatial resolution ranging from low to high. They can provide rich colors, textures, and other properties. Images with high spatial resolution or very high spatial resolution (10 to 100 cm/pixel) can also reflect the structure information of the ground objects [40]. Consequently, they are widely used for change detection. Specifically, the most commonly used types of multispectral images for AI-based change detection methods are derived from the Landsat series of satellites  and the Sentinel series of satellites [66,67], due to their low acquisition cost and high time and space coverage. In addition, other satellites, such as Quickbird [68][69][70][71][72][73][74], SPOT series [75][76][77][78], Gaofen series [14,79,80], Worldview series [81][82][83][84][85], provide high and very high spatial resolution images, and various aircrafts provide very high spatial A panchromatic image has only one band (i.e., black and white band), and usually contains a couple of hundred nanometer bandwidth. The bandwidth enables it to hold a high signal-noise ratio, making the panchromatic data available at a high and very high spatial resolution. Therefore, panchromatic images are usually fused with multispectral images to obtain richer spectral information and spatial information for change detection with high and very high spatial resolution. In addition, they can be used directly for change detection [95].
Optical RS images are widely utilized for change detection as they provide abundant spectral and spatial information. However, optical sensors rely upon the sun's illumination and the used wavelength is close to visible light or 1 mm. Therefore, they are often affected by solar radiation and clouds. SAR, on the other hand, uses a wavelength of 1 cm to 1 m and has its own illumination source. Thus, it can image at both day and night, in almost all weather conditions.

SAR Images
SAR is a technique which uses signal processing to improve the resolution beyond the limitation of physical antenna aperture [96]. The sensor is mounted on an aircraft or a satellite, and is used to make a high-resolution image of the earth's surface. SAR is independent of atmospheric and sunlight condition, so it has become a valuable source of information in change detection. With the development of SAR imaging technology, multi-platform, multi-band, multi-polarization SAR images provide more abundant data sources for change detection tasks. However, SAR images always suffer from the effect of speckle noises, which results in a more difficult process of change detection than optical RS images. Their three key problems include: (1) suppressing speckle noise; (2) designing a change metric or a change indicator; and (3) using a threshold or a classifier based on a change metric to generate a final change map. Change detection methods using AI techniques, especially an autoencoder (AE) [97][98][99][100][101][102][103][104][105][106][107] and a convolutional neural network (CNN) [108][109][110][111][112][113][114], to suppress speckle noise and extract features has been proven to be the state of the art. In the overall process and framework of methods, they are similar to the methods based on optical RS images, and the detailed framework and AI model introduction are analyzed in Sections 4 and 5.

Street View Images
Unlike optical RS and SAR images, street view images are captured at eye-level instead of overhead. They provide more detailed information in relatively small areas and at more observation angles, which can be used for dynamic or real-time change detection. Change detection based on street view images focuses on changes in the dynamic urban visual landscape, such as the addition or removal of specific landmarks, pedestrians, vehicles, and other roadside buildings.
A critical challenge is on how to identify noisy changes caused by various illuminations, camera viewpoints, occlusions, and shadows in detecting changes using street view images. These noisy changes are interwoven with semantic changes, making it difficult to define and measure the wanted semantic changes in street view images. Thus, using AI algorithms, mainly CNN [115][116][117][118][119][120], to learn deep features for change detection, requires street view images that have been spatially registered.

Combining Heterogeneous Data
According to whether the source of multi-period data is the same, the change detection methods can be divided into homogeneous data change detection and heterogeneous data change detection. Homogeneous data comes from the same type of sensors, and they have the same properties, spectral distribution, and feature space, while heterogeneous data come from different types of sensors, they have different properties and feature spaces, so they cannot be analyzed directly for difference image. Although change detection using heterogeneous data is more challenging, it has fewer restrictions on the type of input data and can be used in more situations. Different sensors can complement each other to provide richer information on ground objects. For example, using optical RS images and SAR images for change detection, the former can provide rich texture information, while the latter can be acquired without atmospheric restrictions. It can be used for emergency change detection in areas where data are insufficient or disasters occur. Much work on this issue has been proposed that uses AI methods to detect changes in SAR and optical RS images [16,66,[121][122][123][124][125][126][127]. In addition, GIS maps [128], point cloud data [91], DEMs [129,130], and digital surface models (DSMs) [131] are used in combination with optical RS images or SAR images for change detection. These different data types can satisfy different application requirements and are selected according to the actual situation.

Open Data Sets
Currently, there are some freely available data sets for change detection, which can be used as benchmark datasets for AI training and accuracy evaluation in future research. Detailed information is presented in Table 1.
It can be seen that the amount of open datasets that can be used for change detection tasks is small, and some of them have small data sizes. At present, there is still a lack of large SAR datasets that can be used for AI training. Most AI-based change detection methods are based on several SAR data sets that contain limited types of changes, e.g., the Bern dataset, the Ottawa dataset, the Yellow River dataset, and the Mexico dataset [24,103], which cannot meet the needs of change detection in areas with complex land cover and various change types. Moreover, their labels are not freely available. Street-view datasets are generally used for research of AI-based change detection methods in computer vision (CV). In CV, change detection based on pictures or video is also a hot research field, and the basic idea is consistent with that based on RS data. Therefore, in addition to street view image datasets, several video datasets in CV can also be used for research on AI-based change detection methods, such as CDNet 2012 [132] and CDNet 2014 [133]. Since they belong to the research field of video analysis, this paper will not review them in more detail. Those interested can refer to [132][133][134]. HRSCD [136] 291 co-registered pairs of RGB aerial images, with pixel-level change and land cover annotations, providing hierarchical level change labels, for example, level 1 labels include five classes: no information, artificial surfaces, agricultural areas, forests, wetlands, and water. WHU building dataset [88] 2-period aerial images containing 12,796 buildings, provided along with building vector and raster maps.
SZTAKI Air change benchmark [137,138] 13 aerial image pairs with 1.5 m spatial resolution, labeled as changed and unchanged at pixel level.
OSCD [139] 24 pairs of multispectral images acquired by Sentinel-2, labeled as changed and unchanged at pixel level.
Change detection dataset [140] 4 pairs of multispectral images with different spatial resolutions, labeled as changed and unchanged at pixel level.

MtS-WH [141]
2 large-size VHR images acquired by IKONOS sensors, with 4 bands and 1 m spatial resolution, labeled 5 types of changes (i.e., parking, sparse houses, residential region, and vegetation region) at scene level.
ABCD [92] 16,950 pairs of RGB aerial images for detecting washed buildings by tsunami, labeled damaged buildings at scene level.
xBD [142] Pre-and post-disaster satellite imageries for building damage assessment, with over 850,000 building polygons from 6 disaster types, labeled at pixel level with 4 damage scales.
AICD [143] 1000 pairs of synthetic aerial images with artificial changes generated with a rendering engine, labeled as changed and unchanged at pixel level.
Database of synthetic and real images [144] 24,000 synthetic images and 16,000 fragments of real season-varying RS images obtained by Google Earth, labeled as changed and unchanged at pixel level.

Type Data Set Description
Street view VL-CMU-CD [145] 1362 co-registered pairs of RGB and depth images, labeled ground truth change (e.g., bin, sign, vehicle, refuse, construction, traffic cone, person/cycle, barrier) and sky masks at pixel level.
PCD 2015 [119] 200 panoramic image pairs in "TSUNAMI" and "GSV" subset, with the size of 224 × 1024 pixels, label as changed and unchanged at pixel level.
Change detection dataset [146] Image sequences of city streets captured by a vehicle-mounted camera at two different time points, with the size of 5000 × 2500 pixels, labeled 3D scene structure changes at pixel level.

General AI-Based Change Detection Frameworks
The input of the change detection task is multi-temporal data, which are homogeneous or heterogeneous data in two or more periods. According to the deep feature extraction or latent feature representation learning process of the bi-temporal data, the AI-based change detection frameworks can be summarized into three types: single-stream, double-stream, and multi-model integrated.
In addition, we further analyze their unsupervised scheme in these frameworks, which is a very important and challenging research issue in AI.

Single-Stream Framework
There are two main types of single-stream framework structures for AI-based change detection, as shown in Figure 5, namely a direct classification structure and a mapping transformationbased structure. vegetation region) at scene level.
ABCD [92] 16,950 pairs of RGB aerial images for detecting washed buildings by tsunami, labeled damaged buildings at scene level.
xBD [142] Pre-and post-disaster satellite imageries for building damage assessment, with over 850,000 building polygons from 6 disaster types, labeled at pixel level with 4 damage scales.
AICD [143] 1000 pairs of synthetic aerial images with artificial changes generated with a rendering engine, labeled as changed and unchanged at pixel level.
Database of synthetic and real images [144] 24,000 synthetic images and 16,000 fragments of real season-varying RS images obtained by Google Earth, labeled as changed and unchanged at pixel level.

Street view
VL-CMU-CD [145] 1362 co-registered pairs of RGB and depth images, labeled ground truth change (e.g., bin, sign, vehicle, refuse, construction, traffic cone, person/cycle, barrier) and sky masks at pixel level.
PCD 2015 [119] 200 panoramic image pairs in "TSUNAMI" and "GSV" subset, with the size of 224 × 1024 pixels, label as changed and unchanged at pixel level.
Change detection dataset [146] Image sequences of city streets captured by a vehicle-mounted camera at two different time points, with the size of 5000 × 2500 pixels, labeled 3D scene structure changes at pixel level.
They usually only need a core AI model to achieve change detection, so they can be regarded as a single-stream structure. It is worth noting that, in practice, some studies have made improvements based on these structures to meet specific change detection purposes, and a detailed analysis is given below. They usually only need a core AI model to achieve change detection, so they can be regarded as a single-stream structure. It is worth noting that, in practice, some studies have made improvements based on these structures to meet specific change detection purposes, and a detailed analysis is given below.

Direct Classification Structure
The direct classification methods use various data processing approaches to fuse the two or more periods of data into intermediate data, and a single AI-based classifier is then used to perform feature learning and achieve two or multiple classifications of the fusion data. That is, as shown in Figure 5a, this structure converts the change detection task into a classification task, also known as a two-channel structure in some of the literature [108,147]. Its two key research issues are the choice of data fusion approach and AI-based classifier.
To obtain the fusion data from multi-period data, the two most common approaches are using change analysis methods and direct concatenation. Change analysis methods, such as CVA [47], differencing by log-ratio operator [18,148] or change measures [103,149], are able to directly provide change intensity information (i.e., the difference data) in multitemporal data, which can highlight change information and facilitate change detection. The direct concatenation method can retain all the information of the multi-period data, so the change information is extracted by the subsequent classifier. In general, the one-dimensional input data is directly concatenated [24,42,101,[150][151][152], while the two-dimensional data is concatenated by channel [111,112,153,154]. Moreover, the fusion of original data and difference data [21,99] is another good strategy, which can keep all the information while highlighting the difference information.
The classifier uses AI techniques to classify the fused data into two types (i.e., changed or unchanged) or multiple types (different types of changes) [77]. Its performance and related training data are the key to finally obtaining satisfactory change maps. More details are reviewed in Section 5.

Mapping Transformation-Based Structure
The mapping transformation-based framework structure is usually used to detect changes in different domains or heterogeneous data. Its main idea is to use the AI method to learn the feature mapping transformation, and use it to perform feature transformation on one kind of data, as shown in Figure 5b. The transformed features correspond to the features of another kind of data. In short, it transforms data from one feature space to another feature space. Finally, the change map can be obtained by performing decision analysis on the corresponding features of the two kinds of data. In [16], a mapping neural network (MNN) is designed to learn the mapping function between the multi-spatial-resolution data, and the feature similarity analysis is then implemented to build a change map. This method also achieves change detection between SAR and optical images. In [60], the authors use an ANN to achieve relative radiometric normalization, and then detect changes of the two-period data under the same radiation condition. Moreover, using this idea of mapping transformation, several improved structures have been proposed for detecting changes in heterogeneous data [121][122][123][124] or different domain data [10].

Double-Stream Framework
Since the change detection task is usually based on two periods of data, that is, two inputs, the double-stream structure is very common for change detection and can be summarized into three types, as shown in Figure 6. They are a siamese structure, a transfer learning-based structure, and a post-classification structure.

Siamese Structure
As shown in Figure 6a, the Siamese structure generally consists of two sub-networks with the same structure, i.e., feature extractors, which convert the input two-period data into feature maps. Finally, the change map is obtained by using change analysis (i.e., decision maker). The main advantage of this structure is that its two sub-networks are directly trained at the same time to learn the deep features of the input two-period data.
According to whether the weights of sub-networks are shared, this can be divided into the pure-Siamese structure [22,68,94,117,155,156] and the pseudo-Siamese structure [79,109,157,158]. The main difference is that the former sub-network extracts the common features of the two-period data by sharing weights. The latter sub-network extracts features corresponding input data, respectively, resulting in an increase in the number of trainable parameters and complexity, but also in its flexibility. Similarly, the authors of [159]

Siamese Structure
As shown in Figure 6a, the Siamese structure generally consists of two sub-networks with the same structure, i.e., feature extractors, which convert the input two-period data into feature maps. Finally, the change map is obtained by using change analysis (i.e., decision maker). The main advantage of this structure is that its two sub-networks are directly trained at the same time to learn the deep features of the input two-period data.
According to whether the weights of sub-networks are shared, this can be divided into the pure-Siamese structure [22,68,94,117,155,156] and the pseudo-Siamese structure [79,109,157,158]. The main difference is that the former sub-network extracts the common features of the two-period data by sharing weights. The latter sub-network extracts features corresponding input data, respectively, resulting in an increase in the number of trainable parameters and complexity, but also in its flexibility. Similarly, the authors of [159] designed a triple network consisting of three sub-networks with sharing weights for change detection.
Although this structure enables the feature extractor to directly learn deep features by supervised training with labeled samples, unsupervised training is more challenging. A common solution is to train feature extractors individually in an unsupervised manner [105][106][107]160]. These pre-trained feature extractors provide the latent representation of the original data (i.e., feature maps) for further change detection. To generate change maps, the output feature maps in the two periods can be directly classified by the concatenation of channels [91,155,161] or can be used to produce difference maps using a certain distance metric [9], and then used for further change analysis [162,163]. To retain multi-scale change information, feature maps at different depths can be concatenated for change detection [164][165][166][167], and this works well.

Transfer Learning-Based Structure
The transfer learning-based structure is proposed to alleviate the lack of training samples and optimize the training process. Transfer learning uses training in one domain to enable better results in another domain and, specifically, the lower to midlevel features learned in the original domain can be transferred as useful features in the new domain [13]. The pre-trained AI model, as a feature extractor, is used to generate feature maps for two periods, and the feature extractors of the two periods can be the same, as shown in Figure 6b. Whether the pre-trained model can correctly extract the deep feature map or latent feature representation of the input data determines the performance of the change detection task. Although this structure enables the feature extractor to directly learn deep features by supervised training with labeled samples, unsupervised training is more challenging. A common solution is to train feature extractors individually in an unsupervised manner [105][106][107]160]. These pre-trained feature extractors provide the latent representation of the original data (i.e., feature maps) for further change detection. To generate change maps, the output feature maps in the two periods can be directly classified by the concatenation of channels [91,155,161] or can be used to produce difference maps using a certain distance metric [9], and then used for further change analysis [162,163]. To retain multi-scale change information, feature maps at different depths can be concatenated for change detection [164][165][166][167], and this works well.

Transfer Learning-Based Structure
The transfer learning-based structure is proposed to alleviate the lack of training samples and optimize the training process. Transfer learning uses training in one domain to enable better results in another domain and, specifically, the lower to midlevel features learned in the original domain can be transferred as useful features in the new domain [13]. The pre-trained AI model, as a feature extractor, is used to generate feature maps for two periods, and the feature extractors of the two periods can be the same, as shown in Figure 6b. Whether the pre-trained model can correctly extract the deep feature map or latent feature representation of the input data determines the performance of the change detection task.
The transfer learning-based structure usually has two training phases, namely the deep feature learning phase and the fine-tuning phase. In the deep feature learning phase, the AI model is usually supervised, pre-trained with sufficient labeled samples in other domain data [67,110,168]. The fine-tuning phase is optional and, in this phase, only a small number of labeled samples are required for fine-tuning [125,[169][170][171][172] or additional classifier training [90,140]. Therefore, the change map can be directly obtained by the trained classifier. Without fine-tuning, final change maps can be obtained based on the two-period feature maps using change analysis, such as low rank analysis [173], CVA [72], clustering [73,82], and threshold [119,174]. This means that no more labeled samples are needed for further training. Moreover, based on the idea of transfer learning, the pre-trained AI model can also be used to generate training samples or masks to achieve the unsupervised scheme [78], which is a very practical strategy.

Post-Classification Structure
As shown in Figure 6c, the post-classification structure consists of two classifiers, which can usually be converted into classification tasks and trained in a joint or independent way. It provides a classification map for each period data and the change map with change directions can be obtained by comparing classification maps. Nevertheless, the accuracy of the change detection results of these methods depends on the performance of the classifier.
A number of studies [43,44,52,56,61,70,75,76,131,175,176] have proven that AI techniques increase the accuracy of land-cover classification to a notable level and the results can be further used for change detection. By converting the direct geometric or spectral comparison to label changes, the post-classification structure can be regarded as a very general and practical structure, and it provides a type change matrix. Advantageously, it works robustly for data acquired under different acquisition conditions (illumination condition, sensor attitude, season, etc.) or even different sensors [45,50,62,130]. Supervised training of the AI-based classifier requires a large number of training samples, which can be generated by existing GIS data representing land cover [128] or thematic maps [177].

Multi-Model Integrated Structure
Many works have integrated multiple AI models to improve the performance of change detection methods. Considering the large number and complex structure, only a representative structure is summarized, as shown in Figure 7.
based on the two-period feature maps using change analysis, such as low rank analysis [173], CVA [72], clustering [73,82], and threshold [119,174]. This means that no more labeled samples are needed for further training. Moreover, based on the idea of transfer learning, the pre-trained AI model can also be used to generate training samples or masks to achieve the unsupervised scheme [78], which is a very practical strategy.

Post-Classification Structure
As shown in Figure 6c, the post-classification structure consists of two classifiers, which can usually be converted into classification tasks and trained in a joint or independent way. It provides a classification map for each period data and the change map with change directions can be obtained by comparing classification maps. Nevertheless, the accuracy of the change detection results of these methods depends on the performance of the classifier.
A number of studies [43,44,52,56,61,70,75,76,131,175,176] have proven that AI techniques increase the accuracy of land-cover classification to a notable level and the results can be further used for change detection. By converting the direct geometric or spectral comparison to label changes, the post-classification structure can be regarded as a very general and practical structure, and it provides a type change matrix. Advantageously, it works robustly for data acquired under different acquisition conditions (illumination condition, sensor attitude, season, etc.) or even different sensors [45,50,62,130]. Supervised training of the AI-based classifier requires a large number of training samples, which can be generated by existing GIS data representing land cover [128] or thematic maps [177].

Multi-Model Integrated Structure
Many works have integrated multiple AI models to improve the performance of change detection methods. Considering the large number and complex structure, only a representative structure is summarized, as shown in Figure 7. The multi-model integrated framework is a hybrid structure, which is similar to the double stream structure, but it contains more types of AI models and can also be trained in multiple stages. Change detection is a spatiotemporal analysis and can be achieved by acquiring the spatial-spectral features through an AI-based feature extractor as a spectral-spatial module, and then modeling temporal dependency through an AI-based classifier as a temporal module [14,38,53,74]. Moreover, this hybrid structure is skillfully used for unsupervised change detection [100] and object-level change detection [87]. This makes the whole change detection process more complicated while improving performance. The multi-model integrated framework is a hybrid structure, which is similar to the double stream structure, but it contains more types of AI models and can also be trained in multiple stages. Change detection is a spatiotemporal analysis and can be achieved by acquiring the spatial-spectral features through an AI-based feature extractor as a spectral-spatial module, and then modeling temporal dependency through an AI-based classifier as a temporal module [14,38,53,74]. Moreover, this hybrid structure is skillfully used for unsupervised change detection [100] and object-level change detection [87]. This makes the whole change detection process more complicated while improving performance.

Unsupervised Schemes in Change Detection Frameworks
AI-based change detection frameworks usually include feature extractors or classifiers, which require supervised and unsupervised training. Since obtaining a large number of labeled samples for supervised training is usually time-consuming and labor-intensive, many efforts have been made to achieve AI-based change detection in an unsupervised or semi-supervised manner. As introduced in Section 4.2.2, transfer learning can reduce or even eliminate the need for training samples, but these are not pure unsupervised schemes, as samples from other domains are required. In addition, the most commonly used unsupervised scheme is to use the change analysis method and sample selection strategy to select absolute changed or/and unchanged as training samples for AI models. Its flow chart is shown in Figure 8a.
AI-based change detection frameworks usually include feature extractors or classifiers, which require supervised and unsupervised training. Since obtaining a large number of labeled samples for supervised training is usually time-consuming and labor-intensive, many efforts have been made to achieve AI-based change detection in an unsupervised or semi-supervised manner. As introduced in Section 4.2.2, transfer learning can reduce or even eliminate the need for training samples, but these are not pure unsupervised schemes, as samples from other domains are required. In addition, the most commonly used unsupervised scheme is to use the change analysis method and sample selection strategy to select absolute changed or/and unchanged as training samples for AI models. Its flow chart is shown in Figure 8a. It can be seen that there are two change detection stages in this scheme. The first stage, i.e., preclassification, is usually simple but worth studying, and most of them are unsupervised methods, which can be implemented with difference analysis and clustering [101], such as K-means [162], fuzzy c-means (FCM) [90,99,100,111,151,160,165,[178][179][180][181], spatial FCM [102,154], or hierarchical FCM [21,113]. This stage in some works are implemented by threshold analysis [18,39], saliency analysis [78], or well-designed rules [38,83,84,124,148,182,183]. After obtaining high-confidence changed or/and unchanged samples, the AI model can be trained in a supervised manner for change detection in the second stage. Moreover, another commonly used unsupervised scheme is based on the latent change map, as shown in Figure 8b. In addition to the pre-trained model obtained by transfer learning, it can be generated by an unsupervised AI model (e.g., AEs), and the final change map is then generated by using a clustering algorithm [23,79,98,107,157,163,184].
Although unsupervised change detection does not require labeled training samples, sometimes the lack of prior knowledge makes it unsuitable for change detection involving semantic information. Weakly and semi-supervised schemes use inaccurate or insufficient labeled samples as a priori knowledge to solve this problem, which can be implemented with label aggregation [97], iterative learning [58,185], deep generative models [186] (see Section 5.6 for a more detailed review), sample generation strategies [156], or novel cost functions [36,187].

Mainstream Networks in AI
Section 4 summarizes general AI-based change detection frameworks, but the detailed introduction of their feature extractors and classifiers are not provided. In this section, the various network structures in AI used for change detection are specifically analyzed. At present, they mainly include autoencoders (AEs), deep belief networks (DBNs), CNNs, recurrent neural networks (RNNs), pulse couple neural networks (PCNNs), and generative adversarial networks (GANs), as can be seen in Figure 9. In addition, other networks or methods in AI used for change detection have also been summarized briefly. It can be seen that there are two change detection stages in this scheme. The first stage, i.e., preclassification, is usually simple but worth studying, and most of them are unsupervised methods, which can be implemented with difference analysis and clustering [101], such as K-means [162], fuzzy c-means (FCM) [90,99,100,111,151,160,165,[178][179][180][181], spatial FCM [102,154], or hierarchical FCM [21,113]. This stage in some works are implemented by threshold analysis [18,39], saliency analysis [78], or well-designed rules [38,83,84,124,148,182,183]. After obtaining high-confidence changed or/and unchanged samples, the AI model can be trained in a supervised manner for change detection in the second stage. Moreover, another commonly used unsupervised scheme is based on the latent change map, as shown in Figure 8b. In addition to the pre-trained model obtained by transfer learning, it can be generated by an unsupervised AI model (e.g., AEs), and the final change map is then generated by using a clustering algorithm [23,79,98,107,157,163,184].
Although unsupervised change detection does not require labeled training samples, sometimes the lack of prior knowledge makes it unsuitable for change detection involving semantic information. Weakly and semi-supervised schemes use inaccurate or insufficient labeled samples as a priori knowledge to solve this problem, which can be implemented with label aggregation [97], iterative learning [58,185], deep generative models [186] (see Section 5.6 for a more detailed review), sample generation strategies [156], or novel cost functions [36,187].

Mainstream Networks in AI
Section 4 summarizes general AI-based change detection frameworks, but the detailed introduction of their feature extractors and classifiers are not provided. In this section, the various network structures in AI used for change detection are specifically analyzed. At present, they mainly include autoencoders (AEs), deep belief networks (DBNs), CNNs, recurrent neural networks (RNNs), pulse couple neural networks (PCNNs), and generative adversarial networks (GANs), as can be seen in Figure 9. In addition, other networks or methods in AI used for change detection have also been summarized briefly.

Autoencoder
The basic structure of the AE is shown in Figure 9a. It mainly includes two parts: an encoder and a decoder. The encoder encodes the input vector x to get latent features h(x), which can be formulated as: h(x) = f(Wx + b). The decoder, which can be formulated as: x = f(W h(x) + b ), reconstructs the learned latent features to output a vector x that should be as close as possible to the original x. W and b are trainable parameters and can be obtained through unsupervised schemes. Intuitively, an AE can be used for feature dimensionality reduction, similar to PCA, but its performance is better due to the strong feature learning capabilities of the neural network. Thus, it is widely used in change detection tasks as the feature extractor. The commonly used AE models are stacked AEs [97,98,104], stacked denoising AEs [16,101,106,[121][122][123]151,160,188], stacked fisher AEs [189], sparse AEs [80], denoising AEs [102], fuzzy AEs [105], and contractive AEs [99,103]. These AEs preserve spatial information by expanding pixel neighborhoods into vectors, while convolutional AEs are implemented directly through convolution kernels [170,190]. According to its characteristics, AEs can be used to implement change detection in an unsupervised manner and perform well.

Autoencoder
The basic structure of the AE is shown in Figure 9a. It mainly includes two parts: an encoder and a decoder. The encoder encodes the input vector to get latent features ( ) , which can be formulated as: ( ) = ( + ). The decoder, which can be formulated as: ̃= ( ′ ( ) + ′ ), reconstructs the learned latent features to output a vector ̃ that should be as close as possible to the original . and are trainable parameters and can be obtained through unsupervised schemes. Intuitively, an AE can be used for feature dimensionality reduction, similar to PCA, but its performance is better due to the strong feature learning capabilities of the neural network. Thus, it is widely used in change detection tasks as the feature extractor. The commonly used AE models are stacked AEs [97,98,104], stacked denoising AEs [16,101,106,[121][122][123]151,160,188], stacked fisher AEs [189], sparse AEs [80], denoising AEs [102], fuzzy AEs [105], and contractive AEs [99,103]. These AEs preserve spatial information by expanding pixel neighborhoods into vectors, while convolutional AEs are implemented directly through convolution kernels [170,190]. According to its characteristics, AEs can be used to implement change detection in an unsupervised manner and perform well.

Deep Belief Network
A DBN is a generative graphical model and learns to probabilistically reconstruct its inputs. It

Deep Belief Network
A DBN is a generative graphical model and learns to probabilistically reconstruct its inputs. It can be formed by stacking multiple simple and unsupervised networks such as restricted Boltzmann machines (RBMs) or AEs. As shown in Figure 9b, a DBN consists of multiple layers of hidden units, with connections between the layers. However, its units within the same layer are not connected to each other and each hidden layer serves as the visible layer for the next. As a feature extractor, it can be trained greedily, i.e., one layer at a time, and appears in many unsupervised change detection methods [23,37,157,183]. On the other hand, the deep Boltzmann machine (DBM), as a graph similar to DBN but undirected, can also achieve such a function [182].

Convolutional Neural Network
Deep features extracted by deep neural networks can be divided into two categories according to whether spatial relationships are considered. One type uses one-dimensional data as input (e.g., DBNs and AEs). Their input is a column vector converted from the input image patch that loses the intact spatial information. The other uses two-dimensional data as input, considering the spatial relationship and its typical representative is CNN. As shown in Figure 9c, it detects features hierarchically through local connections and shared weight and is closer to the human visual perception mechanism. For those unfamiliar with CNNs, an excellent book by Goodfellow et al. can be found in [191].
Due to its strong ability to automatically learn deep features, the CNN has achieved a satisfactory performance in various image-processing tasks. Many classic CNNs and their improvements are used as classifiers or feature extractors for change detection, such as VGGNet [78,86,119,140,164,168], CaffeNet [174], SegNet [192], UNet [169,193], InceptionNet [67], and ResNet [194,195]. Nevertheless, most of the network structures in the literature are newly designed, and a CNN, generally, consists of an input layer and a series of convolutional layers, pooling layers, activation functions, and fully connected layers. However, to achieve special functions, the CNNs integrate some special structures for change detection. For instance, the region CNN (R-CNN), primitively designed for object detection in CV, contains a region-proposals structure to predict the regions of the changed objects [87,196,197]. The PCANet, with its convolution filter banks chosen from PCA filters, is able to reduce the influence of speckle noise and has been used in SAR image change detection [178,180]. More recent work proposes a kernel PCA convolution to extract representative spatial-spectral features from RS images in an unsupervised manner [163].
In conclusion, the use of CNNs enables the change detection method to reach the state of the art, but there is no systematic way yet to design and/or train the network, which is a long-standing problem in the RS community.

Recurrent Neural Network
Obviously, the input for change detection includes two or more periods of data, so the change detection task can be converted into a process of obtaining change information directly from the multi-period sequence data. RNN, as a memory network whose input is sequence data, is very suitable for this situation. As shown in Figure 9d, its core part is a directed graph that can be unfold to a chain, with units (i.e., RNN cells) linked in sequence. An RNN cell has two inputs: one is the current time input x t , which is used to update the state in real time, and the other is the state of the hidden layer h t−1 at the previous time, which is used to remember the state. The network at different times shares the same set of parameters F .
A long short-term memory (LSTM) network, a special RNN that alleviates the problem of gradient disappearance and gradient explosion during long sequence training, has been employed as a temporal module (see Section 4.3) in change detection tasks [14,38,53,64,74]. In [51], the authors used an improved LSTM network to acquire and record the change information of multi-temporal RS data. Advantageously, the trained model could be transferred to other data domains; that is, it has a good generalization ability. Further, in [52], based on the same idea, i.e., knowledge transfer, the authors propose an RNN-based framework to detect annual urban dynamics of four cities. These methods can help to address the problem of temporal spectral variance and insufficient samples in the long-term detection of urban change.

Pulse Couple Neural Network
The PCNN is a bionic neural network inspired by the visual cortex of mammals. Unlike traditional neural networks, it does not require a learning and training stage to extract effective information from very complex backgrounds, which means that it is unsupervised and can easily be used as a feature extractor. As shown in Figure 9e, it mainly consists of three parts: a received field, a modulation field, and a pulse generator. A PCNN receives a two-dimensional input image, and each neuron corresponds to one pixel in the image. The value of each pixel serves as an external stimulus for each neuron, and its connected neighboring neurons provide local stimuli. The external and local stimuli are combined in a modulation field and a pulse generator to produce a pulsed output. As the number of iterations increases, the PCNN generates a pulse sequence that can be used for image segmentation and feature extraction [198] and, similarly, for change detection [54,66,71,85,124,[199][200][201].

Generative Adversarial Networks
GANs are algorithmic architectures that use two neural networks, namely, a generator and a discriminator, to contest with each other in a game to obtain the best generation and discrimination model [202], as shown in Figure 9f. The generator learns to generate plausible data as negative training examples for the discriminator, while the discriminator learns to distinguish these fake data from real data. Therefore, with a small amount of labeled data (i.e., real data), a well-performed discrimination model can be trained for change detection [17]. Different from using random Gaussian noises to generate fake data [83], the authors of [203] used a CNN-based generator to produce fake difference maps based on the joint distribution of two-period data, which can help to weaken the impact of the bad pixels on the network. In [204], the authors designed a W-Net as a generator to decrease the network parameters and make them easy to train, and used many manually annotated samples to improve the reliability of the results. More recent work [127] proposed a conditional GAN to achieve change detection of heterogeneous data, where a translation network and an approximation network were used to transform the heterogeneous data into same feature space, and the change map could then be obtained by direct comparison. Similarly, a coupling translation network was employed in [126]. For mapping landslides, the authors of [93] used a mapping generator to transform the preand post-landslide images into the same domain, and a Siamese network was then employed to detect landslide changes.
To obtain a well-functioning GAN, the methods need a well-designed loss function and a good training strategy; otherwise, the model results may be unsatisfactory due to the freedom of the neural network. In addition, real data are needed to ensure the reliability of the network. These are challenges that exist in many practical applications.

Other Artificial Neural Networks and AI Methods
There are many types of artificial neural networks in AI and the mainstream network structures used for change detection are described above. In addition, other networks, such as Hopfield networks [47,48,65,[205][206][207], back propagation networks [42,149,208,209], multilayer perceptrons (MLPs) [70,[210][211][212][213][214], extreme learning machines [215], and self-organizing map (SOM) networks [55,[216][217][218][219][220][221], do not require a large number of training samples to learn high-level abstract features as deep neural networks do, but due to their shallow network structure, low sample size requirements, and easy training process, they are also widely used in change detection tasks and can achieve satisfactory results. Since they can be regarded as traditional machine learning techniques, we will not make more detailed comments here due to space limitations and existing reviews [7,222,223].
In addition to the neural network in AI, there are other AI techniques used for implementing change detection. Recently, dictionary learning has been employed, and it focuses on learning internal feature representations from datasets [141,176,224,225], just like AEs. The cellular automata (CA), a spatially and temporally discrete model inspired by cellular behavior, can help to model future changes in LULC [226] and predict urban spatial expansion [227]. The development of these AI techniques has significantly promoted research on change detection, which helps to develop more automatic, intelligent and accurate methods to meet the needs of various applications.

Applications
In practical applications, according to the change information, the final change maps can be grouped into four types, namely, binary maps, one-class maps, from-to maps, and instance maps, as can been see in Figure 10.
automatic, intelligent and accurate methods to meet the needs of various applications.

Applications
In practical applications, according to the change information, the final change maps can be grouped into four types, namely, binary maps, one-class maps, from-to maps, and instance maps, as can been see in Figure 10. A binary map uses 1 and 0 to indicate change and no change. It contains any changes and cannot provide an additional type of information for the changed ground object. A one-class map provides single-type change information, which indicates the appearance and disappearance of a specific type of ground object. For example, the result of building change detection is a one-class map indicating the addition and removal of buildings, which can be used for urban management [86]. A from-to map provides change transfer information, indicating that the ground object is changed from one type to another, and these change types are determined by the classifier [128]. A change instance map provides boundaries for each change instance, which can be the result of object-based change detection [89]. These change detection maps can be generated by trained AI models and used in various applications. To demonstrate the future potential application demands and development possibilities of AI-based change detection techniques, the current attempts and work in various application fields are summarized in this section. The development of AI-based change detection techniques has greatly facilitated many applications and has improved their automation and intelligence. Most AI-based change detection generates binary maps, and these studies only focus on the algorithm itself, without a specific application field. Therefore, it can be considered that they are generally suitable for LULC change detection. In this section, we focus on the techniques that are associated with specific applications, and they can be broadly divided into four categories: A binary map uses 1 and 0 to indicate change and no change. It contains any changes and cannot provide an additional type of information for the changed ground object. A one-class map provides single-type change information, which indicates the appearance and disappearance of a specific type of ground object. For example, the result of building change detection is a one-class map indicating the addition and removal of buildings, which can be used for urban management [86]. A from-to map provides change transfer information, indicating that the ground object is changed from one type to another, and these change types are determined by the classifier [128]. A change instance map provides boundaries for each change instance, which can be the result of object-based change detection [89]. These change detection maps can be generated by trained AI models and used in various applications. To demonstrate the future potential application demands and development possibilities of AI-based change detection techniques, the current attempts and work in various application fields are summarized in this section. The development of AI-based change detection techniques has greatly facilitated many applications and has improved their automation and intelligence. Most AI-based change detection generates binary maps, and these studies only focus on the algorithm itself, without a specific application field. Therefore, it can be considered that they are generally suitable for LULC change detection. In this section, we focus on the techniques that are associated with specific applications, and they can be broadly divided into four categories: • Urban contexts: urban expansion, public space management, and building change detection; • Resources and environment: human-driven environmental changes, hydro-environmental changes, sea ice, surface water, and forest monitoring; • Natural disasters: landslide mapping and damage assessment; • Astronomy: planetary surfaces.
We provide an overview of the various change detection techniques in the literature for the different application categories. The works and data types associated with these applications are listed in Table 2. Table 2. Summary of main applications of AI-based change detection techniques.

Applications
Data Types

Urban contexts
Urban expansion a. Satellite images [52,228] b. SAR images [229] Public space management Street view images [117] Building change detection a. Aerial images [86,89,90] b. Satellite images [85,192] c. Satellite/Aerial images [88,94] d. Airborne laser scanning data and aerial images [91] e. SAR images [112] f. Satellite images and GIS map [177] Resources & environment Human-driven environmental changes Satellite images [69] Hydro-environmental changes Satellite images [56] Sea ice SAR images [171] Surface water Satellite images [230,231] Forest monitoring Satellite images [45,63,150,196,232] Natural disasters Landslide mapping a. Aerial images [20,93] b. Satellite images [129,214,233] Damage assessment a. Satellite images (caused by tsunami [190,234], particular incident [156], flood [235], or earthquake [19] [170] Urbanization is a significant factor causing land surface change. As a result of population growth, the expansion of urbanization plays an important role in transforming natural land cover into urban facilities for people. In [52], the authors proposed a new framework based on transfer learning and an RNN for urban area extraction and change detection. Its overall accuracy of single-year urban maps is approximately 96% among the four target cities (Beijing, New York, Melbourne, and Munich). Using a genetic-algorithm-evolved ANN [228] or a CNN [229], urban changes were obtained by taking the difference in the predicted urban distribution maps. For public space management, change detection based on street view images is a good way to identify the encroachment of public spaces. In [117], a CNN-using Siamese structure and transfer learning achieved 98.3% pixel accuracy in the VL-CMU-CD dataset. In addition, many studies focus on building changes. Due to the small scale of buildings, change detection is usually carried out based on high-or very-high-spatial-resolution RS data, such as aerial photos [86,90] and satellite images from Quickbird [192] or Worldview 2 [85]. However, due to differences in the experimental data, it is difficult to evaluate which one has the best performance. Although some methods are based on the WHU building dataset [88], unfortunately, their experimental data are different with regard to area, which makes it difficult to compare them directly. In [89], the authors proposed two CNN models, a mask R-CNN for object-based instance segmentation and a multi-scale CNN for pixel-based semantic segmentation, and their intersection of union (IoU) accuracy of buildings was more than 0.83. In [94], the authors proposed a pyramid Remote Sens. 2020, 12, 1688 20 of 35 feature-based attention-guided Siamese network to detect building changes and the IoU of change map exceeded 0.97. Sometimes, the problem of constant cloud coverage prevents optical RS images from being fully utilized, and SAR data represent a good alternative [112]. On the other hand, to obtain more building information to promote change detection, airborne laser scanning data [91] and building thematic data [177] can provide 3-D information and a priori knowledge of buildings, respectively, and proved to be effective.
Land cover changes typically reflect changes in climatology and hydrology. Thus, AI-based change detection techniques can provide effective methods to monitor changes in resources and the environment. For example, forest monitoring is usually achieved by detecting changes using multi-period Landsat satellite images [45,63,150,196,232]. In addition to the Landsat data, the authors of [56] used an ANN to investigate LULC changes and the effect on outlet runoff by detecting LULC change locations. Surface water is essential for humans; using an ANN [230] or CNN [231], its changes can be detected effectively. Obtaining sea ice changes is crucial for navigation safety and climate research in the polar regions. The authors of [171] employed a transfer learning-based framework and a CNN model to detect sea ice changes from two-period SAR images, and obtained results with kappa coefficient exceeding 94%.
Natural disaster is a powerful agent that changes the appearance of the landscape. Therefore, change detection techniques based on RS data are significant measures to monitor natural disasters. For example, more automatic and accurate landslide mapping can be achieved through CNN-based change detection methods [20,93,129,233]. Damage assessment is an important application field of change detection. After a natural disaster, AI-based change techniques can help to identify damaged areas using pre-event and post-event data. Existing research mainly includes tsunamis [92,119,190,234,236], floods [235], fires [104], and earthquakes [19,110].
Many change detection techniques focus on applications that are closely related to people's daily lives, but in [170], a new AI-based approach is proposed for planetary surface change detection, employing a transfer learning-based framework and convolutional AE models. The experiments showed that the proposed method was superior to a difference image-based method.

Challenges and Opportunities for AI-Based Change Detection
By combining general AI-based change detection frameworks and the network structure summarized in Sections 4 and 5, change detection for various applications can be implemented. The design of the AI model usually needs to consider the multi-period input data type, training set size, and desired change map. Specifically, a mapping transformation-based structure or pseudo-siamese structure should be considered first based on heterogeneous data. To obtain from-to change maps, post-classification structure is the best choice. If training samples are insufficient, a transfer learning-based structure can help alleviate this problem, and the use of AEs and GANs can also reduce the dependence on ground truth. Change detection based on long-term sequence data is usually implemented using the multi-model integrated structure and an RNN model. A CNN has strong feature extraction capability, and is the best choice when there are sufficient training samples.
Various applications of AI-based change detection have demonstrated that AI techniques have achieved great success in the field of change detection in RS community. However, there are many challenges in the processes, and they relate to the following: • With the development of various platforms and sensors, they bring significant challenges such as high-dimensional datasets (the high spatial resolution and hyperspectral features), complex data structures (nonlinear and overlapping distributions), and the nonlinear optimization problem (high computational complexity). The complexity of multi-source data greatly contributes to the difficulty of learning robust and discriminative representations from training data with AI techniques. This can be considered a challenge of heterogeneous big data processing; • The supervised AI methods require massive training samples, which are usually obtained by time-consuming and labor-intensive processes such as human interpretation of RS products and field surveys. It is a big challenge to achieve a robust model of AI-based methods with insufficient training samples. Unsupervised AI techniques need to be developed; • There are various efficient and accurate AI models and frameworks, as we review in Sections 4 and 5. At present, researchers constantly propose novel AI-based change detection approaches endlessly. Still, it is also a great challenge to choose an efficient one and ensure its accuracy for different applications. The reliability of AI needs to be considered in practical applications.
Some researchers have explored solutions to these problems and proposed useful strategies. We will discuss them separately and give our views.

Heterogeneous Big Data Processing
Heterogeneity is one of the main characteristics of big data and heterogeneous data, causing problems in the generation and analytics of change detection results [237]. From the data source perspective, RS technology can provide various data types for change detection, such as SAR, GIS data, high-resolution satellite images, and various time and space measured data. These data with high variability of data types and formats are difficult to use due to missing values, high data redundancy, and untruthfulness. Moreover, the generalization ability of existing AI methods needs to be improved in RS data processing, especially in heterogeneous big data processing [88]. Therefore, in our opinion, the following aspects need further study: • Although some AI-based change detection methods based on heterogeneous data have achieved satisfactory results, as summarized in Section 3.1.4, the types of sensor and data size of these studies are relatively limited. Moreover, they mainly consider change detection between different source data rather than finding the fusion of data in the same period. The full use of multi-source data at the same period (e.g., optical RS images and DEM) and data fusion theories (i.e., mutual compensation of various types of data), combined with AI techniques, would help improve the accuracy of change detection sufficiently; • Since current change detection methods mainly depends on the detection of 2D information, with the development of 3D reconstruction techniques, using 3D data to detect changes in buildings, etc., is also a direction of future development [6]. Among such techniques, 3D reconstruction based on oblique images or laser point cloud data and 3D information integration based on aerial imagery and ground-level street view imagery (i.e., air-ground integration) are the hot topics of research. There are still no effective AI techniques that implement 3D change detection; • The processing of RS big data requires a large amount of computing resources, limiting the implementation of the AI model. For example, the processing of large-format data usually needs to be processed in blocks, which easily leads to edge problems. The large amount of data means that large trainable parameters in the AI model are required, resulting in a difficult training process and consuming a high amount of computing resources. Therefore, it is necessary to balance the amount of data and the number of trainable parameters. They pose challenges to the design of AI-based change detection approaches.
In short, heterogeneous big data should be considered when designing AI models for change detection so that it can be practically used for RS big data processing, which is worth pursuing.

Unsupervised AI
Although domain knowledge can be used to help design representations in traditional machine learning methods, the quest for AI is motivating the design of more powerful unsupervised representation-learning algorithms [238]. This is because unsupervised AI possesses the capacity of learning hierarchy features directly from the data itself, and can be used to make data-driven decisions. The research on unsupervised AI can be considered in the following aspects:

•
Due to the lack of labeled samples to train efficient AI models in the past few years, many researchers have devoted great efforts to these problems and have consistently produced impressive results. New unsupervised AI techniques are constantly emerging, including GAN, transfer learning, and AEs, as summarized in Section 4.4. Although these techniques alleviate the lack of samples to a certain extent, there is still room for improvement; • Change detection is generally regarded as a low-likelihood problem (i.e., the unchanged in the change map is much larger than the changed), with the uncertainty of the change location and direction. The current unsupervised AI techniques do not easily solve this problem due to the lack of prior knowledge. Excluding supervised AI, weakly-and semi-supervised AI techniques are feasible solutions, but further research is needed to improve performance. Nevertheless, pure unsupervised AI technique for change detection should be the ultimate goal; • One of the reasons for studying unsupervised AI techniques is the lack of training samples, i.e., prior knowledge. Considering that the Internet has entered the Web 2.0 era (emphasizing user-generated content, ease of use, participatory culture and interoperability for end users), using crowd-source data as a priori knowledge is a good alternative solution. For example, OpenStreetMap [32], a free wiki world map, can provide a large amount of annotation data labeled by volunteers for the training of AI models. Although the label precision of some crowd-sourced data is not high, the AI model can also be trained in a weakly supervised manner to achieve change detection.
On the other hand, given the current trends in CV, unsupervised AI techniques will remain a hot research field and even more popular in change detection as well.

Reliability of AI
Although many change detection frameworks using AI present the model structure, their trainable parameters are opaque, like black boxes, which make it difficult to determine why they do what they do or how they work [239]. The reliability of AI aims to develop techniques to help improve the reliability and interpretability of the change detection methods. Therefore, it is necessary to develop robust AI and interpretable AI for change detection. The relevant theoretical literature can be found in [240,241]. We only discuss strategies that can be used to improve the reliability of change detection results from the following aspects: • Strategy 1: Reduce errors caused by data sources, such as using preprocessing (e.g., spectral and radiometric correction) to reduce the uncertainty of data caused by geometric errors and spectral differences, or fusing multiple data to improve the reliability of the original data, thereby improving the reliability of the change detection results. To date, there have been some studies considering the impact of registration [242] and algorithm fusion [243]; • Strategy 2: Improve the interpretability of AI models through a sub-modular model structure, which can help to understand the operation principle of the entire AI model by understanding the function of each sub-module. For example, the region-proposals component in R-CNN can be clearly understood as a generator to predict the regions of the objects; • Strategy 3: Improve the robustness of AI models by integration of multiple algorithms and results. Ensemble learning is a good solution [15,244], which can improve the accuracy of the final result by using the results of multiple models; • Strategy 4: Improve the practicality of the AI model results by integrating post-processing algorithms, such as the Markov random field [245], the conditional random field [246], and level set evolution [247], which can help remove noise points and provide accurate boundaries. This is critical for some cartographic applications; • Strategy 5: Improve the fineness of change maps through refined detection units. According to the detection unit of change detection, it can be divided into scene level, patch or super-pixel level, pixel level and sub-pixel level from coarse to fine. From the aspect of reliability, sub-pixel level is the best choice because it alleviates the problem of mixed pixels in RS images. However, this easily leads to high computational complexity. Therefore, using different detection units according to different land cover types is the best solution, which requires a well-designed AI model; • Strategy 6: Improve the representation of change maps by detecting changes in each instance. As we introduced in Section 6, the change maps can be grouped into binary maps, one-class maps, from-to maps, and instance maps. The instance change map is more practical but still lacks research. It can provide change information for each instance, and is more reflective of real-world changes. Moreover, it can avoid the limitation of the binary map without semantic information and the restriction of the from-to map by the classification system, thereby improving the reliability of the final result.
When using AI techniques for change detection, factors that affect the reliability of data preprocessing, model training, change feature extraction, and accuracy assessment should be considered. This aims toward the most reasonable AI framework to improve the reliability of change detection results.
In this section, a summary of the challenges and opportunities for AI-based change detection techniques has been delineated and we have put forward our prospects. The development of AI-based change detection techniques depends on the future endeavor on overcoming these challenges; the efforts and innovations of researchers would push forward further successes of the techniques.

Conclusions
This review presents the latest methods, applications, and challenges of the AI-based change detection techniques. For beginners, the implementation process of AI-based change detection is introduced. Considering that the validity of training data is one of the major challenges, the commonly used data sources and existing datasets used for change detection were fully surveyed. Although the current public datasets have increased significantly, openly labeled datasets for change detection are still scarce and deficient, which requires the joint efforts of the RS community. The systematic analysis of the general network frameworks and commonly used networks in AI adopted for change detection shows that great progress has been made in the combination of AI for change detection, but there are still many challenges in change detection with heterogeneous big data processing, unsupervised AI, and the reliability of AI. This means that further research needs to be pushed forward. This review offers a clearer organization and will help researchers understand this field.
Author Contributions: All authors contributed in a substantial way to the manuscript. W.S. and M.Z. conceived the review. M.Z. wrote the manuscript. M.Z. and R.Z. contributed to the discussion of the review. M.Z., R.Z., S.C. and Z.Z. made contribution to the review of related literature. All authors discussed the basic structure of the manuscript. W.S. designed the overall structure of the review, reviewed the manuscript and supervised the study for all the stages. All authors have read and agreed to the published version of the manuscript.