Fusion of SAR and Multispectral Images Using Random Forest Regression for Change Detection

In order to overcome the insufficiency of single remote sensing data in change detection, synthetic aperture radar (SAR) and optical image data can be used together for supplementation. However, conventional image fusion methods fail to address the differences in imaging mechanisms and cannot overcome some practical limitations such as usage in change detection or temporal requirement of the optical image. This study proposes a new method to fuse SAR and optical images, which is expected to be visually helpful and minimize the differences between two imaging mechanisms. The algorithm performs the fusion by establishing relationships between SAR and multispectral (MS) images by using a random forest (RF) regression, which creates a fused SAR image containing the surface roughness characteristics of the SAR image and the spectral characteristics of the MS image. The fused SAR image is evaluated by comparing it to those obtained using conventional image fusion methods and the proposed method shows that the spectral qualities and spatial qualities are improved significantly. Furthermore, for verification, other ensemble approaches such as stochastic gradient boosting regression and adaptive boosting regression are compared and overall it is confirmed that the performance of RF regression is superior. Then, change detection between the fused SAR and MS images is performed and compared with the results of change detection between MS images and between SAR images and the result using fused SAR images is similar to the result with MS images and is improved when compared to the result between SAR images. Lastly, the proposed method is confirmed to be applicable to change detection.


Introduction
Change detection is the process of analyzing images of an area acquired at different times to monitor changes that may have occurred naturally or created by human activities [1].Specifically, when conducting change detection, the images must be both temporally close to one another and rich in detail.However, such conditions are rarely satisfied when using only synthetic aperture radar (SAR) or only optical data [2].Therefore, to overcome the insufficiency of using a single image source during the change detection, SAR and optical images can be used together to provide complementary information and contribute to better performance in change detection.Recently, satellites equipped with observation sensors offering various resolutions and levels of performance have been launched and are currently in operation, which offers an opportunity to complement each of the sensors to overcome the limitations in change detection [3].
Optical images are known to contain information regarding reflective and emissive characteristics and also present rich details, which makes optical imagery relatively easier to interpret.As such, it has become the main data source for change detection through remote sensing [3,4].However, optical images are affected by the presence of clouds or atmospheric conditions at the time of capturing the images, which results in the difficulty to meet the temporal requirements of the images to be used for change detection [5].In contrast, radar sensors provide their own source of illumination with longer wavelengths that can penetrate through atmospheric conditions.Thus, SAR images are less influenced by weather conditions and are not affected by day and night [6].Moreover, they are sensitive to the backscatter of terrain and object characteristics and have the feature of coherent imaging capability (both amplitude and phase signals) [7].However, due to limited band numbers as well as the effects caused by speckle noise, slant-range imaging, foreshortening, layover, and shadows, SAR images are not intuitive for understanding and is relatively difficult to interpret when compared to optical images [4].
Accordingly, to use the complementary information from optical and SAR images, conventional image fusion techniques are utilized [8].Image fusion is a technique that is used to combine the spatial information of a high resolution panchromatic (PAN) image with the spectral information of a lower multispectral (MS) image for the same area to get more information, which is not achieved by using each image alone [9].Especially, in the case of fusion between SAR and MS images, it is assumed that the SAR image is the PAN and performs fusion with the MS image, which is a good alternative to enhance the presentation and demonstration of SAR images by integrating the color information of MS images [10,11].However, due to significant differences between the imaging mechanisms of SAR and optical images, when SAR and MS images are fused by using conventional image fusion methods, the gray value differences between the intensity image and the SAR image become obvious [11,12].After the image fusion, this difference causes substantial color distortion in the fused images [11].In other words, it is difficult for the fusion to use the information available in MS and SAR images when dealing directly with change detection due to the characteristics of the two different physical imaging mechanisms [2].Moreover, these methods work with images that are acquired at the same time, which does not overcome the limitations of the MS images [13].
To overcome such limitations that do not meet the temporal requirements of MS images and cannot be not utilized for change detection, this study attempts to conduct direct fusion by establishing relationships between SAR and MS images, which is expected to be visually helpful and to minimize the differences in the imaging mechanism in terms of the work process.In other words, the purpose of this study is to establish relationships between the SAR and MS images to increase the interpretability of the SAR image.Since fusion is based on establishing relationships between the two images, which is different from the conventional image fusion, it is possible to utilize an image in which there is a time difference.The relationships between the SAR image and the corresponding pixel values of the MS image are established by using a random forest (RF) regression.RF regression is a data-mining technique that offers some advantages over most statistical modeling methods including the ability to model nonlinear relationships, resistance to overfitting, and relative robustness to the presence of noise in the data [14,15].Then, the proposed method is evaluated by comparing the results to conventional image fusion methods.Furthermore, for verification, other ensemble approaches such as stochastic gradient boosting (SGB) regression and adaptive boosting (AdaBoost) regression are compared.Lastly, the change detection between the fused SAR image and MS image is performed and compared with the results of change detection between MS images and between SAR images, which determines the applicability of this method in change detection when it is difficult to obtain MS images to meet temporal requirements.

Dataset
The experimental areas are located in Seoul and the suburbs of Seoul, which include Seongnam and Hanam in the central-western part of South Korea.The cities have grown steadily, which have led to significant changes of the land uses.The major events occurred in the area were constructions, deforestation, and smoothing of land that causes the changes on a barren area to built-up area and forest area to barren area or built-up area or built-up area to barren area.For the datasets used in the experiments, the SAR image is selected as Korea Multi-Purpose Satellite-5 (KOMPSAT-5) and the MS image used for reference is selected as Landsat-8 Operational Land Imager (OLI).The KOMPSAT-5 consists of X-band (9.66 GHz), which provides three kinds of the imaging mode: standard mode (ST-mode, 3 m, and 30 km swath), high-resolution mode (HR-mode, 1 m, and 5 km swath), and wide-swath mode (WS-mode, 20 m, and 100 km swath) with an incidence angle ranging from 20 • to 55 • [16].The basic products are subdivided into four types: Level 1A (L1A), Level 1B (L1B), Level 1C (L1C), and Level 1D (L1D).A selective polarization channel can be provided among HH, HV, VH, and VV.In this study, KOMPSAT-5 is obtained in ST-mode with the parameters of spatial resolution 3 m, descending orbit, and HH polarization.The processing level is L1A, which is a single-look complex slant (SCS) product focused in the slant range-azimuth projection.Pre-processing was then performed on KOMPSAT-5, which consists of multi-looking, speckle filtering, and terrain correction.First, to reduce the speckle noise, multi-looking was applied with three range looks and six azimuth looks.In addition, speckle filtering was performed in which a gamma map filter of 3 × 3 kernels was chosen to be the most efficient since it reduces speckle while preserving object edges [17].Lastly, terrain correction using Shuttle Radar Topography Mission (SRTM) 3 s and bilinear interpolation was applied, which results in an image with a nominal pixel size of 15 × 15 m and projection of the coordinate system to World Geodetic System (WGS) 84 Universal Transverse Mercator Coordinate (UTM) 52S.For the MS in Landsat-8 OLI, the spatial resolution is 30 m and consists of seven bands (Band 1: 0.433-0.453µm, Band 2: 0.450-0.515µm, Band 3: 0.525-0.600µm, Band 4: 0.630-0.680µm, Band 5: 0.845-0.885µm, Band 6: 1.560-1.660µm, and Band 7: 2.100-2.300µm).Furthermore, the Landsat-8 OLI used in this study is based on Level -1 data product, which are not corrected for atmospheric conditions and the Dark-Object Subtraction algorithm was performed for atmospheric correction.For the fusion schemes, the Landsat-8 OLI image was resampled at a resolution of 15 m, which is the same resolution as the KOMPSAT-5 image.The date and purpose of the experimental images used in this study are listed in detail in Table 1.RF is an ensemble approach that constructs a number of independent trees as a classifier or regression and obtains results through statistical values.It can model complex relationships in data and model nonlinear relationships between predictive and response variables [18,19].In the case of a classifier, the result is obtained by using the majority of votes determined from the result of each tree while, in the case of regression, the result is obtained by using the average of the trees [20].Regression trees are used as base learners in the RF regression and each regression tree is grown on a separate bootstrap sample derived from the original dataset, which is called bootstrap aggregating (bagging) [21].In other words, the result is obtained through an independent tree and then the results are aggregated, so the stability is high, the influence of noise on the data is not significant, and the error for overfitting can be reduced.Moreover, the performance of the RF is evaluated by using some of the trees that are not selected for training and eliminating the need to segment the training and test data.This is called out-of-bag (OOB) data [22].The mean squared error (MSE) of the OOB data is calculated by the error for the RF predictor of each tree by determining the performance by averaging the values [19].Furthermore, RF provides the relative importance of the variables, which can be used to understand the influence of each variable and the interaction between them [22].In summary, RF can model with high accuracy when the relationship between two pieces of data is complex or nonlinear and, due to the method's high predictive power and stability, it has been suggested for use in the field of remote sensing [14,20].

Other Ensemble Regression
SGB, which is an ensemble method related to both boosting and bagging, builds many small classification or regression trees sequentially from the residuals of the preceding tree [23].At each iteration, a tree is built from a random sub-sample of the dataset (selected without replacement) and the accuracy of the tree is computed.The successive samples are adjusted to accommodate previously computed inaccuracies, which improves the performance of the model.Based on each successive tree, resistance to outliers is increased, error is reduced, and it is not sensitive to incorrect training data [24].AdaBoost is a sequential ensemble method that constructs a succession of weak learners by using bootstrap data [25].The key to AdaBoost is to improve the weak learner by weighting the individuals when drawing the bootstrap sample and works by changing the weights of samples at each iteration.The probabilities kept by the algorithm are modified based on the magnitude of the error in which instances with a large error on the previous learners have a higher probability of being chosen to train the following base learner [26].Then, the median or weighted average is applied to combine the predictions of the base learners.

Fusion Method
A new fusion method is proposed to set the fusion problem of single SAR and MS images for each scene.This method extracts the intensity and texture features from the SAR image for establishing relationships while capturing the intensity of the corresponding MS image as the reference values.The RF regression is adopted to establish the relationships between the intensity and texture features of the SAR image and the pixel values of the MS image.The basic structure of the proposed fusion method is represented in Figure 1.The whole fusion process can be divided into five steps: (1) MS image processing, (2) SAR image processing, (3) selection of the training pixels, (4) RF regression modeling, and (5) fusion of the SAR and MS images.
The first step is to process the MS image used as the reference for fusion, which is called the reference MS image.First, only the red (R), green (G), and blue (B) bands representing the visible bands are extracted from the reference MS image.Afterward, to reduce the complexity of the algorithm and enforce a higher prediction, the classification is performed on the reference MS image, which allows the RF regression to train independently for each acquired class.The classification is carried out by using K-means, which is an unsupervised classification algorithm.K-means is a centroid-based method that utilizes the cluster centers to construct the model for data classifying.To minimize the sum of the distance between points to the centroid vectors, an iterative algorithm is used, which modifies the model until the desired result is achieved [27].In other words, the parameters used in this method are the number of classes, K values, and the number of iterations.The optimal class number is selected through performance by considering training times and the minimum number of classes is set based on knowledge of the land cover distribution characteristics, which is selected as six classes [28].Iterations are set at 20 since the value converges when the iterations exceed 20 [29].In the second step, the intensity and texture features to be used for establishing relationships are extracted from the SAR image.Using intensity only is not informative for prediction.Therefore, additional texture information is necessary.In order to extract the maximum information for prediction, the features that describe the textural information of the local neighborhood of pixels are considered.This study utilizes a gray-level co-occurrence matrix (GLCM) that represents the spatial characteristics of the pixels by using statistical values between a given pixel and the neighboring pixels [30].Based on the co-occurrence matrix, the GLCM descriptors are computed: contrast, dissimilarity, homogeneity, correlation, angular second moment (ASM), and energy.Contrast reflects the depth and smoothness of the image texture structure [31].Dissimilarity measure is similar to the contrast measure.However, whereas contrast weights increase exponentially (0, 1, 4, 9, etc.) as one moves away from the diagonal, dissimilarity weights increase linearly (0, 1, 2, 3, etc.) [32].Homogeneity measures the smoothness of the image texture and large changes in the spectral values will result in very small homogeneity while small changes will result in larger homogeneity [33].Correlation reflects the similarity of the image texture in the horizontal or vertical direction.ASM reflects the regularity and uniformity of the image distribution while energy is the square root of ASM [31,34].Additionally, in order to calculate the GLCM values, the window size should be set and, in this study, the window size was selected as 5 × 5, which better reflects the coarse and fine textures [31].Furthermore, the mean and standard deviation of intensity within a 5 × 5 neighborhood are included as supplementary features.At this time, in order to measure all texture information clearly, it is extracted before performing the terrain correction and then the image is co-registered.
In the third step, meaningful training pixels are selected for use in establishing relationships.Training pixels are selected as invariant pixels, which are regions where spectral reflectance varies little with changes in time.In other words, relationships are learned through features corresponding to invariant pixels from which those of the changed regions can be predicted [35].In this study, the invariant pixels are acquired through image differencing, which is a method that subtracts pixel values between the SAR image and the reference MS image.The unchanged regions are selected as invariant pixels.At this time, due to the difference in dimension between SAR and reference MS images, image differencing is performed after matching their dimensions.The reference MS image is matched with the dimension of the SAR image by using the luminosity method and the equation is shown below [36].
The fourth step is to model the RF regression, which is a major step in the proposed method.As mentioned above, the RF regression is modeled for each class to reduce the complexity of the algorithm and to retrieve more information from the reference MS image.For each class, the RF regression is learned by using the features of the SAR image and the pixel values of the reference MS image's R, G, and B bands corresponding to the location of the acquired invariant pixels.Moreover, the numbers of trees to model the RF regression are selected through trial and error by considering the performance and training time and 32 trees were selected [26,35].
Lastly, the fifth step is to fuse the SAR and MS images.The features corresponding to the entire pixel locations of the SAR image are acquired and the R, G, and B bands are predicted by applying the RF regression obtained in the previous step for each class.

Performance Evaluation
The criteria to evaluate the fusion method can be divided into two categories [12].First, from a visual perspective, our proposed method can evaluate the quality of the fusion result intuitively.Second, statistical evaluation methods are used to measure the spectral and/or spatial characteristics of the fusion.These methods have to be objective, reproducible, and quantitative.In this study, the following indexes are used to assess spectral characteristics: universal image quality index (UIQI) and correlation coefficient (CC).UIQI is calculated by using a combination of luminance distortion, contrast distortion, and loss of correlation between the fused SAR image and the criterion MS image, which have similar time periods as the SAR image and similar seasonal features as the reference MS image.UIQI is expressed by Equations ( 2) to (5) [37].
where µ x , µ y refer to the means of the respective images, σ x , σ y refer to the standard deviations of the respective images, and σ xy refers to the covariance of the two images.CC indicates the correlation between two images and is expressed by Equation ( 6) [38].
UIQI and CC have a range of [−1, 1] where a value close to 1 represents better image quality [37,38].
Then the entropy and cumulative probability of blur detection (CPBD) values are obtained to evaluate the spatial quality.Entropy can show the average information included in the image and reflect detailed information of the fused image [39].Generally, the greater the entropy of the fused image, the more abundant the information included in the image and the greater the quality of the fusion.The entropy of the image is expressed by Equation (7).
where P i is the probability of i in the image.CPBD is based on the sensitivity of human blur perception at different contrasts.Utilizing this framework, the probability of detecting a blur at each edge in an image is estimated, and then pooled over the entire image to obtain a final quality score [40].For a given contrast C, the probability of the detecting blur can be calculated by Equation ( 8).
where w(e i ) is the measured width of edge e i , w JNB (e i ) is the noticeable blur (JNB) width, which depends on the local contrast C, and β is a parameter whose value is acquired by least squares fitting.
Then CPBD is calculated by using Equation (9).
where P(P BLUR ) denotes the value of the probability distribution function at a given P BLUR .CPBD has a range of [0, 1] where a value close to 1 represents better image quality.

Change Detection
To identify the applicability of change detection for the fused SAR image, change detection is performed with the reference MS image, and the result is compared between the reference MS image and criterion MS image.In addition, the results of the change detection between the SAR image before performing fusion and the SAR image with similar time periods as the reference MS image are compared.At this time, before the change detection, histogram matching is performed, which is commonly used as radiometric normalization [41].
The change detection method used in this study comprises two steps: (1) pixel-based detection and (2) object-based recognition.The first step aims to distinguish between changed and unchanged pixels.Image differencing is used, which is a method to subtract the pixel values from two images with a common coordinate system to observe changes between two points in time.Afterward, the threshold between the changed and unchanged pixels is determined and Otsu's method, which is the most practical method to determine the threshold value based on the brightness distribution of the input image, is used [42].
However, due to the characteristics of pixel-based change detection, this process will result in fragmentation and incomplete expression of the change [43].Therefore, the second step forms changed pixels as changed objects by a set of morphological operations [44].This step consists of three sub-steps: (1) morphological closing, (2) hole and gap filling, and (3) morphological opening.
Morphological closing is a dilation operation followed by an erosion operation [45].The aim is to fill the holes in the change regions.Then hole and gap filling is additionally used to fill the portions that are not filled by the closing operation, which makes the changed information become more complete [44].Lastly, the morphological opening operation is applied for which erosion is conducted on the image and followed by a dilation operation [45].The aim of this operation is to break narrow connections.At this time, the key parameter of the closing and opening operations is the square structuring element, which is set to 3 × 3 and 5 × 5, respectively [44].
Afterward, to evaluate the change detection results, ground-truth data for the areas with actual change are distinguished from those with no change via manual digitizing.Manual digitizing generates ground-truth data by directly interpreting two images through a combination of spatial and spectral properties.In this study, ground-truth data is obtained by using the reference MS image and the criterion MS image.Based on ground-truth data, precision, recall, and the F-measure are obtained.Precision represents the ratio of actually changed and unchanged regions among the regions detected as changed and unchanged and recall represents the ratio of changed and unchanged regions among the regions detected as actually changed and unchanged.The F-measure is an index of accuracy that incorporates precision and recall and is obtained through a harmonic mean of precision and recall [46].Precision, recall, and the F-measure are shown in Equations ( 10) to (12).
where TP, FP, and FN represent true positive, false positive, and false negative, respectively.TP refers to cases where the outcome from a prediction is p and the actual value is also p. FP refers to cases where the outcome from a prediction is p and the actual value is n and FN refers to cases where the predicted outcome is n and the actual value is p.

Comparison of Fusion Results
This section presents the fusion results for the proposed algorithm and compares them to those of conventional image fusion algorithms.First, our proposed fusion process is performed step-by-step by using experimental images, which are shown in Figures 2 and 3. To achieve reasonable computation time, only a portion of the image was extracted.Site 1 was selected as 700 × 700 pixels and Site 2 was selected as 600 × 600 pixels.As mentioned above, only the RGB bands of the reference Landsat-8 OLI image are extracted and classification is then performed.In this experiment, the number of classes is selected as a minimum of six.For the KOMPSAT-5 image, features other than intensity and intensity are extracted.Afterward, co-registration is performed, invariant pixels are extracted to be used as training data, and RF regression is obtained for each class to obtain a fused KOMPSAT-5 image.Then our proposed method is compared with conventional image fusion methods.To ensure a fair comparison, pixel-based image fusion algorithms are compared and the standard techniques, modified intensity-hue-saturation (IHS), principal component analysis (PCA), Gram-Schmidt (GS), and Ehlers fusion are selected.A modified IHS method proposed in Reference [47] is a vast improvement over traditional IHS, which converts a color image from RGB to IHS color space and intensity band is replaced by the PAN image.The technique works by assessing the spectral overlap between each MS and PAN images and weighting the merge based on these relative wavelengths [48].The PCA method transforms a multivariate dataset of correlated variables into a dataset of new uncorrelated linear combinations of the original variables [49].It is assumed that the first PC band contains the most amount of information of the original image and replace it with the PAN image.Then an inverse PCA transform is performed to obtain the fused image.The GS method simulates a PAN image from the MS image, which is achieved by averaging the MS image.As the next step, GS transformation is performed for the simulated PAN image and the MS image with the simulated PAN image is employed as the first band.Then, the original PAN image replaces the first GS band.Lastly, an inverse GS transform is applied to create the fused image [9].The Ehlers method is based on the IHS transformation coupled with Fourier domain filtering [38].In other words, the first three bands of MS image are transformed to an IHS image and then a two dimensional Fast Fourier transform (FFT) is used to transform the intensity component and the PAN image into the frequency domain.The intensity spectrum is filtered with the low passage filter while the spectrum of the PAN image is filtered with an inverse high pass filter.After filtering, an inverse FFT is performed and added together to form a fused intensity component with the low-frequency information from the MS image and the high-frequency information from the PAN image.As the last step, an inverse IHS transformation produces a fused image.At this time, since the conventional image fusion methods work by using similar-time images, image fusion between the KOMPSAT-5 image and the Landsat-8 OLI image (called the criterion Landsat-8 OLI image and used for the performance evaluation) acquired on 19 September, 2014 (Figures 4a and 5a) is performed.As mentioned above, the results of the proposed method and image fusion are compared by visual inspection and statistical evaluation and the results of Sites 1 and 2 are shown in Figures 4 and 5, respectively.
Visual inspection shows that the proposed method and the image fusion methods contain more information than the original single image.All of them include not only the characteristics of the original KOMPSAT-5 image but also color information of the Landsat-8 OLI image.However, on all sites, in the case of modified IHS, PCA, and GS, a massive color distortion is shown.The difference between the imaging mechanism of the KOMPSAT-5 and Landsat-8 OLI images is not considered at all.In addition, spectral preservation is not performed well when compared to the criterion Landsat-8 OLI image.With the Ehlers method, spectral preservation is achieved when compared to other image fusions, but blurriness is present, which results in some loss of detail especially on linear features.On the other hand, our proposed method shows that there is less color distortion and the spectral characteristics of the Landsat-8 OLI image are well preserved.It also provides surface roughness characteristics of the KOMPSAT-5 image, which fully combines complementary and supplementary information of the original images.In other words, the visual inspection shows that the proposed method produces better results than the other image fusion methods.Although the visual inspection is easy and direct, it is highly subjective and cannot be used to accurately evaluate the practical effects of the algorithms.Therefore, the performance of each method is further analyzed quantitatively based on UIQI, CC, entropy, and CPBD, which is mentioned above.The calculated values of UIQI, CC, entropy, and CPBD are presented in Tables 2-5.The UIQI in Table 2 indicates that the proposed method is significantly better than the image fusion methods.Compared to the modified IHS, PCA, GS, and Ehlers, the proposed method improves, on average, by 0.4843, 0.3688, 0.2704, and 0.0907 in Site 1 and 0.4194, 0.3964, 0.2882, and 0.1440 in Site 2, respectively.The higher UIQI of the proposed algorithm indicates that it is similar to the criterion Landsat-8 OLI image.Table 3 presents the CC, which is also higher than with image fusion.The improvements are similar to UIQI, which improve, on average, 0.3916, 0.3086, 0.2996, and 0.1524 in Site 1 and 0.4086, 0.3852, 0.2902, and 0.1457 in Site 2, respectively.The higher CC indicates that the proposed method provides better spectral preservation.In other words, spectral qualities are improved significantly when compared with the image fusion methods.In addition, the spatial quality, entropy, and CPBD improved significantly at all sites, which is shown in Tables 4 and 5.This means that the fused image obtained by the proposed method contains more average information and less perceptually blurry distortions.In other words, it can be confirmed that the proposed method from the viewpoint of spectral and spatial quality is remarkably superior to the conventional image fusion methods.
Furthermore, comparing the performance between Site 1 and Site 2, it is confirmed that Site 1 is higher than Site 2 regardless of the spectral and spatial qualities.Based on the reference Landsat-8 OLI, Site 1 is mostly made up of the built-up area and includes more forest area than Site 2. On the other hand, Site 2 contains a lot of water and barren areas and somewhat less build-up and forest areas compared to Site 1. Generally, in the SAR images, build-up and forest areas have relatively high backscattered intensities while the water area has the lowest backscatter intensities and barren area also has low backscattered intensities but slightly higher than the water area due to involvement of soil roughness.In other words, it is confirmed that the backscattered intensities of the SAR image have an effect on the fusion results locally, which represent that build-up and forest areas with relatively high backscattered intensities are able to retrieve more information from the reference MS image than barren and water areas with low backscattered intensities.In addition, a comparison was made with the results of fusion performed through other ensemble approaches, stochastic gradient boosting (SGB) regression, and adaptive boosting (AdaBoost) regression for further verification, which is shown in Figures 4g,h and 5g,h , respectively.For the results of SGB regression compared with the results obtained by RF regression, it is not possible to retrieve the color of the ground well, but the colors of the other areas are well captured.On the other hand, the results of AdaBoost regression can be confirmed to be in somewhat reddish tones highlighted in both sites.In addition, the portions containing the SAR characteristics include noise of a green color.In other words, when visually analyzed, the results of SGB regression are somewhat worse than those obtained by RF regression, but they are fairly similar while the results of AdaBoost regression contain significantly different color information.
Both results are also evaluated statistically and the results are shown in Tables 6-9.The spectral qualities, UIQI, and CC of the results obtained through RF regression are the highest regardless of site and band.In other words, when obtaining the results through RF regression, the most similar spectral information is included and the spectral preservation is performed well.On the other hand, spatial quality, entropy, and CPBD showed somewhat different results depending on the site and the band.In the case of entropy, Site 1 has the highest results obtained through RF regression regardless of the band while Site 2 shows different results by the band.For CPBD, except for the G band of Site 1, it can be confirmed that the results obtained through AdaBoost are the highest.However, although the spatial qualities vary according to site and band, it can be confirmed that the color information of the results obtained by RF regression is abundant and more stable as a whole.

Influence of the Number of Classes
To evaluate the influence of the number of classes, which is a key parameter of the proposed method, the performance evaluation according to the number of classes is performed.As mentioned above, the minimum number of classes is selected as six based on the knowledge of land cover distribution and the maximum number is selected as ten, which is a total of five experiments for each site.The performance of each is evaluated with averages and is considered together with training time, which is shown in Figures 6 and 7.In all sites, it is confirmed that the performance is similar regardless of the number of classes while the training time increases proportionally.In other words, a large number of classes may lead to overtraining for fusion, which needs more complex computation but does not improve the performance significantly.Thus, in this study, six is selected as the optimal number of classes.

Comparison of Change Detection Results
To investigate the applicability of the fused KOMPSAT-5 image in the change detection, change detection with the reference Landsat-8 OLI image is performed and compared with the result between the reference and criterion Landsat-8 OLI and KOMPSAT-5 images.The KOMPSAT-5 images with similar time periods as the reference Landsat-8 OLI are shown in Figure 8    In other words, when using the fused KOMPSAT-5 image, it is judged that there is a limitation to detecting changes in a narrow area while detecting changes in a relatively large area is accurate.
Precision, recall, and the F-measure are then obtained to quantitatively evaluate the results of each change detection procedure based on the ground-truth data (Figures 9a and 10a) and these are shown in Table 10.At this time, precision and recall have an inverse proportion and there are limitations to the evaluation through two accuracy indices.Therefore, the accuracies are assessed using the F-measure combined with precision and recall.In Sites 1 and 2, the F-measure using the fused KOMPSAT-5 image shows 76.97% and 81.14%.The F-measure between Landsat-8 OLI images shows 78.68% and 81.45% and the F-measure between the KOMPSAT-5 images shows 59.29% and 57.02%.In all sites, using the fused KOMPSAT-5 image is similar to the change detection results of Landsat-8 OLI and is significantly improved compared to the change detection results of KOMPSAT-5 images.Lastly, the applicability of the fused KOMPSAT-5 image in the change detection is identified.

Discussion
The fusion method proposed in this paper establishes a relationship between the MS image and the SAR image to perform fusion.The proposed method performs the training separately for each class through RF regression and selects the optimal number of classes through a performance evaluation and training time, according to the number of classes.Based on the knowledge of land cover distribution, the minimum number of classes was selected as six and the maximum number of classes was selected as ten.A total of five experiments were performed.The training time increased proportionally to the number of classes while statistical values were not significantly affected by the number of classes.For this reason, it is judged that the number of classes used for fusion was sufficient for the land cover distribution.
Then we compared our method with the existing image fusion methods, modified IHS, PCA, GS, and Ehlers, which shows superior results in both visual inspection and statistical evaluation.From the visual aspect, this finding suggests that the concepts mentioned above are suited to minimizing the imaging mechanism between MS and SAR images when the SAR image is fused with the MS image.From the quantitative aspect, the UIQI and CC for spectral quality improved by 9.07-48.43%and 15.24-39.16%on average for Site 1 and 14.40-41.94%and 14.57-40.86%on average for Site 2, respectively, while entropy and CPBD for spatial quality improved by 0.07-5.03%and 4.75-8.99% on average, for Site 1, 0.15-1.96%and 4.79-7.89%,on average, for Site 2, respectively.Furthermore, comparisons with the fused SAR image obtained through SGB and AdaBoost regression were performed.Compared with SGB regression, UIQI, CC, entropy, and CPBD improved by 5.41%, 9.81%, 0.21%, and 1.94%, on average, respectively, for Site 1, and for Site 2, UIQI, CC, and CPBD improved by 1.99%, 2.14%, and 1.31%, while entropy decreased by 0.06%.Compared with the AdaBoost regression, UIQI, CC, and entropy improved by 8.62%, 12.31%, and 1.09% on average, respectively, for Site 1, and for Site 2, UIQI and CC improved by 4.27% and 4.30% while entropy decreased by 0.08%.Furthermore, CPBD decreased by 2.09% and 3.90% for both Sites 1 and 2.Even though CPBD indicating spatial quality is slightly lower than the AdaBoost regression, the spectral quality is remarkably higher and it is judged that RF regression is superior in terms of both visual and quantitative considerations.
In the experiments on change detection, using the image of the proposed method also shows satisfactory results.In both Sites 1 and 2, the change detection results are 1.71% and 0.31% lower than the results between MS images but are improved by 17.68% and 24.12% compared with the results between SAR images, which is similar to the change detection results between MS images superior to the results between SAR images.In other words, these results indicate the advantage of the proposed method.
Although the proposed method can realize satisfactory results, there remain several limitations.First, there are so many types of SAR systems in recent years, which leads to the existence of various characteristics of the sensors.However, only KOMPSAT-5 and Landsat-8 OLI were used in this study and did not apply the algorithm to images obtained from other sensors.It is necessary to apply the algorithm to images acquired from other sensors by considering various characteristics.Second, in this study, SAR and MS images of a two-year difference are used.However, further verification is needed to determine the extent of the period in which the fusion can properly be performed.Lastly, although images with difference time periods can be used, the seasonal characteristics between the SAR and MS images should be matched.

Conclusions
This study proposes a new method to fuse SAR and MS images to facilitate the interpretation of SAR images for use in change detection.The new fusion method, unlike conventional image fusion methods, can be applied with MS images of different time periods and it is based on establishing relationships between SAR and MS images.Especially RF regression is used to model the nonlinear relationships between the two with different image mechanisms.In addition, the proposed method fuses SAR and MS images by extracting various features of the SAR image rather than using only intensity.Based on the results of the study, the following conclusions are drawn.First, visual inspection shows that our proposed method has less color distortion and better spectral preservation than the conventional image fusion methods.Furthermore, the quantitative performance of our method shows significant improvements in spectral and spatial qualities.In addition, compared to other ensemble approaches for further verification, SGB regression, and AdaBoost regression, it was confirmed that RF regression is better synthetically.Lastly, when applied to change detection, it is similar to the change detection accuracy between MS images and significantly better than the change detection accuracy between SAR images, which shows the applicability in change detection when it is difficult to acquire MS images by satisfying temporal requirements.
In future studies, various combinations of the parameters should be studied to improve the performance of RF regression and the additional usefulness of the method should be verified by securing enough images for each season and period.Furthermore, additional verification will be performed by applying the method to SAR and MS images acquired from various sensors.In addition, in order to ultimately overcome the limitations of the MS images, the trained model from one area will be extended to other areas with similar land cover classes.In other words, the need for the MS images will be eliminated and can be applied to a recently acquired SAR image alone.Lastly, applying the method to high-resolution images should be considered to investigate the suitability of fusion on images with complex structures.

Figure 6 .
Figure 6.Statistical evaluation according to number of classes for Site 1. UIQI, universal image quality index.CC, correlation coefficient.CPBD, cumulative probability of blur detection.

Figure 7 .
Figure 7. Statistical evaluation according to the number of classes for Site 2. UIQI, universal image quality index.CC, correlation coefficient.CPBD, cumulative probability of blur detection.
. As mentioned above, the change detection of this study consists of two steps, which are pixel-based detection and object-based recognition.The final change detection results are shown in Figures 9b-d and 10b-d , respectively.The black areas indicate the unchanged areas and the white areas indicate the changed areas.In order to observe the difference more clearly, several regions are selected and marked with red rectangles.Three rectangles are selected in Site 1 and two rectangles are selected in Site 2.

Figure 10 .
Figure 10.Final change detection results for Site 1: (a) ground-truth data, (b) result between Landsat-8 OLI images, (c) result between fused KOMPSAT-5 image and reference Landsat-8 OLI image, (d) result between KOMPSAT-5 images.For rectangle 1 of Site 1, the change detection result of Landsat-8 OLI captures the exact change shape and the change detection result using the fused KOMPSAT-5 image shows a somewhat overestimated change shape but extracts fairly accurate changes.On the other hand, for the change detection result of KOMPSAT-5, the change is in the form of salt-and-pepper noise even though object recognition is performed.In rectangle 2, only the result using the fused KOMPAT-5 is correct.The change detection result between Landsat-8 OLI images is underestimated and the change detection result between KOMPSAT-5 images does not capture the change at all.Rectangle 3 is a relatively small change area, which detects only the change detection result of Landsat-8 OLI correctly.In Site 2, rectangle 1 is the region where only the fused KOMPSAT-5 image correctly detects the change, which shows that the change detection results between Landsat-8 OLI and KOMPSAT-5 are considerably underestimated.Rectangle 2, which is a relatively small change area compared with rectangle 1, extracts the correct change region only in the change detection result of Landsat-8 OLI.In other words, when using the fused KOMPSAT-5 image, it is judged that there is a limitation to detecting changes in a narrow area while detecting changes in a relatively large area is accurate.Precision, recall, and the F-measure are then obtained to quantitatively evaluate the results of each change detection procedure based on the ground-truth data (Figures9a and 10a) and these are shown in Table10.At this time, precision and recall have an inverse proportion and there are limitations to the evaluation through two accuracy indices.Therefore, the accuracies are assessed using the F-measure combined with precision and recall.In Sites 1 and 2, the F-measure using the fused KOMPSAT-5 image shows 76.97% and 81.14%.The F-measure between Landsat-8 OLI images shows 78.68% and 81.45% and the F-measure between the KOMPSAT-5 images shows 59.29% and 57.02%.In all sites, using the fused KOMPSAT-5 image is similar to the change detection results of Landsat-8 OLI and is significantly improved compared to the change detection results of KOMPSAT-5 images.Lastly, the applicability of the fused KOMPSAT-5 image in the change detection is identified.

Table 1 .
The date and purpose of the experimental images used in this study.

Table 4 .
Entropy between the fused KOMPSAT-5 images and the criterion Landsat-8 OLI image.

Table 5 .
Cumulative probability of blur detection (CPBD) between the fused KOMPSAT-5 images and the criterion Landsat-8 OLI image.

Table 7 .
Comparison of CC of fused KOMPSAT-5 image through RF, SGB, and AdaBoost.

Table 10 .
Quantitative change detection results based on precision, recall, and the F-measure.