Comparative Analysis of Edge Information and Polarization on SAR-to-Optical Translation Based on Conditional Generative Adversarial Networks

: To accurately describe dynamic vegetation changes, high temporal and spectral resolution data are urgently required. Optical images contain rich spectral information but are limited by poor weather conditions and cloud contamination. Conversely, synthetic-aperture radar (SAR) is effective under all weather conditions but contains insufficient spectral information to recognize certain vegetation changes. Conditional adversarial networks (cGANs) can be adopted to transform SAR images (Sentinel-1) into optical images (Landsat8), which exploits the advantages of both optical and SAR images. As the features of SAR and optical remote sensing data play a decisive role in the translation process, this study explores the quantitative impact of edge information and polarization (VV, VH, VV&VH) on the peak signal-to-noise ratio, structural similarity index measure, correlation coefficient (r), and root mean squared error. The addition of edge information improves the structural similarity between generated and real images. Moreover, using the VH and VV&VH polarization modes as the input provides the cGANs with more effective information and results in better image quality. The optimal polarization mode with the addition of edge information is VV&VH, whereas that without edge information is VV. Near-infrared and short-wave infrared bands in the generated image exhibit higher accuracy (r > 0.8) than visible light bands. The conclusions of this study could serve as an important reference for selecting cGANs input features, and as a potential reference for the applications of cGANs to the SAR-to-optical translation of other multi-source remote sensing data.


Introduction
Following advances in satellite technology in recent years, remote sensing data is now widely used to monitor land-cover changes [1][2][3]. For various land-cover types, vegetation changes are frequent, complex, and closely related to the surrounding environment [4,5]. To better observe and describe vegetation changes, we need a dataset with high temporal resolution and high spectral resolution. Optical images provide rich spectral information; however, the influence of cloud and rainy weather can necessitate months of image processing to generate high-quality images [6][7][8][9]. Conversely, syntheticaperture radar (SAR) is not limited by lighting conditions, climate, or other environmental factors; thus, it can produce images continuously and in all weather conditions, generating time series with high temporal resolution [10,11]. However, an important limitation of SAR images is that the spectral information is insufficient to recognize certain vegetation changes [12,13]. Determining the relationship between optical and SAR images can allow us to use SAR data as the input, in the absence of optical data, to generate images similar to the optical images. The generated and existing optical images can then form a complete dataset containing rich spectral information and high temporal resolution, which can be used for accurate and comprehensive analysis of vegetation coverage and changes. The process of generating optical images with SAR images as input can be called SAR-to-optical image translation [14][15][16]. The exploration of SAR-to-optical image translation is beneficial to image interpretation, spatial information transfer, and cloud removal [17,18]. However, SAR-to-optical image translation is difficult to accomplish using a simple physical model [19,20].
In contrast, deep learning can effectively simulate complicated relationships by performing image-to-image translation tasks [36]. Generative adversarial networks (GANs) [37] have recently been regarded as a breakthrough in deep learning as they consist of two adversarial models, a generative model and a discriminative model, in which the generative model is used to capture the data distribution and the discriminative model is used to estimate the probability that a sample belongs to real data rather than generated samples. GANs generate data in an unsupervised manner but they cannot control the data generation process. In other words, for large images or complex images, simple GANs become very uncontrollable [38,39]. Therefore, conditional GANs (cGANs) were developed to deal with complex images [40], whereby additional information is used to condition the models and direct the data generation process of cGANs. cGANs have attracted considerable interest in the remote sensing community [41], as they allow to generate desired artificial data based on a specified target output and have achieved promising results in many fields, such as image inpainting [42][43][44], image manipulation [45][46][47], and image translation [48][49][50][51][52]. More specifically, cGANs can be employed to efficiently translate SAR images to optical images, and have been proved to be suitable in the SAR-to-optical translation process [6,16,17,20,[53][54][55][56][57][58]. There are several cGANs-based SAR-to-optical image translation methods. However, these methods do not distinguish the features of SAR and optical remote sensing data that have the greatest influence on the translation process. Moreover, these methods do not consider the influence of different polarization modes of SAR data.
Generation of an image is inseparable from analysis of the original image and target image [55]. The goal of image analysis is to extract description parameters that can accurately express key information in the image and to quantitatively describe the image content; namely, feature extraction. Specifically, SAR images contain very rich structural information, whereas optimal images contain very rich spectral information. As such, the most abundant and typical information should be extracted from both images. However, previous studies [53,59] have only considered the textural part of structural information and neglected edge information, which also contains abundant useful information, as well as the basic features of the target structure. Therefore, it is important to evaluate the effect of introducing edge information to the cGANs on the SAR-to-optical translation process.
In general, SAR can be classified into four categories in terms of its polarimetric capability: single-polarization, dual-polarization, compact-polarization, and fully polarimetric [60]. Among them, dual-polarization can be divided into co-polarization (VV/HH) and cross-polarization (VH/HV). Polarization describes the vibration state of the electric field vector, which is one of the inherent properties of the electromagnetic wave [61].
However, the image information returned by different polarization methods can differ because the polarization mode has a significant influence on the radar beam response [62]. Therefore, it is also important to evaluate the difference in echo intensity between crosspolarization and co-polarization and determine the effect of this difference on the SARto-optical translation process. Furthermore, the extent to which dual-polarization can improve the recognition degree from that of the single-polarization mode remains unknown [11,[63][64][65].
Therefore, this study employs cGANs to transform SAR images into optical images, then explores the impact of edge information on the image generation process. Additionally, the effects of three different polarization modes are compared: co-polarization (VV), cross-polarization (VH), and dual-polarization (VV&VH). The main contributions of this study are as follows. First, we extend cGANs to the field of optical image reconstruction and prove its effectiveness. Second, we discuss the importance of edge information in the SAR-to-optical process. Third, we compare the reconstruction capability of different polarization modes for different land-cover types, which can be used to guide the selection of polarization modes for SAR data.

Methods
SAR data and cGANs were used to infer the spectral band information of optical images. SAR images are virtually immune to lighting conditions, weather conditions, or other environmental factors [20], therefore, they were used to reconstruct cloudless optical images. Thus, SAR images and optical images were used as the input, and the cGANs model was trained to learn the nonlinear mapping function to obtain corresponding optical images as the output. The specific methods were divided into four steps ( Figure 1).
(1) Preprocessing: optical remote sensing images and SAR images were preprocessed and split into small patches. (2) Feature extraction: rich spectral information of optical remote sensing images and rich structural information of SAR images were extracted as feature vectors.
(3) cGANs model training: SAR-optical patches were input to train the cGANs until convergence. In this step, we input paired co-polarization SAR-optical patches, crosspolarization SAR-optical patches, and dual-polarization SAR-optical patches. (4) Accuracy assessment: neural network classification was used to classify the generated optical images and original optical images, then compare the classification results.

Paired Features for Model Training from Remote Sensing Images
In this experiment, Landsat8 multi-spectral data were used for the optical image, and Sentinel-1 data was used for the SAR image. Landsat8 was launched by the National Aeronautics and Space Administration (NASA) and the United States Geological Survey (USGS), carrying the Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS). Sentinel-1 was launched by the European Space Agency (ESA) and is composed of a constellation of two satellites, Sentinel-1A and Sentinel-1B, sharing the same orbital plane. Both satellites carry a C-band SAR sensor and provide dual-polarization SAR images in all weather conditions, day or night. We used Landsat imagery from Hengyang City, Hunan Province, China, from July to September 2018 and 2019, along with corresponding Sentinel-1 images (with a time difference of fewer than 14 days).
For optical images, Fmask (function of mask) algorithm [66,67] was used to detect the cloud regions from landsat8 data and generate the cloud mask. First, it uses Landsat top of atmosphere (TOA) reflectance and brightness temperature (BT) as inputs, and uses rules based on cloud physical properties to separate potential cloud pixels (PCPs) and clear-sky pixels. Next, a normalized temperature probability, spectral variability probability, and brightness probability are combined to produce a probability mask for clouds over land and water separately. Then, the PCPs and the cloud probability mask are used together to derive the potential cloud layer. Using the cloud mask, we were able to remove invalid cloud regions and get cloud-free regions. SAR images were subjected to the following processes: multilooking, despeckling, geocoding, radiometric calibration, and resampling. After preprocessing, we obtained large paired SAR-optical images with the same geographic coordinate system and spatial resolution (30 m). Then, we used a sliding window to split the large paired SAR-optical images into small patches with a size of 256 × 256 pixels.
To fully exploit the rich spectral information of optical remote sensing images, we used a feature vector containing the pixel spectral information from bands 1 to 7 and the normalized difference vegetation index (NDVI); therefore, the optical feature vector contained eight channels (coastal aerosol, blue, green, red, near infrared[NIR], short-wave infrared[SWIR 1, SWIR 2 ], and NDVI).
In SAR data, the radar backscatter coefficient of the ground object is strongly correlated with the grayscale feature on the image; therefore, the grayscale difference cannot effectively reflect changes in the ground object target. As such, it is necessary to introduce textural features to represent the rich structural information contained in SAR data. The gray level concurrence matrix (GLCM) was first proposed by Haralick [68] in the early 1970s. It effectively describes the image gray value in terms of the direction, stride, and space between adjacent pixels, and can be simply interpreted as an estimate of the secondorder joint probability: where and are the distance and direction, respectively, i, , , denotes the pixel point with a gray value of I, and the probability of the pixel point with a gray value of J appearing in the specified distance and direction . For simplicity, the textural features of SAR images are generally obtained using the features of correlation, contrast, homogeneity, and energy, which can effectively reflect the texture of remote sensing images [69].
In addition to textural information, the edge information also exhibits good differentiation. The edge of the image is the boundary between adjacent different homogeneous regions, which is the region with the richest amount of useful information that contains the basic features of the target structure. In this study, we used the Canny edge detection algorithm [70] to extract the edge information of the SAR image. The Canny algorithm applies a Gaussian filter for image smoothing and noise suppression, then filters out lowgradient edge pixels (caused by noise), based on a hysteresis thresholding method [71]. The specific implementation steps were as follows: (1) A Gaussian filter was used to smooth the image and filter out noise.
(2) The gradient magnitude and direction of the filtered image was calculated. The direction of a pixel was divided into components in the x direction and y direction. The Canny operator was used to perform relevant operations with the original image and calculate the gradient of the pixel in the horizontal and vertical directions. (3) All values along the gradient line, except for the local maxima, were suppressed to sharpen the edge features. (4) By selecting high and low thresholds, edge pixels with weak gradient values were filtered out and edge pixels with high gradient values were retained.
For the SAR images, we computed the features using the GLCM and Canny edge detection algorithm. As shown in Figure 2, we computed four features (correlation, contrast, homogeneity, and energy) in four directions (0°, 45°, 90°, and 135°) using 7 × 7 windows. Then, each SAR pixel in co-polarization mode (VV) and cross-polarization mode (VH) was represented by a 17-dimensional feature vector, whereas those in dual-polarization mode (VV&VH) were represented by a 34-dimensional feature vector.

Conditional Generative Adversarial Networks (cGANs)
cGANs are an extension of the GAN concept, whereby both the generator and discriminator are conditioned using additional information (y). The loss function for conditional GANs is expressed by: Previous studies [42] have found it favorable to combine the cGANs objective function with a loss function that measures the difference between pixels, such as the L1 distance: Then, the whole objective function can be defined as: where λ is a parameter that controls the weight of the L1 distance in the overall objective function.
The additional information can be any type of auxiliary information, such as class labels or data from other modalities [40]. If x and y represent two different image domains, then cGANs can achieve the corresponding image-to-image translation [48]. Many applications have exploited this characteristic for image translation [72,73]; in this study, we apply it to SAR-to-optical translation.

Network Architecture
We adapted the pix2pix network architectures [48] to be compatible with multi-spectral optical images, as well as either one-channel (VV or VH) or two-channel (VV and VH) SAR images. Pix2pix, as a widely-used image-to-image translation network architecture, has proven stable and powerful for image-to-image translation [72,74]. It makes it possible to create various images without specialized knowledge of the images that we want to make [38]. In particular, the generator adopted the U-Net [75] architecture, consisting of seven convolution layers for encoding and seven deconvolution layers for decoding. In contrast, the discriminator adopted the PatchGAN architecture, consisting of five convolutional layers, followed by a sigmoid output layer for classification. The detailed information is shown in Figure 3. The inputs of the generators were SAR images, and the discriminators were either SAR images or optical images. The outputs of the generators were the generated optical images. For VV and VH polarization modes, the size of the input of the generator was set to 256 × 256 × 17, whereas for the VV&VH polarization mode, the size of the input of the generator was set to 256 × 256 × 34. The output of the generator was set to 256 × 256 × 8 and the size of the input of the discriminator was set to 256 × 256 × 16. The three numbers in round brackets shown in all encoding and decoding layers indicate the number of filters, filter size, and stride, respectively. The numbers in square brackets indicate the size of the feature maps. The discriminator learns to classify between fake (generated images, SAR patches) and real (Landsat patches, SAR patches) tuples.

Establishing the SAR-to-Optical Translation Relationship by Model Training
The small SAR-optical patches were input in order to train the cGANs until convergence ( Figure 4). The generator uses SAR patches as the input to generate optical images. The generated images and SAR patches are then classed as fake examples, whereas the Landsat patches and SAR patches are classed as real examples for the discriminator to learn. Each time the optical image generated by the generator is judged by the discriminator, information regarding its judgment is fed back to the generator. When the discriminator is unable to determine whether the input data patches are real or fake, it indicates that the images generated by the generator are good enough, and the training is ended. The training parameters at this point correspond to the SAR-to-optical relationship In this step, we input the paired co-polarization SAR(VV)-optical images, cross-polarization SAR(VH)-optical images, and dual-polarization SAR(VV&VH)-optical images. For each polarization mode, we extracted 2300 pairs of patches, of which 1700 pairs were classed as training data, among which 600 pairs were the most common testing data, which trained 200 epochs at a batch size of 16. The networks were trained with stochastic gradient descent and the ADAM optimizer [76], where the learning rate was set to 0.0002 and β was set to 0.5. All code development was conducted with TensorFlow deep learning frameworks on the Ubuntu operating system, and training was conducted on a single Graphic Processing Unit(GPU),namely, NVIDIA Tesla P100.

Optical Image Generation
Once the SAR-to-optical translation relationship was determined, i.e., by obtaining the training parameters at the point when the model converges, we used this relationship to generate the optical image. In reality, we already had the original optical image at time T1 (the real image) but pretended otherwise, for convenience of verification. Then, we used the SAR image at time T1 as the input and generated the optical image at time T1, using the established SAR-to-optical transformation relationship.

Evaluation of Reconstruction Image Data Quality
Neural network classification was used to classify the generated optical images and the original optical images, then compare the classification results. To quantitatively evaluate the accuracy, surface objects were visually compared using the following indicators: the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), correlation coefficient (r), and root mean squared error (RMSE). PSNR indicates image pixel statistical information, with a higher PSNR generally indicating higher image quality. SSIM represents structural information by calculating the structural similarity between generated images and real optical images. Its value ranges from zero to one and will reach one when the two images are identical. PSNR and SSIM are commonly used in the field of image quality evaluation. The correlation coefficient, r, is a statistical index that determines the degree of linear correlation between variables; a larger r indicates a greater relevance of the value of each pixel in the generated images real images. RMSE measures the deviation of generated values from the true value and detects the consistency between the two values; the smaller the RMSE, the denser the data and the higher the quality of the generated image.

Influence of Edge Information on SAR-to-Optical Translation
In this experiment, we compare the influence of the addition of edge information on the SAR-to-optical translation process without comparing different polarization modes. We perform control experiments using only the textural information as the input; the textural information is provided by the GLCM and the edge information is provided by the Canny edge detection algorithm.

Qualitative Evaluation of Generated Images
The results generated from several different inputs are compared with the original images in Figure 5. From the perspective of visual effects, after the addition of the edge information, the boundaries of the water body are clearer and more continuous. Specifically, in VV mode (as shown in the vertical comparison of columns 2 of Figure 5), the addition of edge information results in blurrier ground objects with less textural information in the generated image; in VH mode (as shown in the vertical comparison of columns 3 of Figure 5), the addition of edge information results in a clearer boundary between the water body and vegetation in the generated image, as well as more detailed textural information; in VV&VH mode (as shown in the vertical comparison of columns 4 of Figure 5), the addition of edge information also results in a clearer boundary between the water body and vegetation in the generated image, as well as a more uniform water body that does not erroneously contain vegetation.  Table 1 shows the image quality assessment (IQA) results for the different combinations of inputs. The best values for each quality index are shown in bold. The IQA consists of PSNR and SSIM. After edge information is added, images generated using the VV mode input exhibit lower quality according to both indexes, indicating that the addition of edge information in this case results in a more confusing cGANs with weaker learning ability. Conversely, images generated using the VH mode input exhibit higher quality; however, this improvement is only observed in bands 1, 2, and 7 for PSNR, indicating that the addition of edge information in this case may provide the cGANs with more effective information. Furthermore, images generated using the VV&VH mode input improve in all bands for both indexes, indicating that the addition of edge information in this case provides the cGANs with more effective information and results in better image quality. It is worth noting that the SSIM is improved for images generated using the VV and VV&VH polarization modes, which indicates that the generated images are more similar to the real image in terms of brightness, contrast, and structure.

Comparison of Different Polarization Modes
In this experiment, we compare the optimal polarization of the three polarization modes in two cases: (1) using both textural information and edge information as the input and (2) using only textural information as the input. To quantitatively assess the accuracy of the images generated by the different polarization modes, we employ the r and RMSE values between the predicted reflectance and real reflectance. To display the result clearly, we employ a scatter diagram to show the density distribution of data using MATLAB. The scatter diagram shows the reflectance relationship between the generated values and the actual values of the Landsat image, where the line in the scatter diagram is the 1:1 line. Points that are close to the line indicate that the algorithm can capture the reflectance change in surface objects and achieve high accuracy in predicting the reflectance of pixels. The yellow color indicates high centralization of points.

Optimal Polarization Mode Using Textural Information and Edge Information
According to the statistical data, images generated using the VV&VH polarization mode achieve the best r and RMSE values, followed by images generated using VH polarization, with those generated using VV polarization achieving the worst r and RMSE values. Regardless of the polarization mode used as the input, the r value is higher for bands 5-7 than bands 1-4, with band 6 exhibiting the highest r value (VV: r = 0.832, VH: r = 0.842, VV&VH: r = 0.864) and band 2 in images generated using VV and VH polarization exhibiting the lowest r value (VV: r = 0.346, VH: r = 0.607). The r value for images generated using VV&VH polarization is lowest in band 1 (r = 0.652). The relatively low r value for bands 1 and 2 under VV polarization (0.48 and 0.346, respectively) indicates that a large number of pixels have low accuracy; therefore, it is not recommended to use the VV polarization mode as the input when generating Landsat8 first band and second band images.
According to the scatter diagram (Figure 6), the reflectivity distribution for images generated using VV&VH polarization as the input is relatively concentrated, with the majority of data falling in the yellow area and light blue area, and relatively little data in the low-density dark blue area. In comparison, although the high-density yellow area is larger for images generated using VH polarization as the input, more low-density points appear in the dark blue area. Finally, the reflectivity values of images generated using VV polarization as the input exhibit a relatively discrete distribution, with a large number of lowdensity dark blue points falling far from the 1:1 fitting line.

Optimal Polarization Mode Using Only Textural Information
According to the statistical data, the images generated using the VV polarization mode as the input exhibit the best r and RMSE values, followed by those generated using VV&VH polarization, with images generated using VH polarization exhibiting the worst r and RMSE values. Again, regardless of the polarization mode, bands 5-7 exhibit higher r values than bands 1-4, with the highest r value in band 6 (VV: r = 0.874, VH: r = 0.831, VV&VH: r = 0.834) and the lowest r value in band 1 (VV: r = 0.623, VH: r = 0.596, VV&VH: r = 0.567).
According to the scatter diagram (Figure 7), the reflectivity distribution for images generated using the VV polarization mode is relatively small, but some points are scattered parallel to the X-axis in bands 1-4. Taking band 1 as an example, some points appear near the y = 0.15 line, which indicates that points with a reflectance of 0.15 in the real image are generated by mistake with values ranging from 0.10 to 0.18. This situation is alleviated when the image is generated using the VH polarization mode as the input, and almost disappears when the image is generated using the VV&VH polarization mode as the input. Figure 7. Scatter plots of the real reflectance and generated reflectance produced by co-polarization (VV), cross-polarization (VH), and dual-polarization (VV&VH) inputs when only textural information are input. (a): co-polarization (VV); (b): cross-polarization (VH), (c): dual-polarization (VV&VH). The horizontal axis represents the generated values, the vertical axis represents the actual values, r represents the correlation between generated values and actual values, and RMSE represents the root mean square error.

Classification and Area Ratio Comparison
To assess the accuracy of surface object information in the generated images, we classify the generated images into three categories: water bodies, building land, and vegetation, and determine the areal proportion of each surface object ( Table 2). The proportions of the three types of surface object in the real images are regarded as the baseline for the other sets. The best values for each quality index are shown in bold. Figure 8 shows the intuitive results of the classification. When only the GLCM is used as the input, the classification results of VV polarization are closest to those of the real images. When both the GLCM and edge information are used as the input, the classification results of VV&VH polarization are closest to those of the real images.
In VV polarization mode, the addition of edge information results in finer classification of surface object patches, which means that the generated images contain more detailed information. Water bodies surrounded by vegetation and building land are also effectively distinguished; thus, the water area is closer to that in the real image. However, many pixels that should be classified as building land are misclassified as vegetation, resulting in a smaller proportion of building land and a higher proportion of vegetation compared to the real image. The VH polarization results are similar to those of VV polarization, except that the addition of edge information results in better separation of water bodies and more detailed water area information in the generated image. In VV&VH polarization mode, the addition of edge information again makes the water boundaries more continuous and clear, reduces the misclassification of building land and vegetation, and generates areal proportions that are more similar to those in the real images for all three surface objects. cross-polarization as the input; VV&VH: dual-polarization as the input; GLCM: only textural information as the input; GLCM&Canny: both textural information and edge information as the input. Blue represents water bodies, red represents building land, and green represents vegetation.

Correlation Comparison
The scattering mechanism of SAR data is divided into surface scattering, body scattering, double echo, etc., which gives the SAR images different backscattering intensity for different surface objects. Therefore, we determine the optimal input for different types of surface objects by calculating the r value between the generated image and the original image for different surface objects ( Figure 9). For vegetation, the VV&VH polarization mode with edge information is the optimal input. Although the overall correlation for the VV polarization mode without edge information is relatively high, the correlation of each band is quite different. With the addition of edge information, the correlation is improved for vegetation areas generated with VH and VV&VH polarization modes as the input. For water bodies, the VV polarization mode with edge information is the optimal input, which may be because backscattering of the water body is typically surface reflection, which requires a polarization mode with a stronger echo, such as the VV polarization mode. The addition of edge information significantly improves the correlation for water body areas generated using each polarization mode. For building land, the VV polarization mode without edge information is the optimal input. The addition of edge information improves the correlation for building land generated using VV&VH polarization. Therefore, different input features should be selected for different types of surface object to ensure optimal image accuracy.

Effects of Different Reconstruction Methods on Different Optical Bands
There are many methods available for reconstructing optical remote sensing images [77][78][79][80][81][82]. The cGANs method adopted in this study has great potential for the reconstruction of NIR and SWIR bands (for Landsat8 images, band 5 is the NIR band, and bands 6 and 7 are the SWIR bands). For example, the spatial and temporal adaptive reflectance fusion model (STARFM) was used to obtain r values for NIR and SWIR bands of 0.693, 0.598, and 0.638, respectively [82], whereas the r values obtained in our study for inputs including edge information, textural information, and the VV&VH polarization mode were 0.848, 0.864, and 0.836, respectively. Moreover, sparse representation was used to reconstruct an optical image with a PSNR value for band 5 of 22.35 [8], whereas our study obtained a PSNR value of 27.33. Finally, the improved spatial-temporal fusion method was used to predict a reflectance r value for band 5 of 0.8403 [83], whereas the r value obtained in our study was 0.848. NIR and SWIR bands are typically used for urban monitoring, detection, and identification of roads, exposed soil, and water. Therefore, cGANsbased SAR-to-optical image translation methods may be the most suitable for image generation in these cases.

Superior Reconstruction With an Adequate Textural Extraction Scale
In this study, we used the GLCM to extract textural features, and the size of the running window was set to 7 × 7. However, different ground objects do not have the same texture size, periodic mode, or direction; therefore, the window size of 7 × 7 may not be suitable for all ground objects. This highlights a very interesting problem, that is, the problem of textural scale or the size of the window involved in the texture extraction process. Many previous studies have demonstrated that the textural scale has a significant influence on image reconstruction [84,85]. A larger textural scale can lead to blurred boundaries and interiors of the textural information and even mosaic phenomena, which conceals relatively small changes in the image. Conversely, a smaller textural scale will result in a more broken and spotted image, despite the more detailed textural information, which is not conducive to subsequent surface object extraction. Therefore, the spectral properties and spatial properties of the geological phenomena or processes being studied should be considered when selecting the textural scale. However, this topic is beyond the scope of this study.

Conclusions
In this study, we translated SAR images into optical images using cGANs, then investigated the effect of adding edge information and using three different polarization modes in the model input on the translation process. The major findings are as follows.
The addition of edge information improves the structural similarity between the generated image and the real image, makes the boundaries between surface objects clearer in the generated image, and provides the cGANs with more effective information, resulting in better image quality when VH and VV&VH polarization modes are used as the input. The optimal polarization mode with edge information added in the input is VV&VH, whereas the optimal polarization mode without edge information is VV. Moreover, different surface object types have different optimal input features. For example, VV&VH polarization with edge information is the optimal input for vegetation, VV polarization with edge information is the optimal input for water bodies, and VV polarization without edge information is the optimal input for building land. Overall, the accuracy of NIR and SWIR bands in the generated image is higher than that of visible bands (for Landsat8 images, bands 5-7 are more accurate than bands 2-4).
These findings provide an important reference for the selection of cGANs input features and have important applications for cloud removal, vegetation index reconstruction, etc. Although we only translated Sentinel-1 images into Landsat8 images, the translation of other optical images and SAR images is also theoretically feasible. Our results indicate that SAR-to-optical image translation can generate high-quality optical images that can be used in the construction of high temporal and spectral resolution time-series data. Future research should consider using images from multiple satellites and introducing time-series data to further improve the translation results.  Institutional Review Board Statement: "Not applicable" for studies not involving humans or animals.
Informed Consent Statement: "Not applicable" for studies not involving humans.
Data Availability Statement: Data sharing not applicable.