Next Article in Journal
Spatiotemporal Dynamics of Surface Energy Balance over the Debris-Covered Glacier: A Case Study of Lirung Glacier in the Central Himalaya from 2017 to 2019
Previous Article in Journal
A Semi-Analytical–Empirical Hybrid Model for Shallow Water Bathymetry Using Multispectral Imagery Without In Situ Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AWM-GAN: SAR-to-Optical Image Translation with Adaptive Weight Maps

1
School of Computer Science and Engineering, Kyungpook National University, 80, Daehak-ro, Buk-gu, Daegu 41566, Republic of Korea
2
Korea Aerospace Research Institute, 169-84, Gwahak-ro, Yuseong-gu, Daejeon 34133, Republic of Korea
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(23), 3878; https://doi.org/10.3390/rs17233878
Submission received: 2 October 2025 / Revised: 24 November 2025 / Accepted: 25 November 2025 / Published: 29 November 2025

Highlights

What are the main findings?
  • The proposed AWM-GAN effectively combines registration correction and adaptive weight mapping, ensuring geometric alignment and transformation consistency between SAR and optical domains.
  • The adaptive weight map integrates attribution and uncertainty information to emphasize reliable regions and reduce the influence of uncertain areas during training, thereby enhancing structural preservation and spectral realism.
What is the implication of the main finding?
  • The experimental results show that AWM-GAN consistently outperforms existing comparison models across all evaluation metrics, demonstrating superior performance in both structural accuracy and color restoration quality.
  • By achieving explainability in cross-modal image translation, AWM-GAN introduces the potential for integrating explainable artificial intelligence (XAI) into the field of remote sensing image generation.

Abstract

Synthetic Aperture Radar (SAR) imagery enables high-resolution observation regardless of weather conditions. However, it is difficult to interpret intuitively due to issues such as speckle noise. In contrast, optical imagery provides realistic visual information but is vulnerable to weather and illumination changes. Leveraging this complementarity, SAR-to-Optical image translation has attracted considerable attention. Nevertheless, existing paired learning approaches are limited by the high cost of large-scale data collection and residual misregistration errors, while unpaired learning approaches are prone to global color shifts and structural distortions. To address these limitations, this study proposes a SAR-to-Optical translation framework that introduces a weight map throughout the training process. The proposed weight map combines Attribution maps and Uncertainty maps to amplify losses in important regions while guiding conservative learning in uncertain areas. Moreover, the weight map is incorporated into the registration stage to refine pixel-wise displacement estimation, preserve boundary and structural consistency, and enhance overall training stability. The experimental results demonstrate that the proposed method outperforms existing approaches on the SAR2Opt and SEN1-2 datasets in all metrics, including PSNR and SSIM.

1. Introduction

Remote sensing imagery serves as a cornerstone for understanding the Earth system and addressing global social and environmental challenges. Its applications span from climate monitoring and land surface dynamics [1,2] to agricultural productivity enhancement [3,4], disaster risk management [5,6], maritime surveillance and port logistics [7,8], as well as urban expansion and infrastructure planning [9,10].
Among the various sensing technologies, Synthetic Aperture Radar (SAR) and optical imagery have emerged as two of the most representative and complementary modalities. SAR employs an active microwave sensing that transmits radar pulses and receives backscattered signals. As a result, SAR has the advantages of enabling observations independent of solar illumination both during day and night, allowing for stable data acquisition regardless of weather conditions such as clouds, fog, and precipitation. In addition, SAR is highly sensitive to the shape and structural characteristics of terrain and man-made objects. Nevertheless, these strengths are counterbalanced by inherent drawbacks, detailed below. Speckle noise, a byproduct of coherent interference, introduces grainy textures that degrade visual quality and hinder subsequent processing [11,12]. Geometric distortions arising from side-looking geometry [13,14] lead to layover and foreshortening effects, complicating alignment with ground truth. Furthermore, the absence of intuitive spectral content renders SAR interpretation highly nontrivial for non-experts.
In contrast, optical imagery captures solar reflectance across visible and near-infrared bands, thereby producing photorealistic representations closely aligned with human visual perception. Optical data convey rich chromatic and textural cues that are essential for tasks such as vegetation monitoring, land cover classification, and damage assessment. Its intuitive interpretability explains the broad adoption of optical datasets in both research and practice. However, optical sensing is inherently constrained by atmospheric and illumination conditions. For example, environmental factors including clouds, fog, and haze, along with varying solar geometry, significantly impact acquisition quality [15,16,17]. Consequently, optical imagery can be unreliable for continuous monitoring in regions with persistent cloud cover or in time-sensitive emergency response scenarios.
Taken together, SAR and optical imagery exhibit highly complementary characteristics. SAR excels at structural reliability under challenging conditions, whereas optical imagery provides spectral fidelity and human interpretability. Their integration or mutual translation has thus become a vibrant research direction in the remote sensing community [18,19,20]. In particular, SAR-to-Optical (S2O) image translation seeks to render SAR images into optical-like forms, alleviating the steep learning curve required for SAR interpretation. This not only enhances usability for non-expert end users but also expands the applicability of SAR data to domains where intuitive interpretation is critical, such as disaster recovery, environmental surveillance, and urban planning. However, to fully realize this potential, it is essential to develop translation models capable of effectively handling the complex distributional differences between SAR and optical data. Consequently, image-to-image (I2I) translation techniques have been applied to the S2O problem, and the approaches can be broadly categorized into paired-data-based supervised methods and unpaired-data-based unsupervised methods.
Early I2I translation research was largely dominated by paired learning, which leverages pixel-wise correspondences between SAR and optical images acquired under near-identical conditions. Representative examples include Pix2Pix [21] and BicycleGAN [22], both of which demonstrated the feasibility of generating visually coherent results. However, paired approaches face three fundamental limitations. First, there is the scarcity and high cost of acquiring well aligned SAR-Optical pairs at scale. Second, residual misalignments often remain in practice due to sensor differences and acquisition timing. Third, these approaches show limited generalizability when applied beyond the training domain. Diffusion-based generative models offer higher-quality synthesis but impose prohibitive computational burdens, constraining their scalability for operational remote sensing.
These challenges motivated the development of unpaired learning strategies. The seminal CycleGAN [23] introduced cycle consistency to learn bidirectional mappings without paired data. Building on this foundation, several extensions have emerged. CUT [24] enhances local structural consistency through contrastive patch learning. NICE-GAN [25] improves efficiency by reusing discriminator layers as encoder components. ASGIT [26] introduces spatial attention for focused translation. StegoGAN [27] leverages embedding-based representations to improve adaptability. More recently, domain-tailored designs such as FG-GAN [28] and the two-stage method of Qing et al. [20] have demonstrated the potential of adapting these ideas specifically to S2O translation.
Despite this progress, unpaired approaches still fall short of delivering reliable S2O translations. Four issues are particularly salient. First, global radiometric deviations arise due to irreconcilable sensing differences, resulting in biased tones and inconsistent brightness that undermine spectral fidelity. Second, fine-grained textures such as vegetation patterns, building surfaces, or road networks are frequently lost, as the models prioritize global structure over local detail. Third, man-made boundaries often suffer from blurring and deformation, exacerbating geometric inconsistencies rather than correcting SAR distortions. Finally, domain shift remains unresolved: models trained on specific datasets generalize poorly when confronted with unseen SAR distributions, severely limiting robustness in real-world deployments.
To address these challenges, this study introduces a novel S2O image translation framework that integrates the strengths of both paired and unpaired paradigms. The framework is built on Cycle-Consistent Adversarial Networks [23], augmented with two critical components. The first is a fine-grained registration module, explicitly designed to correct residual geometric misalignments between SAR and optical data, thereby complementing the weaknesses of paired learning. The second is an adaptive loss re-weighting mechanism, constructed from attribution and uncertainty maps. The attribution map highlights structurally important regions such as edges and boundaries, while the uncertainty map identifies areas of low confidence or potential distortion. Their combination yields a spatial weight map that adaptively modulates cycle and registration losses, strengthening reliable regions while attenuating unreliable ones. Through this synergy, the framework simultaneously enhances optical fidelity, structural consistency, and domain robustness. The main contributions of this study are summarized as follows:
  • We propose a registration enhanced CycleGAN framework that explicitly corrects residual geometric errors in SAR-Optical pairs, improving boundary preservation and geometric reliability.
  • We design an attribution and uncertainty guided weight map for adaptive loss re-weighting, which reflects spatially varying importance and confidence, thereby boosting both performance and interpretability.
  • We conduct extensive experiments on the SAR2Opt [29] and SEN1-2 [30] datasets, demonstrating consistent improvements across all major quantitative metrics, validating the robustness and generalizability of the proposed method.

2. Related Work

2.1. Image-to-Image Translation and SAR-to-Optical

I2I translation aims to learn the distributional discrepancy between different domains and translates input images into corresponding target-domain representations using generative models. Such techniques have been widely applied in diverse areas, including medical imaging, artistic style transfer, autonomous driving perception, and remote sensing, offering the potential to leverage heterogeneous data from sensors with distinct characteristics.

2.1.1. Paired I2I Approaches

Paired data-based approaches rely on aligned image pairs acquired under identical spatiotemporal conditions to learn direct mappings between distinct domains. Leveraging the strengths of supervised learning, this paradigm produces high-quality translations. Figure 1 illustrates representative examples of such pair-based learning based translation.
A representative work, Pix2Pix [21], established one-to-one mappings between inputs and outputs, where pixel-level reconstruction losses enhanced color fidelity and preserved fine-grained details. This approach laid the foundation for subsequent supervised I2I research. Building upon this, BicycleGAN [22] introduced a multimodal latent space to generate diverse outputs from a single input, thereby enhancing flexibility. While such models deliver stable and realistic results when sufficient paired data are available, constructing large-scale high-quality paired datasets in satellite and aerial remote sensing remains challenging. Sensor heterogeneity, acquisition conditions, and viewpoint discrepancies introduce inevitable residual misalignments, which degrade translation performance. Consequently, despite their high potential, paired approaches are constrained by prohibitive data acquisition costs and limited applicability.

2.1.2. Unpaired I2I Approaches

To address the scarcity of paired datasets, unpaired I2I approaches have been proposed. These methods enable learning translations even in the absence of aligned data, which can be particularly beneficial in domains like remote sensing, where data construction is difficult. Figure 2 illustrates the core framework of CycleGAN, highlighting the bidirectional translation enabled by cycle-consistency. CycleGAN [23] introduced the cycle-consistency loss, enforcing input images to be recoverable back to their original domain, thereby enabling bidirectional learning without pairs. Although this framework is groundbreaking, it still suffers from global radiometric shifts, texture degradation, and structural distortions. Several extensions have sought to overcome these limitations.
Additionally, Figure 3 presents examples of various unpaired I2I methods. CUT [24] applied contrastive learning to intermediate generator patches, enhancing structural consistency. NICE-GAN [25] improved efficiency and performance by reusing early discriminator layers as the generator encoder. ASGIT [26] employed spatial attention mechanisms to focus on semantically critical regions. StegoGAN [27] constrained latent embeddings to enforce cross-domain alignment. Compared to paired approaches, these methods offer higher flexibility and achieve moderate improvements in translation quality. Nonetheless, fundamental issues remain, including structural distortions, global color discrepancies, and generalization degradation due to domain mismatch.

2.1.3. S2O-Specific I2I Approaches

S2O image translation entails challenges beyond general I2I tasks. Owing to microwave backscattering properties, SAR imagery inevitably exhibits speckle noise, geometric distortions, and radar shadows, making direct alignment with optical imagery difficult. To address these characteristics, specialized S2O image translation frameworks have been proposed. FG-GAN [28], for instance, aimed to preserve structural information from SAR while reconstructing the spectral fidelity of optical images, incorporating domain adaptation to mitigate cross-domain representation gaps. Similarly, Qing et al. [20] proposed a two-stage framework integrating a registration module into a CycleGAN backbone. This approach simultaneously corrected residual geometric misalignments within paired data and leveraged the strengths of unpaired learning to alleviate dataset scarcity. Nonetheless, these specialized approaches still suffer from limited capability in capturing fine-grained details and tend to exhibit overfitting toward vegetation regions, leading to biased representations. Figure 4 illustrates representative examples of these specialized S2O translation frameworks.

2.2. Attribution and Uncertainty in Vision Translation

I2I translation has demonstrated remarkable success in cross-domain mapping tasks and has emerged as a core technique in computer vision. Nevertheless, persistent concerns remain regarding the reliability and interpretability of the generated outputs. Users often lack transparency regarding why specific results were produced and to what extent they can be trusted. This issue is particularly critical in application domains such as remote sensing and medical imaging, where decision-making requires clear and reliable justification.
To address this limitation, explainable artificial intelligence (XAI) methods have been introduced. XAI aims to convert the complex internal processes of deep models into human-understandable explanations, primarily realized through attribution-based techniques. Attribution methods quantify or visualize the contribution of each input feature to the output, thereby revealing which parts of the input the model relied on and to what degree. Representative techniques include Layer-wise Relevance Propagation (LRP) [31], Grad-CAM [32], and Integrated Gradients [33]. LRP redistributes the prediction score backward through the network to assign a relevance value to each input pixel, thereby indicating which evidence most strongly supports the model’s decision. Grad-CAM visualizes class-discriminative regions by exploiting the final convolutional feature maps, offering heatmaps of network attention. Integrated Gradients accumulate gradients along the path from a baseline to the input, providing a more stable and linear attribution measure. Collectively, these attribution-based methods enhance interpretability by offering explanatory evidence for I2I translation outputs.
In parallel, uncertainty estimation constitutes an equally critical dimension. Uncertainty quantifies the reliability of model predictions and is typically divided into aleatoric and epistemic uncertainty [34]. Aleatoric uncertainty arises from inherent data variability, such as sensor noise, resolution limits, or speckle artifacts in SAR imagery, and is often modeled through variance estimation or pixel-wise probability maps. Epistemic uncertainty, by contrast, reflects model ignorance due to insufficient training data or architectural limitations, highlighting what the model does not know. Estimation techniques include Monte Carlo Dropout [35], which repeatedly applies dropout during inference to obtain output variance, and Bayesian neural networks, which treat weights as probabilistic variables to provide posterior distributions and confidence intervals.
Recent works have increasingly integrated these concepts into I2I translation. For instance, Uncertainty-Guided Progressive GAN [36] employed aleatoric uncertainty as an attention map to iteratively refine uncertain regions. MultiResunc [37] jointly estimated aleatoric and epistemic uncertainty to improve MR-to-CT translation. Karthik et al. [38] incorporated aleatoric-based cycle-consistency and gradient-consistency losses into CycleGAN to enhance boundary preservation and interpretability. Uncertainty-Aware Regularization [39] combined generalized Gaussian aleatoric modeling with total variation regularization to suppress noise-sensitive uncertainty.
In the context of S2O image translation, attribution and uncertainty are particularly relevant. SAR imagery often suffers from information loss and structural deformation due to speckle noise, geometric distortions, and radar shadows. Attribution maps can serve to verify whether models attend to semantically or structurally important regions, such as terrain boundaries, water bodies, or building contours. Meanwhile, uncertainty maps explicitly reveal low-confidence areas, offering practical cues for post-processing and refinement. Ultimately, attribution and uncertainty are likely to serve as key mechanisms not only for enhancing perceptual quality but also for ensuring the reliability and interpretability of S2O image translation outcomes.

3. Data Description

Research on S2O image translation has been actively conducted along with the construction of various paired datasets. However, most existing remote sensing datasets are biased toward optical imagery, while high-quality SAR datasets, especially those organized as SAR-Optical pairs, remain limited. For instance, Sentinel-1/2-based datasets have the advantage of global coverage, but their relatively low spatial resolution makes them less suitable for local-scale analysis. On the other hand, high-resolution SAR datasets collected by satellites such as TerraSAR-X or Capella Space can capture fine-grained urban structures, but their coverage is narrow and the number of available pairs is limited. Therefore, S2O research requires selecting datasets according to the task, with low-resolution datasets enabling large-scale training and high-resolution datasets allowing for precise object-level analysis. In this study, we used two representative paired SAR-Optical datasets, SEN1-2 and SAR2Opt. The former is a large-scale low-resolution dataset covering the global domain, while the latter is a high-resolution dataset that preserves fine urban structures. Below, we describe the characteristics and differences of each dataset in detail.

3.1. SEN1-2

SEN1-2 [30] is a large-scale paired dataset released in 2018, consisting of a total of 282,384 co-registered image patches of size 256 × 256 . Figure 5 presents representative examples of the SEN1-2 dataset.
The dataset covers global regions across all four seasons. SAR images were collected from the Sentinel-1 satellite of the European Space Agency, while the corresponding optical images were obtained from the Sentinel-2 satellite. Sentinel-1 employs a C-band SAR sensor with an approximate spatial resolution of 5 m, and all images were acquired in Interferometric Wide Swath mode with VV polarization. SEN1-2 has become widely used in large-scale applications such as vegetation monitoring, disaster response, and land cover classification due to its openness and large volume. However, its relatively coarse resolution of around 5 m limits its applicability in high-precision tasks that require detailed urban structure analysis.

3.2. SAR2Opt

SAR2Opt [29] is a high-resolution SAR-Optical paired dataset released in 2022, comprising 2076 image pairs of size 600 × 600 . The dataset covers approximately 70 km2 across multiple urban regions worldwide. The SAR images were acquired using the X-band sensor of the TerraSAR-X satellite in high-resolution spotlight mode, achieving 1 m spatial resolution. The corresponding optical images were collected from Google Earth Engine. Unlike SEN1-2, SAR2Opt can accurately represent object-level structures within cities, such as individual buildings and ships near harbors, making it particularly useful for high-resolution image translation and object-based analysis. However, its limited dataset size remains a drawback. Figure 6 shows example pairs of SAR–optical images from the SAR2Opt dataset.

4. Preliminaries: Cycle-Consistent Adversarial Networks

Cycle-consistent adversarial networks comprise two generators and two discriminators. The generator G X Y learns a mapping from domain X to domain Y, while G Y X learns the reverse mapping. The discriminators D X and D Y are trained to distinguish real samples from generated ones in their respective domains. This dual-structure enables learning cross-domain correspondences without requiring paired data. The training objective is defined by three complementary loss functions.
The first component is the adversarial loss, which enforces distributional alignment between generated and real samples. By adopting the hinge formulation, the discriminator is optimized as follows:
L D = E y p Y max ( 0 , 1 D ( y ) ) + E x p X max ( 0 , 1 + D ( G ( x ) ) ) ,
where p X and p Y denote the data distributions of domains X and Y. Here, x p X and y p Y represent real samples from each domain. The generator is optimized to maximize discriminator responses:
L G = E x p X D ( G ( x ) ) ,
which drives the generated distribution toward the target domain and ensures perceptual realism. The second component is the cycle-consistency loss, which regularizes the mapping to be approximately invertible. Formally,
L c y c = E x p X G Y X ( G X Y ( x ) ) x 1 + E y p Y G X Y ( G Y X ( y ) ) y 1 ,
where · 1 is the L1 distance. The first term constrains the round-trip mapping X Y X , and the second enforces Y X Y . This bidirectional constraint reduces mode collapse and preserves semantic consistency in unpaired translation. The third component is the identity loss, which prevents degenerate mappings that alter samples unnecessarily. It is defined as follows:
L i d = E y p Y G X Y ( y ) y 1 + E x p X G Y X ( x ) x 1 .
this regularization discourages color distortions and structural shifts, thereby preserving domain-specific attributes and stabilizing training.
The overall learning objective is a weighted combination of adversarial, cycle-consistency, and identity losses. This formulation underpins the CycleGAN framework [23], which we adopt as the backbone for our study. On top of this foundation, we design additional modules to address domain-specific challenges in SAR-to-Optical image translation.

5. Methods

Our framework addresses two key challenges in cycle-consistency based SAR-to-Optical (S2O) translation: residual geometric misalignments between paired images and the spatial heterogeneity of confidence across different regions. To tackle these issues, we integrate a registration module that estimates pixel-wise displacement fields and employ attribution- and uncertainty-guided weight maps to adaptively reweight spatial losses. The overall architecture is shown in Figure 7, with details of each component described below.
Specifically, the generator adopts an encoder–decoder–based residual learning structure that captures the domain differences between SAR and optical images while preserving the essential scene structure. The discriminator employs a patch-based architecture that evaluates local-region consistency, thereby ensuring fine-grained texture quality and structural realism. In addition, to mitigate the inherent geometric discrepancies between SAR–Optical image pairs, the model incorporates a multi-resolution encoder–decoder registration network. This module estimates pixel-wise dense displacement fields between the generated image and the target-domain image, and uses them to warp the generated output to match the target geometry.
Furthermore, the model utilizes attribution-based and uncertainty-based information produced during the generation process to construct a weight map, which is then used to spatially re-weight the loss function. By assigning higher weights to structurally meaningful or highly reliable regions and reducing the influence of noisy or uncertain areas, the proposed framework balances structural preservation and spectral fidelity while preventing overfitting to noise. Through the combination of this weight-map–guided learning mechanism and the registration module, the proposed method simultaneously improves geometric alignment and visual expressiveness in S2O translation. The following subsections provide a detailed explanation of the weight-map generation process, the displacement-field estimation process, and the way these weights are finally applied to the overall loss.

5.1. Attribution-Uncertainty Guided Weight Map Generation

The framework relies on generating weight maps that guide the optimization process. At the end of each generator forward pass, a single-channel map W is constructed and refined with attribution and uncertainty information. Separate maps are maintained for each translation direction ( X Y and Y X ) to reflect domain asymmetry. Figure 8 illustrates example weight maps for each translation direction.
Let the last convolutional feature map of the generator be F R B × C × h × w , where B is the batch size, C the number of channels, and h × w the spatial resolution. The average gradient of channel c with respect to a scalar generation loss s is computed as follows:
w ¯ c = 1 h w i , j s F c ( i , j ) .
here, w ¯ c measures the sensitivity of channel c to s across spatial positions ( i , j ) . Using these weights, a channel-weighted sum is calculated, followed by ReLU activation and normalization to [ 0 , 1 ] , yielding the attribution map:
A = norm ReLU c w ¯ c F c .
The uncertainty map represents the model’s confidence. The generator outputs pixel-wise log variances, from which normalized standard deviations are obtained:
U = norm exp 0.5 · log σ 2 .
regions with higher uncertainty values correspond to lower confidence predictions. The final weight map combines both attribution and uncertainty. Attribution is down-weighted in uncertain regions, using β as a decay factor and ϵ as a small constant:
W ˜ = norm A β U + ϵ .
A curriculum schedule stabilizes training by gradually increasing a ramp-up coefficient α . The final map is:
W = ( 1 α ) + α W ˜ .
Thus, W remains within [ 1 α , 1 ] , preventing complete suppression of gradients. The map is broadcast along the channel dimension and applied consistently across all spectral bands.

5.2. Displacement-Field Flow Estimator

Geometric misalignments often occur between SAR and Optical images due to differences in sensing mechanisms. To mitigate this, we employ a registration module that predicts dense pixel-wise displacement fields in the form:
ϕ = ( Δ y , Δ x ) R 2 × H × W ,
where Δ x and Δ y represent horizontal and vertical displacements. The final convolution layer is initialized to zero to start from identity mapping, avoiding abrupt distortions. Given ϕ , the source image I a is warped with bilinear interpolation:
I ˜ a = B ( I a , G i d + S ( ϕ ) ) ,
where B is the bilinear sampling operator, G i d the normalized identity grid, and S ( ϕ ) a scaling function. This process is fully differentiable, enabling end-to-end training. The registration module is optimized with two objectives. The photometric loss L photo reg enforces similarity between the warped source I ˜ a and target I b :
L photo reg = I ˜ a I b 1 .
here, I a and I b form a same-modality pair. In the A→B direction the photometric loss is computed between the generated Optical image and the real Optical image, and in the B→A direction it is computed between the generated SAR image and the real SAR image. The smoothness loss L smooth reg encourages spatial regularity in ϕ :
L smooth reg = E x ϕ 2 + E y ϕ 2 ,
where x and y are horizontal and vertical spatial gradients. This regularization suppresses discontinuities, yielding coherent and physically plausible flows.
The registration module in this work is jointly trained with the generators in an end-to-end manner, and all loss terms are integrated into a single total loss optimized through a single backward pass. In addition, a weight map is employed to spatially reweight the major loss components by reflecting the local reliability of each region.

5.3. Weight-Map–Guided Global Loss Reweighting

The weight map (W) reweights major losses, emphasizing reliable regions while attenuating uncertain ones. This stabilizes optimization and improves the fidelity of learned displacement fields. To ensure training stability, W is detached from the computation graph and used solely as a multiplicative weighting factor; so, no gradient flows into the map itself.
For cycle consistency loss L cyc , given a reconstruction rec and reference real with loss weighting coefficient λ cyc :
L cyc = λ cyc i , j W i , j | rec ( i , j ) real ( i , j ) | i , j W i , j + ϵ .
For photometric consistency loss L photo global between warped predictions and the reference with loss weighting coefficient λ photo :
L photo global = λ photo i , j W i , j | T ( fake , ϕ ) ( i , j ) real ( i , j ) | i , j W i , j + ϵ ,
where T ( fake , ϕ ) denotes the warped output. Finally, the smoothness loss L smooth global is weighted in a boundary-aware manner, applied separately to vertical and horizontal neighbors with loss weighting coefficient λ smooth :
L smooth global = λ smooth i , j W i , j v | ϕ ( i , j + 1 ) ϕ ( i , j ) | i , j W i , j v + ϵ + i , j W i , j h | ϕ ( i + 1 , j ) ϕ ( i , j ) | i , j W i , j h + ϵ .
here, W v and W h denote weights for vertical and horizontal neighbors. This formulation enhances boundary preservation while maintaining global smoothness.
In summary, W systematically modulates spatial losses, amplifying errors in semantically meaningful and reliable regions while reducing the influence of noisy or uncertain areas. This design directs the network to focus on structural fidelity and improves stability in cross-domain translation.

6. Experiments

We compared our model with several representative baselines, including CycleGAN [23], CUT [24], ASGIT [26], Qing et al. [20], and StegoGAN [27]. All models were trained under identical training conditions for 100 epochs to ensure a fair comparison.

6.1. Implementation Details

6.1.1. Dataset Configuration

For the SAR2Opt [29] dataset, we followed the default structure provided in the official GitHub (https://github.com/MarsZhaoYT/SAR2Opt-Heterogeneous-Dataset, accessed on 15 October 2025) repository, which offers an approximate 7:3 split between the training set and the inference set. To ensure consistency, we applied the same 7:3 ratio when partitioning the SEN1-2 [30] dataset, since this dataset does not provide an official split. Additionally, both datasets were preprocessed to take 256 × 256 inputs and to produce outputs of the same 256 × 256 resolution. For fair comparison, all baseline models were also trained and evaluated under identical 256 × 256 resolution conditions.

6.1.2. Training Settings

In our method, the adaptive weighting mechanism is determined by three key hyperparameters, each serving a specific purpose. First, the curriculum ramp-up coefficient α is scheduled to increase gradually from 0.15 to 0.70, enabling the model to shift its focus from global structural alignment in the early stage to uncertainty-guided fine-grained refinement in later epochs. Second, the uncertainty attenuation factor β is fixed at 1.2 to prevent excessive amplification of attribution responses in regions with high uncertainty values. Finally, the numerical stabilization term ε is set to 10 8 to avoid division-by-zero issues and to ensure stable computation during the construction of the adaptive weight map.

6.1.3. Evaluation Metrics

To comprehensively evaluate translation and reconstruction quality, we adopt six quantitative metrics covering pixel fidelity, structural similarity, perceptual similarity, spectral consistency, radiometric accuracy, and perceptual color difference:
  • Peak Signal-to-Noise Ratio (PSNR): PSNR measures the fidelity of the reconstructed image with respect to the reference image, based on the ratio between signal power and noise power. Higher values indicate closer resemblance to the ground truth. Since PSNR is based on mean squared error (MSE), it emphasizes overall intensity differences but may not fully reflect perceptual quality.
  • Structural Similarity Index Measure (SSIM): SSIM evaluates structural similarity by incorporating luminance, contrast, and structural information. Unlike PSNR, SSIM captures perceptual structural fidelity, making it more consistent with human visual perception. Values closer to 1 indicate higher structural similarity.
  • Learned Perceptual Image Patch Similarity (LPIPS): LPIPS quantifies perceptual similarity by comparing deep features from pretrained neural networks. It measures the distance between local image patches in feature space, correlating well with human perception. Lower values imply closer perceptual similarity.
  • Spectral Angle Mapper (SAM): Widely used in remote sensing, SAM measures the spectral similarity between reconstructed and reference images by computing the angle between spectral vectors. Smaller values indicate better spectral preservation, which is crucial in applications such as land cover and vegetation analysis.
  • Relative Global Dimensional Synthesis Error (ERGAS): ERGAS quantifies the overall radiometric distortion between reconstructed and reference images. It is calculated from the normalized root mean square error (RMSE) across spectral bands. Lower values represent higher radiometric consistency.
  • CIEDE2000 Color Difference ( Δ E 2000 ): Following the International Commission on Illumination (CIE) standard, Δ E 2000 measures the perceptual difference in color based on hue, chroma, and lightness. It provides a reliable indication of how close the reconstructed image colors are to the ground truth optical image. Lower values denote better color consistency.
In summary, this evaluation protocol enables a rigorous and multidimensional assessment of the proposed method, considering not only pixel-level accuracy but also structural realism, spectral fidelity, and perceptual quality.

6.2. Qualitative Results

Figure 9 compares the translation results of different models on agricultural regions from the SEN1-2 [29] dataset.
CUT [24] fails in all test cases, producing severely degraded outputs. CycleGAN [23] does not correctly capture colors in (b) and (f), and the remaining examples exhibit substantial loss of fine details. StegoGAN [27] also fails to render realistic colors in (b) and (c), while the other cases suffer from degraded details. ASGIT [26] appears to preserve coarse structural layouts, but struggles to recover colors and local textures. The method of Qing et al. [20] reproduces the overall structure and color distribution to some extent, yet still fails to retain fine details. In contrast, the proposed method accurately restores both structure and spectral information while carefully preserving local details. Representative improvements can be observed in the brown area below the village in (b) and the white patch in the center of (f).
Figure 10 presents visual examples of the results obtained from different models on the SAR2Opt [29] dataset. As shown in (a), (b), and (c), when the images are dominated by vegetation (green areas), all models generate relatively stable results. However, in cases such as (d), (e), and (f), where the scene is either non-vegetated or contains colors other than green within the vegetation regions, existing models fail to properly reproduce those colors. In particular, in (d), although the ground-truth optical image exhibits desert-like textures, all comparison models generate results with greenish tones, whereas our model successfully captures the distinctive desert color. Similarly, (e) and (f) are scenes that are largely covered by vegetation but contain partial blue regions, which the comparison models fail to replicate. In contrast, our method clearly reproduces these blue regions, as can be visually observed.

6.3. Quantitative Results

6.3.1. Accuracy-Based Metrics

Table 1 presents the quantitative comparison of S2O translation methods on the SEN1-2 dataset [30]. The conventional unpaired approaches show generally low performance across most metrics. Specifically, CycleGAN achieved a PSNR of 9.88, SSIM of 0.09, LPIPS of 0.54, SAM of 13.15, ERGAS of 125.82, and Δ E 2000 of 29.43, indicating poor fidelity and high spectral distortion. CUT exhibited similar limitations with PSNR 9.39, SSIM 0.05, LPIPS 0.61, SAM 12.42, ERGAS 131.67, and Δ E 2000 31.27. StegoGAN also remained at a low level, recording PSNR 9.07, SSIM 0.05, LPIPS 0.57, SAM 12.28, ERGAS 142.76, and Δ E 2000 32.39. Compared with these, ASGIT yielded slightly better values, with PSNR 10.62, SSIM 0.12, LPIPS 0.54, SAM 12.16, ERGAS 113.69, and Δ E 2000 27.39, but the overall accuracy still remained insufficient. A more recent method, Qing et al. [20], demonstrated a clear performance jump, achieving PSNR 17.26, SSIM 0.35, LPIPS 0.36, SAM 8.25, ERGAS 45.84, and Δ E 2000 13.38. This indicates significant improvements in both fidelity and spectral preservation compared to earlier unpaired approaches. Nevertheless, the proposed method surpassed all competing models across every metric. It attained the highest PSNR of 19.02, the best SSIM of 0.45, and the lowest LPIPS of 0.29, confirming superior fidelity, structural consistency, and perceptual quality. In terms of spectral accuracy, it recorded the lowest SAM of 6.78 and the best ERGAS of 39.00, highlighting excellent preservation of radiometric information. Finally, with a Δ E 2000 of 10.96, our model achieved the most accurate color reproduction. These results clearly demonstrate that the proposed approach consistently outperforms all baseline methods, providing the most balanced and reliable performance across fidelity, structure, spectrum, and color dimensions.
Table 2 summarizes the quantitative results on the SAR2Opt dataset using the same evaluation metrics. The early unpaired baselines show clear limitations. CycleGAN produced PSNR 13.06, SSIM 0.19, LPIPS 0.75, SAM 7.04, ERGAS 60.99, and Δ E 2000 20.82, revealing low fidelity and noticeable spectral distortion. CUT reported very similar outcomes with PSNR 13.05, SSIM 0.17, LPIPS 0.74, SAM 6.50, ERGAS 60.06, and Δ E 2000 19.81. ASGIT reached PSNR 12.91, SSIM 0.18, LPIPS 0.74, SAM 6.20, ERGAS 62.91, and Δ E 2000 20.10, showing only marginal variation. Likewise, StegoGAN remained weak with PSNR 12.68, SSIM 0.18, LPIPS 0.74, SAM 6.53, ERGAS 64.42, and Δ E 2000 20.63. In contrast, the more advanced method of Qing et al. [20] delivered a substantial leap forward, obtaining PSNR 15.73, SSIM 0.27, LPIPS 0.71, SAM 5.18, ERGAS 45.82, and Δ E 2000 14.81. These results marked a notable step up in both reconstruction quality and spectral consistency relative to the earlier baselines. Building on this, the proposed model achieved the best results across all indicators: PSNR 16.13, SSIM 0.28, LPIPS 0.72, SAM 4.81, ERGAS 43.49, and Δ E 2000 13.92. The higher PSNR and SSIM values show improved fidelity and structural coherence, while the reduced LPIPS indicates closer perceptual similarity. Importantly, the lowest SAM and ERGAS highlight superior spectral preservation, and the smallest Δ E 2000 confirms the most faithful color reproduction. Overall, these findings demonstrate that the proposed method consistently surpasses existing approaches on the SAR2Opt dataset, offering a well-rounded improvement in fidelity, structure, perception, spectrum, and color accuracy.

6.3.2. Computational Cost Analysis

To evaluate the computational efficiency of the proposed method, we conducted a quantitative comparison with existing methods in terms of training time, inference time, and the number of trainable parameters. For a fair comparison, 4000 image pairs were randomly sampled from the SEN1-2 dataset and split into training and inference sets at a 7:3 ratio, and all models were tested under identical hardware and software conditions.
As summarized in Table 3, the proposed method exhibits a slight increase in the number of parameters compared to CycleGAN and ASGIT due to the addition of the registration module and adaptive weighting components. However, the overall model size remains comparable to that of [20] and StegoGAN, indicating that the additional modules do not introduce excessive structural complexity. In terms of inference time, the proposed method achieves the fastest performance among all compared approaches. Lastly, regarding training time, the proposed method incurs somewhat higher computational cost than ASGIT and [20] but maintains a similar level to CycleGAN and CUT. This is attributed to the additional training of the registration module and the continuous computation of attribution and uncertainty maps throughout the training process; nonetheless, the overall training burden remains within a practical range.
In summary, despite the inclusion of additional modules, Ours maintains competitive computational efficiency while delivering higher translation quality. These results demonstrate that the proposed framework offers both effectiveness and practicality for SAR-Optical translation tasks.

6.4. Ablation Study

In this section, we conduct ablation experiments to quantitatively analyze the contribution of each component of the proposed AWM-GAN to the final performance. All experiments were performed under the same conditions for 20 epochs, and the comparison includes the following settings: removing the Registration module, removing the weight map (W), using only the Attribution map (A), using only the Uncertainty map (U), and the full proposed method with all components enabled. The quantitative results obtained for each dataset are summarized in Table 4.
First, removing the weight map leads to the largest degradation in both PSNR and SSIM across the two datasets. This indicates that, without reflecting region-wise reliability and importance, the model tends to learn uniformly across the entire image, resulting in reduced structural preservation and weakened detail representation. Next, removing the Registration module also results in noticeable performance degradation. Due to intrinsic sensor differences between SAR and optical images, residual geometric discrepancies inevitably exist, and the registration module mitigates such distortions by correcting these misalignments, thereby enabling stable domain translation. Using only the Attribution map achieves improved performance compared to the W setting, as it assigns higher weights to structurally important regions and encourages focused learning on meaningful areas. However, since it does not suppress highly uncertain regions, its stability remains limited. In contrast, using only the Uncertainty map yields larger PSNR improvements than using only A. This demonstrates that attenuating low-reliability regions, which frequently arise due to speckle characteristics in SAR images, plays a crucial role in enhancing translation stability. The full proposed method, which incorporates all modules, achieves the best performance across all metrics.
Overall, the results show that the proposed method with all components enabled consistently delivers the highest performance. This confirms that each module not only provides meaningful independent contributions but also works synergistically to produce optimal SAR–Optical translation quality.

7. Discussion

This study proposed a novel framework that integrates a registration module with adaptive weight maps to address the limitations of existing approaches in SAR-to-Optical (S2O) image translation. Both qualitative and quantitative results demonstrated that the proposed method consistently outperforms previous approaches across all evaluation metrics. On the SEN1-2 and SAR2Opt datasets, the framework achieved improvements not only in conventional image quality metrics such as PSNR and SSIM, but also in remote sensing-specific metrics including SAM, ERGAS, and Δ E 2000 . This indicates that the proposed framework ensures not only visual similarity but also spectral and structural reliability.
First, the introduction of the registration module effectively minimized distortions of man-made structures such as building boundaries and road networks. Second, the weight map generated by combining attribution and uncertainty maps enhanced the loss in high-confidence regions while alleviating learning in highly uncertain areas. This design secured both training stability and interpretability, extending the explainability of deep learning-based image translation beyond mere performance improvements. In the context of remote sensing, where interpretability and reliability of results are crucial, this contribution significantly enhances the practical applicability of the method. Third, the proposed framework successfully addressed a key limitation of existing methods, which often perform well on vegetation-dominated scenes but degrade in non-vegetation or complex color environments. The experimental results showed that the proposed approach more accurately restored chromatic information in challenging cases such as desert terrain and regions containing blue spectral components, demonstrating the effectiveness of weight-based loss rebalancing in capturing local characteristics and uncertainty.
Nevertheless, several limitations remain. First, while validation on the representative public datasets SEN1-2 and SAR2Opt is meaningful, the method has not yet been tested under diverse sensor characteristics or extreme weather conditions encountered in real-world satellite operations. Second, although performance improvements were confirmed on high-resolution datasets, the limited data size constrains the assessment of generalization ability. Third, as the framework still relies on a GAN-based structure, further comparative studies are required to assess its computational complexity and long-term scalability against emerging diffusion models.
Future work will focus on enhancing robustness through the collection and validation of more diverse datasets, as well as exploring hybrid frameworks that combine GANs with diffusion models. In addition, to address the current limitation of relying on two datasets, future work will further examine the model’s generalization ability by exploring geographically diverse regions or alternative SAR modalities when suitable paired datasets become available. Moreover, given that the proposed framework is built upon a bidirectional image-to-image translation architecture, it can be naturally extended toward Optical-to-SAR generation. In particular, we plan to investigate the reconstruction of SAR imagery from historical optical observations to improve applicability in scenarios where SAR acquisition is limited or unavailable.

Author Contributions

Conceptualization, S.-J.P. and W.-J.N.; Methodology, S.-J.P.; Software, S.-J.P. and S.-H.K.; Validation, S.-J.P. and S.-H.K.; Formal analysis, S.-J.P.; Investigation, S.-J.P.; Resources, W.-J.N. and T.K.; Data curation, S.-J.P. and T.K.; Writing—original draft preparation, S.-J.P. and T.K.; Writing—review and editing, S.-J.P., S.-H.K., H.-K.S. and W.-J.N.; Visualization, S.-J.P.; supervision, W.-J.N.; Project administration, S.-J.P. and W.-J.N.; Funding acquisition, W.-J.N. and T.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by funding from the Korea government ((KASA, Korea AeroSpace Administration) (grant number RS-2020-NR055936)).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tsokas, A.; Rysz, M.; Pardalos, P.M.; Dipple, K. SAR data applications in earth observation: An overview. Expert Syst. Appl. 2022, 205, 117342. [Google Scholar] [CrossRef]
  2. Jiang, D.; Marino, A.; Ionescu, M.; Gvilava, M.; Savaneli, Z.; Loureiro, C.; Spyrakos, E.; Tyler, A.; Stanica, A. Combining optical and SAR satellite data to monitor coastline changes in the Black Sea. ISPRS J. Photogramm. Remote Sens. 2025, 226, 102–115. [Google Scholar] [CrossRef]
  3. Hashemi, M.G.; Jalilvand, E.; Alemohammad, H.; Tan, P.N.; Das, N.N. Review of synthetic aperture radar with deep learning in agricultural applications. ISPRS J. Photogramm. Remote Sens. 2024, 218, 20–49. [Google Scholar] [CrossRef]
  4. Gao, H.; Wang, C.; Zhu, J.; Song, D.; Xiang, D.; Fu, H.; Hu, J.; Xie, Q.; Wang, B.; Ren, P.; et al. TVPol-Edge: An Edge Detection Method With Time-Varying Polarimetric Characteristics for Crop Field Edge Delineation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4408917. [Google Scholar] [CrossRef]
  5. Qin, Y.; Yin, X.; Li, Y.; Xu, Q.; Zhang, L.; Mao, P.; Jiang, X. High-precision flood mapping from Sentinel-1 dualpolarization SAR data. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4204315. [Google Scholar] [CrossRef]
  6. Saleh, T.; Weng, X.; Holail, S.; Hao, C.; Xia, G.S. DAM-Net: Flood detection from SAR imagery using differential attention metric-based vision transformers. ISPRS J. Photogramm. Remote Sens. 2024, 212, 440–453. [Google Scholar] [CrossRef]
  7. Liu, S.; Li, D.; Song, H.; Fan, C.; Li, K.; Wan, J.; Liu, R. SAR ship detection across different spaceborne platforms with confusion-corrected self-training and region-aware alignment framework. ISPRS J. Photogramm. Remote Sens. 2025, 228, 305–322. [Google Scholar] [CrossRef]
  8. Guan, T.; Chang, S.; Deng, Y.; Xue, F.; Wang, C.; Jia, X. Oriented SAR Ship Detection Based on Edge Deformable Convolution and Point Set Representation. Remote Sens. 2025, 17, 1612. [Google Scholar] [CrossRef]
  9. Koukiou, G. SAR Features and Techniques for Urban Planning—A Review. Remote Sens. 2024, 16, 1923. [Google Scholar] [CrossRef]
  10. Liu, R.; Ling, J.; Zhang, H. SoftFormer: SAR-optical fusion transformer for urban land use and land cover classification. ISPRS J. Photogramm. Remote Sens. 2024, 218, 277–293. [Google Scholar] [CrossRef]
  11. Liu, Z.; Wang, S.; Li, Y.; Gu, Y.; Yu, Q. Dsrkd: Joint despecking and super-resolution of sar images via knowledge distillation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5218613. [Google Scholar] [CrossRef]
  12. Fang, Y.; Liu, R.; Peng, Y.; Guan, J.; Li, D.; Tian, X. Contrastive learning for real SAR image despeckling. ISPRS J. Photogramm. Remote Sens. 2024, 218, 376–391. [Google Scholar] [CrossRef]
  13. Liu, S.; Li, D.; Wan, J.; Zheng, C.; Su, J.; Liu, H.; Zhu, H. Source-assisted hierarchical semantic calibration method for ship detection across different satellite SAR images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5215221. [Google Scholar] [CrossRef]
  14. Yin, L.; Deng, M.; Yang, Y.; Huang, Y.; Tang, Q. A sensitive geometric self-calibration method and stability analysis for multiview spaceborne SAR images based on the range-Doppler model. ISPRS J. Photogramm. Remote Sens. 2025, 220, 550–562. [Google Scholar] [CrossRef]
  15. Xu, F.; Shi, Y.; Ebel, P.; Yu, L.; Xia, G.S.; Yang, W.; Zhu, X.X. GLF-CR: SAR-enhanced cloud removal with global–local fusion. ISPRS J. Photogramm. Remote Sens. 2022, 192, 268–278. [Google Scholar] [CrossRef]
  16. Wang, P.; Chen, Y.; Huang, B.; Zhu, D.; Lu, T.; Dalla Mura, M.; Chanussot, J. MT_GAN: A SAR-to-optical image translation method for cloud removal. ISPRS J. Photogramm. Remote Sens. 2025, 225, 180–195. [Google Scholar] [CrossRef]
  17. Li, M.; Xu, Q.; Li, K.; Li, W. DecloudFormer: Quest the key to consistent thin cloud removal of wide-swath multi-spectral images. Pattern Recognit. 2025, 166, 111664. [Google Scholar] [CrossRef]
  18. Yang, X.; Zhao, J.; Wei, Z.; Wang, N.; Gao, X. SAR-to-optical image translation based on improved CGAN. Pattern Recognit. 2022, 121, 108208. [Google Scholar] [CrossRef]
  19. Bai, X.; Pu, X.; Xu, F. Conditional diffusion for SAR to optical image translation. IEEE Geosci. Remote Sens. Lett. 2023, 21, 4000605. [Google Scholar] [CrossRef]
  20. Qing, Y.; Zhu, J.; Feng, H.; Liu, W.; Wen, B. Two-way generation of high-resolution eo and sar images via dual distortion-adaptive gans. Remote Sens. 2023, 15, 1878. [Google Scholar] [CrossRef]
  21. Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
  22. Zhu, J.Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A.A.; Wang, O.; Shechtman, E. Toward Multimodal Image-to-Image Translation. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  23. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
  24. Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.Y. Contrastive learning for unpaired image-to-image translation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 319–345. [Google Scholar]
  25. Chen, R.; Huang, W.; Huang, B.; Sun, F.; Fang, B. Reusing discriminators for encoding: Towards unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8168–8177. [Google Scholar]
  26. Lin, Y.; Wang, Y.; Li, Y.; Gao, Y.; Wang, Z.; Khan, L. Attention-based spatial guidance for image-to-image translation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 816–825. [Google Scholar]
  27. Wu, S.; Chen, Y.; Mermet, S.; Hurni, L.; Schindler, K.; Gonthier, N.; Landrieu, L. Stegogan: Leveraging steganography for non-bijective image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 7922–7931. [Google Scholar]
  28. Yang, X.; Wang, Z.; Zhao, J.; Yang, D. FG-GAN: A fine-grained generative adversarial network for unsupervised SAR-to-optical image translation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5621211. [Google Scholar] [CrossRef]
  29. Zhao, Y.; Celik, T.; Liu, N.; Li, H.C. A comparative analysis of GAN-based methods for SAR-to-optical image translation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 3512605. [Google Scholar] [CrossRef]
  30. Schmitt, M.; Hughes, L.H.; Zhu, X.X. The SEN1-2 dataset for deep learning in SAR-optical data fusion. arXiv 2018, arXiv:1807.01569. [Google Scholar] [CrossRef]
  31. Binder, A.; Montavon, G.; Lapuschkin, S.; Müller, K.R.; Samek, W. Layer-wise relevance propagation for neural networks with local renormalization layers. In Proceedings of the International Conference on Artificial Neural Networks, Barcelona, Spain, 6–9 September 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 63–71. [Google Scholar]
  32. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
  33. Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; PMLR: New York, NY, USA, 2017; pp. 3319–3328. [Google Scholar]
  34. Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 2017, 30, 5580–5590. [Google Scholar]
  35. Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; PMLR: New York, NY, USA, 2016; pp. 1050–1059. [Google Scholar]
  36. Upadhyay, U.; Chen, Y.; Hepp, T.; Gatidis, S.; Akata, Z. Uncertainty-guided progressive GANs for medical image translation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 614–624. [Google Scholar]
  37. Klaser, K.; Borges, P.; Shaw, R.; Ranzini, M.; Modat, M.; Atkinson, D.; Thielemans, K.; Hutton, B.; Goh, V.; Cook, G.; et al. A multi-channel uncertainty-aware multi-resolution network for MR to CT synthesis. Appl. Sci. 2021, 11, 1667. [Google Scholar] [CrossRef] [PubMed]
  38. Karthik, E.N.; Cheriet, F.; Laporte, C. Uncertainty estimation in unsupervised MR-CT synthesis of scoliotic spines. IEEE Open J. Eng. Med. Biol. 2023, 5, 421–427. [Google Scholar] [CrossRef]
  39. Vats, A.; Farup, I.; Pedersen, M.; Raja, K. Uncertainty-Aware Regularization for Image-to-Image Translation. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 26 February–6 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 3965–3974. [Google Scholar]
Figure 1. Figure 1 illustrates the architectures of representative paired I2I approaches, Pix2Pix and BicycleGAN. (a) shows Pix2Pix [21], while (b) presents BicycleGAN [22], with the upper part representing cVAE-GAN and the lower part representing cLR-GAN.
Figure 1. Figure 1 illustrates the architectures of representative paired I2I approaches, Pix2Pix and BicycleGAN. (a) shows Pix2Pix [21], while (b) presents BicycleGAN [22], with the upper part representing cVAE-GAN and the lower part representing cLR-GAN.
Remotesensing 17 03878 g001
Figure 2. The basic training process of CycleGAN [23], a representative model that leverages cycle Consistency loss.
Figure 2. The basic training process of CycleGAN [23], a representative model that leverages cycle Consistency loss.
Remotesensing 17 03878 g002
Figure 3. These are advanced models built upon CycleGAN [25], including (a) CUT [24], (b) ASGIT [26], (c) NICE-GAN [25], and (d) StegoGAN [27].
Figure 3. These are advanced models built upon CycleGAN [25], including (a) CUT [24], (b) ASGIT [26], (c) NICE-GAN [25], and (d) StegoGAN [27].
Remotesensing 17 03878 g003
Figure 4. (a) illustrates the FG-GAN framework, while (b) depicts the SAR-to-Optical translation framework proposed by Qing et al. [20].
Figure 4. (a) illustrates the FG-GAN framework, while (b) depicts the SAR-to-Optical translation framework proposed by Qing et al. [20].
Remotesensing 17 03878 g004
Figure 5. Sample visualization from the SEN1-2 [30] dataset.
Figure 5. Sample visualization from the SEN1-2 [30] dataset.
Remotesensing 17 03878 g005
Figure 6. Sample visualization from the SAR2Opt [29] dataset.
Figure 6. Sample visualization from the SAR2Opt [29] dataset.
Remotesensing 17 03878 g006
Figure 7. This figure presents the overall architecture of the proposed framework, illustrated with the pair-based learning flow. Similar to a conventional cycle-consistency model, the network is trained by repeating the transformation X (SAR) →Y (Optical) →X (SAR). During this process, the generated and reconstructed outputs are passed to the discriminator, the registration module, and the weight map generator, and the resulting outputs are used to compute the loss functions and guide optimization.
Figure 7. This figure presents the overall architecture of the proposed framework, illustrated with the pair-based learning flow. Similar to a conventional cycle-consistency model, the network is trained by repeating the transformation X (SAR) →Y (Optical) →X (SAR). During this process, the generated and reconstructed outputs are passed to the discriminator, the registration module, and the weight map generator, and the resulting outputs are used to compute the loss functions and guide optimization.
Remotesensing 17 03878 g007
Figure 8. The figure illustrates SAR-Optical input pairs together with the generated Attribution map (A), Uncertainty map (U), and Weight map (W). The weight map W enhances the regions highlighted in A, while suppressing those with high uncertainty in U, resulting in attenuated or hole-like regions within W. This indicates that the model deliberately reduces the contribution of highly uncertain areas during optimization.
Figure 8. The figure illustrates SAR-Optical input pairs together with the generated Attribution map (A), Uncertainty map (U), and Weight map (W). The weight map W enhances the regions highlighted in A, while suppressing those with high uncertainty in U, resulting in attenuated or hole-like regions within W. This indicates that the model deliberately reduces the contribution of highly uncertain areas during optimization.
Remotesensing 17 03878 g008
Figure 9. Qualitative comparison of S2O image translation results on SEN1-2 [30] dataset. (a) shows a semi-urban area with a small settlement and surrounding farmland, while (b) presents large green agricultural fields. (c) illustrates block-shaped croplands with strong color contrast, and (d) depicts a village–farmland region similar to (a). (e) shows large block-structured croplands similar to (c), and (f) represents a rural area characterized by an irregular bright patch. The red boxes highlight regions where the proposed method restores structural and spectral details more accurately than the baselines.
Figure 9. Qualitative comparison of S2O image translation results on SEN1-2 [30] dataset. (a) shows a semi-urban area with a small settlement and surrounding farmland, while (b) presents large green agricultural fields. (c) illustrates block-shaped croplands with strong color contrast, and (d) depicts a village–farmland region similar to (a). (e) shows large block-structured croplands similar to (c), and (f) represents a rural area characterized by an irregular bright patch. The red boxes highlight regions where the proposed method restores structural and spectral details more accurately than the baselines.
Remotesensing 17 03878 g009
Figure 10. Qualitative comparison of S2O image translation results on SAR2Opt [29] dataset. (a) shows a grid-structured urban area where residential blocks and trees are closely mixed, and (b) presents a similarly structured housing region with dense vegetation. (c) resembles these two scenes but features curved roads and a more irregular layout. (d) then contrasts these urban examples by depicting a desert region with very sparse man-made structures. (e) shows a mixed landscape composed of a park-like green area and residential buildings located in the lower-right portion of the scene, while (f) represents a forested area in which large houses are partially hidden beneath dense tree canopies. The red boxes highlight regions where the proposed method restores structural and spectral details more accurately than the baseline methods.
Figure 10. Qualitative comparison of S2O image translation results on SAR2Opt [29] dataset. (a) shows a grid-structured urban area where residential blocks and trees are closely mixed, and (b) presents a similarly structured housing region with dense vegetation. (c) resembles these two scenes but features curved roads and a more irregular layout. (d) then contrasts these urban examples by depicting a desert region with very sparse man-made structures. (e) shows a mixed landscape composed of a park-like green area and residential buildings located in the lower-right portion of the scene, while (f) represents a forested area in which large houses are partially hidden beneath dense tree canopies. The red boxes highlight regions where the proposed method restores structural and spectral details more accurately than the baseline methods.
Remotesensing 17 03878 g010
Table 1. Quantitative comparison of S2O image translation results on SEN1-2 [30] dataset.
Table 1. Quantitative comparison of S2O image translation results on SEN1-2 [30] dataset.
MethodPSNR ( ) SSIM ( ) LPIPS ( ) SAM ( ) ERGAS ( ) Δ E 2000 ( )
CycleGAN [23]9.880.090.5413.15125.8229.43
CUT [24]9.390.050.6112.42131.6731.27
ASGIT [26]10.620.120.5412.16113.6927.39
Qing et al. [20]17.260.350.368.2545.8413.38
StegoGAN [27]9.070.050.5712.28142.7632.39
Ours19.020.450.296.7839.0010.96
Table 2. Quantitative comparison of S2O image translation results on SAR2Opt [29] dataset.
Table 2. Quantitative comparison of S2O image translation results on SAR2Opt [29] dataset.
MethodPSNR ( ) SSIM ( ) LPIPS ( ) SAM ( ) ERGAS ( ) Δ E 2000 ( )
CycleGAN [23]13.060.190.757.0460.9920.82
CUT [24]13.050.170.746.5060.0619.81
ASGIT [26]12.910.180.746.2062.9120.10
Qing et al. [20]15.730.270.715.1845.8214.81
StegoGAN [27]12.680.180.746.5364.4220.63
Ours16.130.280.724.8143.4913.92
Table 3. Computation Cost on the SEN1-2 [30] dataset.
Table 3. Computation Cost on the SEN1-2 [30] dataset.
MethodTraining Time (s)Inference Time (s)Trainable Parameters (M)
CycleGAN [23]22,421.8186.6528.29
CUT [24]23,108.880.8914.70
ASGIT [26]15,930.542.1728.29
Qing et al. [20]20,429.874.4032.40
StegoGAN [27]31,983.7179.2830.06
Ours27,848.332.2732.39
Table 4. Quantitative evaluation of different module combinations on the SEN1-2 [30] and SAR2Opt [29] datasets.
Table 4. Quantitative evaluation of different module combinations on the SEN1-2 [30] and SAR2Opt [29] datasets.
SEN1-2SAR2Opt
PSNR ( ) SSIM ( ) PSNR ( ) SSIM ( )
w/o W Map12.620.2014.160.19
w/o Registration12.980.2315.240.24
w/ A Map14.190.2615.410.25
w/ U Map14.410.2516.010.25
Ours14.870.2716.100.26
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pyeon, S.-J.; Kim, S.-H.; Shin, H.-K.; Kim, T.; Nam, W.-J. AWM-GAN: SAR-to-Optical Image Translation with Adaptive Weight Maps. Remote Sens. 2025, 17, 3878. https://doi.org/10.3390/rs17233878

AMA Style

Pyeon S-J, Kim S-H, Shin H-K, Kim T, Nam W-J. AWM-GAN: SAR-to-Optical Image Translation with Adaptive Weight Maps. Remote Sensing. 2025; 17(23):3878. https://doi.org/10.3390/rs17233878

Chicago/Turabian Style

Pyeon, Su-Jang, Seong-Heon Kim, Ho-Kyung Shin, Taeheon Kim, and Woo-Jeoung Nam. 2025. "AWM-GAN: SAR-to-Optical Image Translation with Adaptive Weight Maps" Remote Sensing 17, no. 23: 3878. https://doi.org/10.3390/rs17233878

APA Style

Pyeon, S.-J., Kim, S.-H., Shin, H.-K., Kim, T., & Nam, W.-J. (2025). AWM-GAN: SAR-to-Optical Image Translation with Adaptive Weight Maps. Remote Sensing, 17(23), 3878. https://doi.org/10.3390/rs17233878

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop