Abstract
Pansharpening is a crucial topic in remote sensing, and numerous deep learning-based methods have recently been proposed to explore the potential of deep neural networks (DNNs). However, existing approaches are often sensitive to spatial translation errors between high-resolution panchromatic (HRPan) and low-resolution multispectral (LRMS) images, leading to noticeable artifacts in the fused results. To address this issue, we propose an unsupervised pansharpening method that is robust to translation misalignment between HRPan and LRMS inputs. The proposed framework integrates a shift-invariant module to estimate subpixel spatial offsets and a diffusion-based generative model to progressively enhance spatial and spectral details. Moreover, a multi-scale detail injection module is designed to guide the diffusion process with fine-grained structural information. In addition, a carefully formulated loss function is established to preserve the fidelity of fusion results and facilitate the estimation of translation errors. Experiments conducted on the GaoFen-2, GaoFen-1, and WorldView-2 datasets demonstrate that the proposed method achieves superior fusion quality compared with state-of-the-art approaches and effectively suppresses artifacts caused by translation errors.