HiT_DS: A Modular and Physics-Informed Hierarchical Transformer Framework for Spatial Downscaling of Sea Surface Temperature and Height
Highlights
- HiT_DS achieves high-resolution reconstruction of SST and SSH fields while preserving fine-scale structures and high-gradient ocean features.
- Selective combination of E-DFE, GA, and physics-informed losses enhances reconstruction accuracy across regions with distinct ocean dynamics.
- HiT_DS provides a flexible and modular framework for oceanographic data downscaling that can be tailored to different dynamical regimes.
- The approach bridges the gap between generic super-resolution methods and physically consistent geophysical data reconstruction, supporting improved ocean monitoring and research.
Abstract
1. Introduction
- Preservation of Scientific Data Fidelity: Oceanographic variables such as SSH and SST are high-precision, single-channel measurements. Direct application of image-based super-resolution architectures may introduce errors or fail to capture intrinsic physical structures. Therefore, a key challenge is designing models that respect the inherent accuracy and spatial patterns of scientific data.
- Limited Capacity of Existing Models for Long-Range Spatial Dependencies: Currently, most methods for downscaling SSH and SST rely on Convolutional Neural Networks (CNNs). However, due to their fixed and local receptive fields, CNNs have inherent limitations in capturing long-range dependencies. This constraint hinders the ability of such models to represent large-scale oceanic structures and low-frequency variability. While Transformer-based models have shown remarkable success in both computer vision and Earth system modeling, their application to geophysical downscaling remains underexplored.
- Lack of Dynamic Feature Awareness in Spatially Heterogeneous Regions: Oceanic fields exhibit strong spatial heterogeneity, with dynamic features such as eddies, fronts, and filaments regulating energy and mass transport. Existing deep learning methods typically apply uniform enhancement across spatial domains, potentially diluting attention in highly dynamic regions. An open challenge is guiding models to focus on spatially complex and physically important structures during reconstruction.
2. Materials and Methods
2.1. Study Areas
- Study Area 1: South China Sea and Adjacent Waters (107°E–123°E, 6°N–22°N)
- 2.
- Study Area 2: Kuroshio Extension Region (134°E–150°E, 28°N–44°N)
2.2. Datasets
2.2.1. SST Data
2.2.2. SSH Data
2.3. Hierarchical Transformer
2.3.1. Block-Level Design: Hierarchical Windows
2.3.2. Layer-Level Design: Spatial-Channel Correlation
2.4. HiT_DS: A Scientific Data-Optimized Architecture
2.4.1. Overall Architecture
2.4.2. Enhanced Dual Feature Extraction
- 1.
- Depth-wise Convolution:
- 2.
- Pointwise Convolution:
2.4.3. Gradient-Aware Attention (GA)
2.4.4. Redesign of the Loss Functions
- (a)
- L1 Loss with Laplacian-Based Physical Constraint for SST
- (b)
- L1 Loss with Geostrophic Constraint for SSH
3. Results
3.1. Ablation Strategy and Model Variants
- Baseline Model
- 2.
- E-DFE
- 3.
- GA
- 4.
- Physics-informed Loss Functions
- 5.
- Selective Combination of Effective Modules
3.2. Analysis of SST Downscaling Results
3.2.1. SST Downscaling Performance in Study Area 1
3.2.2. SST Downscaling Performance in Study Area 2
3.3. Analysis of SSH Downscaling Results
3.3.1. SSH Downscaling Performance in Study Area 1
3.3.2. SSH Downscaling Performance in Study Area 2
4. Discussion
4.1. SST Downscaling: Module Effectiveness and Regime Dependence
4.2. SSH Downscaling: Module Effectiveness and Regime Dependence
5. Conclusions
- Module effectiveness is regime-dependent. In low-variability regions characterized by smooth SST or weak SSH gradients, E-DFE effectively reinforces local spatial structures and improves structural fidelity. In contrast, GA and physics-informed losses demonstrate their strongest impact in high-variability regions with sharp gradients, eddies, and mesoscale processes.
- Complementarity of modules. Integrating E-DFE, GA, and physics-informed losses into a unified architecture yields synergistic improvements, suppressing large reconstruction errors, enhancing temporal consistency, and preserving high-gradient features. This confirms the importance of modular and adaptive deployment for complex oceanographic fields.
- SST vs. SSH downscaling. For SST, improvements in local feature extraction and gradient-sensitive attention lead to more accurate and temporally stable reconstructions, especially in moderate-to-high variability regimes. For SSH, physically informed constraints play a dominant role in stabilizing errors, while GA enhances reconstruction in dynamically complex regions. E-DFE has limited effect for SSH, reflecting the need to account for large-scale and nonlocal dynamics.
- Practical implications. Even modest reductions in RMSE, MAE, or relative error translate into practical value for operational ocean monitoring and forecasting, underscoring the utility of HiT_DS as a robust and flexible downscaling framework.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, X.; Chen, B.; Wu, M.; Gan, Q.; Wang, L. Characteristic Analysis of Spring Sea Surface Temperature Predictors for Tropical Cyclone Genesis over the Northwest Pacific in 2021. J. Agric. Disaster Res. 2021, 11, 76–80. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015. [Google Scholar]
- Chen, Y.; Wang, X.; Liu, Y. Reprocessing of Sea Surface Height Anomaly Data in China Offshore Waters. Mar. Sci. 2016, 40, 151–159. [Google Scholar]
- Solanki, H.U.; Bhatpuria, D.; Chauhan, P. Signature Analysis of Satellite Derived SSHa, SST and Chlorophyll Concentration and Their Linkage with Marine Fishery Resources. J. Mar. Syst. 2015, 150, 12–21. [Google Scholar] [CrossRef]
- Carton, J.A.; Giese, B.S.; Grodsky, S.A. Sea Level Rise and the Warming of the Oceans in the Simple Ocean Data Assimilation (SODA) Ocean Reanalysis. J. Geophys. Res. Oceans 2005, 110, 1–8. [Google Scholar] [CrossRef]
- Zhao, G.; Li, D.; Yang, S.; Qi, J.; Yin, B.S. The Development of a Weather-Type Statistical Downscaling Model for Wave Climate Based on Wave Clustering. Ocean Eng. 2024, 304, 117863. [Google Scholar] [CrossRef]
- Al Azad, A.S.M.; Marsooli, R. A High-Resolution Coupled Circulation Wave Model for Regional Dynamic Downscaling of Water Levels and Wind Waves in the Western North Atlantic Ocean. Ocean Eng. 2024, 311, 118869. [Google Scholar] [CrossRef]
- Gao, T.; Jiang, H. Statistical Downscaling of Coastal Directional Wave Spectra Using Deep Learning. Coast. Eng. 2024, 192, 104557. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, P.; Duan, S.; Yang, H.; Yin, Z. Downscaling of Landsat 8 Land Surface Temperature Products Based on Deep Learning. Natl. Remote Sens. Bull. 2021, 25, 1767–1777. [Google Scholar]
- Thiria, S.; Sorror, C.; Archambault, T.; Charantonis, A.; Bereziat, D.; Mejia, C.; Molines, J.-M.; Crépon, M. Downscaling of Ocean Fields by Fusion of Heterogeneous Observations Using Deep Learning Algorithms. Ocean Modell. 2023, 182, 102174. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, Y.; Yu, F. Hit-SR: Hierarchical Transformer for Efficient Image Super-Resolution. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Good, S.; Fiedler, E.; Mao, C.Y.; Martin, M.J.; Maycock, A.; Reid, R.; Roberts-Jones, J.; Searle, T.; Searle, T.; Waters, J.; et al. The Current Configuration of the OSTIA System for Operational Production of Foundation Sea Surface Temperature and Ice Concentration Analyses. Remote Sens. 2020, 12, 720. [Google Scholar] [CrossRef]
- E.U. Global Ocean Physics Reanalysis; Copernicus Marine Service Information (CMEMS); Marine Data Store (MDS); Mercator Ocean International: Toulouse, France, 2023. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Choi, H.; Lee, J.; Yang, J. N-gram in Swin Transformers for Efficient Lightweight Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Zhou, Y.; Li, Z.; Guo, C.; Bai, S.; Cheng, M.; Hou, Q. SRFormer: Permuted Self-Attention for Single Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L. Swin Transformer V2: Scaling Up Capacity and Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022. [Google Scholar]
- Hui, Z.; Gao, X.B.; Yang, Y.C.; Wang, X. Lightweight Image Super-Resolution with Information Multi-Distillation Network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019. [Google Scholar]
- Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient Long-Range Attention Network for Image Super-Resolution. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Chen, Z.; Zhang, Y.; Gu, J.; Kong, L.; Yang, X.; Yu, F. Dual Aggregation Transformer for Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023. [Google Scholar]
- Wang, W.; Chen, W.; Qiu, Q.; Chen, L.; Wu, B.; Lin, B.; He, F.; Liu, W. Crossformer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 3123–3136. [Google Scholar] [CrossRef] [PubMed]
- Cai, H.; Li, J.; Hu, M.; Gan, C.; Han, S. EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Beucler, T.; Pritchard, M.; Rasp, S.; Ott, J.; Baldi, P.; Gentine, P. Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems. Phys. Rev. Lett. 2021, 126, 098302. [Google Scholar] [CrossRef] [PubMed]
- Hardy, C.M.; Livermore, P.W.; Niesen, J.; Luo, J.W.; Li, K. Determination of the instantaneous geostrophic flow within the three-dimensional magnetostrophic regime. Proc. R. Soc. A Math. Phys. Eng. Sci. 2018, 474, 20180412. [Google Scholar]
- Wang, S.; Li, X.; Zhu, X.; Li, J.; Guo, S. Spatial Downscaling of Sea Surface Temperature Using Diffusion Model. Remote Sens. 2024, 16, 3843. [Google Scholar] [CrossRef]
























| Model | RMSE (°C) | MAE (°C) | RE | TCC |
|---|---|---|---|---|
| HiT_SR | 0.04636 | 0.02696 | 0.001641 | 0.999999756 |
| HiT_SR + E-DFE | 0.04405 | 0.02655 | 0.001559 | 0.999999906 |
| HiT_SR + Loss | 0.04705 | 0.02739 | 0.001665 | 0.999997563 |
| HiT_SR + GA | 0.04397 | 0.02658 | 0.001557 | 0.999999934 |
| HiT_SR + E-DFE + GA | 0.04364 | 0.02647 | 0.001545 | 0.999999929 |
| Model | RMSE (°C) | MAE (°C) | RE | TCC |
|---|---|---|---|---|
| HiT_SR | 0.17185 | 0.09215 | 0.009753 | 0.999999698 |
| HiT_SR + E-DFE | 0.17224 | 0.09393 | 0.009949 | 0.999999662 |
| HiT_SR + Loss | 0.16904 | 0.09171 | 0.009711 | 0.999999766 |
| HiT_SR + GA | 0.16887 | 0.09225 | 0.008938 | 0.999999756 |
| HiT_SR + GA + Loss | 0.16441 | 0.08980 | 0.008607 | 0.999999913 |
| Model | RMSE (m) | MAE (m) | RE | TCC |
|---|---|---|---|---|
| HiT_SR | 0.00207 | 0.00110 | 0.014070 | 0.999999716 |
| HiT_SR + E-DFE | 0.00219 | 0.00113 | 0.014939 | 0.999997704 |
| HiT_SR + Loss | 0.00202 | 0.00104 | 0.013681 | 0.999999799 |
| HiT_SR + GA | 0.00208 | 0.00108 | 0.014121 | 0.999999727 |
| HiT_SR + GA + Loss | 0.00210 | 0.00109 | 0.014096 | 0.999999812 |
| Model | RMSE (m) | MAE (m) | RE | TCC |
|---|---|---|---|---|
| HiT_SR | 0.00443 | 0.00282 | 0.007315 | 0.999999358 |
| HiT_SR + E-DFE | 0.00451 | 0.00295 | 0.007860 | 0.999997704 |
| HiT_SR + Loss | 0.00423 | 0.00269 | 0.006506 | 0.999999766 |
| HiT_SR + GA | 0.00421 | 0.00263 | 0.006191 | 0.999999556 |
| HiT_SR + GA + Loss | 0.00414 | 0.00260 | 0.005940 | 0.999999666 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, M.; Liu, W.; Chu, R.; Wang, X.; Zhu, S.; Liao, G. HiT_DS: A Modular and Physics-Informed Hierarchical Transformer Framework for Spatial Downscaling of Sea Surface Temperature and Height. Remote Sens. 2026, 18, 292. https://doi.org/10.3390/rs18020292
Wang M, Liu W, Chu R, Wang X, Zhu S, Liao G. HiT_DS: A Modular and Physics-Informed Hierarchical Transformer Framework for Spatial Downscaling of Sea Surface Temperature and Height. Remote Sensing. 2026; 18(2):292. https://doi.org/10.3390/rs18020292
Chicago/Turabian StyleWang, Min, Weixuan Liu, Rong Chu, Xidong Wang, Shouxian Zhu, and Guanghong Liao. 2026. "HiT_DS: A Modular and Physics-Informed Hierarchical Transformer Framework for Spatial Downscaling of Sea Surface Temperature and Height" Remote Sensing 18, no. 2: 292. https://doi.org/10.3390/rs18020292
APA StyleWang, M., Liu, W., Chu, R., Wang, X., Zhu, S., & Liao, G. (2026). HiT_DS: A Modular and Physics-Informed Hierarchical Transformer Framework for Spatial Downscaling of Sea Surface Temperature and Height. Remote Sensing, 18(2), 292. https://doi.org/10.3390/rs18020292
