Satellite Image Multi-Frame Super Resolution Using 3D Wide-Activation Neural Networks
Abstract
:1. Introduction
2. Materials and Methods
2.1. Image Dataset
2.1.1. Dataset Characteristics
- One-thousand one-hundred sixty images from 74 hand-selected regions were collected at different points in time.
- This was divided into two spectral bands: RED with 594 images and NIR with 566; a radiometrically and geometrically corrected top-of-atmosphere reflectance in plate carrée projection was used for both bands.
- The LR size is 128 × 128 and the HR size 384 × 384, and both have a bit-depth of 14 bits, but saved as 16-bit png format.
- Each scene has a range of LR images (from a minimum of 9 to a maximum of 30) and one HR image.
- The mean geolocation accuracy of PROBA-V is about 61 m, and (sub-)pixel shifts can occur as the images are delivered as recorded by PROBA-V (not aligned with each other). This induces a possible 1 pixel shift between the LR frames of an image.
- For each LR and HR image, there is a mask that indicates which pixels can be reliably used for reconstruction. The masks were provided by Märtens et al. [20] and are based on a status map containing information on the presence of artifacts (clouds, cloud shadows, and ice/snow) generated from ESA’s Land Cover Climate Change Initiative and ESA’s GlobAlbedo surface reflectance data. The exact procedure on how this map is generated can be found in Section 2.2.3 of the PROBA-V product manual [26].
2.2. Network Architecture
- Convolutional layers are replaced by 3D WDSR blocks
- Bicubic upsampling is replaced by 2D WDSR blocks.
2.2.1. WDSR Blocks
2.3. Preprocessing and Data Augmentation
- Register all frames from each image to the corresponding first frame by phase cross-correlation to align the possible pixel-shifts (see the dataset characteristics in Section 2.2.1). We used the implementation from scikit-image: phase_cross_correlation [29].
- Remove images where all of their frames had more than 15% dirty pixels. A dirty pixel is a pixel that has a mask showing the presence of an artifact, as explained in Section 2.2.1.
- Select k best frames (). To do this, we sort all the frames from the cleanest to the dirtiest (as a percentage of the total number of pixels) based on the masks provided and select the first k frames.
- Extract 16 patches per image.
- Remove instances where the HR target patch had more than 15% dirty pixels.
2.4. Training
Quality Metric and Loss Function
3. Results
3.1. Comparisons
- DeepSUM: It performs a bicubic upsampling of images before feeding the network. It uses convolutional layers with 64 filters, a 96 × 96 patch size, and 9 frames. When using this approach, higher specifications are needed because of increasing memory cost, making it impossible to train on low specification equipment. It was trained on a Nvidia 1080Ti GPU.
- Highres-net: This method upscales after fusion, so memory usage is reduced. However, it still needs 64 convolutional filters, 16 frames, and 64 × 64 patches to reach the best score. Scores are improved by averaging the outputs of two pretrained networks.
- 3DWDSRnet: Our method follows the Highres-net approach of upscaling after fusion, but achieves similar scores using less than half of the image frames (7), half the patch size (34 × 34), and 32 convolutional filters. Moreover, there is no need to average two models to obtain these results. Figure 5 shows a real prediction done by our best model.
4. Conclusions
5. Discussion
- Further investigate data augmentation methods to benefit from multiple frames such as more interesting permutations, inserting of pixel variations simulating clouds, and changes in image color, brightness, contrast, etc.
- Ensemble results from multiple models as done in Highres-net [22].
- Try different patch sizes and see how this affects the performance.
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
SR | Super Resolution |
MISR | Multi-Image Super Resolution |
MFSR | Multi-Frame Super Resolution |
CNN | Convolutional Neural Network |
Conv | Convolutional |
AI | Artificial Intelligence |
MAE | Mean Absolute Error |
MSE | Mean Squared Error |
cPSNR | clean Peak Signal-to-Noise Ratio |
References
- Knuth, M.I. Small Satellite Market—Growth, Trends, and Forecast (2020–2025). Available online: https://www.mordorintelligence.com/industry-reports/small-satellite-market (accessed on 23 September 2020).
- Xu, H.J.; Wang, X.P.; Yang, T.B. Trend shifts in satellite-derived vegetation growth in Central Eurasia, 1982–2013. Sci. Total Environ. 2017, 579, 1658–1674. [Google Scholar] [CrossRef] [PubMed]
- Martínez-Fernández, J.; Almendra-Martín, L.; de Luis, M.; González-Zamora, A.; Herrero-Jiménez, C. Tracking tree growth through satellite soil moisture monitoring: A case study of Pinus halepensis in Spain. Remote Sens. Environ. 2019, 235, 111422. [Google Scholar] [CrossRef]
- Ricker, R.; Hendricks, S.; Girard-Ardhuin, F.; Kaleschke, L.; Lique, C.; Tian-Kunze, X.; Nicolaus, M.; Krumpen, T. Satellite-observed drop of Arctic sea ice growth in winter 2015–2016. Geophys. Res. Lett. 2017, 44, 3236–3245. [Google Scholar] [CrossRef] [Green Version]
- Liu, Q.; Hang, R.; Song, H.; Li, Z. Learning multiscale deep features for high-resolution satellite image scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 56, 117–126. [Google Scholar] [CrossRef]
- Sweeting, M. Modern small satellites-changing the economics of space. Proc. IEEE 2018, 106, 343–361. [Google Scholar] [CrossRef]
- Luo, Y.; Zhou, L.; Wang, S.; Wang, Z. Video satellite imagery super resolution via convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2398–2402. [Google Scholar] [CrossRef]
- Book, D.R. Flexible Advanced Coding and Modulation Scheme for High Rate Telemetry Applications. Available online: https://public.ccsds.org/Pubs/131x2b1e1.pdf (accessed on 18 November 2020).
- Wertz, P.; Kiessling, M.; Hagmanns, F.J. Maximizing Data Throughput in Earth Observation Satellite to Ground Transmission by Employing a Flexible High Data Rate Transmitter Operating in X-Band and Ka-Band. In Proceedings of the 36th International Communications Satellite Systems Conference (ICSSC 2018), Niagara Falls, ON, Canada, 15–18 October 2018. [Google Scholar]
- Bertolucci, M.; Falaschi, F.; Cassettari, R.; Davalle, D.; Fanucci, L. A Comprehensive Trade-off Analysis on the CCSDS 131.2-B-1 Extended ModCod (SCCC-X) Implementation. In Proceedings of the 2020 23rd IEEE Euromicro Conference on Digital System Design (DSD), Kranj, Slovenia, 26–28 August 2020; pp. 126–132. [Google Scholar]
- Demirel, H.; Anbarjafari, G. Satellite image resolution enhancement using complex wavelet transform. IEEE Geosci. Remote Sens. Lett. 2009, 7, 123–126. [Google Scholar] [CrossRef]
- Demirel, H.; Anbarjafari, G. Discrete wavelet transform-based satellite image resolution enhancement. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1997–2004. [Google Scholar] [CrossRef]
- Iqbal, M.Z.; Ghafoor, A.; Siddiqui, A.M. Satellite image resolution enhancement using dual-tree complex wavelet transform and nonlocal means. IEEE Geosci. Remote Sens. Lett. 2012, 10, 451–455. [Google Scholar] [CrossRef]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2472–2481. [Google Scholar]
- Sun, W.; Chen, Z. Learned image downscaling for upscaling using content adaptive resampler. IEEE Trans. Image Process. 2020, 29, 4027–4040. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Anwar, S.; Barnes, N. Densely residual laplacian super-resolution. arXiv 2019, arXiv:1906.12021. [Google Scholar] [CrossRef] [PubMed]
- Sajjadi, M.S.; Vemulapalli, R.; Brown, M. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6626–6634. [Google Scholar]
- Jo, Y.; Wug Oh, S.; Kang, J.; Joo Kim, S. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3224–3232. [Google Scholar]
- Kim, S.Y.; Lim, J.; Na, T.; Kim, M. 3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks. arXiv 2018, arXiv:1812.09079. [Google Scholar]
- Märtens, M.; Izzo, D.; Krzic, A.; Cox, D. Super-resolution of PROBA-V images using convolutional neural networks. Astrodynamics 2019, 3, 387–402. [Google Scholar] [CrossRef]
- Molini, A.B.; Valsesia, D.; Fracastoro, G.; Magli, E. DeepSUM: Deep neural network for Super-resolution of Unregistered Multitemporal images. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3644–3656. [Google Scholar] [CrossRef] [Green Version]
- Deudon, M.; Kalaitzis, A.; Goytom, I.; Rifat-Arefin, M.; Lin, Z.; Sankaran, K.; Michalski, V.; Kahou, S.; Cornebise, J.; Bengio, Y. HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery. arXiv 2020, arXiv:2002.06460. [Google Scholar]
- Yu, J.; Fan, Y.; Yang, J.; Xu, N.; Wang, Z.; Wang, X.; Huang, T. Wide activation for efficient and accurate image super-resolution. arXiv 2018, arXiv:1808.08718. [Google Scholar]
- Website, K.E.A.C.C. PROBA-V Super Resolution Post Mortem. Available online: https://kelvins.esa.int/proba-v-super-resolution-post-mortem/leaderboard (accessed on 23 September 2020).
- Website, K.E.A.C.C. PROBA-V Super Resolution Competition. Available online: https://kelvins.esa.int/proba-v-super-resolution (accessed on 23 September 2020).
- Wolters, E.; Dierckx, W.; Iordache, M.D.; Swinnen, E. PROBA-V Products User Manual. VITO. Available online: https://proba-v.vgt.vito.be/sites/proba-v.vgt.vito.be/files/products_user_manual.pdf (accessed on 18 November 2020).
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Krasserm. Single Image Super-Resolution with EDSR, WDSR and SRGAN. Available online: https://github.com/krasserm/super-resolution (accessed on 23 September 2020).
- Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T. Scikit-image: Image processing in Python. PeerJ 2014, 2, e453. [Google Scholar] [CrossRef] [PubMed]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Fan, Y.; Shi, H.; Yu, J.; Liu, D.; Han, W.; Yu, H.; Wang, Z.; Wang, X.; Huang, T.S. Balanced two-stage residual networks for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 161–168. [Google Scholar]
- Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for neural networks for image processing. arXiv 2015, arXiv:1511.08861. [Google Scholar]
- Dorr, F. 3DWDSR: Multiframe Super Resolution Framework Applied to PROBA-V Challenge. Available online: https://github.com/frandorr/PROBA-V-3DWDSR (accessed on 18 November 2020).
- Bajo, M. Multi-Frame Super Resolution of Unregistered Temporal Images Using WDSR Nets. Available online: https://github.com/mmbajo/PROBA-V (accessed on 18 November 2020).
Method | Patch | Frames | Loss | Normalization | Score | Memory Requirement |
---|---|---|---|---|---|---|
DeepSUM | 96 × 96 (bicubic) | 9 | cMSE | Instance | 0.94745 | +++ |
HighResnet | 64 × 64 | 16 | cMSE | Batch | 0.94774 | ++ |
3DWDSRnet (ours) | 34 × 34 | 5 | cMAE | Weight | 0.97933 | + |
3DWDSRnet (ours) | 34 × 34 (aug) | 5 | cMAE | Weight | 0.96422 | + |
3DWDSRnet (ours) | 34 × 34 (aug) | 7 | cMAE | Weight | 0.94625 | + |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dorr, F. Satellite Image Multi-Frame Super Resolution Using 3D Wide-Activation Neural Networks. Remote Sens. 2020, 12, 3812. https://doi.org/10.3390/rs12223812
Dorr F. Satellite Image Multi-Frame Super Resolution Using 3D Wide-Activation Neural Networks. Remote Sensing. 2020; 12(22):3812. https://doi.org/10.3390/rs12223812
Chicago/Turabian StyleDorr, Francisco. 2020. "Satellite Image Multi-Frame Super Resolution Using 3D Wide-Activation Neural Networks" Remote Sensing 12, no. 22: 3812. https://doi.org/10.3390/rs12223812
APA StyleDorr, F. (2020). Satellite Image Multi-Frame Super Resolution Using 3D Wide-Activation Neural Networks. Remote Sensing, 12(22), 3812. https://doi.org/10.3390/rs12223812