# An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methods

#### 2.1. DCSTFN Introduction

#### 2.2. EDCSTFN Architecture

#### 2.2.1. Overall Architecture

#### 2.2.2. Compound Loss Function

#### 2.2.3. Enhanced Data Strategy

#### 2.2.4. Detailed Design

## 3. Experiments

#### 3.1. Study Area and Datasets

#### 3.2. Experiment Settings

## 4. Results and Discussion

#### 4.1. Evaluation Indices

#### 4.2. Experimental Results

#### 4.2.1. The Guangdong Region

#### 4.2.2. The Shandong Region

#### 4.3. Discussion

## 5. Conclusions and Prospects

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Xiaolin, Z.; Fangyi, C.; Jiaqi, T.; Trecia, W. Spatiotemporal Fusion of Multisource Remote Sensing Data: Literature Survey, Taxonomy, Principles, Applications, and Future Directions. Remote Sens.
**2018**, 10, 527. [Google Scholar] [CrossRef] [Green Version] - Amorós-López, J.; Gómez-Chova, L.; Alonso, L.; Guanter, L.; Zurita-Milla, R.; Moreno, J.; Camps-Valls, G. Multitemporal fusion of Landsat/TM and ENVISAT/MERIS for crop monitoring. Int. J. Appl. Earth Obs. Geoinf.
**2013**, 23, 132–141. [Google Scholar] [CrossRef] - Walker, J.J.; de Beurs, K.M.; Wynne, R.H.; Gao, F. Evaluation of Landsat and MODIS data fusion products for analysis of dryland forest phenology. Remote Sens. Environ.
**2012**, 117, 381–393. [Google Scholar] [CrossRef] - Yang, X.; Lo, C.P. Using a time series of satellite imagery to detect land use and land cover changes in the Atlanta, Georgia metropolitan area. Int. J. Remote Sens.
**2002**, 23, 1775–1798. [Google Scholar] [CrossRef] - Chen, B.; Huang, B.; Xu, B. Comparison of Spatiotemporal Fusion Models: A Review. Remote Sens.
**2015**, 7, 1798–1835. [Google Scholar] [CrossRef] [Green Version] - Shen, H.; Meng, X.; Zhang, L. An Integrated Framework for the Spatio–Temporal–Spectral Fusion of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens.
**2016**, 54, 7135–7148. [Google Scholar] [CrossRef] - Gao, F.; Masek, J.; Schwaller, M.; Hall, F. On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance. IEEE Trans. Geosci. Remote Sens.
**2006**, 44, 2207–2218. [Google Scholar] - Hilker, T.; Wulder, M.A.; Coops, N.C.; Seitz, N.; White, J.C.; Gao, F.; Masek, J.G.; Stenhouse, G. Generation of dense time series synthetic Landsat data through data blending with MODIS using a spatial and temporal adaptive reflectance fusion model. Remote Sens. Environ.
**2009**, 113, 1988–1999. [Google Scholar] [CrossRef] - Khaleghi, B.; Khamis, A.; Karray, F.O.; Razavi, S.N. Multisensor data fusion: A review of the state-of-the-art. Inf. Fusion
**2013**, 14, 28–44. [Google Scholar] [CrossRef] - Belgiu, M.; Stein, A. Spatiotemporal Image Fusion in Remote Sensing. Remote Sens.
**2019**, 11, 818. [Google Scholar] [CrossRef] [Green Version] - Roy, D.P.; Wulder, M.A.; Loveland, T.R.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Helder, D.; Irons, J.R.; Johnson, D.M.; Kennedy, R.; et al. Landsat-8: Science and product vision for terrestrial global change research. Remote Sens. Environ.
**2014**, 145, 154–172. [Google Scholar] [CrossRef] [Green Version] - Justice, C.O.; Vermote, E.; Townshend, J.R.G.; Defries, R.; Roy, D.P.; Hall, D.K.; Salomonson, V.V.; Privette, J.L.; Riggs, G.; Strahler, A.; et al. The Moderate Resolution Imaging Spectroradiometer (MODIS): Land remote sensing for global change research. IEEE Trans. Geosci. Remote Sens.
**1998**, 36, 1228–1249. [Google Scholar] [CrossRef] [Green Version] - Tan, Z.; Yue, P.; Di, L.; Tang, J. Deriving High Spatiotemporal Remote Sensing Images Using Deep Convolutional Network. Remote Sens.
**2018**, 10, 1066. [Google Scholar] [CrossRef] [Green Version] - Acerbi-Junior, F.W.; Clevers, J.G.P.W.; Schaepman, M.E. The assessment of multi-sensor image fusion using wavelet transforms for mapping the Brazilian Savanna. Int. J. Appl. Earth Obs. Geoinf.
**2006**, 8, 278–288. [Google Scholar] [CrossRef] - Ghassemian, H. A review of remote sensing image fusion methods. Inf. Fusion
**2016**, 32, 75–89. [Google Scholar] [CrossRef] - Hilker, T.; Wulder, M.A.; Coops, N.C.; Linke, J.; McDermid, G.; Masek, J.G.; Gao, F.; White, J.C. A new data fusion model for high spatial- and temporal-resolution mapping of forest disturbance based on Landsat and MODIS. Remote Sens. Environ.
**2009**, 113, 1613–1627. [Google Scholar] [CrossRef] - Zhu, X.; Helmer, E.H.; Gao, F.; Liu, D.; Chen, J.; Lefsky, M.A. A flexible spatiotemporal method for fusing satellite images with different resolutions. Remote Sens. Environ.
**2016**, 172, 165–177. [Google Scholar] [CrossRef] - Lu, L.; Huang, Y.; Di, L.; Hang, D. A New Spatial Attraction Model for Improving Subpixel Land Cover Classification. Remote Sens.
**2017**, 9, 360. [Google Scholar] [CrossRef] [Green Version] - Huang, B.; Zhang, H.; Song, H.; Wang, J.; Song, C. Unified fusion of remote-sensing imagery: Generating simultaneously high-resolution synthetic spatial–temporal–spectral earth observations. Remote Sens. Lett.
**2013**, 4, 561–569. [Google Scholar] [CrossRef] - Xue, J.; Leung, Y.; Fung, T. A Bayesian Data Fusion Approach to Spatio-Temporal Fusion of Remotely Sensed Images. Remote Sens.
**2017**, 9, 2310. [Google Scholar] [CrossRef] [Green Version] - Cammalleri, C.; Anderson, M.C.; Gao, F.; Hain, C.R.; Kustas, W.P. Mapping daily evapotranspiration at field scales over rainfed and irrigated agricultural areas using remote sensing data fusion. Agric. For. Meteorol.
**2014**, 186, 1–11. [Google Scholar] [CrossRef] [Green Version] - Huang, B.; Song, H. Spatiotemporal Reflectance Fusion via Sparse Representation. IEEE Trans. Geosci. Remote Sens.
**2012**, 50, 3707–3716. [Google Scholar] [CrossRef] - Song, H.; Liu, Q.; Wang, G.; Hang, R.; Huang, B. Spatiotemporal Satellite Image Fusion Using Deep Convolutional Neural Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2018**, 11, 821–829. [Google Scholar] [CrossRef] - Song, H.; Huang, B. Spatiotemporal Satellite Image Fusion Through One-Pair Image Learning. IEEE Trans. Geosci. Remote Sens.
**2013**, 51, 1883–1896. [Google Scholar] [CrossRef] - Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw.
**2015**, 61, 85–117. [Google Scholar] [CrossRef] [Green Version] - LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] - Ducournau, A.; Fablet, R. Deep learning for ocean remote sensing: An application of convolutional neural networks for super-resolution on satellite-derived SST data. In Proceedings of the 2016 9th IAPR Workshop on Pattern Recogniton in Remote Sensing (PRRS), Cancun, Mexico, 4 December 2016; pp. 1–6. [Google Scholar]
- Liu, Y.; Chen, X.; Peng, H.; Wang, Z. Multi-focus image fusion with a deep convolutional neural network. Inf. Fusion
**2017**, 36, 191–207. [Google Scholar] [CrossRef] - Masi, G.; Cozzolino, D.; Verdoliva, L.; Scarpa, G. Pansharpening by Convolutional Neural Networks. Remote Sens.
**2016**, 8, 594. [Google Scholar] [CrossRef] [Green Version] - Wei, Y.; Yuan, Q.; Shen, H.; Zhang, L. Boosting the Accuracy of Multispectral Image Pansharpening by Learning a Deep Residual Network. IEEE Geosci. Remote Sens. Lett.
**2017**, 14, 1795–1799. [Google Scholar] [CrossRef] [Green Version] - Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O. Multispectral and Hyperspectral Image Fusion Using a 3-D-Convolutional Neural Network. IEEE Geosci. Remote Sens. Lett.
**2017**, 14, 639–643. [Google Scholar] [CrossRef] - Scarpa, G.; Gargiulo, M.; Mazza, A.; Gaetano, R. A CNN-Based Fusion Method for Feature Extraction from Sentinel Data. Remote Sens.
**2018**, 10, 236. [Google Scholar] [CrossRef] [Green Version] - Liu, X.; Deng, C.; Chanussot, J.; Hong, D.; Zhao, B. StfNet: A Two-Stream Convolutional Neural Network for Spatiotemporal Image Fusion. IEEE Trans. Geosci. Remote. Sens.
**2019**, 1–13. [Google Scholar] [CrossRef] - Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss Functions for Neural Networks for Image Processing. ArXiv
**2015**, 3, 47–57. [Google Scholar] - Dumoulin, V.; Visin, F. A Guide to Convolution Arithmetic for Deep Learning. arXiv
**2016**, arXiv:1603.07285. [Google Scholar] - Wu, B.; Duan, H.; Liu, Z.; Sun, G. SRPGAN: Perceptual Generative Adversarial Network for Single Image Super Resolution. arXiv
**2017**, arXiv:1712.05927. [Google Scholar] - Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing
**2017**, 234, 11–26. [Google Scholar] [CrossRef] - Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process.
**2004**, 13, 600–612. [Google Scholar] [CrossRef] [Green Version] - Roy, D.P.; Ju, J.; Lewis, P.; Schaaf, C.; Gao, F.; Hansen, M.; Lindquist, E. Multi-temporal MODIS–Landsat data fusion for relative radiometric normalization, gap filling, and prediction of Landsat data. Remote Sens. Environ.
**2008**, 112, 3112–3130. [Google Scholar] [CrossRef] - Odena, A.; Dumoulin, V.; Olah, C. Deconvolution and Checkerboard Artifacts. Distill
**2016**. [Google Scholar] [CrossRef] - Vermote, E. MOD09A1 MODIS/Terra Surface Reflectance 8-Day L3 Global 500m SIN Grid V006. NASA EOSDIS Land Process. DAAC
**2015**, 10. [Google Scholar] [CrossRef] - Paszke, A.; Gross, S.; Chintala, S.; Chanan, G. Pytorch: Tensors and Dynamic Neural Networks in Python with Strong GPU Acceleration. 2017. Available online: https://github.com/pytorch/pytorch (accessed on 4 December 2019).
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Jagalingam, P.; Hegde, A.V. A Review of Quality Metrics for Fused Image. Aquat. Procedia
**2015**, 4, 133–142. [Google Scholar] [CrossRef] - Wang, Q.; Yu, D.; Shen, Y. An overview of image fusion metrics. In Proceedings of the 2009 IEEE Instrumentation and Measurement Technology Conference, Singapore, 5–7 May 2009; pp. 918–923. [Google Scholar] [CrossRef]
- Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Proceedings of the Summaries of the Third Annual JPL Airborne Geoscience Workshop, Pasadena, CA, USA, 1–5 June 1992; pp. 147–149. [Google Scholar]
- Ranchin, T.; Wald, L. Fusion of high spatial and spectral resolution images: The ARSIS concept and its implementation. Photogramm. Eng. Remote Sens.
**2000**, 66, 49–61. [Google Scholar] - Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv
**2015**, arXiv:1502.03167. [Google Scholar] - Vedaldi, V.L.D.U.A. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv
**2016**, arXiv:1607.08022. [Google Scholar] - Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Sajjadi, M.S.; Schölkopf, B.; Hirsch, M. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4501–4510. [Google Scholar]
- Shridhar, K.; Laumann, F.; Liwicki, M. A comprehensive guide to bayesian convolutional neural network with variational inference. arXiv
**2019**, arXiv:1901.02731. [Google Scholar] - Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data
**2016**, 3, 9. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Comparison of general architecture between the deep convolutional spatiotemporal fusion network (DCSTFN) and enhanced deep convolutional spatiotemporal fusion network (EDCSTFN) model for Moderate Resolution Image Spectroradiometer (MODIS)-Landsat image fusion (the input of low temporal but high spatial resolution (LTHS) encoder is a Landsat image at reference time ${t}_{k}$; reference MODIS data at time ${t}_{k}$ and the MODIS data on prediction date ${t}_{1}$ share the same high temporal but low spatial resolution (HTLS) encoder for DCSTFN model. The inputs of residual encoder includes at least one pair of reference images at time ${t}_{k}$ and a MODIS image at time ${t}_{1}$ for prediction; and the output is a Landsat-like image on prediction date.

**Figure 4.**The detailed design of EDCSTFN model for MODIS-Landsat fusion (the three parameters of a convolution are kernel size, input and output channels; the kernel size is empirically set to 3 except for the last layer. The ⨁ denotes element-wise addition of multi-dimensional arrays).

**Figure 5.**The study area (MODIS tiles are denoted with orange; Landsat scenes are rendered in light grey and the experiment areas are labeled in light blue).

**Figure 7.**Quantitative evaluation results for Guangdong area (for root mean square error (RMSE), relative dimensionless global error (ERGAS), spectral angle mapper (SAM) and multi-scale structural similarity (MS-SSIM), the values are averaged among all the four bands).

**Figure 8.**The fusion results on 7 December 2016 in the P122R043 region (the first column exhibits the standard true color composite images; the second column gives the bias between prediction and ground truth; the third column exhibits the zoomed-in details of the red rectangles marked in the first column; the last column is the calculated normalized difference vegetation index (NDVI) corresponding to the third column).

**Figure 12.**The fusion results on 24 November 2017 in P122R035 region (the first column exhibits the overviews of the whole scene. The second column shows the zoomed-in details of the red rectangles marked in the first column. The third column gives the bias between fusion results and ground truth corresponding to the second column. The fourth column presents the zoomed-in details of the yellow rectangles in the second column. The last column is the calculated NDVI correspondent to the fourth column).

STARFM | FSDAF | DCSTFN | EDCSTFN-S | EDCSTFN-M | |
---|---|---|---|---|---|

RMSE | 0.0220 | 0.0226 | 0.0201 | 0.0176 | 0.0174 |

ERGAS | 2.2708 | 2.3424 | 1.8838 | 1.5199 | 1.5268 |

SAM | 0.0678 | 0.0681 | 0.0689 | 0.0562 | 0.0562 |

SSIM | 0.9079 | 0.9001 | 0.9060 | 0.9294 | 0.9290 |

STARFM-I | STARFM-II | ESTARFM | DCSTFN | StfNet | EDCSTFN-I | EDCSTFN-II | |
---|---|---|---|---|---|---|---|

RMSE | 0.0243 | 0.0221 | 0.0260 | 0.0230 | 0.0206 | 0.0172 | 0.0154 |

ERGAS | 1.2541 | 1.1436 | 1.4637 | 1.2242 | 1.1737 | 0.9249 | 0.8353 |

SAM | 0.0738 | 0.0676 | 0.0624 | 0.0783 | 0.0792 | 0.0616 | 0.0507 |

SSIM | 0.8963 | 0.8948 | 0.9216 | 0.8472 | 0.9161 | 0.9161 | 0.9352 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tan, Z.; Di, L.; Zhang, M.; Guo, L.; Gao, M.
An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion. *Remote Sens.* **2019**, *11*, 2898.
https://doi.org/10.3390/rs11242898

**AMA Style**

Tan Z, Di L, Zhang M, Guo L, Gao M.
An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion. *Remote Sensing*. 2019; 11(24):2898.
https://doi.org/10.3390/rs11242898

**Chicago/Turabian Style**

Tan, Zhenyu, Liping Di, Mingda Zhang, Liying Guo, and Meiling Gao.
2019. "An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion" *Remote Sensing* 11, no. 24: 2898.
https://doi.org/10.3390/rs11242898