Next Article in Journal
Issues with Large Area Thematic Accuracy Assessment for Mapping Cropland Extent: A Tale of Three Continents
Next Article in Special Issue
End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images
Previous Article in Journal
Monitoring Inter- and Intra-Seasonal Dynamics of Rapidly Degrading Ice-Rich Permafrost Riverbanks in the Lena Delta with TerraSAR-X Time Series
Previous Article in Special Issue
Remote Sensing Image Classification Based on Stacked Denoising Autoencoder
Article Menu
Issue 1 (January) cover image

Export Article

Open AccessArticle

Effective Fusion of Multi-Modal Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling

Key Laboratory of Spatial Information Processing and Application System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Huairou District, Beijing 100049, China
Institute for Applied Computer Science, Bundeswehr University Munich, Werner-Heisenberg-Weg 39, D-85577 Neubiberg, Germany
Author to whom correspondence should be addressed.
Remote Sens. 2018, 10(1), 52;
Received: 9 November 2017 / Revised: 17 December 2017 / Accepted: 28 December 2017 / Published: 29 December 2017
PDF [2247 KB, uploaded 29 December 2017]


In recent years, Fully Convolutional Networks (FCN) have led to a great improvement of semantic labeling for various applications including multi-modal remote sensing data. Although different fusion strategies have been reported for multi-modal data, there is no in-depth study of the reasons of performance limits. For example, it is unclear, why an early fusion of multi-modal data in FCN does not lead to a satisfying result. In this paper, we investigate the contribution of individual layers inside FCN and propose an effective fusion strategy for the semantic labeling of color or infrared imagery together with elevation (e.g., Digital Surface Models). The sensitivity and contribution of layers concerning classes and multi-modal data are quantified by recall and descent rate of recall in a multi-resolution model. The contribution of different modalities to the pixel-wise prediction is analyzed explaining the reason of the poor performance caused by the plain concatenation of different modalities. Finally, based on the analysis an optimized scheme for the fusion of layers with image and elevation information into a single FCN model is derived. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset (infrared and RGB imagery as well as elevation) and the Potsdam dataset (RGB imagery and elevation). Comprehensive evaluations demonstrate the potential of the proposed approach. View Full-Text
Keywords: semantic labeling; Fully Convolutional Networks; multi-modal dataset; fusion nets semantic labeling; Fully Convolutional Networks; multi-modal dataset; fusion nets

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Zhang, W.; Huang, H.; Schmitz, M.; Sun, X.; Wang, H.; Mayer, H. Effective Fusion of Multi-Modal Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling. Remote Sens. 2018, 10, 52.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Remote Sens. EISSN 2072-4292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top