Review and Evaluation of Deep Learning Architectures for Efficient Land Cover Mapping with UAS Hyper-Spatial Imagery: A Case Study Over a Wetland
Abstract
:1. Introduction
2. Deep Learning for Semantic Image Segmentation
- introduction of more advanced and deeper CNN feature encoders that are efficiently trained using recently developed advanced optimization algorithms.
- utilizing a more advanced decoding strategy to the final low-resolution encoded feature maps in an encoder–decoder architecture using deconvolution or dilated convolution to efficiently increase their resolution for pixel-wise prediction.
- using the skip connection to introduce low-level abstract information to the high-level abstract information to build highly accurate feature maps representing pixel-level feature information.
2.1. Feature Encoders
- VGG-Net. VGG-Net [55] was invented in 2014 by Oxford’s Visual Geometry Group as a successful effort to build and train a very deep CNN. VGG-Net showed that the depth of a network is a critical component in CNNs to achieve high performance in recognition or classification. By shrinking the convolution kernels to yet increasing the number of sequences of convolutional layers and feature maps in each convolution layer, VGG is able to train deeper architecture with appropriate receptive field comparable with AlexNet for recognition tasks.
- GoogLeNet. GoogLeNet [56] (a.k.a. Inception Net) from Google in 2015 was proposed by Szegedy et al. with the objective of reducing computation complexity compared to the traditional CNNs. Inception module, which makes building block for the network, is a combination of , , and convolutional kernels and a pooling layer. The motivation behind inception module is to increase the receptive field without losing fine information. By learning and combining features with different scales in parallel in each inception module, GoogLeNet is able to learn feature hierarchy in a multi-scale manner while its innovative architecture reduces the number of trainable parameters in a really deep framework (22 layers) to less than 5 million parameters in comparison to 62 million and 138 million parameters in AlexNet and VGG-Net, respectively. To train a deep stack of inception modules in an efficient way, bottleneck approach is exploited in which extra convolutions reduce the dimensionality of feature maps that enter the inception module from the previous layer. This helps to avoid parameter explosion in inception modules and the overfitting problem in the whole network. Figure 1 illustrates the architecture of the inception module. Other versions of inception modules including BN-Inception [59], Inception V2, and Inception V3 [60] were later proposed. In order to increase the efficiency and performance of inception modules, in 2017, Szegedyetal et al. proposed a combined version of inception modules and residual network (ResNet) modules known as Inception-ResNet [61]. Xception [62], which stands for extreme version of inception, was proposed by Chollet et al. in 2017. The motivation behind it is to disjointly map cross-channels and spatial information in feature maps as their correlation is sufficiently decoupled. As a result, the depthwise separable convolutions from inception modules are modified in Xception modules as separable pointwise convolutions follow by depthwise convolutions.
- ResNet. As mentioned above, deeper networks can improve the performance of deep learning approach to solve complex visual tasks, but they are more prone to the notorious problem of vanishing/exploding gradients during training as well. It may lead to not only saturated accuracy, but also degradation of training accuracy. ResNet [57] designed by He et al. in 2015 exploits residual blocks to overcome the vanishing gradient problem in very deep CNNs by introducing identity shortcut connections to successive convolution layers as shown in Figure 2. The shortcut connections in residual blocks help gradients flow easily in back propagation step which leads to gaining accuracy during the training phase in a very deep network. Referring to Figure 2, each unit calculates a residual function , in which x is the output of the previous residual unit and denotes the desired underlying mapping. More precisely, if is the output of the lth residual unit with weights , thenAccording to Figure 3, different variants of residual unit were proposed, which consists of different combinations of convolutional layers, batch normalization (BN) [59], and rectified linear unit [63] activation function [57,64]. In our experiment, we use the full pre-activation variant of residual unit proposed by He et al. [57,64] to build our architectures, which use ResNet as their feature encoder. ResNeXt [65] proposed by Saining Xie in 2017 is a highly modularized version of ResNet architecture based on split transform aggregate strategy as an inception module for image classification. Its innovative, simple design results in homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This approach exposes a new dimension called cardinality, the size of the set of transformations, as an essential factor in addition to other critical factors such as depth and width. The network is constructed by stacking repeating building blocks that aggregate a set of transformations with the same topology. Inspired by a residual network, several modifications, new designs, and architectures were proposed for different image understanding tasks [57,66,67,68]. For instance, Figure 4 illustrates an inception-ResNet block called Inception ResNet-A module of the Inception ResNet-v2 network [61]. Other variants of inception-ResNet blocks including Inception ResNet-B and Inception ResNet-C modules were also proposed by Szegedy et al. [61] in 2017.
- DenseNet. Inspired by ResNet and the idea that shorter connections between layers close to the input and those close to the output can help to train substantially deeper CNNs more accurately and efficiently, Huang et al. proposed DenseNet [58] in 2017. The architecture consists of densely connected CNN blocks in which the output feature maps of each layer are concatenated with the output feature maps of all successor layers in a dense block as shown in Figure 5. If lth layer receives all the feature maps from all preceding layers, , , …, , as input then:
- MobileNet. Since the advancement of deep learning, the general trend has been to make deeper and more complicated networks to improve model performance [55,60,61]. However, these advances to improve accuracy are not necessarily making networks more efficient with respect to size and speed. In many real world applications such as self-driving car, robotics, and augmented reality, the timely-fashioned or almost real-time prediction and recognition tasks need to be carried out on a computationally limited platform. Inspired by depthwise separable convolutions [69] to reduce the computation in the first few layers, a class of efficient models, called MobileNets [70,71], for mobile and embedded vision applications was introduced by Howard et al. in 2017. This class of models presents a streamlined-base architecture that uses depthwise separable convolutions to build lightweight deep neural networks. According to Figure 6, the depthwise separable convolution is a form of factorized convolutions factorizing a standard convolution into a depthwise convolution, which applies a single filter to each input channel, and a convolution called a pointwise convolution to change the dimensions and linearly combine the output feature maps from depthwise convolutions. The depthwise separable convolution technique results in a drastic reduction in computation complexity and model size. Figure 7 illustrates two variants of MobileNet architectures. According to Figure 7, in MobileNetV1 [70], there are two layers including depthwise and pointwise convolutions. M and N are the number of input and output channels, respectively, and and are the sizes of feature maps and filter size, respectively. BN and ReLU activation function are both applied after convolutional layers. MobileNet introduces two hyper-parameters to the network including width multiplier, , to control the input width of a convolutional layer and resolution multiplier , to control the input image resolution of the network. and are hyper-parameters for the baseline MobileNets and and are considered for any reduced computation MobileNets. Computational cost and the number of parameters are reduced by roughly . However, the accuracy drops off as and decrease. MobileNetV2 [71] is a significant improvement over MobileNetV1 with high potential of reaching the state-of-the-art performance for mobile visual recognition tasks. It was also built upon the idea of depthwise separable convolution already applied in MobileNetV1 as efficient building blocks. In MobileNetV2, there are two types of blocks. One block is a residual block with stride of 1 and a second block with stride of 2 for downsampling. Both blocks include three layers. The first layer of each block in MobileNetV2 includes a convolution with ReLU activation function. The second layer is a depthwise convolution, and the third layer is another convolution but without any activation function.
2.2. Decoding Approaches
2.3. Transfer Learning
2.4. Performance Metrics
3. Materials and Methods
3.1. Study Site
3.2. Data Collection and Preparation
3.3. Deep Learning Architectures
- Encoder–Decoder (SegNet). SegNet architecture, displayed in Figure 9, is examined in this study, which is a relatively old deep learning network for semantic image segmentation task. It uses VGG network as its encoder to hierarchically extract features from input images. The encoder network consists of 13 convolutional layers corresponding to the first 13 convolutional layers in the VGG-16 network. In our experiment, we use weights from pre-trained VGG-16 network to initialize the training process. Each encoder layer has a corresponding decoder layer that upsamples the feature maps by using the stored pooled indices.
- U-Net. U-Net is a famous deep architecture based on an encoder–decoder principle that instead of using pooling indices, it transfers and exploits the entire feature maps from encoder to decoder. Upsampling strategy can have a great impact on the final accuracy of pixel-wise image classification. Comparing the performance of SegNet and U-Net architecture can tell us more about the effectiveness of those two upsampling strategies. Figure 10 illustrates U-Net architecture with ResNet-34 network for feature extraction in this study.
- FC-DenseNet. To explore the efficiency of DensNet architecture in feature learning for pixel-wise classification of coastal wetland images, the one hundred layer tiramisu model (FC-DenseNet), as shown in Figure 11, is employed which uses 56 convolutional layers, with four layers per dense block and a growth rate of 12. Similar to U-Net architecture, FC-DenseNet exploits U-shape encoder–decoder structure with skip connections between the downsampling and the upsampling paths to add higher resolution information to the final feature map. Unique characteristics of feature reuse, compactness, and substantially reduced number of parameters in FC-DenseNet architecture is evaluated in our experiment based on its performance when training the network from scratch using a limited dataset, which is the case here.
- DeepLabV3+. Effectiveness of ASPP to encode multi-scale contextual information in images acquired over complex coastal wetland is investigated by examining DeepLabV3+ architecture illustrated in Figure 12. This architecture is able to perform several parallel atrus convolution with different rates.
- PSPNet. As illustrated in Figure 13, PSPNet, which uses pyramid pooling module for more reliable prediction, is also investigated for this study. Specifically, this module is able to extract global context information through aggregating different regional context information.
- MobileU-Net. Considering the idea of depthwise separable convolution in MobileNet and feature map upsampling in U-Net architecture, MobileU-Net architecture, illustrated in Figure 14, is implemented in this study. The performance of this architecture in pixel-wise image labeling of hyper-spatial UAS images may give us an estimation of the accuracy achievement in real-time land cover mapping.
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Boon, M.; Greenfield, R.; Tesfamichael, S. Wetland assessment using unmanned aerial vehicle (UAV) photogrammetry. In Proceedings of the XXIII ISPRS Congress, Prague, Czech Republic, 12–19 July 2016. [Google Scholar]
- Laliberte, A.S.; Rango, A.; Herrick, J. Unmanned aerial vehicles for rangeland mapping and monitoring: A comparison of two systems. In Proceedings of the ASPRS 2007 Annual Conference, Tampa, FL, USA, 7–11 May 2007. [Google Scholar]
- Pashaei, M.; Starek, M.J. Fully Convolutional Neural Network for Land Cover Mapping In A Coastal Wetland with Hyperspatial UAS Imagery. In Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 6106–6109. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Simultaneous detection and segmentation. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 297–312. [Google Scholar]
- Stedman, S.M.; Dahl, T.E. Status and Trends of Wetlands in the Coastal Watersheds of The Eastern United States, 1998 to 2004. 2008. Available online: https://www.fws.gov/wetlands/Documents/Status-and-Trends-of-Wetlands-in-the-Coastal-Watersheds-of-the-Eastern-United-States-1998-to-2004.pdf (accessed on 13 March 2020).
- Pendleton, L.H. The Economic and Market Value of Coasts and Estuaries What’s at Stake; Restore America’s Estuaries: Arlington, VA, USA, 2011. [Google Scholar]
- Olmsted, I.C.; Armentano, T.V. Vegetation of Shark Slough, Everglades National Park; South Florida Natural Resources Center, Everglades National Park Homestead: Homestead, FL, USA, 1997.
- Belluco, E.; Camuffo, M.; Ferrari, S.; Modenese, L.; Silvestri, S.; Marani, A.; Marani, M. Mapping salt-marsh vegetation by multispectral and hyperspectral remote sensing. Remote Sens. Environ. 2006, 105, 54–67. [Google Scholar] [CrossRef]
- Smith, G.M.; Spencer, T.; Murray, A.L.; French, J.R. Assessing seasonal vegetation change in coastal wetlands with airborne remote sensing: An outline methodology. Mangroves Salt Marshes 1998, 2, 15–28. [Google Scholar] [CrossRef]
- Cahoon, D.R.; Guntenspergen, G.R. Climate change, sea-level rise, and coastal wetlands. Natl. Wetl. Newsl. 2010, 32, 8–12. [Google Scholar]
- Silvestri, S.; Marani, M.; Marani, A. Hyperspectral remote sensing of salt marsh vegetation, morphology and soil topography. Phys. Chem. Earth Parts A/B/C 2003, 28, 15–25. [Google Scholar] [CrossRef]
- Taramelli, A.; Valentini, E.; Cornacchia, L.; Monbaliu, J.; Sabbe, K. Indications of dynamic effects on scaling relationships between channel sinuosity and vegetation patch size across a salt marsh platform. J. Geophys. Res. Earth Surf. 2018, 123, 2714–2731. [Google Scholar] [CrossRef]
- Myint, S.W.; Gober, P.; Brazel, A.; Grossman-Clarke, S.; Weng, Q. Per-pixel vs. object-based classification of urban land cover extraction using high spatial resolution imagery. Remote Sens. Environ. 2011, 115, 1145–1161. [Google Scholar] [CrossRef]
- Hsieh, P.F.; Lee, L.C.; Chen, N.Y. Effect of spatial resolution on classification errors of pure and mixed pixels in remote sensing. IEEE Trans. Geosci. Remote Sens. 2001, 39, 2657–2663. [Google Scholar] [CrossRef]
- Tso, B.C.; Mather, P.M. Classification of multisource remote sensing imagery using a genetic algorithm and Markov random fields. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1255–1260. [Google Scholar] [CrossRef]
- Blaschke, T. Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef] [Green Version]
- Dronova, I.; Gong, P.; Wang, L. Object-based analysis and change detection of major wetland cover types and their classification uncertainty during the low water period at Poyang Lake, China. Remote Sens. Environ. 2011, 115, 3220–3236. [Google Scholar] [CrossRef]
- Small, C.; Milesi, C. Multi-scale standardized spectral mixture models. Remote Sens. Environ. 2013, 136, 442–454. [Google Scholar] [CrossRef] [Green Version]
- Pande-Chhetri, R.; Abd-Elrahman, A.; Liu, T.; Morton, J.; Wilhelm, V.L. Object-based classification of wetland vegetation using very high-resolution unmanned air system imagery. Eur. J. Remote Sens. 2017, 50, 564–576. [Google Scholar] [CrossRef] [Green Version]
- Li, M.; Zang, S.; Zhang, B.; Li, S.; Wu, C. A review of remote sensing image classification techniques: The role of spatio-contextual information. Eur. J. Remote Sens. 2014, 47, 389–411. [Google Scholar] [CrossRef]
- Whiteside, T.G.; Boggs, G.S.; Maier, S.W. Comparing object-based and pixel-based classifications for mapping savannas. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 884–893. [Google Scholar] [CrossRef]
- Gao, Y.; Mas, J.F. A comparison of the performance of pixel-based and object-based classifications over images with various spatial resolutions. Online J. Earth Sci. 2008, 2, 27–35. [Google Scholar]
- Rollet, R.; Benie, G.; Li, W.; Wang, S.; Boucher, J. Image classification algorithm based on the RBF neural network and K-means. Int. J. Remote Sens. 1998, 19, 3003–3009. [Google Scholar] [CrossRef]
- Blanzieri, E.; Melgani, F. Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1804–1811. [Google Scholar] [CrossRef]
- Goncalves, M.; Netto, M.; Costa, J.; Zullo Junior, J. An unsupervised method of classifying remotely sensed images using Kohonen self-organizing maps and agglomerative hierarchical clustering methods. Int. J. Remote Sens. 2008, 29, 3171–3207. [Google Scholar] [CrossRef]
- Civco, D.L. Artificial neural networks for land-cover classification and mapping. Int. J. Geogr. Inf. Sci. 1993, 7, 173–186. [Google Scholar] [CrossRef]
- Vapnik, V.; Vapnik, V. Statistical Learning Theory; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
- Zou, Q.; Ni, L.; Zhang, T.; Wang, Q. Deep learning based feature selection for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2321–2325. [Google Scholar] [CrossRef]
- Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on deep belief network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Lin, G.; Shen, C.; Van Den Hengel, A.; Reid, I. Efficient piecewise training of deep structured models for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1 July 2016; pp. 3194–3203. [Google Scholar]
- Dai, J.; He, K.; Sun, J. Instance-aware semantic segmentation via multi-task network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1 July 2016; pp. 3150–3158. [Google Scholar]
- Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A review on deep learning techniques applied to semantic segmentation. arXiv 2017, arXiv:1704.06857. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3320–3328. [Google Scholar]
- Maggiori, E.; Tarabalka, Y.; Charpiat, G.; Alliez, P. Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans. Geosci. Remote Sens. 2016, 55, 645–657. [Google Scholar] [CrossRef] [Green Version]
- Romero, A.; Gatta, C.; Camps-Valls, G. Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1349–1362. [Google Scholar] [CrossRef] [Green Version]
- Paoletti, M.; Haut, J.; Plaza, J.; Plaza, A. A new deep convolutional neural network for fast hyperspectral image classification. ISPRS J. Photogramm. Remote Sens. 2018, 145, 120–147. [Google Scholar] [CrossRef]
- Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O. Multispectral and hyperspectral image fusion using a 3-D-convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 639–643. [Google Scholar] [CrossRef]
- Liu, T.; Abd-Elrahman, A.; Morton, J.; Wilhelm, V.L. Comparing fully convolutional networks, random forest, support vector machine, and patch-based deep convolutional neural networks for object-based wetland mapping using images from small unmanned aircraft system. GIScience Remote Sens. 2018, 55, 243–264. [Google Scholar] [CrossRef]
- Liu, T.; Abd-Elrahman, A. Deep convolutional neural network training enrichment using multi-view object-based analysis of Unmanned Aerial systems imagery for wetlands classification. ISPRS J. Photogramm. Remote Sens. 2018, 139, 154–170. [Google Scholar] [CrossRef]
- Pouliot, D.; Latifovic, R.; Pasher, J.; Duffe, J. Assessment of convolution neural networks for wetland mapping with landsat in the central Canadian boreal forest region. Remote Sens. 2019, 11, 772. [Google Scholar] [CrossRef] [Green Version]
- Hu, Y.; Zhang, J.; Ma, Y.; An, J.; Ren, G.; Li, X. Hyperspectral coastal wetland classification based on a multiobject convolutional neural network model and decision fusion. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1110–1114. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Romera-Paredes, B.; Torr, P.H.S. Recurrent instance segmentation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 312–329. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
- Zeiler, M.D.; Taylor, G.W.; Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. ICCV 2011, 1, 6. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; NIPS, Inc.: Montreal, QC, Canada, 2012; pp. 1097–1105. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Huang, G.; Sun, Y.; Liu, Z.; Sedra, D.; Weinberger, K.Q. Deep networks with stochastic depth. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 646–661. [Google Scholar]
- Veit, A.; Wilber, M.J.; Belongie, S. Residual networks. In Advances in Neural Information Processing Systems; NIPS, Inc.: Montreal, QC, Canada, 2016; pp. 550–558. [Google Scholar]
- Wu, Z.; Shen, C.; Van Den Hengel, A. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit. 2019, 90, 119–133. [Google Scholar] [CrossRef] [Green Version]
- Sifre, L.; Mallat, S. Rigid-Motion Scattering for Image Classification. Ph.D. Thesis, CMAP Ecole Polytechnique, Palaiseau, France, 2014. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Zeiler, M.D.; Krishnan, D.; Taylor, G.W.; Fergus, R. Deconvolutional networks. CVPR 2010, 10, 7. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing And Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Burt, P.; Adelson, E. The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 1983, 31, 532–540. [Google Scholar] [CrossRef]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Jégou, S.; Drozdzal, M.; Vazquez, D.; Romero, A.; Bengio, Y. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 11–19. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Holschneider, M.; Kronland-Martinet, R.; Morlet, J.; Tchamitchian, P. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets; Springer: Berlin/Heidelberg, Germany, 1990; pp. 286–297. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Proceedings of the European Conference on Information Retrieval, Santiago de Compostela, Spain, 21–23 March 2005; pp. 345–359. [Google Scholar]
- Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 12–14 December 2016; pp. 234–244. [Google Scholar]
- Paine, J.G.; White, W.A.; Smyth, R.C.; Andrews, J.R.; Gibeaut, J.C. Mapping coastal environments with lidar and EM on Mustang Island, Texas, US. Lead. Edge 2004, 23, 894–898. [Google Scholar] [CrossRef]
- Nguyen, C.; Starek, M.; Tissot, P.; Gibeaut, J. Unsupervised clustering method for complexity reduction of terrestrial lidar data in marshes. Remote Sens. 2018, 10, 133. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, C.; Starek, M.J.; Tissot, P.; Gibeaut, J. Unsupervised Clustering of Multi-Perspective 3D Point Cloud Data in Marshes: A Case Study. Remote Sens. 2019, 11, 2715. [Google Scholar] [CrossRef] [Green Version]
- Westoby, M.J.; Brasington, J.; Glasser, N.F.; Hambrey, M.J.; Reynolds, J.M. ‘Structure-from-Motion’ photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology 2012, 179, 300–314. [Google Scholar] [CrossRef] [Green Version]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
- Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
Algorithm | OA-Train | OA-Val | Prec | Recall | F1 | mIoU | Veg | TF | Water | Road |
---|---|---|---|---|---|---|---|---|---|---|
FC-DenseNet | ||||||||||
U-Net | ||||||||||
DeepLabV3+ | ||||||||||
PSPNet | ||||||||||
MobileU-Net | ||||||||||
SegNet |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pashaei, M.; Kamangir, H.; Starek, M.J.; Tissot, P. Review and Evaluation of Deep Learning Architectures for Efficient Land Cover Mapping with UAS Hyper-Spatial Imagery: A Case Study Over a Wetland. Remote Sens. 2020, 12, 959. https://doi.org/10.3390/rs12060959
Pashaei M, Kamangir H, Starek MJ, Tissot P. Review and Evaluation of Deep Learning Architectures for Efficient Land Cover Mapping with UAS Hyper-Spatial Imagery: A Case Study Over a Wetland. Remote Sensing. 2020; 12(6):959. https://doi.org/10.3390/rs12060959
Chicago/Turabian StylePashaei, Mohammad, Hamid Kamangir, Michael J. Starek, and Philippe Tissot. 2020. "Review and Evaluation of Deep Learning Architectures for Efficient Land Cover Mapping with UAS Hyper-Spatial Imagery: A Case Study Over a Wetland" Remote Sensing 12, no. 6: 959. https://doi.org/10.3390/rs12060959
APA StylePashaei, M., Kamangir, H., Starek, M. J., & Tissot, P. (2020). Review and Evaluation of Deep Learning Architectures for Efficient Land Cover Mapping with UAS Hyper-Spatial Imagery: A Case Study Over a Wetland. Remote Sensing, 12(6), 959. https://doi.org/10.3390/rs12060959