1.1. Background and Problem
Automated species mapping of forest trees using remote sensing data has long been a goal of remote sensing and forest ecologists [1
]. Conducting remote inventories of forest species composition using an imaging platform instead of field surveys would save time, money, and support analysis of species composition over vast spatial extents [12
]. Accurate assessments of tree species composition in forest environments would be an asset for forest ecologists, land managers, and commercial harvesters and could be used to study biodiversity patterns, estimate timber stocks, or improve estimates of forest fire risk. Operational remote sensing and field sensors are improving, but new classification tools are necessary to bridge the gap between data-rich remote sensing imagery and the need for high-resolution information about forests. Our work sought to improve current automated tree species mapping techniques.
Tree species mapping from remote sensing imagery has proven a difficult challenge in the past [11
], owing to the lack of (1) widely-available high resolution spatial and spectral imagery; (2) machine learning classifiers sophisticated enough to account for the lighting, shape, size, and pattern of trees as well as the spectral mixing in the canopies themselves; and (3) spatially precise ground data for training the classifiers. Efforts to overcome these challenges have taken different approaches with regard to remote sensing data sources and classification techniques. High-resolution multispectral satellite remote sensing [16
], hyperspectral airborne imagery [2
], and even airborne Light Detection and Ranging (LiDAR, see Table A1
) without spectral imagery [18
] have been used to discriminate tree species. Many methods take a data fusion approach, combining LiDAR with multispectral [19
] or hyperspectral imagery [21
] to classify tree species. To discriminate individual tree crowns at both high spatial and spectral resolution, airborne imagery is required. Hyperion Imaging Spectrometer was the first and only imaging spectrometer to collect science-grade data from space [24
] and it has been used to map minerals [25
], coral reefs [26
], and invasive plant distributions from orbit [27
]. The 30 m spatial resolution and low signal to noise ratio make it an unsuitable instrument for mapping individual trees; however, spaceborne hyperspectral sensors have demonstrated viability to map areas inaccessible to airborne platforms such as the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [28
The use of airborne hyperspectral imagers in forested environments was advanced by the Carnegie Airborne Observatory (CAO) used for large swaths of carbon rich forests in the Airborne Spectronomics Project, mapping canopy chemistry, functional plant traits, and individual tree species in diverse tropical forests [29
]. The CAO uses hyperspectral imagery combined with LiDAR allowing for a three-dimensional, chemical characterization of the landscape based on spectral absorption features and is informative about the composition of plants communities. This sensor combination was adopted by the National Ecological Observation Network (NEON) Airborne Observation Platform (AOP), providing openly available data at 81 monitoring sites in 20 eco-climatic domains across the conterminous USA, Alaska, Hawaii, and Puerto Rico [33
]. AOP imaging instruments include a small-footprint waveform LiDAR to measure three-dimensional (3D) canopy structure, a high-resolution hyperspectral imaging spectrometer, and a broadband visible/shortwave infrared imaging spectrometer. Data are collected at a spatial resolution (sub-meter to meter) sufficient to study individual organisms and observe stands of trees. In addition to manned aerial flights, unmanned aerial vehicles have recently been used to identify tree species using hyperspectral imagery and point cloud data [34
Over the past decade, there have been significant advances in the application of a variety of machine learning classifiers to hyperspectral imagery for tree species classification. Classifiers have included Random Forest, a decision tree method, Support Vector Machines, and artificial neural networks, and have been applied to (sub)tropical wet and dry forests [4
], temperate and boreal forests [8
], plantations and agroforestry [10
], and urban forests [41
]. These machine learning classifiers achieved accuracies (averaged across species) ranging from 63%–98% when applied to 4–40 tree species using tens to occasionally hundreds of trees per species for training. Classification accuracies typically varied more widely among species in these studies (e.g., per-species accuracies from 44%–100%) than among machine learning and other classifiers when they were compared.
Convolutional Neural Networks (CNNs) are machine learning supervised classifiers that, in addition to characterizing spectral signatures, analyze the spatial context of the pixel. To our knowledge, CNNs have not been applied to tree species classification from airborne hyperspectral imagery. CNNs can perform concurrent analysis of spectra and shape using multiple deep layers of pattern abstraction which are learned through numerical optimization over training data. In this study, we parameterized and tested a CNN classifier applied to high-resolution airborne hyperspectral imagery of a forested area for tree species identification with sparsely distributed training labels.
1.2. Convolutional Neural Networks
Convolutional Neural Networks (CNNs) give high performance on a variety of image classification and computer vision problems. CNNs use computational models that are composed of multiple convolved layers to learn representation of data with multiple levels of abstraction [44
]. These algorithms have dramatically improved the state of the art in image recognition, visual object recognition, and semantic segmentation by discovering intricate structure in large data sets. CNNs consist of many sets of convolution and pooling layers separated by non-linear activation functions (such as the rectified linear unit [ReLU]). These “deep learning” models are trained using the backpropagation algorithm, and variants of stochastic gradient descent [45
]. CNNs have been used for over two decades in applications that include handwritten character classification [46
], document recognition [47
], traffic sign recognition [48
], sentence classification [49
] and facial recognition [50
]. Biological imaging applications of CNNs include identifying plant species based on photographs of leaves [51
], interpreting wildlife camera trap imagery [52
], and new crowdsourced applications such as the iNaturalist mobile application, which uses user photographs, geographic locations and CNNs to identify species of plants and animals [53
Application of CNN methods has been limited by computational resources, and the need to program the code to apply the neural network and backpropagation algorithms to a classification problem from scratch [54
]. Over the past decade, graphical processor units (GPUs) that support massive parallelization of matrix computations, GPU-based matrix algebra libraries, and high-level neural network libraries with automatic differentiation have made training and application of neural networks accessible outside of research labs in industry settings.
CNNs are conceptually well-suited to spatial prediction problems. Geospatial images contain spatially structured information. In the case of trees, the spatial structure of canopies is related to tree size relative to pixel size; in high resolution imagery if a pixel falls in a tree canopy its neighbors are also likely to be in the same canopy and have similar information [55
]. Information in neighboring pixels is related to information in a focal pixel and these relationships decay with distance. CNN classifiers operate on this same principle to find patterns in groups of nearby pixels and relate them to ‘background’ information. For automated species mapping, individual trees are represented as clusters of similar pixels at fine spatial resolution and, at a coarser spatial scale, stands of trees are clusters of individuals, both of which might be informative in determining the species of each tree.
CNNs have been applied to moderate spatial and/or spectral resolution imagery to classify land use [56
] and reconstruct missing data in Moderate Resolution Imaging Spectrometer (MODIS) and Landsat Enhanced Thematic Mapper remotely sensed imagery [57
]. Deep Recurrent Neural Networks have also been used to classify hyperspectral imagery with sequential data, such as spectral bands [58
]. CNNs have been used to classify high spatial resolution imagery into land use and land cover categories [59
], but not tree species.