Light detection and ranging (LiDAR) data provide high spatial resolution, detailed representations of bare earth landscapes, and have been shown to be valuable for mapping features of geomorphic and archeological interest. For example, Joboyedoff et al. [1
] suggest that LiDAR is an essential tool for detecting, characterizing, monitoring, and modelling landslides and other forms of mass movement. Chase et al. [2
] argue that LiDAR technologies have resulted in a paradigm shift in archeological research, as they allow for the mapping of ancient anthropogenic features and landscapes even under dense canopy cover. For example, LiDAR has recently improved our understanding of ancient Mesoamerican cultures by mapping ancient cities now obscured by dense forest cover, a mapping task that is too labor intensive for field-based survey methods alone [2
]. Further, LiDAR data are becoming increasingly available for public download, especially in Europe and North America. For example, the United States has implemented the 3D Elevation Program (3DEP) (https://www.usgs.gov/core-science-systems/ngp/3dep
) with a goal of providing LiDAR coverage for the entire country, excluding Alaska [3
]. In this spirit, The Earth Archive project has argued for the need for 3D data of the entire Earth surface to create a historic record for future generations, and is currently soliciting donations to support this project [4
Despite the increasing availability of high spatial resolution digital terrain data, and the wealth of information that can be derived from such data, the extraction of features from these data to support archeological, geomorphic, and landscape change research is in many cases dominated by manual interpretation, as previously noted by [5
]. With the exception of some notable studies (e.g., [6
]), generic and automated mapping of topographic features from digital elevation data has proved to be a particularly challenging task. However, deep learning (DL), and in particular mask regional-convolutional neural networks (Mask R-CNN), may make it possible to realize the potential of digital elevation data for automated mapping of topographic features.
Therefore, this study explores the use of Mask R-CNN for mapping valley fill faces (VFFs) resulting from mountaintop removal (MTR) surface coal mining in the Appalachian region of the eastern United States. MTR is a common mining method in this region which results in extensive modifications to the landscape, and therefore mapping VFFs is of significant interest for environmental modelers. From our findings, we comment on the application of this DL method for extracting anthropogenic and natural terrain features from LiDAR-derived data based on distinct topographic and spatial patterns. Since LiDAR data are not commonly available to represent historic terrain conditions due to only recent development of this technology for mapping large spatial extents, we also explore the transferability of the models to older, photogrammetrically-derived elevation data. This study therefore has two objectives:
1.1. LiDAR and Digital Terrain Mapping
LiDAR is an active remote sensing method that relies on laser range finding. A laser pulse is emitted by a sensor. When the emitted photons strike an object, a portion of the energy is reflected back to the sensor. Using the two-way travel time of reflected laser pulses detected by the sensor, global position system (GPS) locations, and aircraft orientation and motion from an inertial measurement unit (IMU), horizontal and elevation coordinates of the reflecting surface can be estimated at a high spatial resolution. Further, a single laser pulse can potentially result in multiple returns, allowing for vegetation canopy penetration and the mapping of subcanopy terrain, in contrast to other elevation mapping methods [10
LiDAR data have been applied to a variety of terrain mapping and analysis tasks. For example, many studies have investigated the mapping of slope failures, such as landslides, using terrain variables derived from LiDAR [1
]. Another common application is modeling the likelihood of slope failure occurrence or landslide risk [14
]. In a 2012 review of the use of LiDAR in landslide investigations, Jaboyedoff et al. [1
] suggest that LiDAR is an essential tool for landslide risk management and that there is a need to develop methods to extract useful information from such data. Older, photogrammetrically-derived elevation data have also been used for terrain mapping and analysis tasks and offer a means to characterize historic terrain conditions. For surface mine mapping specifically, Maxwell and Warner [18
] found that historic, photogrammetric elevation data were of great value for differentiating grasslands resulting from mine reclamation from other grasslands while DeWitt et al. [19
] provided a comparison of different digital elevation data sources for mapping terrain change resulting from surface mining.
Object-based image analysis (OBIA) has been applied to LiDAR data for the mapping of landslides [20
] and geomorphic landforms in general [9
]. OBIA incorporates segmentation of raster-based data into regions or polygons, based on measures of similarity or homogeneity. These polygons are the spatial unit of analysis and classification [21
]. Part of the interest in OBIA for geomorphic mapping is the ability to incorporate spatial context information into the mapping process, facilitated by the data segmentation. Nevertheless, choosing the scale of the segmentation is a major hurdle in OBIA, and indeed some research indicates it is necessary to choose multiple scales [22
]. In contrast, spatial context information can be included in DL by using convolutional neural networks (CNNs) in a manner that does not require a priori specification of the scale. Thus, applying CNN-based DL to digital terrain data holds great promise.
1.2. Deep Learning
DL algorithms are derived from, and offer an extension to, artificial neural networks (ANNs). Traditional ANNs generally have a small number of hidden layers, whereas DL algorithms have many hidden layers. In contrast to traditional machine learning methods, which are shallow learners, it has been suggested that DL is able to provide a higher level of data abstraction, potentially resulting in improved predictive power, generalization, and transferability [24
]. Although this results in a model that is much more complex and has many more parameters, it allows for multiple levels of data abstraction to learn complex patterns. Like other supervised machine learning methods, DL requires example training data with associated labels in order to build the model. A measure of error or performance, generally termed loss, is used to guide the algorithm to improve predictions as it iterates through the training data multiple times [24
CNNs extend the deep ANN architecture to incorporate context information into the prediction. CNNs include convolutional layers that learn filters that transform input image values, similar to moving window kernels traditionally used in remote sensing for image edge detection and smoothing. However, in the case of CNNs, the algorithm produces optimal filters to aid in predicting the labels associated with the training images. The addition of this context information has offered substantial advancements in computer vision and scene labeling problems [24
]. In remote sensing applications, CNNs allow for the analysis of spatial context information when applied to high spatial resolution data (for example, [28
]), spectral patterns when applied to hyperspectral data (for example, [31
]), and temporal patterns when applied to time series products (for example, [32
]). Thus, DL with convolution allows for the integration of contextual information in the spatial, spectral, and temporal domains.
Traditional CNNs have primarily been used for scene labeling problems, for example, entire images or image chips categorized by different land cover type. Traditional CNNs do not allow for pixel-level or semantic labeling. However, the introduction of fully convolutional neural networks (FCNs) alleviated this limitation by combining convolution and deconvolution layers with up-sampling, which allows for the final feature map to be produced at the original image resolution with a prediction at each cell location [27
], similar to traditional remote sensing classification products. Example FCN architectures include SegNet [34
] and UNet [35
In this study, we use instance segmentation methods, in which the goal is to distinguish each individual instance of a feature in the scene separately. For example, each tree in a scene can be identified as a separate instance of the tree class. We specifically implement the Mask R-CNN method. This method is an extension of faster R-CNN, which allows for convolution to be applied on regions of the image as opposed to the entire scene. This involves generating convolution feature maps that are then applied to individual subsets of the image, called regions of interest (RoI), defined by the region proposal network (RPN). The process of RoI pooling allows convolution features to be applied to regions of the image of different sizes and rectangular shapes [38
]. Mask R-CNN extends this framework to allow for polygon masks to be generated within each RoI, essentially performing semantic segmentation within each RoI using FCNs. This requires better alignment between the RoI pooling layers and the RoIs than is provided by faster R-CNN. So, a ROIAlign layer is applied to improve the spatial alignment [39
]. Since there are multiple components of the model, multiple loss measures are used to assess performance. Specifically, the total loss is the sum of the loss for the bounding box, classification, and mask predictions [38
]. For a full discussion of Mask R-CNN, please consult He et al. [39
], who introduced this method.
DL methods have shown promise in remote sensing mapping and data processing tasks including scene labeling, pixel-level classification, object-detection, data fusion, and image registration [24
]. For example, Microsoft has recently used DL to map 125 million building footprints across the entire US [40
]. Kussul et al. [26
] explored DL for differentiating crops using a time series of Landsat-8 multispectral and Sentinel-1 synthetic-aperture radar (SAR) data and documented improved overall and class-specific classification performance in comparison to shallow learners, such as random forests (RF). Li et al. [41
] used DL and QuickBird satellite imagery to map individual oil palm trees with precision and recall rates greater than 94%.
It should be noted that there are some complexities in implementing these methods and applying them to remotely sensed data, such as the need for a large number of training samples, the difficulty of model optimization and parameterization, and large computational demands [24
]. Also, the processes of training models and predicting to new data can differ from those used in traditional image classification and machine learning; for example, convolution requires training on and predicting to small rectangular image extents, or image chips, as opposed to individual pixels or image objects. Thus, researchers and analysts must augment workflows and learn new techniques for implementing DL algorithms [24
A review of the literature suggests that the application of DL to LiDAR and digital terrain data is still limited. There has been some research relating to using DL for extracting ground returns from LiDAR point clouds for digital terrain model (DTM) generation (for example, [33
]). Specifically, Hu and Yuan [44
] suggest that DL-based algorithms can outperform the current methods that are most commonly used for ground return classification. Others have investigated the classification of features in 3D space represented as point clouds [45
There is a need to investigate mapping anthropogenic and natural terrain features from digital terrain data using DL, as the research on this topic is currently lacking; however, there have been some notable studies. Tier et al. [6
] investigated the identification of prehistoric structures from LiDAR-derived raster data. From the LiDAR data, a DTM was interpolated followed by a measure of local relief, which was then provided as input to the ResNet18 CNN algorithm as an RGB image. They reported mixed results, with some areas predicted well and other areas suffering from many false positives. Behrens et al. [48
] explored digital soil mapping using DTM raster data and DL and obtained a more accurate output than that produced by RF.
Interestingly, a number of studies attempt to map features that are at least partially characterized by geomorphic and terrain characteristics using spectral data only, without using terrain data. For example, Li et al. [49
] mapped craters from image data using faster R-CNN and obtained a mean average precision (mAP) higher than 0.90. As an example of a study that combined spectral and terrain data, Ghorbanzadeh et al. [50
] used RapidEye satellite data and measures of plan curvature, topographic slope, and topographic aspect to detect landslides. They noted comparable performance between CNNs and traditional shallow classifiers: ANN, SVM, and RF.
Mask R-CNN has seen limited application in remote sensing at the time of this writing. Zhang et al. [51
] assessed the method for mapping artic ice-wedge polygons from high spatial resolution aerial imagery and documented that 95% of individual ice-wedge polygons were correctly delineated and classified, with an overall accuracy of 79%. Zhao et al. [37
] found that Mask R-CNN outperformed UNet for pomegranate tree canopy segmentation. Stewart et al. [52
] used the method to detect lesions on maize plants from northern leaf blight using unmanned aerial vehicle (UAV) data. Given the small number of studies that have applied this algorithm to remotely sensed data, there is a need for further exploration of this algorithm within the discipline. We found a lack of research associated with mapping terrain features from digital terrain data using DL methods, and no published studies that apply this algorithm to raster-based digital terrain data for mapping geomorphic features. We attribute this to the only recent advancement of DL for semantic and instance segmentation, and lack of available data to train DL models.
1.3. Mountaintop Removal Coal Mining and Valley Fills
In this study we apply Mask R-CNN to detect instances of VFFs from digital terrain data derived from LiDAR. Valley fills are a product of MTR coal mining, which has been practiced in southern West Virginia, eastern Kentucky, and southwestern Virginia in the Appalachian region of the eastern United States for several decades. This surface mining process involves using heavy machinery to extract thin interbedded coal seams. Valley fills are generated from the redistribution of overburden rock material. Since the coal seams are interbedded with other rock types of limited commercial value, a large volume of displaced material is produced. Due to the original steepness of the slopes, it is not possible to reclaim the landscape to the approximate original contour. Therefore, excess overburden material is placed in adjacent valleys, raising the valley elevation and changing the landscape.
The excavation and subsequent reclamation associated with valley fills results in substantial alterations to land cover, soil, and the topography and contour of the landscape [53
]. Forests are lost and fragmented [63
], mountaintop elevations are lowered by tens to hundreds of meters [53
]; soils are compacted [58
], and human quality of life and health is affected by exposure to chemicals, dust, and particulates [62
]. Because valley fills bury headwater streams [54
], and the fill material is hydrologically dissimilar to undisturbed land, hydrology is particularly affected. Valley fills tend to increase stream conductivity and alter hydrologic regimes downstream [53
]. Wood and Williams [61
] documented a decrease in salamander abundance in headwater streams impacted by valley fills in comparison to reference streams. In summary, valley fills profoundly alter the landscape, resulting in a variety of complex effects on the physical environment and its inhabitants, making it of vital importance that these features be monitored and mapped over time to facilitate environmental modeling.
provides examples of valley fills within the study area. Note that these features are generally characterized by steep slopes, a terraced pattern to encourage stability, placement in headwater stream valleys adjacent to mines and reclaimed mines, and drainage ditches to transport water away from the mine site. In short, they have a unique topographic signature and are readily observable in digital terrain data representations, such as hillshades and slopeshades. Due to this unique signature and their potential environmental impacts, we argue that this is a valuable case study in which to assess the use of Mask R-CNN for detecting and mapping topographic features. Here, we are specifically attempting to map the valley fill faces (VFF; i.e., the graded slope that faces the downstream valley not yet filled). Since the true extent of the filled area and excavated areas are not readily observable and grade into one another, the upper extent of each fill is hard to distinguish. Therefore, we focus exclusively on the VFF.