Detection of Tailings Dams Using High-Resolution Satellite Imagery and a Single Shot Multibox Detector in the Jing–Jin–Ji Region, China

: The timely and accurate mapping and monitoring of mine tailings dams is crucial to the improvement of management practices by decision makers and to the prevention of disasters caused by failures of these dams. Due to the complex topography, varying geomorphological characteristics, and the diversity of ore types and mining activities, as well as the range of scales and production processes involved, as they appear in remote sensing imagery, tailings dams vary in terms of their scale, color, shape, and surrounding background. The application of high-resolution satellite imagery for automatic detection of tailings dams at large spatial scales has been barely reported. In this study, a target detection method based on deep learning was developed for identifying the locations of tailings ponds and obtaining their geographical distribution from high-resolution satellite imagery automatically. Training samples were produced based on the characteristics of tailings ponds in satellite images. According to the sample characteristics, the Single Shot Multibox Detector (SSD) model was ﬁne-tuned during model training. The results showed that a detection accuracy of 90.2% and a recall rate of 88.7% could be obtained. Based on the optimized SSD model, 2221 tailing ponds were extracted from Gaofen-1 high resolution imagery in the Jing–Jin–Ji region in northern China. In this region, the majority of tailings ponds are located at high altitudes in remote mountainous areas. At the city level, the tailings ponds were found to be located mainly in Chengde, Tangshan, and Zhangjiakou. The results prove that the deep learning method is very e ﬀ ective at detecting complex land-cover features from remote sensing images. Conv9_2, Conv11_2 progressively A set of detection can be produced by each feature layer. Hence, the SSD network allowed predictions of detections at multiple scales. Default bounding boxes were associated with cells in feature maps. An overall objective loss function was constructed as a weighted sum of the localization loss and conﬁdence loss. During the SSD training, the default boxes were matched to the ground truth boxes to reduce the objective loss function.


Introduction
Mining activities and infrastructure expansion cause great changes to the landscape and lead to social, economic, and environmental impacts in the surrounding areas [1,2]. Mine tailings are mixtures of crumpled rock and processing effluents that are generated during the extraction of metals, minerals, or coal from mines [3]. To avoid the formation of acid mine drainage, tailings are typically stored under water in ponds (tailings ponds) or impounded behind dams (tailings dams). Tailings storage facilities and waste rock dumps account for a large proportion of the area of mining sites [4]. Historic occurrences of flow failures of mine waste dumps and tailings dams from 1928 to 2000 have resulted in deaths and damage to the environment and infrastructures [5]. Failures of mine tailings dams and the resulting mudflows have caused the loss of almost 2000 lives over the past century [1]. These dam failures Du et al. [29] developed a new INSAR time series approach to derive ground displacement maps for dam safety monitoring. Tailings facilities are often built in areas with complex topography and geomorphology. The ore type, mining activities, beneficiation process, scale of operations, and production processes at mining sites also vary a lot. Thus, the tailing ponds also differ in terms of scale, shape, colors, and background in remote sensing imagery. Despite the wide application of multi-source remote sensing data, the difficulties in automatic and accurate extraction of tailing ponds from remote sensing data at large spatial scales remain.
Target detection can not only recognize target categories, but also predict the location of each target to within a bounding box. Based on an understanding of the semantic and spatial information contained in the image, traditional target detection methods make use of several steps, namely proposal generation, feature vector extraction, and region classification, to assign category labels to regions of interest. Deep learning emerged as a leading machine learning technique in the image recognition and computer vision fields. In contrast to traditional methods, deep convolutional neural networks can generate hierarchical feature representations ranging from the pixel level to high-level semantic information. In particular, when a large dataset is available, a deep convolutional neural network can produce improved feature representation. Therefore, target detection algorithms based on deep learning have superior feature representation capability and can be optimized in an end-to-end manner. Previous studies have shown that novel target detection methods using deep networks outperform traditional target detection methods and have thus, aided great progress in the field [30]. In particular, the convolutional neural network (CNN) performs outstandingly well at object detection and has been applied to areas such as automation, robotics, and agriculture. As part of the Large-Scale Visual Recognition Challenge, a deep convolutional model trained using the ImageNet dataset proved to be a significant improvement on existing approaches [31].
Target detection techniques based on deep networks can be divided into two families: two-stage detectors and one-stage detectors. Two-stage detectors generate a set of proposals first and use a region classifier to predict the category of the proposed region. CNN [32], and its descendants fast R-CNN [33] and faster R-CNN [34], are two-stage object detectors that significantly improve both the detection accuracy and computing speed. One-stage detectors consider all positions in the image as potential objects and directly predict the category of objects at each location in the feature maps. One-stage detectors such as OverFeat [35] and You Only Look Once (YOLO) [36] have achieved similar or even higher levels of accuracy as two-stage detectors, while reducing the computational complexity. By combining predictions from multiple feature maps having different resolutions, the Single Shot Multibox Detector (SSD) [37] has proved to be suitable for detecting objects of various scales.
The open data policy of Earth Observation programs such as NASA's Landsat, ESA'S Copernicus, and the Chinese high-resolution satellite, Gaofen (GF), has made available remote sensing data with high spectral and spatial resolutions as well as frequent revisit times [38][39][40]. Among them, the historical records and daily updated wide field of view data of GF-1 and GF-6 satellites are shared freely through China's National Space Administration's GEO platform to support the needs of global sustainable development, disaster prevention and mitigation, and climate change adaptation. The development of high-performance computing resources, such as GPU clusters and clouds in recent years, has further facilitated the application of deep learning networks to land-cover classification [41,42], semantic segmentation [43], and information extraction using remote sensing images [44,45]. Using deep neural networks, Balaniuk et al. performed country-wide identification and classification of mines and tailings dams in Brazil from freely available Sentinel-2 satellite imagery [46]. This work demonstrated the possibility of large-scale mapping and environment analysis in mining areas using open source remote sensing data and deep learning techniques. Satellite imagery with spatial resolution higher than 5 m provides abundant spatial, structural, and semantic information of natural and manmade objects. To our knowledge, the application of high-resolution satellite imagery for automatic detection of tailings dams at large spatial scales have been barely reported. Taking the Jing-Jin-Ji region in northern China as the study area, we aimed to explore the use of deep learning techniques for detecting Remote Sens. 2020, 12, 2626 4 of 18 mine tailings ponds from high-resolution satellite imagery at the regional scale. Due to the large variation in the spectral, textural, and geometric characteristics of tailings dams, a deep detection architecture based on the Single Shot Multibox Detector was developed and fine-tuned. Experiments were conducted on the detection of tailings ponds from Gaofen-1 satellite data of the Jing-Jin-Ji region. Finally, the spatial distribution and characteristics of tailings dams were analyzed in detail.

Study Area
The Beijing-Tianjin-Hebei or Jing-Jin-Ji region is located in the northeastern part of mainland China ( Figure 1). It includes Beijing municipality, Tianjin municipality, and Hebei province. The study area extends from 113 • 4 E to 119 • 53 E and from 36 • 1 N to 42 • 37 N, covering an area of approximately 216,000 km 2 . The Jing-Jin-Ji region is one of the most heavily urbanized and industrialized regions in China [47,48]. With a population of more than 100 million, the economy of the study area contributed 8.5% of China's GDP in 2019. Beijing is the capital city as well as the political and cultural center of China. As the largest coastal city in northern China, Tianjin plays a dominant role in the nation's heavy engineering and manufacturing fields. Since the coordinated development of Beijing, Tianjin, and Hebei was put forward as a major national strategic policy in 2014, the Jing-Jin-Ji region has gradually developed to become the most important urban agglomeration in northern China. high-resolution satellite imagery at the regional scale. Due to the large variation in the spectral, textural, and geometric characteristics of tailings dams, a deep detection architecture based on the Single Shot Multibox Detector was developed and fine-tuned. Experiments were conducted on the detection of tailings ponds from Gaofen-1 satellite data of the Jing-Jin-Ji region. Finally, the spatial distribution and characteristics of tailings dams were analyzed in detail.

Study Area
The Beijing-Tianjin-Hebei or Jing-Jin-Ji region is located in the northeastern part of mainland China ( Figure 1). It includes Beijing municipality, Tianjin municipality, and Hebei province. The study area extends from 113° 4′ E to 119° 53′ E and from 36° 1′ N to 42° 37′ N, covering an area of approximately 216,000 km 2 . The Jing-Jin-Ji region is one of the most heavily urbanized and industrialized regions in China [47,48]. With a population of more than 100 million, the economy of the study area contributed 8.5% of China's GDP in 2019. Beijing is the capital city as well as the political and cultural center of China. As the largest coastal city in northern China, Tianjin plays a dominant role in the nation's heavy engineering and manufacturing fields. Since the coordinated development of Beijing, Tianjin, and Hebei was put forward as a major national strategic policy in 2014, the Jing-Jin-Ji region has gradually developed to become the most important urban agglomeration in northern China. The topography of Jing-Jin-Ji rises from northwest to southeast, with mountains, hills, and plateaus in the northwest, and vast plain areas in the center and southeast. Hebei province is the only province in China that contains plateaus, mountains, hills, plains, lakes, and coastline. The region is rich in mineral resources and contains more than 100 ore deposits. Of these deposits, 78 types of ore deposits with assured mineral reserves have been found so far, and 45 of these deposits rank among the top 10 in China. Iron ore, limestone, coal, and oil are the most abundant mineral types. Based on the total possible losses if failure of the dam should occur for any reason, mine tailings ponds can be considered to belong to one of extreme, very high, high, or low danger categories. Among the 12,655 tailings dams in China in 2013, 613 were classed as extremely dangerous, 1265 as presenting a very high danger, and 3032 as highly dangerous; 7745 were in the low danger category [49]. Hebei province, where a lot of mining occurs, has the largest number of tailings dams in China. The The topography of Jing-Jin-Ji rises from northwest to southeast, with mountains, hills, and plateaus in the northwest, and vast plain areas in the center and southeast. Hebei province is the only province in China that contains plateaus, mountains, hills, plains, lakes, and coastline. The region is rich in mineral resources and contains more than 100 ore deposits. Of these deposits, 78 types of ore deposits with assured mineral reserves have been found so far, and 45 of these deposits rank among the top 10 in China. Iron ore, limestone, coal, and oil are the most abundant mineral types. Based on the total possible losses if failure of the dam should occur for any reason, mine tailings ponds can be considered to belong to one of extreme, very high, high, or low danger categories. Among the 12,655 tailings dams in China in 2013, 613 were classed as extremely dangerous, 1265 as presenting a very high danger, and 3032 as highly dangerous; 7745 were in the low danger category [49]. Hebei province, where a lot of mining occurs, has the largest number of tailings dams in China. The potential environmental safety risks posed by tailings facilities are a huge potential threat to life and property and to the environment in the surrounding areas. Since 2006, nine accidents involving tailings ponds have been recorded by the Ministry of Environmental Protection in Hebei. As of May 31, 2012, accidents had occurred previously at a total of 137 tailings dams in the province. In many cases, these accidents threatened the environment of downstream rivers and the safety of drinking water resources. Therefore, there is an urgent need to carry out highly accurate and timely monitoring of the tailings dams in the study area.

Satellite Data
The GF-1 satellite is the first satellite in China's high-resolution Earth observation program. It was launched on April 26, 2013. The GF-1 satellite payload consists of two 2-m panchromatic and 8-m multispectral resolution cameras, and four 16-m resolution wide-angle cameras ( Table 1). The high-resolution GF-1 images used in this study were acquired by the panchromatic (PAN) and multispectral cameras in 2017. The GF-1 satellite data were downloaded from the Land Observation Satellite Data Service Platform of China Center for Resources Satellite Date and Application. To reduce the influence of atmosphere, we selected good quality images with cloud coverage less than 10% covering the study area. The original GF-1 images were first processed using Infoterra France's Pixel Factory geoprocessing software. As a commercial software, Pixel Factory provides a set of data processing tools to create accurate and high-quality Earth observation products from satellite to UAVs and aerial images.

Methodology
The proposed target detection consisted of five main steps ( Figure 2): data preprocess, identifying characteristics of tailings ponds in satellite imagery; preparing samples for use as training and testing sets for the SSD network; training and optimizing the SSD network parameters; and assessing the accuracy of SSD network and detecting tailings ponds in the Jing-Jin-Ji region.

Data Preprocessing
Data preprocessing includes image selection, radiometric calibration, orthorectification, image fusion, image mosaicing, and image slicing. By visually checking the quality of original images such as clarity and noise, the L1A level GF-1 data with good quality were selected. Then, radiometric calibration was applied to digital numbers of selected raw data. To correct and eliminate the geometric distortion and increase the geometric accuracy of the original image, orthorectification mainly based on rational polynomial coefficients (RPC) file and digital elevation model (DEM) data was performed to the calibrated images. An adaptive segmented linear-stretching method was applied to the panchromatic and multispectral images to improve the contrast and clarity of the imagery. The panchromatic fusion method was applied to fuse the panchromatic and multispectral data and 2 m resolution multispectral images were produced. In the process of image mosaicing, color balancing was performed to enhance color and tone consistency among the image scenes. The mosaic of pansharpened GF-1 imagery was illustrated in Figure 1. Each image scene was divided into 1500 × 1500-pixel slices for model training and prediction. The slicing was carried out from left to right and from top to bottom using a sliding window with the size of 1500 × 1500-pixel. A total of 50,257 image slices were generated for the Jing-Jin-Ji region.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 19 Figure 2. Workflow for the proposed tailings dam detection using the SSD.

Data Preprocessing
Data preprocessing includes image selection, radiometric calibration, orthorectification, image fusion, image mosaicing, and image slicing. By visually checking the quality of original images such as clarity and noise, the L1A level GF-1 data with good quality were selected. Then, radiometric calibration was applied to digital numbers of selected raw data. To correct and eliminate the geometric distortion and increase the geometric accuracy of the original image, orthorectification mainly based on rational polynomial coefficients (RPC) file and digital elevation model (DEM) data was performed to the calibrated images. An adaptive segmented linear-stretching method was applied to the panchromatic and multispectral images to improve the contrast and clarity of the imagery. The panchromatic fusion method was applied to fuse the panchromatic and multispectral data and 2 m resolution multispectral images were produced. In the process of image mosaicing, color balancing was performed to enhance color and tone consistency among the image scenes. The mosaic of pansharpened GF-1 imagery was illustrated in Figure 1. Each image scene was divided into 1500 × 1500-pixel slices for model training and prediction. The slicing was carried out from left to right and from top to bottom using a sliding window with the size of 1500 × 1500-pixel. A total of 50,257 image slices were generated for the Jing-Jin-Ji region.

Characteristics of Tailings Dams in Satellite Imagery
The layout of a surface tailings storage facility is dependent on natural land geomorphic features as well as manmade, engineered features [50]. Due to the influence of many factors such as the topography, landforms, the minerals mined, the mining technology used, and the scale of the operations, tailings ponds can have different layouts. Based on the shapes and characteristics of geomorphological features, most above-ground tailings ponds can be considered to belong to one of four categories: cross-valley, hillside, stockpile, or cross-river ( Figure 3).
A cross-valley tailings pond refers to a pond formed by damming of the valley mouth. Ponds of this type normally consists of three parts, namely, the dam, tailings, and wastewater. The dam body is located on one side of the valley mouth, and the other sides are connected to the valley. At the

Characteristics of Tailings Dams in Satellite Imagery
The layout of a surface tailings storage facility is dependent on natural land geomorphic features as well as manmade, engineered features [50]. Due to the influence of many factors such as the topography, landforms, the minerals mined, the mining technology used, and the scale of the operations, tailings ponds can have different layouts. Based on the shapes and characteristics of geomorphological features, most above-ground tailings ponds can be considered to belong to one of four categories: cross-valley, hillside, stockpile, or cross-river ( Figure 3).
A cross-valley tailings pond refers to a pond formed by damming of the valley mouth. Ponds of this type normally consists of three parts, namely, the dam, tailings, and wastewater. The dam body is located on one side of the valley mouth, and the other sides are connected to the valley. At the initial stage, the body of the dam is relatively short. The reservoir area is relatively long in length and large in area. The density of tailings in the water gradually decreases as the water spreads away from the dam. This means, the closer to the wastewater area, the rarer the tailings are and the closer to the color of normal water the water body appears in remote sensing imagery. Among the four types of tailings pond, cross-valley tailings ponds are the most common and the easiest to identify. They are also the most widespread type found in China.
Hillside tailings ponds are formed by building a dam with three or more sides on a hill and using the hill as one side of the dam. Dams of this kind are relatively long and the reservoir area is relatively narrow. Most of the small tailings dams in China's mountainous and hilly areas belong to this category. The structure of hillside tailings ponds is very similar to those of the cross-valley type except that hillside dams usually have three sides. These three sides can be clearly seen in remote sensing imagery. Stockpile tailings pond are tailings pond formed by building a dam on a flat area. This kind of tailings pond requires a large input of work during the initial construction and later maintenance. A stockpile tailings pond is composed of the dam body and the tailings. Because stockpile tailings ponds are located in flat areas such as plains or deserts, the dam needs to be built up to surround the reservoir area. The main benefit of this layout is that surface runoff cannot inundate the tailings storage area, meaning that the contained water is derived entirely from the processes being carried out or from precipitation. sensing imagery.
Stockpile tailings pond are tailings pond formed by building a dam on a flat area. This kind of tailings pond requires a large input of work during the initial construction and later maintenance. A stockpile tailings pond is composed of the dam body and the tailings. Because stockpile tailings ponds are located in flat areas such as plains or deserts, the dam needs to be built up to surround the reservoir area. The main benefit of this layout is that surface runoff cannot inundate the tailings storage area, meaning that the contained water is derived entirely from the processes being carried out or from precipitation.
A cross-river tailings pond is a tailings pond formed by separately damming the riverbed of upstream and downstream reaches. Tailings ponds of this kind mostly have irregular rectangular shapes. Cross-river tailings ponds are less widespread in China, and so, in China, cross-river tailings ponds are also considered to be a type of stockpile tailings pond. As well as there being a limited number of them, the appearance of cross-river tailings ponds is also similar to those of the stockpile type. In addition to there being various types, as they appear in remote sensing imagery, tailings dams in different areas also differ a lot in terms of their shape, brightness, contrast, hue, background, and scale. The interpretation keys of tailings dams were analyzed for sample collection from satellite imagery. Figure 4 shows some examples of tailings ponds with varying characteristics. Due to different construction methods, the tailings pond exhibits different shapes such as rectangle, triangle, A cross-river tailings pond is a tailings pond formed by separately damming the riverbed of upstream and downstream reaches. Tailings ponds of this kind mostly have irregular rectangular shapes. Cross-river tailings ponds are less widespread in China, and so, in China, cross-river tailings ponds are also considered to be a type of stockpile tailings pond. As well as there being a limited number of them, the appearance of cross-river tailings ponds is also similar to those of the stockpile type.
In addition to there being various types, as they appear in remote sensing imagery, tailings dams in different areas also differ a lot in terms of their shape, brightness, contrast, hue, background, and scale. The interpretation keys of tailings dams were analyzed for sample collection from satellite imagery. Figure 4 shows some examples of tailings ponds with varying characteristics. Due to different construction methods, the tailings pond exhibits different shapes such as rectangle, triangle, circle, and irregular polygon. Due to the influence of different regions and mineral types, tailings show a variety of colors such as gray, gray-white, black, yellow, reddish-brown, etc. The tailings ponds also have different backgrounds such as vegetation, bare land, and sandy land. Although the characteristics of tailings reservoirs vary, the combination of these characteristics can be used to distinguish them from the background. Based on interpretation keys or symbols, such as size, shape, color, hue, texture, shadow, etc., samples of tailings ponds were labeled manually in the GF-1 satellite imagery.
show a variety of colors such as gray, gray-white, black, yellow, reddish-brown, etc. The tailings ponds also have different backgrounds such as vegetation, bare land, and sandy land. Although the characteristics of tailings reservoirs vary, the combination of these characteristics can be used to distinguish them from the background. Based on interpretation keys or symbols, such as size, shape, color, hue, texture, shadow, etc., samples of tailings ponds were labeled manually in the GF-1 satellite imagery.

Sample Preparation
The length, width, perimeter, and area of the bounding boxes of tailings pond samples were analyzed. The lengths and widths of the bounding boxes ranged from 50 to 3000 m, but were mainly between 50 and 1400 m. The perimeters ranged from 300 to 12600 m, mainly between 300 and 2600 m, and the areas ranged from 8700 to 9,430,000 m 2 , mainly between 8700 and 308,700 m 2 . The width: height ratio, perimeter, and area of the bounding boxes reflected the diversity of tailings ponds in terms of scale and provided a basis for the adjustment of the SSD network parameters.
To increase the diversity of the samples, we selected tailings ponds with different characteristics to build the final sample sets. To accommodate the regional variability of spectral signatures across image scenes, a certain number of samples were created in each county of the study area. During the interpretation and collection of tailing samples, it is found that some natural or manmade objects were easily confused with tailings ponds due to their similar structure, texture, and hue, which may lead to false detection. These objects were taken as negative samples in the training of the SSD model. The introduction of negative samples can reduce false detection of these targets as tailings ponds and improve detection accuracy. Figure 5 shows some examples of negative samples of tailings ponds. Negative samples collected can be mainly divided into four categories.

Sample Preparation
The length, width, perimeter, and area of the bounding boxes of tailings pond samples were analyzed. The lengths and widths of the bounding boxes ranged from 50 to 3000 m, but were mainly between 50 and 1400 m. The perimeters ranged from 300 to 12600 m, mainly between 300 and 2600 m, and the areas ranged from 8700 to 9,430,000 m 2 , mainly between 8700 and 308,700 m 2 . The width: height ratio, perimeter, and area of the bounding boxes reflected the diversity of tailings ponds in terms of scale and provided a basis for the adjustment of the SSD network parameters.
To increase the diversity of the samples, we selected tailings ponds with different characteristics to build the final sample sets. To accommodate the regional variability of spectral signatures across image scenes, a certain number of samples were created in each county of the study area. During the interpretation and collection of tailing samples, it is found that some natural or manmade objects were easily confused with tailings ponds due to their similar structure, texture, and hue, which may lead to false detection. These objects were taken as negative samples in the training of the SSD model. The introduction of negative samples can reduce false detection of these targets as tailings ponds and improve detection accuracy. Figure 5 shows some examples of negative samples of tailings ponds. Negative samples collected can be mainly divided into four categories.
facilities. Using the contextual information, negative samples of bare land or dry reservoirs were selected.
(4) Cloud. Cloud clusters in remote sensing images can be easily distinguished from tailing ponds during sample preparation. However, their hue, shape, and other characteristics are similar to those of white or gray-white color tailings. Some of them tend to be incorrectly detected as tailings dams.  (1) Negative samples related to mining activities. This kind of object mainly includes mining pit, mining field, waste rock dump, and so on. Since their hue, texture, and shape are similar to those of tailings ponds, they are often mistakenly detected. (2) Water reservoir. The color of the water reservoir is similar to the color of the wastewater of tailings ponds. The shape of water reservoir is comparable to that of cross-valley tailings ponds.
Especially the water surface of reservoirs is frozen in winter, showing brightness similar to those of tailings pond. Despite these similarities, the dam in the case of the tailings pond is wide and exhibits stacking layers, and generally shows radial and gradual textural characteristics of tailings discharge which can be captured by the feature maps of the deep network. In contrast, water reservoirs and ponds generally contain a narrow dam, and the water body has relatively uniform hue and texture. (3) Bare land. The reservoir area of inactive tailings ponds only contains a tailings beach with no or a small water area. Bare land in dry or small reservoirs and ponds is similar to the tailings of inactive tailings ponds in terms of shape, texture, and other characteristics, and thus, can cause large false detection. Tailings ponds are usually near the mining area and concentrator facilities. Using the contextual information, negative samples of bare land or dry reservoirs were selected. (4) Cloud. Cloud clusters in remote sensing images can be easily distinguished from tailing ponds during sample preparation. However, their hue, shape, and other characteristics are similar to those of white or gray-white color tailings. Some of them tend to be incorrectly detected as tailings dams.

SSD Network Training and Optimization
As it can produce highly accurate results quickly, we chose an SSD network to carry out the object detection from high-resolution satellite imagery in this study ( Figure 6). A convolutional neural network (VGG16) was used as the fundamental network of the SSD network to extract feature information. Feature maps or layers generated by applying convolutional layers or filters to the input image or another feature map can highlight different features such as lines, background or foreground of the input image. In addition to the basic VGG16 network, the feature maps of the SSD network were created by applying convolution kernels of Conv4_3, Conv6, Conv7, Conv8_2, Conv9_2, Conv10_2, and Conv11_2 with progressively decreasing resolutions. A set of detection predictions can be produced by each feature layer. Hence, the SSD network allowed predictions of detections at multiple scales. Default bounding boxes were associated with cells in feature maps. An overall objective loss function was constructed as a weighted sum of the localization loss and confidence loss. During the SSD training, the default boxes were matched to the ground truth boxes to reduce the objective loss function. network (VGG16) was used as the fundamental network of the SSD network to extract feature information. Feature maps or layers generated by applying convolutional layers or filters to the input image or another feature map can highlight different features such as lines, background or foreground of the input image. In addition to the basic VGG16 network, the feature maps of the SSD network were created by applying convolution kernels of Conv4_3, Conv6, Conv7, Conv8_2, Conv9_2, Conv10_2, and Conv11_2 with progressively decreasing resolutions. A set of detection predictions can be produced by each feature layer. Hence, the SSD network allowed predictions of detections at multiple scales. Default bounding boxes were associated with cells in feature maps. An overall objective loss function was constructed as a weighted sum of the localization loss and confidence loss. During the SSD training, the default boxes were matched to the ground truth boxes to reduce the objective loss function. The lengths and widths of tailings ponds were between 50 and 3000 m, which was equivalent to 25 to 1500 pixels. The receptive field of the original SSD was only 740 pixels and so, using this receptive field, it was not possible to detect large tailings ponds. To improve the capacity of the network so that the features of large tailings dams could be extracted, we added additional convolutional layers to the original SSD network and modified the stride to 2 pixels: this increased the size of the receptive field to 2499 pixels. Accordingly, all convolutional layers were enlarged to improve the detection accuracy. This network training process was conducted recursively to optimize the parameters of the SSD model and finally, obtain an ideal model. In specific, interpretation keys and initial tailings dam samples were collected by manual interpretation based on the known location of major mining sites in the official investigation database of tailings dams. The initial samples were used as the training samples for the first model training. If the accuracy of the prediction cannot reach the requirement, newly detected tailings dams will be selected and added in the sample dataset. Using the updated sample dataset, model training and prediction were conducted again until the accuracy met the requirement. In this study, a total of 367 samples were labeled as the initial samples. The model trained by these samples converged at the number of 80K iterations and the detection accuracy was only around 28%. Another 695 tailings ponds in the prediction results were selected manually and added to the sample dataset. Objects which were mistakenly detected as tailings dams were taken as negative samples. The second model training and prediction based on the new samples showed that the model converged when the number of iterations reached 100k, and the accuracy was close to 60%. A number of 438 tailings dams in the detection results were added to the sample dataset as positive samples, and 1400 mistakenly detected objects were taken as negative samples. After the iterations including sample labeling, screening, and correction, a total of 1500 positive samples and 3000 negative samples were created. Using this sample dataset, the SSD model was trained and the tailings dams in the entire study area were predicted. For post-processing, an aggregation algorithm The lengths and widths of tailings ponds were between 50 and 3000 m, which was equivalent to 25 to 1500 pixels. The receptive field of the original SSD was only 740 pixels and so, using this receptive field, it was not possible to detect large tailings ponds. To improve the capacity of the network so that the features of large tailings dams could be extracted, we added additional convolutional layers to the original SSD network and modified the stride to 2 pixels: this increased the size of the receptive field to 2499 pixels. Accordingly, all convolutional layers were enlarged to improve the detection accuracy. This network training process was conducted recursively to optimize the parameters of the SSD model and finally, obtain an ideal model. In specific, interpretation keys and initial tailings dam samples were collected by manual interpretation based on the known location of major mining sites in the official investigation database of tailings dams. The initial samples were used as the training samples for the first model training. If the accuracy of the prediction cannot reach the requirement, newly detected tailings dams will be selected and added in the sample dataset. Using the updated sample dataset, model training and prediction were conducted again until the accuracy met the requirement. In this study, a total of 367 samples were labeled as the initial samples. The model trained by these samples converged at the number of 80K iterations and the detection accuracy was only around 28%. Another 695 tailings ponds in the prediction results were selected manually and added to the sample dataset. Objects which were mistakenly detected as tailings dams were taken as negative samples. The second model training and prediction based on the new samples showed that the model converged when the number of iterations reached 100k, and the accuracy was close to 60%. A number of 438 tailings dams in the detection results were added to the sample dataset as positive samples, and 1400 mistakenly detected objects were taken as negative samples. After the iterations including sample labeling, screening, and correction, a total of 1500 positive samples and 3000 negative samples were created. Using this sample dataset, the SSD model was trained and the tailings dams in the entire study area were predicted. For post-processing, an aggregation algorithm was used to merge the overlapping or adjacent detected bounding boxes in the prediction results. The geographic locations of the center points in each bounding box were calculated. The prediction results of all images were combined to generate the final vector dataset of tailings ponds.
The SSD network was trained within the Caffe framework using a NVIDIA Titan XP GPU, CUDA 2.1 and Intel Xeon E5 GPU. A large batch size can improve the computer memory utilization and speed up the training of a model. Considering the size of the input image slices, the batch size was set to 4 to make maximum use of Titan XP GPU's 12G memory. The initial learning rate was set to 0.0001, and the maximum number of iterations to 100,000. The values of gamma, momentum, and weight decay were set to 0.1, 0.9, and 0.00005, respectively. The learning rate changed with the iteration number: it was 0.00001 when the iteration number reached 20,000, but was set to 0.000001 when the iteration number was above 30,000.

Accuracy Assessment
A TP (true positive) result indicates that image pixels considered to be tailings ponds were correctly detected. A FP (false positive) means that image pixels that did not correspond to tailings ponds were mistakenly extracted as tailings ponds. A FN (false negative) means that image pixels that did correspond to tailings ponds were not extracted. Accuracy assessment metrics, including precision, recall, and the F1 score, were used in this study to assess the performance of the optimized SSD model. The precision is the probability that the pixels which corresponded to tailings ponds in reality were correctly extracted. The recall is the probability of a target being extracted incorrectly. The F1 score is an indicator used to measure the accuracy of the object detection model by taking into account both the precision and the recall. These metrics can be calculated using the following equations: Figure 7 illustrates the loss and accuracy curves that were found for 100,000 iterations of the SSD model. The loss curve quickly decreases to below 0.4 and remains stable up to about 100,000 iterations, which indicates that the network efficiently learned the data. An accuracy of 84.5% was achieved at the same time.

Detection Results of SSD
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 19 was used to merge the overlapping or adjacent detected bounding boxes in the prediction results. The geographic locations of the center points in each bounding box were calculated. The prediction results of all images were combined to generate the final vector dataset of tailings ponds. The SSD network was trained within the Caffe framework using a NVIDIA Titan XP GPU, CUDA 2.1 and Intel Xeon E5 GPU. A large batch size can improve the computer memory utilization and speed up the training of a model. Considering the size of the input image slices, the batch size was set to 4 to make maximum use of Titan XP GPU's 12G memory. The initial learning rate was set to 0.0001, and the maximum number of iterations to 100,000. The values of gamma, momentum, and weight decay were set to 0.1, 0.9, and 0.00005, respectively. The learning rate changed with the iteration number: it was 0.00001 when the iteration number reached 20,000, but was set to 0.000001 when the iteration number was above 30,000.

Accuracy Assessment
A TP (true positive) result indicates that image pixels considered to be tailings ponds were correctly detected. A FP (false positive) means that image pixels that did not correspond to tailings ponds were mistakenly extracted as tailings ponds. A FN (false negative) means that image pixels that did correspond to tailings ponds were not extracted. Accuracy assessment metrics, including precision, recall, and the F1 score, were used in this study to assess the performance of the optimized SSD model. The precision is the probability that the pixels which corresponded to tailings ponds in reality were correctly extracted. The recall is the probability of a target being extracted incorrectly. The F1 score is an indicator used to measure the accuracy of the object detection model by taking into account both the precision and the recall. These metrics can be calculated using the following equations: F1-score = 2 × (Precision × Recall)/(Precision + Recall), Figure 7 illustrates the loss and accuracy curves that were found for 100,000 iterations of the SSD model. The loss curve quickly decreases to below 0.4 and remains stable up to about 100,000 iterations, which indicates that the network efficiently learned the data. An accuracy of 84.5% was achieved at the same time.  The confidence threshold had a great impact on the monitoring accuracy. We gradually increased the confidence threshold from 0.1 to 0.6 in steps of 0.1 and recorded the detection performance of the model for different confidence thresholds. Table 2 lists the TP, FP, FN, precision, recall rate, and F1 score for the different thresholds in this range. The best confidence setting was determined by considering the F1 score, accuracy, and recall. As the confidence threshold increased from 0.1 to 0.6, the number of false detections gradually decreased; however, the number of tailings ponds that were not detected gradually increased. As the confidence threshold increased, the accuracy increased, and the recall rate decreased. The highest F1 score was obtained for a confidence threshold of 0.3. The confidence threshold was, therefore, set to 0.3 in order to achieve the best performance using the SSD model The deep learning-based method that was proposed in this study aimed to address the problem that traditional monitoring methods cannot be used for large-scale and high-frequency monitoring of tailings ponds. The deep learning method identified the tailings ponds automatically and with satisfactory accuracy. Compared with existing methods, our approach has several advantages. First, the deep learning method can extract the characteristics of tailing ponds automatically without the need for manual interpretation. Possible errors caused by the subjectivity of manual interpretation are, thus, reduced. In addition, the extraction can be performed efficiently with a high degree of accuracy. After considering the characteristics of tailings ponds in high-resolution remote sensing images, the original SSD network was fine-tuned to improve the detection accuracy, which allowed the location of tailings ponds in the study area to be detected effectively ( Figure 8). By applying the SSD to the whole Jing-Jin-Ji region, the tailings ponds were extracted from the 2-m multispectral Gaofen-1 imagery within 1 h using 2 GPUs. To enhance the adaptability of the SSD model to complex scenes, samples with different surrounding information were added in the training dataset. During the process of sample preparation, the bounding boxes of the samples containing surrounding areas were labeled. Hence, features such as edges, corner points, shape, context, and sematic information of both tailings dams and their neighborhoods were extracted in the multi-level feature maps of SSD. The ability of the deep neural network for object detection can be improved by increasing the diversity of training data. To enhance the adaptability of the SSD model to complex scenes, samples with different surrounding information were added in the training dataset. During the process of sample preparation, the bounding boxes of the samples containing surrounding areas were labeled. Hence, features such as edges, corner points, shape, context, and sematic information of both tailings dams and their neighborhoods were extracted in the multi-level feature maps of SSD. The ability of the deep neural network for object detection can be improved by increasing the diversity of training data. Tailings ponds are normally composed of the dam body, tailings, and waste water. The textures and tones of these are very similar to those of bare land, water bodies, and other land-cover types, respectively. In addition, the tailings ponds in the study area varied a lot in terms of shape, tone, background, type, and scale. Hence, the model misidentified the classes of vegetation, bare land, water, snow, and cloud as tailings ponds in some cases, which resulted in a low level of precision. To address this limitation, the network architecture could be further improved to enhance the performance and generalization ability of the model [51,52]. The model training and detection process could be performed on a cloud platform, which might raise the detection efficiency. It should be possible to extend the proposed technique to detect tailings dams at national and global scales in the future.

Tailings Dams in the Jing-Jin-Ji Region
The detection results of the SSD network were validated and classified into different types using the GF-1 imagery by image interpreters. False positive detection or objects which were mistakenly detected as tailings ponds were identified and excluded from the results. The distribution of 246 false positive targets and exemplary results were illustrated in Figure 9. The misclassification was mainly distributed in the mountainous areas where the targets related to mining activities were located. These targets include open pit, waste rock dump, and stone quarry. Reservoirs with similar characteristics to tailings ponds were mistakenly classified as well. There are some other types of false positive targets such as landfills and substations, but their number is relatively small.  A total of 2221 tailings ponds were finally recognized across the Jing-Jin-Ji region ( Figure 10). The detected tailings ponds can serve as the basis for analyzing the distribution of tailings facilities. By type, there were 1208 cross-valley ponds, 239 hillside ponds, and 774 stockpile tailings ponds, which accounted for 54.39%, 10.76%, and 34.85% of the total, respectively. According to the extraction results, there are a large number of tailings ponds in the Jing-Jin-Ji region, particularly in Hebei province. Spatially, the tailings ponds are unevenly distributed. They are mainly distributed in the northern and western part of Hebei province, and no tailings ponds were detected in the southeastern A total of 2221 tailings ponds were finally recognized across the Jing-Jin-Ji region (Figure 10). The detected tailings ponds can serve as the basis for analyzing the distribution of tailings facilities. By type, there were 1208 cross-valley ponds, 239 hillside ponds, and 774 stockpile tailings ponds, which accounted for 54.39%, 10.76%, and 34.85% of the total, respectively. According to the extraction results, there are a large number of tailings ponds in the Jing-Jin-Ji region, particularly in Hebei province. Spatially, the tailings ponds are unevenly distributed. They are mainly distributed in the northern and western part of Hebei province, and no tailings ponds were detected in the southeastern part of the region. At the county level, tailings ponds were detected in 49 counties and districts within the study area. The area of counties containing tailings pond accounts for more than half of the total area of Hebei province. Among the 49 counties with tailings pond, Laiyuan county within Baoding city has the largest density of tailings ponds (around 200/km 2 ). At the city level, the tailings ponds in this region are mainly located in Chengde, Tangshan, and Zhangjiakou. In contrast, there are a few in Baoding and Shijiazhuang. There are also a small number of tailings dams in Beijing, most of which are located in Miyun district in the northeastern part of the city. No tailings ponds were detected in Tianjin. The three cities with the largest number of tailings ponds are Chengde (686), Tangshan (440), and Zhangjiakou (446), all of which are in Hebei province. The amount of iron ore production in these cities is also very large, ranking in the top three in Hebei province. Based on the detection results and the digital elevation model (DEM) data, we further analyzed the distribution of tailings ponds in areas with different elevations (Figure 11). Cross-valley tailings ponds were found to be mainly located in areas with altitudes of 200-800 m. Comparing with the other two types, the area of cross-valley tailings dams is usually larger. Built in a high-altitude area with large reservoir capacity, collapse of these dams can cause serious damage and loss to neighbor communities. Hillside tailings ponds were found to be mainly distributed in areas with altitudes of 200-400 m. The stockpile tailings ponds that had been detected were mainly located at altitudes of 50-400 m. By looking at the corresponding topographic characteristics, it could be seen that the tailings ponds are mainly located at high altitude in remote mountainous areas with steep slopes (the Yanshan Mountains and Taihang Mountains). Most of the ponds are located in small valleys, particularly the cross-valley and hillside tailings ponds. The stockpile tailings ponds are mainly located in the large, flat areas in the valleys or near rivers. The spatial distribution of tailing ponds exhibits obvious patterns of spreading and scattering along mountain valleys, river valleys, and rivers. Overall, tailings ponds tend to be clustered in valleys in the study area. Based on the detection results and the digital elevation model (DEM) data, we further analyzed the distribution of tailings ponds in areas with different elevations (Figure 11). Cross-valley tailings ponds were found to be mainly located in areas with altitudes of 200-800 m. Comparing with the other two types, the area of cross-valley tailings dams is usually larger. Built in a high-altitude area with large reservoir capacity, collapse of these dams can cause serious damage and loss to neighbor communities. Hillside tailings ponds were found to be mainly distributed in areas with altitudes of 200-400 m. The stockpile tailings ponds that had been detected were mainly located at altitudes of 50-400 m. By looking at the corresponding topographic characteristics, it could be seen that the tailings ponds are mainly located at high altitude in remote mountainous areas with steep slopes (the Yanshan Mountains and Taihang Mountains). Most of the ponds are located in small valleys, particularly the cross-valley and hillside tailings ponds. The stockpile tailings ponds are mainly located in the large, flat areas in the valleys or near rivers. The spatial distribution of tailing ponds exhibits obvious patterns of spreading and scattering along mountain valleys, river valleys, and rivers. Overall, tailings ponds tend to be clustered in valleys in the study area. 50-400 m. By looking at the corresponding topographic characteristics, it could be seen that the tailings ponds are mainly located at high altitude in remote mountainous areas with steep slopes (the Yanshan Mountains and Taihang Mountains). Most of the ponds are located in small valleys, particularly the cross-valley and hillside tailings ponds. The stockpile tailings ponds are mainly located in the large, flat areas in the valleys or near rivers. The spatial distribution of tailing ponds exhibits obvious patterns of spreading and scattering along mountain valleys, river valleys, and rivers. Overall, tailings ponds tend to be clustered in valleys in the study area.  Tailings dams are normally monitored by regular visual inspection and in situ equipment. Improvement of management procedures in mining sectors can prevent and reduce tailings dam failures to a large extent. In addition to the tailings dam inventory data that can be obtained from high-resolution satellite images, satellite data from multiple sources can be used to monitor critical features of tailings dams remotely. Interferometric synthetic aperture radar (InSAR) and light detection and ranging (LiDAR) data can be used to monitor structural changes in and displacement of a tailings dam before failure [29,53,54]. Optical and thermal satellite data can be used to monitor environmental impacts such as water and soil pollution near tailings dams [55,56]. Based on multi-resource remote sensing data, monitoring systems and platforms can help reduce the probability of failures effectively and also mitigate the consequences.

Conclusions
To meet the requirements of fast and accurate extraction of tailings ponds, a target detection method based on Single Shot Multibox Detector deep learning was developed in this study. Based on the range of characteristics of tailings ponds seen in satellite imagery, the SSD network was improved and adjusted. The results produced a detection accuracy of 90.2% and recall rate of 88.7%. Based on the optimized SSD model, tailings ponds were recognized automatically and rapidly from 2-m resolution GF-1 satellite imagery. A total of 2221 tailing ponds were extracted in the Jing-Jin-Ji region in northern China. The majority of tailings ponds were found to be located at high altitudes in remote mountainous areas with steep slopes within this region. At the city level, the biggest concentrations of tailings ponds were found to be in Chengde, Tangshan, and Zhangjiakou. These results prove that the deep learning method can be used effectively for the detection of complex land-cover features from remote sensing images. The application of the proposed method at national and global scales can be investigated in future work.