Object-Oriented Open-Pit Mine Mapping Using Gaofen-2 Satellite Image and Convolutional Neural Network, for the Yuzhou City, China

: Our society’s growing need for mineral resources brings with it the associated risk of degrading our natural environment as well as impacting on neighboring communities. To better manage this risk, especially for open-pit mine (OM) operations, new earth observation tools are required for more accurate baseline mapping and subsequent monitoring. The purpose of this paper is to propose an object-oriented open-pit mine mapping (OOMM) framework from Gaofen-2 (GF-2) high-spatial resolution satellite image (HSRSI), based on convolutional neural networks (CNNs). To better present the different land use categories (LUCs) in the OM area, a minimum heterogeneity criterion-based multi-scale segmentation method was used, while a mean area ratio method was applied to optimize the segmentation scale of each LUC. After image segmentation, three object-feature domains were obtained based on the GF-2 HSRSI: spectral, texture, and geometric features. Then, the gradient boosting decision tree and Pearson correlation coefficient were used as an object feature information reduction (FIR) method to recognize the distinguishing feature that describe open-pit mines (OMs). Finally, the CNN was used by combing the significant features to map the OM. In total, 105 OM sites were extracted from the interpretation of GF-2 HSRSIs and the boundary of each OM was validated by field work and used as inputs to evaluate the open-pit mine mapping (OMM) accuracy. The results revealed that: (1) the FIR tool made a positive impact on effective OMM; (2) by splitting the segmented objects into two groups, training and testing sets which are composed of 70% of the objects, and validation sets which are formed by the remaining 30% of the objects, then combing the selected feature subsets for training to achieve an overall accuracy ( OA ) of 90.13% and a Kappa coefficient ( KC ) of 0.88 of the whole datasets; (3) comparing the results of the state-of-the-art method, support vector machine (SVM), in OMM, the proposed framework outperformed SVM by more than 7.28% in OA , 8.64% in KC , 6.15% in producer accuracy of OM and by 9.31% in user accuracy of OM. To the best of our knowledge, it is the first time that OM information has been used through the integration of multiscale segmentation of HSRSI with the CNN to get OMM results. The proposed framework can not only provide reliable technical support for the scientific management and environmental monitoring of open pit mining areas, but also be of wide generality and be applicable to other kinds of land use mapping in mining areas using HSR images.


Introduction
Open-pit mining can damage the natural environment, and is prone to cause water pollution, air pollution, solid waste pollution, and geological disasters [1][2][3][4][5]. Mine environmental monitoring, mainly open-pit mine (OM) monitoring, has always been the top priority of mine governance and reclamation. Currently, China is speeding up green mine construction and strengthening the mine area ecological protection. Obtaining detailed information of OM is essential for the national project.
Traditional remote sensing methods addressing OM problems were mainly based on combining visual interpretation and field surveys. The results provided by the analysts are commonly subjective, time-consuming, and costly, even if the results are reliable [6]. Nowadays, with the development of high spatial resolution satellite imagery (HSRSI) and machine learning methods, many researchers have applied machine learning methods to classify the land cover in OM areas [7][8][9] and most of the algorithms are pixel-based [10][11][12][13][14][15].
Although the pixel-based open-pit mine mapping (OMM) approaches have achieved satisfactory OMM results, they usually ignore geometric and contextual information in multi-source image data, especially by using HSRSI [16]. Recently, some researchers have applied object-oriented methods where the units are objects of image, which are composed of pixels with similar spectral characteristics, to map the OM area [17][18][19][20]. Effective object-oriented open-pit mine mapping (OOMM) needs to define object-features from remote sensing and thematic data, e.g., information of image band, texture, and geometric features, and the number ranges from several to dozens. Nonetheless, for feature selection, there is no general rule that will affect the OOMM accuracy. When many features are available, it is a complicated process to extract appropriate features for different research areas, and a variety of OOMM methods can also be utilized [21].
Recently, one of the newest hot topics in machine learning and pattern recognition is deep learning. In deep learning, the most discriminative and representative features can be hierarchically learnt in end-to-end fashion [22]. This breakthrough was due to the interest in modeling advanced feature representations using multilayer neural networks without the need to manually design features or rules. A popular deep learning method, convolutional neural networks (CNNs), has achieved the latest results in multiple fields, e.g., visual recognition [23], image retrieval [24], and scene annotation [25]. Due to its advantages in advanced scene understanding and feature representation, CNN shows great potential in several remote sensing tasks, such as vehicle detection [26,27], road network extraction [28], remotely sensed scene classification [25,29], and semantic segmentation [30].
In this paper, a new effective OOMM framework for Gaofen-2 (GF-2) HSRSI based on CNNs is proposed. First, a multi-scale segmentation (MSS) method is used to extract three object-feature domains from the GF-2 HSRSI: spectral, texture, and geometric features. Then, gradient boosting decision tree (GBDT) and the Pearson correlation coefficient are used as an object feature information reduction (FIR) tool to recognize the important features that describe OMs. Finally, the CNN was used by combing the significant features to map the OM, and the mapping results were compared with that of the support vector machine. In total, 105 OM sites were extracted from the interpretation of GF-2 HSRSI and the boundary of each OM was validated by field work and used as inputs to evaluate the OMM accuracy. This research aims to explore the applicability of FIR methods and CNN algorithm under an object-oriented framework for OMM using 'GF-2' HSRSI.
The remainder of the paper is organized as follows: Section 2 describes the study area, the data used in this study, and introduces the detailed OOMM framework; Section 3 presents the experimental results; discussions are given in Section 4, followed by conclusions in the last section.

Study Area
The study area is located in the north of Yuzhou City, Henan Province, between 113.375°E to 113.561°E and 34.256°N to 34.352°N with an area of 102.59 km 2 ( Figure 1). The terrain types in this area are mainly mountains and hills. The elevation of this area ranges from 128 to 779 m with an average of 454 m. The area has a warm temperate monsoon climate in the north, with four distinct seasons. The study area is sited in the transitional zone between the Yudong Plain and the Funiu Mountain Range. It is Songying platform uprise, with Baisha and Yuzhou city syncline, Huishan, Fenghouling anticline, and Jiaozishan anticline. Yuzhou City is rich in mineral resources, while the main mineral deposits are coal, bauxite, iron, ceramic clay, limestone, sulfur, and so on, and most of them are open-pit mining. Due to many mountains and hills, rich precipitation, and open-pit mining activities, it may cause a lot of problems to the environment, which conflicts with the green mining policy of the Chinese government. The mapping of the OM is necessary for strengthening the ecological construction of mines in the future studies and government decision makers.

Data Sources
A GF-2 HSRSI captured on 16/04/2018 was employed in this study, which has four multispectral bands (Red, Green, Blue, and Near Infrared) with a spatial resolution of 4 m. Land use categories (LUCs) of the study area were classified into seven dominating groups, including OM, waste-dump area (WDA), buildings (BUI), vegetation (VEG), road (ROA), water (WAT), and bare soil (BAS), according to the land use map provided by the Geological Environmental Monitoring Institute of Henan Province.
OM areas in this study area were visually identified on GF-2 images. Then, a series of field surveys were carried out from 27 August to 26 September 2018, with the help of Geological Environmental Monitoring Institute of Henan Province. After the field surveys, 105 Oms, which covered a total area of 9.36 km 2 , were mapped, accounting for 9.12% of the area of the study area. The largest area of OM is 0.65 km 2 while the smallest is 1161 m 2 . Finally, all the mapped OMs were subsequently digitized and rasterized into the same resolution (4 × 4 m) as the HSRSI used for the study in Environmental Systems Research Institute's ArcGIS software (Version 10.3.0). Figure 2 shows the open-pit visual interpretation key and field photo of the field work in the study area of this paper.

Methodology
The proposed OOMM framework (see Figure 3) is composed of three steps: image segmentation, feature selection, and open-pit mine mapping. In the first step, image objects were obtained from GF-2 HSRSI by using the MSS method based on an optimal segmentation scale selection algorithm. Then, a sample set was created, and the objects in the sample set were labeled by using the centroid inclusion principle, in which their centroids belong to a certain LUC, by combing the objects with the existing OMs and land use vectors.
In the second step, a FIR method based on the GBDT and Pearson correlation coefficient was utilized to reduce the feature sample data size. The importance of features was computed, and the features with the least importance values were dropped. Then, the relationship among these remaining features were analyzed and the most relevant features were used in the OMM procedure.
In the final OMM progress, after reducing the feature sets, two thirds of the objects were assigned as training and testing sets (TTS) and the remaining one third as validation sets (VS). Then, a CNN model was applied to classify the study area, and finally, the accuracies of classification were assessed.

Image Segmentation
For object-oriented methods, the necessary and first pre-condition is image segmentation, because the contour quality (such as shape and size) of the target object directly affects the subsequent image classification accuracy [31]. There are several segmentation approaches, but multi-scale segmentation (MSS), which can overcome the limitation that a single segmentation scale cannot, by extracting all types of the target object and considering the characteristics of multiple layers and multiple patterns on the actual surface [32], is selected in this study.
A bottom-up region-merging technique is used to form an object in the MSS procedure, where the smallest object contains one pixel. Based on the chosen scale (ScP), shape (ShP), and color (CP) parameters, which define the growth in heterogeneity between adjacent image objects, smaller image objects are merged into larger ones. It is known that a larger ScP leads to a larger image object, which may cause over-segmentation, while a smaller ScP results in a small size image object, which may lead to under-segmentation [33,34]. ScP is an abstract term that determines the maximum allowable heterogeneity in the resulting image object [35]. The other two important parameters are CP (representing the uniformity of the spectrum) and ShP (used to define the texture uniformity of the resulting image object). The weights of CP and ShP range from 0 to 1 and their weights are sum to 1 in the eCognition ® software (Version 9.0.1). ShP is generally split into two categories: smoothness, which is employed to optimize resulting image objects by considering the smoothness of their borders within the ShP; and compactness, which is employed to optimize the resulting image objects in regard to the overall compactness within the ShP. Besides these three parameters, we have to set a threshold t for the bottom-up merge process. The increase in heterogeneity has been calculated before fusing two adjacent objects. If the result goes beyond the threshold t, the segmentation stops.
In order to obtain good segmentation results, the segmented image objects can completely represent the contour information of a target ground object, and at the same time, the interior of the target ground object will not be segmented too "fragmented". The optimal segmentation scale of the object is calculated. In this study, a mean area ratio (MAR) method was used to calculate the best ScP by using Equation (1). This method usually selects several targets in a certain type of ground objects to replace all the ground objects of the type, and then calculates the MAR of each type according to the Equation (1) to determine the optimal segmentation scale.
where is the mean area ratio, is the total number of LUC of the same type in the entire image, is the number of objects generated by the -th target feature segmentation, and its value is greater than or equal to 1, indicates the actual area of the -th target LUC while is the total area of the -th garget objects generated by the image segmentation.

Object Features Calculation
In this study, a total of 58 features, which was composed of three object-feature domains (OFDs), which were layer features (LFs), texture features (TFs), and geometry features (GFs) were extracted from the GF-2 HSRSI by utilizing the eCognition ® software (Version 9.0.1) (see Table 1) [36,37]. GFs (14) Area, Border length, Border index, Length, Width, Length-width ratio, Shape index, Roundness, Elliptic fit, Rectangular fit, Compactness, Main direction, Asymmetry, Density The stdv value is the standard deviation of all pixels' intensity values that form an image object. The Ratio value is the average value of the image object layer divided by the sum of the average value of all layers. The MaxDiff value is the absolute value of the difference between the maximum object average value ( ( � ( ) )) and the minimum object average value ( ( � ( ) )) in each layer, divided by the object brightness B, which is defined as the sum of the object averages in the same layer ( � ( ) ) divided by the corresponding number of layers ( ).
Shadow vegetation index (SVI), normalized difference vegetation index (NDVI), normalized difference water index (NDWI), soil brightness index (SBI), soil adjust vegetation index (SAVI), and ratio vegetation index (RVI), is calculated by using Equations (4)- (9): where , , is the reflectance of near-infrared, red, and green band, respectively. Mean Diff. to Neighbors is the mean difference between the feature value of an image object and its neighbors of a selected class. Mean Diff. to brighter/lower neighbors is the mean difference between the feature value of an image object and the feature values of its neighbors of a selected class, which have brighter/lower values than the image object itself.
The gray level co-occurrence matrix (GLCM) and gray level co-occurrence vector (GLCV) formed the texture features in this study. Eight-and four-connected textures were calculated from GLCM and GLCV, respectively [38]. In this research, we calculated only the grey-level's co-occurrence frequencies for all directions of neighboring pixels in symmetric matrices, and a sum of all directional GLCMs (GLCMall dir.) and GLCVs (GLCVall dir.) were used to directly calculate eight and four rotationinvariant texture measures of each band per image object using eCognition ® software [39].
The geometrical features used in this study were area, border length, border index, length, width, length-width ratio, shape index, roundness, elliptic fit, rectangular fit, compactness, main direction, asymmetry, and density. The border length of an image object is defined as the sum of the image object edges shared with other image objects or located at the edge of the entire scene. The border index feature is calculated as the ratio between the border lengths of the image object and the smallest enclosing rectangle. The length-width ratio is the ratio of length and width of an object. The shape index is calculated by dividing the boundary length feature of the image object by four times the square root of its area. The roundness feature is calculated by the difference of the enclosing ellipse and the enclosed ellipse. The calculation of elliptic fit is based on an ellipse with the same area as the selected image object. The calculation of the rectangular fit feature is based on a rectangle with the same area as the image object. The compactness feature of an image object is the product of the length and the width, divided by the number of pixels. The definition of the main direction feature of an image object is the direction of the eigenvector that belongs to the larger of the two eigenvalues. The asymmetry feature describes the relative length of an image object, compared to a regular polygon. The density feature is calculated by dividing the number of pixels forming the image object by its approximate radius according to the covariance matrix.

Feature Selection
Large feature sets may cause numerous problems in the classification process, such as inefficiency due to the large number of resources [40], the accuracy loss when the feature number is significantly larger than the optimal feature number [41], and unrelated inputs features that may cause the model to over-fit. Thus, eliminating the redundancy or relevance of features in the input layer is very crucial to improve the classification accuracy of a specific research area.
In this study, the GBDT based on the LightGBM library in Python language was used to calculate the importance value of each feature [42,43]. The GBDT method is a combination of decision tree and ensemble learning techniques and was designed to improve the performance of a single predictive model by combining many models [44]. It is a linear training process that trains multiple trees in series. Each tree in the model learns the classification results and residuals of all previous trees. The final result is a weighted accumulation of the node values of all decision trees. Then, the importance values of all features are normalized and are converted into importance percentages. The features with an importance value of 0 indicate that they have no contribution to the classification prediction process and need to be removed. In addition, according to specific research needs and accuracy requirements, the cumulative importance threshold can be artificially set by using the trial-and-error method, and the features beyond the threshold range can be removed. After that, a collinearity analysis of the remaining features was carried out by using the Pearson correlation coefficient, because even though features play an important role in classification, there will still be cases where some features are strongly related.

Convolutional Neural Network (CNN)
A CNN is a multilayer feedforward neural network specifically intended for handle large-scale image or sensory data in the multiple arrays formation by taking global and local stationary characteristics into consideration [45]. The main components of a CNN usually consist of multiple layers interconnected with each other via a group of learnable weights and biases [46]. Each layer is filled with small patches of images that survey the entire image to acquire distinct local and global scale features. These image patches are generalized by other convolutional layers and pooling/subsampling layers within the CNN framework until advanced features are obtained and fully connected and classified [47]. In addition, there may be several feature maps after each convolution, and convolution kernels weights in the same feature map are shared. With this setting, the network can learn different functions while keeping the parameters number controllable. In addition, nonlinear activations (such as sigmoid, hyperbolic tangent, linear element correction) functions were utilized outside the convolutional layer to enhance non-linearity [48]. In this paper, the Keras library in the Python language was applied to develop the CNN algorithm [49].

Support Vector Machine (SVM)
SVM is a multi-variable non-linear machine learning method with advantages of quick training converge, high training efficiency and good generalization performance [50]. SVMs are mainly constructed based on the structured risk minimization principle and statistical learning theory. Generally, the input variables are mapped from a relatively low feature space to a higher-dimensional feature space through the kernel function of the SVM model to solve the problems of linear inseparability and data dimension calculation. The radial basis function (RBF) is selected as the kernel function of the SVM in this study, because of its high efficiency of nonlinear mapping [10]. Besides the selection of the kernel function, two other parameters need to be set appropriately in the SVM; the penalty term C and the kernel function parameter gamma. A cross-validation stage method, which was proposed by [51], is recommended for the optimal combination of these three parameters. In this paper, the Sklearn library in Python language was used to develop the SVM algorithm [52].

Accuracy Assessment
To assess the performance of the proposed CNN-based OMM (CNNOMM) method, four standard evaluation metrics based on the confusion matrix are used. These evaluation metrics are overall accuracy (OA), Kappa coefficient (KC), producer accuracy (PA), and user accuracy (UA), and the values of these evaluation metrics were compared with that of SVM. OA indicates the overall performance of the models. KC was used to evaluate the reliability of the models. PA shows how often real ground samples are correctly shown on the classified map while UA indicates how often the class on the map will actually be present on the ground. Moreover, the McNemar's test is also used to test whether there is a statistical significant difference between the CNNOMM method and the SVM-based OMM method. The McNemar's test is a renowned statistical test for testing if the proportions of two dichotomous variables are equal [53].

Results of Image Segmentation
The trial-and-error method is used to find the suitable CP, ShP, smoothness, and compactness parameters for the implementation of MSS using eCognition ® software. In order to keep the segmented image objects representing the contour information of a target ground object, the CP, ShP, smoothness, and compactness values were set to 0.9, 0.1, 0.5 and 0.5, respectively ( Table 2).
The optimal ScP parameter of each LUC was determined by using the MAR method. The map of MAR values vary with the ScP was shown in Figure 4.   Table 2. Then, the GF-2 image of the study area was segmented into 48,277 objects by using eCognition ® software. Finally, a sample set with 8326 sample objects was created based on the centroid inclusion principle by combing the segmented objects with the existing OMs and land use vectors. All the sample objects were divided into two parts in the ratio of 2:1, one is TTS, and the remaining is VS. The TTS is used to train and test the model, while the VS is used to validate the results. Then, the sample objects in TTS were split into two groups; training set and testing set, at the ration of 7:3. The details of sample set are listed in Table 3.

Results of Feature Selection
The importance value was set as the average value of the 10 times training of GBDT to reduce the random error. Since there is no universal guideline for determining the threshold of cumulative importance value without a trial-and-error method, we set the threshold to set to 0.99 after several trial-and-error attempts to keep the most important features. Features that do not contribute to achieving 99% total importance value were eliminated. Figure 5 shows the overall cumulative importance curve. From  After the elimination of the least important features, the collinearity among the 51 remaining features was analyzed by using the Pearson correlation coefficient. Then, the feature pair whose correlation absolute value is greater than the threshold value was selected, and one feature from the feature pair was removed. The trial-and-error method was also used to determine the threshold value of the Pearson correlation coefficient, which was set to 0.98 in this study to avoid eliminating too many features, and five features were removed. The thermodynamic chart of these 51 features and eliminated five features are shown in Figures 6 and 7. The five eliminated features are Mean_2, Ratio_2, Ratio vegetation index, and Mean Diff. to Neighbors_ (1,3). In total, the features employed in the final OMM were decreased from 58 to 46 (see Table 4).

CNN-Based Open-Pit Mine Mapping (CNNOMM)
A CNN network structure, including two convolutional layers, two pooling layers, two fully connected layers, and one dropout layer, is constructed by using the sequential model in Keras. The form of convolution uses one-dimensional convolution, and the network structure is shown in Figure  8. The number of convolution kernels (CKs) of the convolution layer C1 and C2 is set to 32 and 64, respectively, after several trial-and-error attempts. The size of CKs in C1 is set to 6 × 1, and the step size is set to 1, while the CKs in C2 is set to 3 × 1, and the step size is set to 1. The pooling layer P1 and P2 adopts the maximum pooling method to perform feature compression and extract retained features. The size of the P1 and P2 both are 2 × 1, and the step size for both are 2. The result of convolution and pooling is flattened and linearly mapped to the new feature space through the fully connected layer F1. The parameter of the Dropout layer is set to 0.5, which means that the connection between all the nods will be randomly cut off by 50%. There are seven neurons in the Fully connected layer F2 corresponding to the seven LUCs in this study, which are connected to the softmax classifier to achieve classification and serve as the output layer.
After the CNN is constructed, the TTS is employed to train the model and verify the learning effect. The Cross-Entropy Loss function [54] is selected to judge the differences between the predicted result and the reference value, while the Adam was applied as the optimization function. Figure 9 shows the Convergence curves of TTS of CNN. From Figure 9 we can see that when the epoch reached the 35th time the accuracy of the test samples of the model finally reaches an accuracy of 0.89. At the same time, the loss rate of the test samples also tends to be stable. Then, the well-trained CNN was applied into the total 48,227 objects, Figure 10 shows the CNN-based OMM results.

SVM-Based Open-Pit Mine Mapping (SVMOMM)
We used the same TTS as we used in CNN model, to train and validate the SVM model. Then, the 10-fold cross-validation method [50] was applied to identify the appropriate SVM model parameters. It is suggested that the cost constant C and the kernel function parameter gamma are set to 10 and 0.045, respectively. Finally, the SVM-based OMM results are obtained ( Figure 11).  Figure 11 shows nearly the same performance as Figure 8, which means the SVMOMM model can also achieve satisfied results. However, the omission phenomenon (misclassified OM into WDA) still exists and is more serious than CNNOMM model. We discuss this further in Section 4.1.

Assessment of Mapping Results
The assessment of OMM results are based on VS. The confusion matrix and the accuracy assessment of these two methods are listed in Tables 5 and 6, respectively.  In general, CNNOMM was more effective in classifying all the seven LUCs, with higher PA and UA values, which means the CNNOMM mapped a greater number of correct samples than that of the SVMOMM classified. Among these seven LUCs, the PA values are higher than UA values in OM, WDA, ROA, both in CNNOMM and SVMOMM, and a higher PA in BAS in CNNOMM but a lower PA in SVMOMM. For the other three LUCs, which are BUI, VEG, and WAT, both two methods obtained a high UA. A higher PA means that more objects were misclassified into other LUCs, and a higher UA means the number of mapped samples were less than that of total samples. Figures 12 and 13 shows the incorrect mapping results of CNNOMM and SVMOMM, respectively. The incorrect mapping results shown in Figures 12 and 13 are consistent with that shown in Table 6, as most of the wrongly mapped LUCs are around the mining area. Figure 14 shows the results of the incorrect identification of CNNOMM and SVMOMM together. The yellow color represents the LUC, which is incorrectly mapped by both CNNOMM and SVMOMM. Most of them were WAT or OM LUC category, which indicates that these two LUC categories are easily misclassified between each other; the main reason is discussed in Section 4.1.

McNemar's Test
The McNemar's test is performed on the pair of CNNOMM and SVMOMM models. Table 7 shows the paired classification models, the number of objects correctly classified by one model but misclassified by another model, as well as the corresponding chi-square value and p-value. It can be concluded that: (1) the CNNOMM significantly outperformed SVMOMM based on the chi-square value, which was larger than 3.84 and the p values smaller than 0.05. It is indicated that the mapping performance of the CNNOMM method is significantly improved; (2) CNNOMM and SVMOMM are significantly different.

Affecting Factors of Mapping Accuracy
It is significant that CNNOMM improve the mapping accuracy of all the seven LUCs of the openpit mine area. The accuracy of the five LUCs for non-mining activities such as WAT, VEG, BUI, BAS, and ROA is generally higher than the overall accuracy. The main reason is that these five types of features are common objects of land use types, and their image features, texture features, etc., are more obvious, which is easy to distinguish. There are certain similarities in the land use types which are caused by mining activities (OM and WDA). Sometimes these two LUCs are not easily distinguished because the materials which formed the WDA are extracted from the OM. There are differences caused by the influence of the topography, landforms, and human activities within these two LUCs. This also reflects the difficulty and importance of mining information extraction from one aspect.
There are some misclassifications between OM and WDA. There are two main reasons for these phenomena: one is that the OM and WDA are closely related in the process of their formation. The waste gravel piles and dumps in the OM and WDA are all formed with the formation of the OM. There are natural similarities in spectrum and texture. Another reason is that, as the country vigorously manages the mining environment in recent years, a large number of OMs have been shut down and governed. The lack of human mining activities may result in muck piles of rocks inside the pits. This part of the OM darkens in the spectrum, and the texture begins to appear patchy, which will appear similar to the mine wastes over time, which is easy to be confused ( Figure 15). Figure 15b shows the mapping results of CNNOMM. From Figure 15a, it is hard to accurately determine its type by visual interpretation. According to the field validation of the mapping results in 2018, it can be determined that the area is an open pit and has been abandoned for many years, resulting in confusion in type.

Uncertainties in OMM
Although the CNNOMM model achieves reasonable and promising results in the study area, there are still some uncertainties. Firstly, in the image segmentation progress, uncertainties will be introduced in OMM, because the OMs and land use references cannot be subdivided into exact objects. Although the MAR method can optimize the scale values, which improves the segmentation results, errors are inevitable and will cause a decrease in the OMM accuracy [21]. Secondly, uncertainties may be introduced in the process of selecting features, because the importance of such features with the two methods is not the same. These will also decrease the mapping accuracy of OM, even though their importance can be calculated [22]. Last but not the least, the complete and consistent OM inventory in the study area were insufficient. This insufficiency may also lead to uncertainties, because some newly OM in the research area may not have been recognized in the existing OM inventory datasets.

Conclusions
In this paper, we proposed a hybrid framework of object-oriented open-pit mine mapping to classify the open-pit mines in Yuzhou City from 'GF-2' HSRSIs. The framework utilizes the greater generalization performance of CNN and the effective elimination of redundant features by a FIR tool.
In total, 58 features were extracted from GF-2 HSRSI from three object-feature domains. By using the FIR tools, which were GBDT and Pearson correlation coefficient, 46 features were selected to form the features subset. The final mapping results demonstrate that the proposed FIR tools can provide valid information for OOMM.
A sample set with 8326 sample objects was created by combing the segmented objects with the existing OMs and land use vectors based on a centroid inclusion principle method. Two thirds of the sample set formed the TTS, while the remaining one third was used as VS. The TTS (combined with the features subset) were used to train and test the CNN and SVM model, and the OA, and KC of the remaining VS were found to be 90.13%, and 0.88 for CNN, 84.01% and 0.81 for SVM, respectively. It indicated that the CNN outperformed the SVM in the proposed OOMM framework. Thus, it can be concluded that, based on the features obtained from 'GF-2' HSRSI, our feature information reduction method (combined with CNN) can provide an effective way for OOMM in Yuzhou City.