Automatic Identiﬁcation and Dynamic Monitoring of Open-Pit Mines Based on Improved Mask R-CNN and Transfer Learning

: As the ecological problems caused by mine development become increasingly prominent, the conﬂict between mining activity and environmental protection is gradually intensifying. There is an urgent problem regarding how to e ﬀ ectively monitor mineral exploitation activities. In order to automatic identify and dynamically monitor open-pit mines of Hubei Province, an open-pit mine extraction model based on Improved Mask R-CNN (Region Convolutional Neural Network) and Transfer learning (IMRT) is proposed, a set of multi-source open-pit mine sample databases consisting of Gaofen-1 , Gaofen-2 and Google Earth satellite images with a resolution of two meters is constructed, and an automatic batch production process of open-pit mine targets is designed. In this paper, pixel-based evaluation indexes and object-based evaluation indexes are used to compare the recognition e ﬀ ect of IMRT, faster R-CNN, Maximum Likelihood (MLE) and Support Vector Machine (SVM). The IMRT model has the best performance in Pixel Accuracy ( PA ), Kappa and MissingAlarm , with values of 0.9718, 0.8251 and 0.0862, respectively, which shows that the IMRT model has a better e ﬀ ect on open-pit mine automatic identiﬁcation, and the results are also used as evaluation units of the environmental damages of the mines. The evaluation results show that level I (serious) land occupation and destruction of key mining areas account for 34.62%, and 36.2% of topographical landscape damage approached level I. This study has great practical signiﬁcance in terms of realizing the coordinated development of mines and ecological environments.


Introduction
With the development of remote sensing technology, target recognition has been widely used. It has important application significance and research value in guiding social economic construction and mineral resource development [1,2]. With the land encroachment and vegetation destruction caused by mining development becoming more prominent, the contradiction between mining activity and environmental protection is gradually intensifying [3,4]. As a research hotspot in the field of remote sensing image processing, target identification can effectively identify and dynamically monitor through the mine's roughness texture and radiation intensity so as to guide environmental protection of the mine, as well as management and ecological restoration [5,6].
Throughout the history of rectification and standardization of mining operations in China, the mining management departments attached great importance to illegal and destructive mining activity. However, the identification and monitoring of open-pit mines mainly adopt traditional extraction manual methods [7,8]. The visual interpretation method needs abundant expert interpretation dynamics of forest cover in the Mufu Mountain mining area, Zhang et al. [26] discussed the land cover changes and the landscape remodeling caused by long-term surface mining. It has great significance in realizing the coordinated development of mines and the ecological environment.
With the aim of the above research, this paper proposed an Improved Mask R-CNN (Region Convolutional Neural Network) and Transfer learning (IMRT) model and constructed a set of multisource mine sample database consisting of Gaofen-1, Gaofen-2 and Google Earth satellite images to automatic identify and dynamically monitor open-pit mines [27,28]. Meanwhile, a variety of indicators based on pixels and objects were used to evaluate the open-pit mine identification results quantitatively and verify the feasibility of the proposed method. The IMRT model can give full play to the small target representation ability of low-level features, which can improve the detection accuracy of open-pit mines on the premise that the detection accuracy of conventional targets is not affected. Based on these results, multi-time series monitoring research on key mining areas in Hubei Province was finally studied.

Study Area
The study area is situated in Hubei Province. Hubei Province is located in central China, and its geographical coordinates are Longitude 108°21′~116°08′ E, Latitude 29°02′~33°17′ N [29]. The topography of the province fluctuates greatly, and the landform is diversified. The whole province spans two first-order tectonic units, the Qinling fold system and the Yangtze paraplatform. Various types of magmatic are widespread, and metamorphic rocks are rich in mineral resources [30]. As shown in Figure 1, the iron, copper, gold and silver deposits in Hubei province are mainly distributed in Proterozoic metamorphic rocks, Mesozoic magmatic rocks and their contact metamorphic zone in Daye, Ezhou, Huangshi, Yangxin, Zhushan and Yunxi. The phosphate mines are located in the Han River alluvial plain and hilly region such as Yicheng and Zhongxiang. Limestone and dolomite mines are distributed in Paleozoic sedimentary strata in Yichang, Wuchang, Jingmen, Xiangfan and Tongshan. Coal and pyrite mines mainly exist in the Paleozoic sedimentary rocks in Yichang, Enshi and Jianshi. The quarries are mainly distributed in Shiyan, Suizhou, Huanggang and other places. The experimental data are all from the key mining areas of remote sensing interpretation in Hubei Province with about 5500 km 2 , except for Xiantao, Tianmen and Qianjiang.

Improved Mask R-CNN
As the Figure 2 shows, based on the faster R-CNN framework, Mask R-CNN adds a parallel semantic segmentation branch to conduct the target detection and regression [31,32]. In this model, feature pyramid networks (FPN) + ResNet101 is used to extract the features of the backbone network. After feature extraction, this model used the Region Proposal Network (RPN) to carry out end-to-end training of the target detection frame in open-pit mines. The dilated convolution was also involved in the feature calculation and finally used the RoI (Region of Interest) Align to solve the problem of large actual deviation in the original image [33].

Improved Mask R-CNN
As the Figure 2 shows, based on the faster R-CNN framework, Mask R-CNN adds a parallel semantic segmentation branch to conduct the target detection and regression [31,32]. In this model, feature pyramid networks (FPN) + ResNet101 is used to extract the features of the backbone network. After feature extraction, this model used the Region Proposal Network (RPN) to carry out end-toend training of the target detection frame in open-pit mines. The dilated convolution was also involved in the feature calculation and finally used the RoI (Region of Interest) Align to solve the problem of large actual deviation in the original image [33].

Convolutional Backbone and Dilated Convolution
Convolutional backbone is a series feature maps used by convolutional layer to extract open-pit mines, and it mainly applies the structure of ResNet101. The ResNet101 network is a residual network proposed by four scholars from Microsoft Research, and its internal residual block uses skip connects to alleviate the problem of gradient disappearance caused by increasing depth in the network. The deep network is designed as H(x) = F(x) + x, which can also be converted to a residual function F(x) = H(x) − x. As long as F(x) = 0, this formula constitutes an identity map H(x) = x so that the residuals can be fitted more easily [34]. As Figure 3 shows, Mask R-CNN divides the Resnet101 network into five stages, which are denoted as C1, C2, C3, C4, and C5. The five stages correspond to the output of feature maps of five scales, which are used to build the feature pyramid of the FPN network.
Local receptive fields are a very important concept in the Convolutional Neural Network (CNN). When CNN performs instance segmentation, on account of the final feature map size being much smaller than the size of the input images, the final predicted split mask will be rough. However, the dilated convolution can control the rate of the convolution kernel and obtain different convolution receptive fields [35]. Therefore, dilated convolution solves the contradiction between improving the receptive fields and maintaining the size of the feature map in CNN. Figure 4a shows the local receptive fields with a traditional 3 × 3 convolution kernel, which is the same as the 3 × 3 dilated convolution kernel with a rate of 1. In Figure 4b, when the dilated convolution kernel has a rate of 2, the local receptive field of the convolution kernel increases to 7 × 7. In this paper, a dilated convolution kernel with a rate of 2 is added to the structure of the feature pyramid, and the dilated convolution

Convolutional Backbone and Dilated Convolution
Convolutional backbone is a series feature maps used by convolutional layer to extract open-pit mines, and it mainly applies the structure of ResNet101. The ResNet101 network is a residual network proposed by four scholars from Microsoft Research, and its internal residual block uses skip connects to alleviate the problem of gradient disappearance caused by increasing depth in the network. The deep network is designed as H(x) = F(x) + x, which can also be converted to a residual function F(x) = H(x) − x. As long as F(x) = 0, this formula constitutes an identity map H(x) = x so that the residuals can be fitted more easily [34]. As Figure 3 shows, Mask R-CNN divides the Resnet101 network into five stages, which are denoted as C1, C2, C3, C4, and C5. The five stages correspond to the output of feature maps of five scales, which are used to build the feature pyramid of the FPN network.
Local receptive fields are a very important concept in the Convolutional Neural Network (CNN). When CNN performs instance segmentation, on account of the final feature map size being much smaller than the size of the input images, the final predicted split mask will be rough. However, the dilated convolution can control the rate of the convolution kernel and obtain different convolution receptive fields [35]. Therefore, dilated convolution solves the contradiction between improving the receptive fields and maintaining the size of the feature map in CNN. Figure 4a shows the local receptive fields with a traditional 3 × 3 convolution kernel, which is the same as the 3 × 3 dilated convolution kernel with a rate of 1. In Figure 4b, when the dilated convolution kernel has a rate of 2, the local receptive field of the convolution kernel increases to 7 × 7. In this paper, a dilated convolution kernel with a rate of 2 is added to the structure of the feature pyramid, and the dilated convolution Remote Sens. 2020, 12, 3474 5 of 20 operation is carried out on the output features in each pyramid stage. Finally, the accuracy of mask prediction can be effectively improved in the category prediction stage at the pixel level.
Remote Sens. 2020, 12, x FOR PEER REVIEW  5 of 19 operation is carried out on the output features in each pyramid stage. Finally, the accuracy of mask prediction can be effectively improved in the category prediction stage at the pixel level.

RPN Framework and RoI Align
RPN is a Full Convolutional Neural (FCN) network. RPN carries out an end-to-end training of the open-pit mines target detection frame by adding additional category and regression convolutional layers on the CNN. This framework lays anchor with different proportions on the original image, and, at the same time, it generates candidate boxes which can match the open-pit mines targets of various scales and extract the boundary. RPN first traverses the CNN feature map output with a 3 × 3 sliding window. Mask R-CNN establishes an m × m binary mask to distinguish the front and rear scenes of each target object with the branch of instance segmentation. At the current position in the pixel space of the open-pit mines image, the mapping point of the center sliding window is the anchor [36,37]. Five feature maps, namely, P2, P3, P4, P5, and P6, with different scales are several anchor boxes generated by the RPN, and the preset proposal can obtain the size and coordinates of corresponding areas in the open-pit mines' target image, where P1, P2, P3, P4, and P5 are the feature pyramid. Mask R-CNN abandons the P1 characteristics of stage1 and takes down sampling based on stage5 (P5) to obtain P6 characteristics. Finally, the five feature maps of different scales (P2, P3, P4, P5, and P6) are input into the RPN to generate RoI, respectively. Due to different stride lengths, RoI Align was performed on the corresponding stride of four feature maps (P2, P3, P4, and P5) with different scales. RoI Align aims to solve the problem of large actual deviation in the original image caused by the formation of candidate regions in the quantization process [38]. Concat connection is conducted based on the RoI Align generated and divides the network into three parts: the fully connected prediction class, the fully connected predictive rectangle box, and the full operation is carried out on the output features in each pyramid stage. Finally, the accuracy of mask prediction can be effectively improved in the category prediction stage at the pixel level.

RPN Framework and RoI Align
RPN is a Full Convolutional Neural (FCN) network. RPN carries out an end-to-end training of the open-pit mines target detection frame by adding additional category and regression convolutional layers on the CNN. This framework lays anchor with different proportions on the original image, and, at the same time, it generates candidate boxes which can match the open-pit mines targets of various scales and extract the boundary. RPN first traverses the CNN feature map output with a 3 × 3 sliding window. Mask R-CNN establishes an m × m binary mask to distinguish the front and rear scenes of each target object with the branch of instance segmentation. At the current position in the pixel space of the open-pit mines image, the mapping point of the center sliding window is the anchor [36,37]. Five feature maps, namely, P2, P3, P4, P5, and P6, with different scales are several anchor boxes generated by the RPN, and the preset proposal can obtain the size and coordinates of corresponding areas in the open-pit mines' target image, where P1, P2, P3, P4, and P5 are the feature pyramid. Mask R-CNN abandons the P1 characteristics of stage1 and takes down sampling based on stage5 (P5) to obtain P6 characteristics. Finally, the five feature maps of different scales (P2, P3, P4, P5, and P6) are input into the RPN to generate RoI, respectively. Due to different stride lengths, RoI Align was performed on the corresponding stride of four feature maps (P2, P3, P4, and P5) with different scales. RoI Align aims to solve the problem of large actual deviation in the original image caused by the formation of candidate regions in the quantization process [38]. Concat connection is conducted based on the RoI Align generated and divides the network into three parts: the fully connected prediction class, the fully connected predictive rectangle box, and the full

RPN Framework and RoI Align
RPN is a Full Convolutional Neural (FCN) network. RPN carries out an end-to-end training of the open-pit mines target detection frame by adding additional category and regression convolutional layers on the CNN. This framework lays anchor with different proportions on the original image, and, at the same time, it generates candidate boxes which can match the open-pit mines targets of various scales and extract the boundary. RPN first traverses the CNN feature map output with a 3 × 3 sliding window. Mask R-CNN establishes an m × m binary mask to distinguish the front and rear scenes of each target object with the branch of instance segmentation. At the current position in the pixel space of the open-pit mines image, the mapping point of the center sliding window is the anchor [36,37]. Five feature maps, namely, P2, P3, P4, P5, and P6, with different scales are several anchor boxes generated by the RPN, and the preset proposal can obtain the size and coordinates of corresponding areas in the open-pit mines' target image, where P1, P2, P3, P4, and P5 are the feature pyramid. Mask R-CNN abandons the P1 characteristics of stage1 and takes down sampling based on stage5 (P5) to obtain P6 characteristics. Finally, the five feature maps of different scales (P2, P3, P4, P5, and P6) are input into the RPN to generate RoI, respectively. Due to different stride lengths, RoI Align was performed on the corresponding stride of four feature maps (P2, P3, P4, and P5) with different scales. RoI Align aims to solve the problem of large actual deviation in the original image caused by the formation of candidate regions in the quantization process [38]. Concat connection Remote Sens. 2020, 12, 3474 6 of 20 is conducted based on the RoI Align generated and divides the network into three parts: the fully connected prediction class, the fully connected predictive rectangle box, and the full convolution predicts pixel segmentation. These three parts are represented by the mask, the target border and the border's class of open-pit mines, respectively.

Border Regression and Loss Function
For a given border (P x , P y , P w , P h ), target border regression is utilized to obtain the final regression border (F x , F y , F w , F h ) and make it closer to the real border (G x , G y , G w , G h ). That is, we need to find a mapping f, such that f (P x , P y , P w , where the subscripts x, y, w, and h represent the horizontal distance, the vertical distance, the width and the height of the three types border of center points, respectively. The border regression learns about these transformations, and translation (t x , t y ) and scale zooming (t w , t h ) are calculated based on the parameters of the real border and the predicted border. The calculations are as follows: The objective function is expressed as F * (P) = d T * ϕ 5 (P), d T * is the parameter to learn (* represents x, y, w, h), ϕ5(P) is the eigenvector that predicts the border, and F * (P) is the regression value obtained. The goal is to minimize the difference between the regression value and the true value t * = (t x , t y , t w , t h ). The loss function obtained is as follows: Discrete probability distribution p is used to represent the probability of the target and background of the open-pit mines, in which the real border is labeled as p * = 0 negative label 1 positive label . The loss function is obtained as follows: The mean value of the sigmoid function was calculated for each pixel of the mask, which was defined as the average binary cross entropy loss function LOSS mask . This method can effectively improve the effect of instance segmentation. Therefore, the loss function of Mask-RCNN consists of the three loss functions, including classification loss, regression loss and segmentation loss [39]. The total loss function is as follows:

Transfer Learning
The essence of transfer learning is the transfer and reuse of knowledge. The existing knowledge is called the source domain, while the new knowledge which needs to be learned is called the target domain [40]. According to the definition of transfer learning, it can be divided into three types: distributed differential transfer learning, characteristic differential transfer learning and tag differential transfer learning. Distributed differential transfer learning refers to the difference in the marginal distribution or conditional probability distribution between the source domain and the target domain. Characteristic differential transfer learning refers to the difference in the feature space between the source domain and the target domain. Tag differential transfer learning refers to the difference Remote Sens. 2020, 12, 3474 7 of 20 in the tag space between the source domain and the target domain. Generally, the target domain is different from the source domain in terms of data distribution, characteristic dimensions and model output change conditions [41].
Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 19 Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall x, where the sample set X = (x 1 , . For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: Ӽ У can be obtained from the training sample {x i ,y i }, x i ∈ X,y i ∈ Y. From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if and only if the label space У has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data n S and target domain non-labeled data n T are fixed, we can finally minimize the binary loss function to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of openpit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved.  Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall x, where the sample set X = (x 1 , . For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: Ӽ У can be obtained from the training From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if and only if the label space У has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data and target domain non-labeled data are fixed, we can finally minimize the binary loss function ∑ l f to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of openpit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved. Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution . For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: Ӽ У can be obtained from the training From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if and only if the label space У has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data and target domain non-labeled data are fixed, we can finally minimize the binary loss function to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of openpit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved.

Data Source and Identification Index
Data sources include image data and auxiliary data. Image data are mainly used for extraction Mathematically, transfer learning contains two elements: domains and t U contains two elements: sample feature space Ӽ and the probability distribut where the sample set X = ( . For a domain U the task Г on U as containing two elements: the label space У and the function random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ population, and У is called the label space of Y, where f: From the view of probability, f can be though probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The tw and only if the label space У has at least one difference when compared to p(y|x). When the source domain-labeled data are fixed, we can finally minimize the binary loss function ∑ l i=1 the learning effect of the target task. In the absence of the target domain calib transfer can be accomplished by reducing the distribution difference between the target domain [43].
Due to the small number of open-pit mine manual labeling target data network should be pre-trained on the dataset firstly to prevent the model fro seen that the label space of the source domain (a dataset has already been trai (untrained open-pit mines dataset) are different, and the transfer learning of to tag-differential transfer learning. Thus, we must make full use of the Mas domain to guide the target recognition of the new open-pit mine dataset already trained the weights for automatic identification of about 80 categori pedestrians. By selecting pre-trained ResNet101 to initialize the model, the t pit mines were saved in a dataset (source domain). As the Figure 5 shows, generated 1024 features through the fully connected layer, and in the Softmax finally generated. On the basis of source domain weight, we used transf category with the closest characteristics of open-pit mines, and the generaliza model was greatly improved.
. For a dom the task Г on U as containing two elements: the label space У and the fun random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n population, and У is called From the view of probability, f can be th probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. T and only if the label space У has at least one difference when compare p(y|x). When the source domain-labeled data and targ are fixed, we can finally minimize the binary loss function the learning effect of the target task. In the absence of the target domain transfer can be accomplished by reducing the distribution difference bet the target domain [43].
Due to the small number of open-pit mine manual labeling target network should be pre-trained on the dataset firstly to prevent the mod seen that the label space of the source domain (a dataset has already been (untrained open-pit mines dataset) are different, and the transfer learnin to tag-differential transfer learning. Thus, we must make full use of the domain to guide the target recognition of the new open-pit mine dat already trained the weights for automatic identification of about 80 cat pedestrians. By selecting pre-trained ResNet101 to initialize the model, pit mines were saved in a dataset (source domain). As the Figure 5 sho generated 1024 features through the fully connected layer, and in the Sof finally generated. On the basis of source domain weight, we used t category with the closest characteristics of open-pit mines, and the gener model was greatly improved.  Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall x, where the sample set X = (x 1 ,

Experiments
. For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: Ӽ У can be obtained from the training sample {x i ,y i }, x i ∈ X,y i ∈ Y. From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if and only if the label space У has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data n S and target domain non-labeled data n T are fixed, we can finally minimize the binary loss function to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of openpit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved.

Experiment
. For a domain U = {Ӽ, φ the task Г on U as containing two elements: the label space У and the function f, wher random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the population, and У is called the label space of Y, where f: Ӽ У can be obtained fro From the view of probability, f can be thought of as probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks and only if the label space У has at least one difference when compared to conditio p(y|x). When the source domain-labeled data and target domain no are fixed, we can finally minimize the binary loss function the learning effect of the target task. In the absence of the target domain calibration d transfer can be accomplished by reducing the distribution difference between the sou the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the network should be pre-trained on the dataset firstly to prevent the model from overf seen that the label space of the source domain (a dataset has already been trained) and (untrained open-pit mines dataset) are different, and the transfer learning of open-pi to tag-differential transfer learning. Thus, we must make full use of the Mask R-CN domain to guide the target recognition of the new open-pit mine dataset [44]. Ma already trained the weights for automatic identification of about 80 categories, such pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer pit mines were saved in a dataset (source domain). As the Figure 5 shows, the pregenerated 1024 features through the fully connected layer, and in the Softmax layer, 8 finally generated. On the basis of source domain weight, we used transfer learn category with the closest characteristics of open-pit mines, and the generalization per model was greatly improved.

Data Source and Identification Index
Data sources include image data and auxiliary data. Image data are mainly use and verification of open-pit mines, while auxiliary data are mainly used to sh is a discrete random variable with uniform distribution. The set Y = (y 1 , y 2 , . . . , y n ) ∈ Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of Mathematically, transfer learning contains two elements: domains and tasks [42]. The domain U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall where the sample set X = (x 1 , . For a domain U = {Ӽ, φ(Ӽ)}, we define the task Г on U as containing two elements: the label space У and the function f, where У is a discre random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of th population, and У is called the label space of Y, where f: Ӽ У can be obtained from the trainin sample {x i ,y i }, x i ∈ X,y i ∈ Y. From the view of probability, f can be thought of as the condition probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, and only if the label space У has at least one difference when compared to conditional probabilit to improv the learning effect of the target task. In the absence of the target domain calibration data, knowledg transfer can be accomplished by reducing the distribution difference between the source domain an the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can b seen that the label space of the source domain (a dataset has already been trained) and target domai (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belong to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the sourc domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN ha already trained the weights for automatic identification of about 80 categories, such as aircraft an pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of open pit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained proce generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features wer finally generated. On the basis of source domain weight, we used transfer learning to find th category with the closest characteristics of open-pit mines, and the generalization performance of th model was greatly improved. sfer learning contains two elements: domains and tasks [42]. The domains ample feature space Ӽ and the probability distribution p(x) of the overall x, . For a domain U = {Ӽ, φ(Ӽ)}, we defined ng two elements: the label space У and the function f, where У is a discrete iform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the d the label space of Y, where f: Ӽ У can be obtained from the training Y. From the view of probability, f can be thought of as the conditional task can be represented as Г = {У, p(y|x)}. The two tasks are different, if e У has at least one difference when compared to conditional probability domain-labeled data D S = {x S i ,y S i } i=1 n S and target domain non-labeled data an finally minimize the binary loss function ∑ l f T  . As the Figure 5 shows, the pre-trained process rough the fully connected layer, and in the Softmax layer, 80 features were basis of source domain weight, we used transfer learning to find the haracteristics of open-pit mines, and the generalization performance of the ed. Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall x, where the sample set X = (x 1 , . For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: Ӽ У can be obtained from the training are fixed, we can finally minimize the binary loss function ∑ l f to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of openpit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved.  Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall x, where the sample set X = (x 1 ,

Experiment Data
. For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: Ӽ У can be obtained from the training sample {x i ,y i }, x i ∈ X,y i ∈ Y. From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if and only if the label space У has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data D S = {x S i ,y S i } i=1 n S and target domain non-labeled data n T are fixed, we can finally minimize the binary loss function ∑ l f T to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of open-pit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved.

Data Source and Identification Index
Data sources include image data and auxiliary data. Image data are mainly used for extraction and verification of open-pit mines, while auxiliary data are mainly used to show the spatial can be obtained from the training sample {x i , y i }, x i ∈ X,y i ∈ Y. From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = { Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 19 Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall x, where the sample set X = (x 1 , . For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: Ӽ У can be obtained from the training sample {x i ,y i }, x i ∈ X,y i ∈ Y. From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if and only if the label space У has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data D S = {x S i ,y S i } i=1 n S and target domain non-labeled data n T are fixed, we can finally minimize the binary loss function ∑ l f T to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of open-pit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved.  ly, transfer learning contains two elements: domains and tasks [42]. The domains ents: sample feature space Ӽ and the probability distribution p(x) of the overall x,

Experiment Data
. For a domain U = {Ӽ, φ(Ӽ)}, we defined ontaining two elements: the label space У and the function f, where У is a discrete ith uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the is called the label space of Y, where f: Ӽ У can be obtained from the training X,y i ∈ Y. From the view of probability, f can be thought of as the conditional and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if el space У has at least one difference when compared to conditional probability source domain-labeled data D S = {x S i ,y S i } i=1 n S and target domain non-labeled data ed, we can finally minimize the binary loss function ∑ l f T to improve of the target task. In the absence of the target domain calibration data, knowledge mplished by reducing the distribution difference between the source domain and 43].
all number of open-pit mine manual labeling target datasets, the Mask R-CNN pre-trained on the dataset firstly to prevent the model from overfitting. It can be pace of the source domain (a dataset has already been trained) and target domain t mines dataset) are different, and the transfer learning of open-pit mines belongs ransfer learning. Thus, we must make full use of the Mask R-CNN in the source he target recognition of the new open-pit mine dataset [44]. Mask R-CNN has weights for automatic identification of about 80 categories, such as aircraft and ecting pre-trained ResNet101 to initialize the model, the transfer values of open-ed in a dataset (source domain). As the Figure 5 shows, the pre-trained process tures through the fully connected layer, and in the Softmax layer, 80 features were On the basis of source domain weight, we used transfer learning to find the losest characteristics of open-pit mines, and the generalization performance of the improved. a nd Identification Index has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data D S = {x S i , y S i } n S i=1 . and target domain non-labeled data are fixed, we can finally minimize the binary loss function i=1 l f T (x T i ), y T i to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of open-pit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved. Mathematically, transfer learning contains two elements: domains and tasks [42]. The domains U contains two elements: sample feature space Ӽ and the probability distribution p(x) of the overall x, where the sample set X = (x 1 , . For a domain U = {Ӽ, φ(Ӽ)}, we defined the task Г on U as containing two elements: the label space У and the function f, where У is a discrete random variable with uniform distribution. The set Y = (y 1 ,y 2 , … ,y n ) ∈ У is the sample of the population, and У is called the label space of Y, where f: ӼУ can be obtained from the training sample {x i ,y i }, x i ∈ X,y i ∈ Y. From the view of probability, f can be thought of as the conditional probability p(y|x), and the task can be represented as Г = {У, p(y|x)}. The two tasks are different, if and only if the label space У has at least one difference when compared to conditional probability p(y|x). When the source domain-labeled data n T are fixed, we can finally minimize the binary loss function to improve the learning effect of the target task. In the absence of the target domain calibration data, knowledge transfer can be accomplished by reducing the distribution difference between the source domain and the target domain [43].
Due to the small number of open-pit mine manual labeling target datasets, the Mask R-CNN network should be pre-trained on the dataset firstly to prevent the model from overfitting. It can be seen that the label space of the source domain (a dataset has already been trained) and target domain (untrained open-pit mines dataset) are different, and the transfer learning of open-pit mines belongs to tag-differential transfer learning. Thus, we must make full use of the Mask R-CNN in the source domain to guide the target recognition of the new open-pit mine dataset [44]. Mask R-CNN has already trained the weights for automatic identification of about 80 categories, such as aircraft and pedestrians. By selecting pre-trained ResNet101 to initialize the model, the transfer values of openpit mines were saved in a dataset (source domain). As the Figure 5 shows, the pre-trained process generated 1024 features through the fully connected layer, and in the Softmax layer, 80 features were finally generated. On the basis of source domain weight, we used transfer learning to find the category with the closest characteristics of open-pit mines, and the generalization performance of the model was greatly improved.

Data Source and Identification Index
Data sources include image data and auxiliary data. Image data are mainly used for extraction and verification of open-pit mines, while auxiliary data are mainly used to show the spatial

Data Source and Identification Index
Data sources include image data and auxiliary data. Image data are mainly used for extraction and verification of open-pit mines, while auxiliary data are mainly used to show the spatial distribution characteristics of open-pit mines. We used Gaofen-1 and Gaofen-2 satellite images to monitor the geological environment of open-pit mines, and for areas that were not covered or had unclear images, we supplemented the image data with Google Earth satellite images of the same period. The satellite image data time is from January to August 2019. All the satellite images have been preprocessed by radiometric calibration, atmospheric correction, orthographic correction and image fusion. According to the open-pit mines distribution of mineral resources in Hubei Province, the identification indexes of main open-pit mines are established according to the mining degree and spatial texture characteristics of surface [45]. The interpretation marks established in the open-pit mines were verified by field investigation, and the interpretation signs are summarized as Figure 6. The open-pit mine's surface is mainly gray or gray-black with a simple texture and regional block, which is in great contrast with the background forest vegetation that a has rough texture and reddish brown reflection.

Sample Database
The sample database of IMRT is mainly divided into user samples and model samples. User samples are mainly extracted from field investigation or visual interpretation of remote sensing images, which are the basis of model samples. Based on the user samples, remote sensing images are converted into a sample format, which is required for the model training and testing through a series of data processing. The experimental source domain data are in the COCO dataset, which has a total of 80 samples with labeled categories. The target domain data are the non-labeled open-pit mines

Training Environment and Function Analysis
This experiment was conducted in a Windows 64-bit operating system with 16GB running memory. A quad-core Intel CORE I5 9th Gen CPU was configured, and a GeForce GTX 1650TI graphics card was equipped. The running environment of this model is python3.5.6, and the framework is TensorFlow. The training of the IMRT model mainly depends on the libraries, such as TensorFlow, Keras, OpenCV and PIL. TensorFlow uses a data flow diagram to design the computational flow, which enables the users to train large-scale neural networks with parallel operations. Keras is an advanced library for rapidly prototyping deep learning, and it has an excellent expansibility for TensorFlow. PIL mainly obtains the color of the pictures by comparing them with the color library. On the other hand, OpenCV recognizes colors by distinguishing the HSV (Hue, Saturation, Value) components of the picture strictly. These libraries provide a good foundation for us to obtain the low-lever and the high-level features of open-pit mine images better.
The size of sample datasets was set to 600 × 600, and the true-color images (experimental samples) and the false-color images (reference samples) were trained, respectively. The total number of experimental samples is 600, and there were 400 reference samples. The experimental samples and reference samples were trained in batches. With an initial learning rate of 10 −3 [48], the network had a total of 100,000 training epochs. During the experiment, the COCO dataset divided all the data into a training set and a test set, among which the validation set is included in the training set.
By controlling the constant of batch size, we planned to analyze the accuracy changes in training and validation accuracy by adjusting the ratio of the training set and validation set. After that, we controlled the ratio between the training set and the validation set constant again to obtain the influence of batch size changes on the training and validation accuracy. As shown in Figure 8, the validation accuracy of the true-color image with a ratio of 70%:30% was lower than that with a ratio of 80%:20%, however, the training accuracy of the true-color image as well as the training and

Training Environment and Function Analysis
This experiment was conducted in a Windows 64-bit operating system with 16GB running memory. A quad-core Intel CORE I5 9th Gen CPU was configured, and a GeForce GTX 1650TI graphics card was equipped. The running environment of this model is python3.5.6, and the framework is TensorFlow. The training of the IMRT model mainly depends on the libraries, such as TensorFlow, Keras, OpenCV and PIL. TensorFlow uses a data flow diagram to design the computational flow, which enables the users to train large-scale neural networks with parallel operations. Keras is an advanced library for rapidly prototyping deep learning, and it has an excellent expansibility for TensorFlow. PIL mainly obtains the color of the pictures by comparing them with the color library. On the other hand, OpenCV recognizes colors by distinguishing the HSV (Hue, Saturation, Value) components of the picture strictly. These libraries provide a good foundation for us to obtain the low-lever and the high-level features of open-pit mine images better.
The size of sample datasets was set to 600 × 600, and the true-color images (experimental samples) and the false-color images (reference samples) were trained, respectively. The total number of experimental samples is 600, and there were 400 reference samples. The experimental samples and reference samples were trained in batches. With an initial learning rate of 10 −3 [48], the network had a total of 100,000 training epochs. During the experiment, the COCO dataset divided all the data into a training set and a test set, among which the validation set is included in the training set.
By controlling the constant of batch size, we planned to analyze the accuracy changes in training and validation accuracy by adjusting the ratio of the training set and validation set. After that, we controlled the ratio between the training set and the validation set constant again to obtain the influence of batch size changes on the training and validation accuracy. As shown in Figure 8, the validation accuracy of the true-color image with a ratio of 70%:30% was lower than that with a ratio of 80%:20%, however, the training accuracy of the true-color image as well as the training and validation accuracies of the false-color image were higher than other ratios. When the ratio of the training set and validation set remains unchanged in the process of batch size increasing, the accuracy of the true-color images and false-color images presents an overall trend of increasing and then regional stability. As shown in Figure 9, when the batch size is 1000, there is an advantage. Therefore, the ratio of the training set and validation set is 70%:30%, and the batch size is 1000 to participate in the open-pit mine identification.

Accuracy Evaluation
The evaluation of open-pit mine extraction accuracy is mainly carried out in two ways: pixel-based evaluation and object-based evaluation. The pixel-based evaluation methods evaluate the extracted pixels and aim to reflect the consistency of the extracted results in geometric accuracy and shape similarity. The object-based evaluation methods can avoid the pixel bias error. With the purpose of analyzing the potential causes of extraction errors, the evaluation results can be correlated with the parameters by quantifying the number of extracted targets. In order to solve the problems such as a fuzzy boundary or complex internal structure, in this paper, we used the two accuracy evaluation methods to evaluate the extraction results.
The evaluation indexes based on pixel are mainly composed of pixel accuracy (PA), comprehensive evaluation index (F1) and Kappa coefficient [49]. PA is an evaluation index to calculate the matching proportion of the predicted pixel values and the real pixel values. The higher the PA, the higher matching degree of the predicted value and the real value. F1 can be regarded as the binary classification problem of open-pit mine targets and backgrounds. Kappa coefficient represents the coincidence degree between the classified image and the reference image, and it is an objective evaluation standard to test their consistency.
where p o is the number of the correctly classified samples divided by the total number of samples, and p e is the number of the misclassified samples divided by the total number of samples.
In terms of object-based evaluation, Precision, Recall, FalseAlarm and MissingAlarm [50][51][52] 5. Discussion Figure 10 compares the intelligent extraction model with traditional extraction methods of multi-source remote sensing. The extraction results of SVM and MLE have better integrity on the whole, but these traditional extraction methods are mainly affected by the surface objects with high reflectivity, such as bare ground and road, so it is easy for them to be misclassified. Faster R-CNN can locate and classify the target on the spatial position of each open-pit mines accurately, but it lacks the boundary mask information. Compared with the three methods, the IMRT model has better performance in the comprehensive locating boundary precision, the fragmentation degree and the integrity of the extraction boundary of the extraction results. The accuracy evaluation method described above is used to quantify the extraction results of IMRT, faster R-CNN, SVM and MLE models. The pixel-based accuracy evaluation results are shown in Table 1, and the object-based accuracy evaluation results are shown in Table 2. The comparison results are as follows:

Open-Pit Mine Identification Results
In true-color images, SVM and MLE extracted the pixels belonging to the supervised classification characteristics of open-pit mines as much as possible, but there were also obvious errors in the extraction results. In Figure 10A, the two traditional extraction methods mistakenly divided the marginal road into an open-pit mine. As shown in Figure 10C, farmland and other land types were classified as open-pit mines. During the influence of spectral characteristics, the open-pit mine extraction of the MLE classification was severely broken and low in accuracy. The extraction results of false-color images by all methods performed well. However, the deep learning model can extract the complete boundary information of block regions. To some extent, the undeveloped areas in the center of open-pit mines also participated in the overall target extraction, so the traditional model performed a little better than the deep learning method, as shown in Figure 10B. Faster R-CNN had a relatively good identification ability for open-pit mines, but for large open-pit mines, the selection area of the identification box was too large. As shown in Figure 10D, when many targets are in the area, it is difficult to truly locate the open-pit mine targets in the image. Compared with the traditional extraction method, the IMRT model was able to extract the complete structures and obvious edge features of small open-pit mines, and it did not cause leakage or misclassification. As shown in Figure 10D, because the false-color image eliminates the influence of water vapor, the extraction effect of open-pit mines with the false-color image is even better than the true-color image. That is to say, the extraction results obtained by IMRT are more consistent with the real surface values.
In terms of pixel-based evaluation, among the two traditional extraction methods, the results of SVM have a better performance, the index of F1 is 0.7148, and the Kappa coefficient is 0.6943, but the PA is slightly lower than that of the MLE, which is 0.9320. For the extraction results of the IMRT model and the faster R-CNN model, the evaluation indexes of the IMRT model are higher than the other one. However, the lowest value of accuracy index in the deep learning method is higher than the highest one of the traditional method, indicating that compared with the traditional method, deep learning has more advantages in open-pit mine recognition. The IMRT model has the best performance in PA and Kappa coefficient, with values of 0.9718 and 0.8251, respectively.

Open-Pit Mine Dynamic Monitoring
The    To sum up, for the multi-source remote sensing image identification of open-pit mines, the deep learning identification method is better than the traditional target identification method in precision and effect. To be specific, the Precision, F1 and FalseAlarm of the faster R-CNN model are the best, and IMRT is better in PA, Kappa coefficient, Recall and MissingAlarm, which shows that both the IMRT model and the faster R-CNN deep learning model have a good effect on the target identification of open-pit mines. Compared with faster R-CNN, IMRT has a better performance on MissingAlarm, which proves that this model has a higher accuracy of identification. Thus, IMRT is more suitable for the identification of open-pit mines with multi-source remote sensing images.

Open-Pit Mine Dynamic Monitoring
The

Open-Pit Mine Dynamic Monitoring
The

Assessment of Mine Environment Damages
The characteristics and intensity of mines' geological environment problems are closely related to mines' geological environment background and topographical landscape damages. Therefore, the damages rate of the topographical landscape (DROTL) is an important index to demonstrate the

Assessment of Mine Environment Damages
The characteristics and intensity of mines' geological environment problems are closely related to mines' geological environment background and topographical landscape damages. Therefore, the damages rate of the topographical landscape (DROTL) is an important index to demonstrate the influence of mines' geological environments. In this paper, three levels are used to evaluate the impact degree of topographical landscapes. Level I represents the DROTL when it is above 40%; level II represents the DROTL when it is between 20% and 40%; and level III represents the DROTL when it is less than 20%. The evaluation formula is as follows: where U k is the topographical landscape damaged area and U m is the open-pit mining area. The damages of topographical landscape are evaluated with the recognition results of IMRT and the remote sensing interpretation results. As shown in Figure 13, the total number of key mining areas in Shennongjia is two, representing the smallest number. As we can see, the DROTL level of all the key mining areas is III, showing that the damage degree of this area is relatively slight. The maximum number of key mining area is 24 in Yichang, and the DROTL levels are mostly III, with only 1 for level I. This indicates that the mines in this region are in reasonable exploitation. The topographical landscapes in Huangshi, Xianning and Jingmen are seriously damaged, and with 75.0%, 68.8% and 53.8% of the topographical landscape destruction in their regions, respectively, they are represented by level I. A total of 36.2% of topographical landscape damage to the key mining area reached level I in Hubei Province, which shows that mineral exploitation has caused serious damage to the topographical landscape.
In the remote sensing interpretation of mines' geological environments, the level of land occupation and destruction (Table 3) is mainly evaluated through the degree of occupation and destruction of farmland, forest land (or grassland) and unused land. As shown in Table 4, in the evaluation, there are 45 key mining areas in level I, 37 key mining areas in level II, 44 key mining areas in level III, and 4 key mining areas without mining land occupation or destruction. Level I of land occupation and destruction represents a total of 34.62%, which is close to the amount of key mining areas facing topographical landscape damage.  Large-scale and high-intensity development of mines directly lead to geological environment problems such as destruction of water resource balance and reduction of vegetation coverage and land occupation. It is necessary for departments to exercise their functions, regulate the mineral resource exploitation activities and ensure the orderly exploitation of mineral resources. Through mine monitoring, it is of great practical significance to provide accurate and reliable data for departments and realize the coordinated development of mines and the ecological environment. land occupation and destruction represents a total of 34.62%, which is close to the amount of key mining areas facing topographical landscape damage.
Large-scale and high-intensity development of mines directly lead to geological environment problems such as destruction of water resource balance and reduction of vegetation coverage and land occupation. It is necessary for departments to exercise their functions, regulate the mineral resource exploitation activities and ensure the orderly exploitation of mineral resources. Through mine monitoring, it is of great practical significance to provide accurate and reliable data for departments and realize the coordinated development of mines and the ecological environment.    Figure 13. The damages rate of the topographical landscape of cities in Hubei.

Conclusions
Deep learning provides an approach to learn effective features automatically from a training set, and it can perform unsupervised characteristic learning from an enormous original image dataset. In view of the low extraction accuracy, low efficiency and low degree of automation of remote sensing images of mines by traditional methods, we proposed an open-pit mine extraction model bases on Improved Mask R-CNN and Transfer learning (IMRT), constructed a set of multi-source mine sample databases consisting of Gaofen-1, Gaofen-2 and Google Earth satellite images with a resolution of two meters, and designed an automatic batch production process of open-pit mine targets in order to automatically identify and dynamically monitor open-pit mines. The main conclusions are as follows: (1) The experiment results show that the IMRT model is superior to traditional methods in precision, generalization, automation and efficiency. At the same time, this model has a good applicability for Gaofen-1, Gaofen-2 and Google Earth satellite images, which expands the data source of open-pit mines and enhances the practicability of model. Funding: This research received no external funding.