3.2.1. Data Pre-Processing
In this study, the preprocessing of the original geological data is divided into four steps.
(1) Screen out favorable geochemical elements for mineralization. In this study, the support vector model was used to obtain the area under curve (
AUC) value of each element, and the
ZAUC value was calculated using Formulas (1)–(3):
The random variable
ZAUC meets the standard normal distribution, and the critical value is obtained by comparing the standard normal distribution table, which is used to detect whether there is a significant difference between
AUC and 0.5. The results are shown in
Table 2. When the
ZAUC value is greater than 0.01, the critical value of 2.58 is selected; that is, Ag, Au, Sn, Cu, Ba, Sb, Hg, and Mo are selected as favorable prospecting factors.
(2) The inverse distance weight method was used to interpolate the above eight geochemical element data sets, and the corresponding element concentration contour map was obtained. This interpolation calculation experiment references [
40] and the experimental results generated
T grid points of element concentration map, where
W is the width of the image,
H is the height, and
T is the number of element distribution maps. The calculation formula of the inverse distance weighting method is shown in (4) and (5), where the Euclidean distance from the discrete point (
xi,
yi) to the grid point (
x0,
y0) is the valuation of the position,
i is the observed value at the discrete point, and
N is the number of discrete points involved in the calculation. In this study, the inverse distance weight method was applied to Ag, Au, Sn, Cu, Ba, Sb, Hg, Mo, and other elements to generate eight isoline maps of element concentration with a size of 1560 × 1560. Finally, the isoline map of element concentration, the geological layer, and the fault structure layer were superimposed with the known ore deposit layer, respectively, to generate 10 new images of geological prospecting factors, as shown in
Figure 2.
(3) This step further processes the geoimage generated in Step 2 to obtain the geoimage data set. Firstly, a sliding window is defined, and the element content of each sampling point is obtained by sliding the geographic image with the appropriate step size. After many experiments, the geographic image data set required by the cost research is generated. It is assumed that the geoinformation training set contains N samples, which are various characteristic elements and are their corresponding real labels. C is the number of geological information channels, where 0 represents “no ore” data and 1 represents “ore” data. In this experiment, 128 × 128 windows and 128 pixels were used for sliding operation on the geoinformation map to generate the geoimage data set needed for the final model training. This image data set includes 428 image data (128 × 128 × 10), among which 342 data are included in the training set, including 56 “ore” data and 286 “no ore” data. The test set contains 86 data, including 14 “ore” data and 72 “no ore” data.
(4) Enhance the number of geological images generated in the previous step with Smote. In order to effectively train the deep learning network model, this experiment enhanced the image data by adding Gaussian noise with an average value of 0 and a variance of 0.01. In this paper, according to smote algorithm [
41] and the SMOTE method [
42], we gave a sample extension to the original geoscience image data set, and enhanced the completeness of the deep learning training model. Through this step, the final generated data set includes 654 data (128 × 128 × 10), among which 524 data are generated in the training set, including 224 “ore” data and 300 “no ore” data. The test set has 130 data, including 56 “ore” data and 74 “no ore” data. After data preprocessing in these four steps, the geoimage data set generated can be used as the input data of the MFAF in this study.
3.2.2. Multiscale Feature Attention Framework (MFAF)
The framework of MFAF research is shown in
Figure 3. It mainly consists of two parts: multiscale feature channel attention net (MFCA-Net) and convolution spatial attention net (CSA-Net). MFCA-Net uses the set of expansion coefficients α = {α1, α2, α3, …, αn} and the channel attention mechanism squeeze and excitation block (SE-Net) module. The framework consists of three steps: (1) The expansion coefficient set α is used to generate convolution kernels of different scales, and the feature maps of different scales are obtained to solve the question of a small number of known deposits in the study area and to provide data support for the convolution operation to extract more and more detailed feature information in this area. (2) The feature image generated after expansion convolution will pass through the channel attention module, which comprises both compression and excitation processes. In the compression stage, the global compression feature quantity is obtained via global pooling on the feature graph. At the bottleneck phase, the weights of each channel in the feature map are obtained through a bottleneck structure that is fully connected at two layers, and the weighted feature map is used as input to the next layer network. Therefore, the extracted features are re-calibrated, and different weight values are assigned to the features in different channels
, so as to solve the problem that different geochemical elements have different influence degrees on mineralization. (3) The CSA-Net module mainly includes a series of convolution operations. Considering that the element contents of different spatial locations in the feature map have different influences on mineralization, the last layer of convolution is added with the spatial attention module to assign different weight coefficients to the features of different locations. In order to reduce the number of training parameters and accelerate the model convergence, each channel was finally classified by the shared full connection layer.
3.2.3. Multi-Scale Feature Fusion
With the deepening of the number of layers in the deep learning network, the semantic expression ability is enhanced; however, this also reduces the resolution of the image, and many detailed features become more and more fuzzy after the convolution operation of the multi-layer network. The traditional target detection model results in the reduction of the effective information of small targets on the last feature map. In this paper, multi-scale feature fusion is used to solve this problem. Instead of using the feature map of the last layer for detection, multi-layer features are selected for fusion and then detection so as to obtain images of multiple scales, and then the classification algorithm is adopted to realize the task of image classification [
43,
44]. The multi-scale feature fusion diagram is shown in
Figure 4.
In order to solve the problem of few mineral points and few ore-bearing label images in the study area, geological layers, fracture images, and a variety of geochemical elements can be used to generate geological image data of different scales, and image data sets can be generated by asynchronously long sliding windows to increase the diversity of data samples. For the input geological data bases set
, the expansion coefficient set
α = {
α1,
α2,
α3, …,
αn} generates convolution of different sizes to check the convolution operation and generates an image of multi-scale features. Specifically, we use the convolution operation of the convolution kernel
and the set α of expansion coefficients to obtain the multi-scale feature graph
F = {
F1,
F2, …,
Fn}, where the
ith feature is
. The specific generation Formula (6) is shown as follows: Where
xi represents the
ith feature element,
Mi represents the convolution weight corresponding to the generation of the
ith feature graph,
αi represents the expansion convolution coefficient corresponding to the generation of the
ith feature graph, and
r represents the convolution channel.
3.2.4. Channel Attention
In the research of deep learning algorithms, the channel attention (CA) mechanism is a resource allocation mechanism that can make the model training of neural network focus on the important features of the image and improve the efficiency and accuracy of the neural network. Each channel of a feature map is a feature detector, and the channel attention mechanism pays different attention to different image channels [
45]. The channel attention module is shown in
Figure 5. For image input features, firstly, the maximum pooling and average pooling algorithms are used simultaneously, and transformation results are then obtained through several Multilayer perceptron and MLP layers, and, finally, the transformation results are respectively applied to the two channels so that the sigmoid function can obtain the attention results of the channels. The calculation procedure is shown in Formula (7). In Formula (7), Mc is the channel attention result, F is the input feature, σ is the sigmoid function, MLP is the multilayer perceptron, AvgPool is the average pooling, and MaxPool is the maximum pooling.
In this paper, mineral geological images and geochemical element images are used to study the identification and prediction of ore deposits. Different data sources contain a variety of geological prospecting factors, and different geological prospecting factors have different degrees of influence on ore deposits. Therefore, in order to reduce the influence of human factors, this study adopted a channel attention module in the process of training data. According to the value of loss in the experiment, the weight values on different channels are adjusted reversely and dynamically, the weight values of important features are increased, the importance of features with little influence is suppressed, and the representational power of our network training model is improved. By assigning the optimal weight value to each channel, the convergence of the network model is accelerated, and the accuracy of the deposit prospecting prediction is improved. In the geological image data, the corresponding degree of labeled mineral point image and different geological prospecting factors can correspond to the weight coefficient.
3.2.5. Spatial Attention
In the study of using artificial intelligence technology to analyze image data, different areas in the image have different contribution degrees to the task, and we need to pay the most attention to the areas related to the task. Spatial attention (SA) can be regarded as an adaptive spatial region selection mechanism: where to focus [
46]. In the spatial attention module of
Figure 6, we firstly reduce the dimension of the channel itself, obtain the results of maximum pooling and mean pooling, respectively, assemble them into a feature map, and then use a convolution layer for learning. The calculation procedure is shown in Formula (8). In Formula (8), Ms is spatial attention as a result, F for the input characteristics, σ as sigmoid function,
f7 × 7 is 7 × 7 size of convolution kernels, AvgPool for average pooling, and MaxPool for maximum pool.
In this paper, considering the difference of geological prospecting factors in different spatial locations on mineralization of ore deposits, the spatial attention module is adopted in the model. The spatial attention module can use spatial attention as a supplement to the convolution operation, which enhances or supposes image features at different spatial locations.
3.2.6. Fully Connected Layer, Softmax, and Voting
In the
CNN structure, after image data pass through multiple convolution layers and pooling layers, one or more fully connected layers (
FC) are connected, and each neuron in the
FC layer is fully connected with all the neurons in the previous layer. It can integrate different types of information in the convolution layer and the pooling layer. In order to improve the performance of the
CNN network, the excitation function of each neuron in the fully connected layer generally adopts the ReLu function [
47]. After passing through the fully connected layer, we used softmax to classify the image data and used the voting mechanism to predict and classify the positions with and without ore in the study area. Through the above steps, we could improve the overall prediction accuracy of the network model.
In this paper, the characteristics of input geological prospecting factors
through expansion convolution of different coefficients can obtain the set
of its feature map, where
. Our network model firstly extracts features from set
F, then performs global pooling operations, allocates different weights to different positions through the spatial attention module, and finally obtains its output results through the fully connected layer. The optimal cross entropy method was adopted to optimize the network structure of the classification model, as shown in Formula (9). In Formula (9),
is the weight parameter in SACNet,
yi is the label value of the
i geological prospecting factor feature, and loss(*) is the calculation of cross entropy loss after Softmax is activated.
After the above calculation, we vote on the probability distribution calculated using softmax for each channel network and obtain the final prediction result. The probability distribution after using softmax is,
,
, where
is the probability predicted to be “ore free” and
is the probability predicted to be “ore present”. We obtain the prediction results of each channel network through Formula (10) and determine the final prediction results through the vote of Formula (11).