Mineral Identiﬁcation Based on Deep Learning Using Image Luminance Equalization

: Mineral identiﬁcation is an important part of geological research. Traditional mineral identiﬁcation methods heavily rely on the identiﬁcation ability of the identiﬁer and external instruments, and therefore require expensive labor expenditures and equipment capabilities. Deep learning-based mineral identiﬁcation brings a new solution to the problem, which not only saves labor costs, but also reduces identiﬁcation errors. However, the accuracy of existing recognition efforts is often affected by various factors such as Mohs hardness, color, picture scale, and especially light intensity. To reduce the impact of light intensity on recognition accuracy, we propose an efﬁcient deep learning-based mineral recognition method using the luminance equalization algorithm. In this paper, we ﬁrst propose a new algorithm combining histogram equalization (HE) and the Laplace algorithm, and use this algorithm to process the luminance of the identiﬁed samples, and ﬁnally use the YOLOv5 model to identify the samples. The experimental results show that our method achieves 95.6% accuracy for the identiﬁcation of 50 common minerals, achieving a luminance equalization-based deep learning mineral identiﬁcation method.


Introduction
Mineral identification occupies an important position in geological research. Traditional geological mineral identification methods mainly identify minerals by the naked eye or observation instruments. Naked-eye identification heavily relies on the discriminatory ability of the identifier. Observations through instruments, such as the identification of clay minerals and hydrocarbons by using near-infrared spectroscopy [1], and mineral identification and mineral mapping by imaging spectroscopy [2], require special identification instruments. Both methods are labor intensive and their accuracy is often influenced by the experience and ability level of the identifier. In recent years, researchers have used deep learning techniques to reduce these effects, for example, Porwal et al. [3] used artificial neural networks in mineral potential mapping, and Li et al. [4] used convolutional neural networks based on geological big data for mineral prospect prediction. In mineral identification, many works also use intelligent algorithms, and these methods can be classified into three categories according to the test method and the type of data obtained: identification based on chemical composition analysis; identification based on spectral analysis; and identification based on optical pictures. The main types of data involved in identification methods based on chemical composition analysis [5] are energy scattering spectroscopy (EDS) [6], electron probe (EPMA) [7], and laser-induced breakdown spectroscopy (LIBS) [8]. The identification method [9] based on spectral analysis is the most reliable method for mineral identification, but it requires expensive testing instruments and is therefore difficult to be widely promoted. The optical picture-based identification method is the most common identification method, which can be performed by microscopic images [10][11][12][13][14][15] and ordinary photographs [16][17][18]. As shown in Table 1, we summarized the different current mineral identification methods.
All of the above studies enable the identification of minerals, but usually only for a small number of species of minerals, and also lack stable and excellent identification accuracy. In addition, one difficulty in using photo-based mineral identification is that mineral photos in the field are often affected by light intensity as well as shadows, resulting in photos with different photometric details, which can easily lead to errors in identification. For example, the same mineral may be pictured in two colors with strong and weak light intensity, and color is one of the important features used for mineral identification. Therefore, it is difficult to achieve high accuracy with a direct identification of photos taken with cell phones or cameras. Studies using image enhancement techniques to eliminate the effects of extraneous factors on photographs have emerged and demonstrated utility in many other applications. For example, Zhi et al. [19] investigated a new method to improve the change detection accuracy of synthetic aperture radar (SAR) remote sensing images by combining image enhancement algorithms based on wavelet and spatial domains and power law. In addition, regarding the effect of luminance, Xiao et al. [20] relied on Retinex theory and used a two-step approach combining candidate regions and object locations to achieve object recognition in low luminance situations. Xiong et al. [21] achieved the identification of ripe litchi under different lighting conditions based on Retinex image enhancement and improved the accuracy of image identification. In more detail, we compare the accuracy of mineral identification approaches based on image type later on, as shown in Section 4.3. Table 1. Comparison of different mineral identification methods.

Methods
Studies Characteristics [1] Wide range of applications. Instrument Observation [2] Spectrometer with very high pixels. [6] Fast data acquisition. [7] High accuracy of chemical element identification. Chemical Composition Analysis [8] Low sample loss.
Spectral Analysis [9] Reliable and has international datasets.
Micro-optical Picture Analysis [15] Good performance in petrographic thin sections. [16] Combined with mineral hardness. [17] High accuracy of malachite and blue copper mineral identification. Traditional Image Analysis [18] Be able to distinguish the formation minerals of different granite types.
There are many models for object detection, such as EfficientDet [22] and YOLOv5. YOLOv5 (as shown in Section 3) extends from YOLOv4 [23], which is one of the most effective object detection models available. Yolov5 has been used in many practical applications such as face recognition [24] and aircraft target detection [25].
In this paper, we combine image enhancement techniques with YOLOv5 for mineral detection to address the effects of illumination factors on image chromatic aberrations. With this method, we achieved the accurate identification of mineral images without relying on specialized instruments for obtaining identification data. In addition, our method enables the more accurate identification of samples with poor lighting conditions (too bright or too dark) than other efforts to identify minerals based on image data. Moreover, our work expands the range of mineral species that can be identified to a greater extent than other works. Our detailed contributions are shown below.

•
We first propose a novel image enhancement algorithm, one which combines histogram equalization (HE) and the Laplace algorithm. In subsequent experiments, the algorithm shows powerful results.
• We achieved the an efficient identification of 50 minerals, which is a significant expansion of the number of mineral species identified compared to the existing works. • Experiments show that our method achieves 95.6% accuracy in mineral identification, surpassing existing mineral identification methods.
The content of this paper is shown as follows. We introduce a novel image enhancement approach in Section 2, combining histogram equalization (HE) and the Laplace algorithm. In Section 3, we focus on the structure of the model we use and briefly describe the training environment and process. In Section 4, we show the results of our experiments and compare them with other methods, in addition to evaluating the model effectiveness using objective evaluation metrics. In Section 5, we conclude the article and propose future work.

Histogram Equalization
Histogram equalization [26] is an important method for the statistical analysis of the image grayscale distribution and is useful for images where both the background and foreground are too bright or too dark. This method enables more detail in overexposed or underexposed [27] photographs. The traditional histogram equalization method uses the cumulative distribution function of the probability of each gray level of the image as the transformation function, and according to this transformation relationship, an image with uniformly distributed gray probability density can be obtained. Its cumulative distribution function can be expressed as: where r j is the normalized gray level before the transformation, T(r k ) is the transformation function, s k is the normalized gray level after the transformation, n j is the number of pixels with the k-th gray level in the original image, n is the total number of pixels in the image, and p r (r j ) is the probability of taking the k-th gray level in the image before the transformation. However, due to its unselective data processing, it may increase the contrast of background noise and decrease the contrast of useful signals. In addition, the gray level of the transformed image is reduced and some details may be lost. Some images, such as histograms with peaks, are processed to show the unnatural over-enhancement of contrast.

Laplace Operator Image Enhancement
The Laplace operator [28] image enhancement is widely used in image processing as a second-order differential algorithm commonly used in the field of digital image processing. It causes the gray contrast to be enhanced, thus making the blurred image sharper. The essence of image blurring is that the image is subject to averaging or integration operations, so the image can be inverse operated. For example, differential operations can highlight image details and make the image sharper. Since Laplace is a differential operator, its application enhances the areas of sudden gray changes in the image and attenuates the areas of slow gray changes. Therefore, the Laplace operator can be selected to sharpen the original image to produce an image describing the abrupt grayscale changes, and then the sharpened image is produced by superimposing the Laplace image with the original image. The basic method of Laplace sharpening can be represented by the following equation.
where f (x, y) denotes the two-dimensional image, ∇ 2 f (x, y) denotes its Laplace operator, and t is the neighborhood center comparison coefficient. This simple sharpening method produces the effect of a Laplace sharpening process while preserving the background information. By superimposing the original image to the processing result of the Laplace transform, we can preserve each gray value in the image so that the contrast at the gray abrupt change is enhanced. The final outcome is to bring out small details in the image while preserving the image background. However, this tends to produce a double response to image edges, which will affect the experimental results.

A New Algorithm Based on HELaplace
In order to overcome the shortcomings of the aforementioned classical histogram and Laplace algorithms, and considering the characteristics of using image fusion, this paper proposes a new algorithm for image enhancement by HELaplace. In this paper, we combine the idea of image fusion by first processing the images with histogram equalization algorithm and Laplace operator, respectively, and then fusing the processed images into a new image after weighting the average by a certain proportion. This approach demonstrates a good enhancement effect within a certain percentage range.
We convert the input image G into YCrCb (a kind of color coding method) [29] space, and then separate the YCrCb image channels and equalize the image histogram using the CLAHE [30] algorithm, which can improve the details of the image while avoiding the problem of the excessive contrast enhancement of the image. The processed channel and the unprocessed channel are combined and then converted to RGB image A. The image is then sharpened and enhanced using the 8-neighborhood Laplace operator with center 5 and image convolution, and the enhanced image is noted as B. The weighted average image fusion algorithm can be expressed as: where the input image A(i, j) represents the illumination function of the image after HE algorithm processing, B(i, j) represents the illumination function of the image after Laplace processing, and the output image F(i, j) represents the fused image. The size of the image is 256 × 256 pixels, i and j are the coordinates of a pixel in the image, and i, j ∈ [256, 256], A, B ∈ [0, 255]. The algorithm description of HELaplace is shown in Algorithm 1. We apply the HELaplace algorithm to the same image and the result is shown in Figure 1. By comparison, we can see that the image is better after the HELaplace algorithm.

Description of Our Model
The main procedure of the experiment is shown in Figure 2. First, we collect data on a variety of minerals. Then, we label all the data and split the dataset into a training set and a test set. HELaplace processing is performed on the data from the test set and training set. Then, the obtained training set is used to train in a convolutional neural network through the YOLOv5 model. Finally, the classification to which each mineral picture in the test set belongs is calculated and the accuracy rate is recorded.   Figure 3 illustrates the specific structure of the YOLOv5 network. It consists of four parts: input, backbone, neck, and prediction. The input side uses Mosaic data enhancement [23] and adaptive anchor frame calculation. The backbone part uses the focus structure and the cross-stage-partial-connections (CSP) structure. The neck part uses a feature pyramid network (FPN) + pixel aggregation network (PAN)) structure. The prediction part uses non-maximal suppression (NMS) to filter the targets, so it has high accuracy. As a new type of deep neural network (DNN), unlike traditional algorithms that require strict image pixel size, YOLOv5's adaptive image scaling has no requirement in terms of image size. We also modified the YOLOv5 code in the letterbox function of datasets.py to add a minimum of black borders to the adaption of the original image, reducing information redundancy and therefore greatly improving the processing speed. The CSP structure of YOLOv5s is to divide the original input into two branches and perform separate convolution operations to halve the number of channels. One branch performs the Bottleneck * N operation, then concats two branches. This allows the input and output of BottlenneckCSP to be the same size, which enables the model to learn more features. The neck of YOLOv5 has the same FPN+PAN structure as in YOLOv4. However, the convolution operation used in the neck of YOLOv4 is regular. In contrast, the CSP2 structure inspired by the CSPNet [31] design is used in the neck structure of YOLOv5 to enhance the network feature fusion and improve the identification accuracy.

Model Training
In this paper, the deep learning integrated development environment is Pycharm. Test environment: NVIDIA GTX 1060, 8G memory, Intel Core(TM) i7-8750H CPU, and Python 3.8 as the compiler language. The parameters we used for model training are shown in Table 2. Parameters not listed in the table are used as default values. In our experiments, we use the GLOU function [32] as our loss function. Its smaller value indicates more accurate results. The expression of its function is where p denotes the predicted positive example index and lou p is the intersection ratio of the predicted positive example frame p to the corresponding true frame. We recorded the changes in loss function GLOU values during the training process and tested the accuracy of the model on the validation set after each iteration of the training set was completed. The change in GLOU loss during the training process is shown in Figure 4. It can be seen that the model converges effectively, and the GLOU loss has reached a low level after 50 epochs. According to the figure, the model achieves the best accuracy on the validation set after the 90th iteration, and the accuracy decreases after continuing the training, probably due to some overfitting.

Test Result and Discussion
To test the accuracy of our method, we selected 13,911 images from a collection of 220,057 images for testing our neural network model. After inputting one of the images into the neural network, the mineral category with the highest probability is given. We evaluate the performance of our method in terms of accuracy, and also compare it with other methods and give results.

Data
The training of mineral identification using YOLOv5 requires a large amount of data during validation and testing. The more data available for training, the more generalizable and robust the model will be, and the higher the accuracy will be. To obtain a large amount of specialized image data for a wide range of minerals, we chose to use image data from Mindat [33]. Mindat is a community-led global mineral and provenance database website and the world's largest database of mineral information. In this paper, one mineral is selected as a training representative in the database according to the mineral category criteria [33] in each mineral major category to obtain adequate category coverage. To further extend the mineral coverage categories, we expanded 26 minerals from those covered by work [16]. Therefore, the images of a total of 50 minerals were collected as experimental samples. The names of relevant minerals and the number of samples are shown in Table 3. Among them, the small numbers of samples of certain minerals are due to their rarity, which makes it difficult to obtain a large number of samples. It is worth noting that all samples of minerals in this paper are labeled according to the classification criteria of Mindat.
Since some of the images directly obtained from the website were taken under a microscope or after processing, this may have some influence on the experimental results. Therefore, we artificially removed the images that did not meet the requirements in the dataset during the collection process. We uniformly mixed each of the obtained mineral images in the ratio of 10:1:1 and separated them into a training set, a validation set, and a test set. An example of the mineral images is shown in Figure 5.

Test Result
We used the YOLOv5 neural network and HELaplace+YOLOv5 neural network to test images with too little light and too much light, respectively, and the average accuracy obtained is shown in Table 4. The test results show that the combination of the HELaplace and YOLOv5 algorithms can greatly improve the identification accuracy. Figure 6 shows the accuracy of mineral identification for all 50 categories. As we can see, except for specific minerals, all of our minerals are identified with an accuracy of more than 80%. Among them, four minerals possess relatively low accuracy due to the a small number of training samples, which include Moissanite, Nitratine, Ozocerite and Selenium. Using HELaplace in combination with YOLOv5, the accuracy of all mineral species was improved compared to the results without using HELapace, especially the identification accuracy of minerals (Azurite, Chalcopyrite, Galena, Topaz) which was improved by 10%. The main reason is that the images taken in insufficiently or excessively bright light will have chromatic aberrations due to the light, many minerals have similar shapes and textures, and the resulting chromatic aberrations make it difficult for the model to correctly identify them based on the images. After applying HELaplace, the minerals (Adularia, Magnetite, and Malachite) do not significantly improve the accuracy, which is due to the fact that these minerals themselves are too dark and less influenced by light. It can be seen from Table 4 and Figure 6 that combining HELaplace with YOLOv5 can improve the identification accuracy of most minerals.   Table 5 demonstrates the number and accuracy of identified minerals for existing mineral detection methods. In contrast to the dual-energy CT chemometric calibration method [34], our work does not require the use of instruments for the medical X-ray tomography of minerals. Compared to the method using polarized light microscopy to obtain images [10], which can only identify five minerals, our method can identify 50 species with similar accuracy. Similarly, compared to the work of Julio et al. [11] which could only distinguish between resin and quartz, we were able to differentiate more minerals and maintain a similar accuracy. Furthermore, in contrast, we do not need special instruments to obtain the picture data under the polarized light microscope, we only need to take pictures of the minerals to perform the identification. In contrast to the work of Zeng et al. [16] who used Mohs hardness and images to identify minerals, our work does not require the use of instruments to obtain Mohs hardness. It is worth noting that the aforementioned work on identification using mineral images was experimentally measured using images taken under normal lighting, and when experiments were conducted using images taken under excessively dark or excessively bright conditions, the accuracy rate would be reduced to varying degrees.

Objective Evaluation Indicators
Since it is difficult to obtain the normal illumination image corresponding to the image under abnormal illumination, for the image quality after enhancement, natural image quality evaluator (NIQE) [36] was used in this paper. NIQE is a non-reference image quality index often used to measure the quality of the image, a smaller NIQE indicating a better the quality of the measured picture. In addition, we used the lightness-order-error (LOE) [37] to evaluate the contrast of the enhanced image with the original illuminated image. LOE reflects the natural retention of the image, and a smaller value indicates that the image has a better order of luminance and therefore looks more natural. Table 6 shows the objective evaluation data of the corresponding methods in Figure 1. From the data in the table, we can see that the LOE of our algorithm is lower than that of the Laplace algorithm, and it is the lowest among all algorithms, indicating that we have the best result in maintaining the naturalness of the image. Furthermore, the NIQE value of the algorithm in this paper is the lowest among all algorithms, which indicates that the method in this paper does not produce much detail, thus blurring and color distortion to the original image.

Conclusions and Future Work
In this paper, we propose a deep learning mineral identification method based on luminance equalization. Compared with traditional mineral identification methods, we reduce the reliance on the researcher's experience and instruments. Compared with traditional mineral identification algorithms, we reduce the influence of illumination intensity on mineral identification and greatly improve the accuracy rate. In the deep learning recog-nition part, we used YOLOv5 to further improve the identification accuracy. During model selection, we used the optimized YOLOv5 to further improve the identification accuracy. In the future, more features will be introduced, such as combining the density and transparency of minerals with photos to further improve the accuracy of mineral identification. However, the identification method mentioned in this paper has some limitations: when the input picture is a mineral other than fifty minerals, the closest one among fifty minerals will be given. In the future, we will collect more mineral data to address this issue.