Mineral Identiﬁcation Based on Deep Learning That Combines Image and Mohs Hardness

: Mineral identiﬁcation is an important part of geological analysis. Traditional identiﬁcation methods rely on either the experience of the appraisers or the various measuring instruments, and the methods are either easily inﬂuenced by appraisers’ experience or require too much work. To solve the above problems, there are studies using image recognition and intelligent algorithms to identify minerals. However, current studies cannot identify many minerals, and the accuracy is low. To increase the number of identiﬁed minerals and accuracy, we propose a method that uses both mineral photo images and the Mohs hardness in deep neural networks to identify the minerals. The experimental results showed that the method can reach 90.6% top-1 accuracy and 99.6% top-5 accuracy for 36 common minerals. An app based on the model was implemented on smartphones with no need for accessing the internet and communication signals. Tested on 73 real mineral samples, the app achieved top-1 accuracy of 89% when the mineral image and hardness are both used and 71.2% when only the mineral image is used.


Introduction
Mineral identification is an important part of geological analysis. Traditional mineral identification is mainly based on visual observation or physical experiments [1]. Visual observations, such as using streak powder color, luster or HCl reactiveness, depend on the experience of the appraiser, and most physical experiments requires special instruments. Both methods require considerable labor. Deep learning has been used in geosciences to reduce this. For example, Porwal et al. [2] used artificial neural networks in mineral-potential mapping, Karimpouli et al. [3] used convolutional neural networks for coal cleat/fracture segmentation, Juliani and Ellefmo [4] used radial basis function neural networks in prospectivity mapping of mineral deposits, and Sun et al. [5] used machine learning and deep learning for modeling mineral prospectivity. In mineral identification, intelligent algorithms have also been used, and the methods can be divided into two categories: traditional identification and neural network identification. Traditional identification methods include laser-induced breakdown spectroscopy [6], color tracking [7], cascade approaches [8], optimal spherical neighborhoods [9,10] and image processing and analysis [11]. Neural network methods identify minerals by using digital features or image features. Harris and Pan [12], Thompson et al. [13] and others have used digital features. In using image features, spectral images, microscopic images, and photo images can all be used to identify the minerals [14][15][16][17][18], although they are not on the same scale. Ishikawa and Gulick [14] used spectral images to identify minerals and achieved an accuracy of 83% for six species of minerals. By using microscope images, Zhang et al. [15] identified four species of minerals with an accuracy of 90.9% and Baykan and Yılmaz [16] achieved an accuracy of 93.86% for five species of mineral Studies using photo images including that of Solar et al. [17], achieved 91% accuracy for six species of mineral, and Liu et al. [18] achieved 74.2% accuracy for 12 species of mineral. Commercial products such as IMAGO [19] and KORE GEOSYSTEMS [20] can also identify minerals automatically, but the method they use and the accuracy they can achieve are unavailable for analysis. The above studies have been able to identify rocks and minerals automatically, but either the number of minerals identified or the accuracy achieved needs to be improved. One difficulty in identifying minerals using photographs (at the macroscopic scale) is that minerals in the same species may have variable colors and shapes, while minerals in different species may have similar colors, shapes, and textures. For example, allochromatic minerals may have different colors depending on their different impurities. Therefore, it is difficult to achieve high accuracy by using only photo features. Studies that use multiple features to improve accuracy have appeared in many other applications [21,22] and have shown effectiveness. For example, Wang et al. [21] combined word-level and char-level features to extract more language information and improve the accuracy in natural language processing. Hai et al. [22] combined the original image with the grayscale image and text features in breast cancer detection, and accuracy was also improved.
Hardness is an important property of minerals, and Mohs hardness is one of the most important tests for identifying minerals [23,24]. By using portable hardness picks [24], we can easily obtain the Mohs hardness of a mineral. Different species of mineral may have very different Mohs hardness [24]. For example, hematite and sphalerite have different Mohs hardness ranges, although they have very similar photo images. The hardness of hematite is in the range of 5.0-6.5, and the hardness of sphalerite is in the range of 3.5-4.0. Therefore, the hardness of hematite and sphalerite can be used as key information to distinguish them. In this paper, hardness was combined in deep neural networks (DNN) to identify minerals with the purpose of alleviating the above difficulties based on images only. The DNN can be trained at once, which does not increase the training difficulty. Experiments on 36 species of common mineral showed that the proposed method can achieve 90.6% top-1 accuracy and 99.6% top-5 accuracy.

Data
The training, validation and testing of a DNN for mineral identification require considerable data. Here, data indicate the photo images and the hardness of the minerals. The purpose of training is to teach the neural network to learn the species of the mineral from the training photo images and the corresponding hardness. The more data used for training, the stronger the generalization ability and robustness of the model, the higher the identification accuracy and the more species. The validation determines whether the training should be stopped, based on the performance of the neural network on the validation data. The data used for validation should not be used in training. Testing the neural network determines how well the model performs on unseen mineral data, so the testing data should also not be used in training.
The mineral photo images used in this paper are from Mindat (A mineral database) [25] and obtained by using a web spider. Mindat is a mineral database with a large number of samples. The mineral photo images in the database are from all over the world and have been identified. Thirty-six species of minerals are used in the paper, as shown in Table 1.
All the obtained mineral images were cleaned manually to remove images that have no minerals, were obtained from a microscope, or do not match the label. The images of jewelry after artificial processing are also removed because our method identifies minerals in their natural state similar to those in the field. The 183,688 images obtained after cleaning are evenly mixed and separated as the training set, validation set and test set at a ratio of 10:1:1. Examples of mineral images are shown in Figure 1. The hardness of the real mineral can be obtained by portable Mohs hardness picks [24]. The Mohs hardness is a value between 1 and 10, and the larger the value, the harder the mineral [24]. In training our DNN, the Mohs hardness value of each image needs to be known. However, it is impossible to obtain real Mohs hardness values for more than 100,000 mineral images in the dataset, so the values are generated randomly within their ranges of Mohs hardness. Because the accuracy of the general Mohs hardness tester is usually 0.5 [24], we set the accuracy of the generated hardness value to 0.5. For example, the hardness range of sulfur is 1.5-2.5; then the hardness of each sulfur image will be generated as 1.5, 2.0 or 2.5 randomly. The hardness of the real mineral can be obtained by portable Mohs hardness picks [24]. The Mohs hardness is a value between 1 and 10, and the larger the value, the harder the mineral [24]. In training our DNN, the Mohs hardness value of each image needs to be known. However, it is impossible to obtain real Mohs hardness values for more than 100,000 mineral images in the dataset, so the values are generated randomly within their ranges of Mohs hardness. Because the accuracy of the general Mohs hardness tester is usually 0.5 [24], we set the accuracy of the generated hardness value to 0.5. For example, the hardness range of sulfur is 1.5-2.5; then the hardness of each sulfur image will be generated as 1.5, 2.0 or 2.5 randomly.

Architecture of the Neural Network
The architecture of the neural network used in this paper is shown in Figure 2, which has three parts: image feature extraction (Figure 2a

Architecture of the Neural Network
The architecture of the neural network used in this paper is shown in Figure 2 The image feature extraction in Figure 2a uses the deep convolutional neural network EfficientNet-b4 [26]. EfficientNet-b4 extracts image features automatically and extends the convolutional neural network in three dimensions of width, depth and image resolution, so it has fewer parameters and a higher top-1 accuracy [26]. As a typical DNN, CNN has been used in many real applications, such as face recognition, crowd counting, and road crack detection [27]. More work on image classification using DNNs can be found in [27]. The size of the image input to EfficientNet-b4 has to be 380 × 380, so the mineral image is first scaled to the standard input size of 380 × 380 before it is input into EfficientNet-b4. To maintain the high accuracy, the size of the input image should not be too small (for example, less than 300 × 300 pixels).
Hardness feature extraction in Figure 2b transforms the Mohs hardness from a value into a vector by using two fully connected layers of dimensions 36 and 1024. Then, the vector is concatenated with the image feature vector output by EfficientNet-b4. The combination of image and hardness features in Figure 2c uses a fully connected layer to obtain the final result. The method that transforms the Mohs hardness from a value x into a vector of dimensions 36 Y is similar to that in The image feature extraction in Figure 2a uses the deep convolutional neural network EfficientNet-b4 [26]. EfficientNet-b4 extracts image features automatically and extends the convolutional neural network in three dimensions of width, depth and image resolution, so it has fewer parameters and a higher top-1 accuracy [26]. As a typical DNN, CNN has been used in many real applications, such as face recognition, crowd counting, and road crack detection [27]. More work on image classification using DNNs can be found in [27]. The size of the image input to EfficientNet-b4 has to be 380 × 380, so the mineral image is first scaled to the standard input size of 380 × 380 before it is input into EfficientNet-b4. To maintain the high accuracy, the size of the input image should not be too small (for example, less than 300 × 300 pixels).
Hardness feature extraction in Figure 2b transforms the Mohs hardness from a value into a vector by using two fully connected layers of dimensions 36 and 1024. Then, the vector is concatenated with the image feature vector output by EfficientNet-b4. The combination of image and hardness features in Figure 2c uses a fully connected layer to obtain the final result. The method that transforms the Mohs hardness from a value x into a vector of dimensions 36 Y is similar to that in    The image feature extraction in Figure 2a uses the deep convolutional neural network EfficientNet-b4 [26]. EfficientNet-b4 extracts image features automatically and extends the convolutional neural network in three dimensions of width, depth and image resolution, so it has fewer parameters and a higher top-1 accuracy [26]. As a typical DNN, CNN has been used in many real applications, such as face recognition, crowd counting, and road crack detection [27]. More work on image classification using DNNs can be found in [27]. The size of the image input to EfficientNet-b4 has to be 380 × 380, so the mineral image is first scaled to the standard input size of 380 × 380 before it is input into EfficientNet-b4. To maintain the high accuracy, the size of the input image should not be too small (for example, less than 300 × 300 pixels).
Hardness feature extraction in Figure 2b transforms the Mohs hardness from a value into a vector by using two fully connected layers of dimensions 36 and 1024. Then, the vector is concatenated with the image feature vector output by EfficientNet-b4. The combination of image and hardness features in Figure 2c uses a fully connected layer to obtain the final result. The method that transforms the Mohs hardness from a value x into a vector of dimensions 36 Y is similar to that in Figure 3. In Figure 3, there are 36 connections, and each connection represents a linear operation yi = i ω x + bi; and bi are determined in the training process as described in the following. Other operations in Figure 2 are similar.  When training the neural network in Figure 2, data augmentation and transfer learning are adopted. Data augmentation enhances the generalization ability of the model to identify the minerals untrained (for example, the minerals in different light) more accurately, which is implemented by randomly flipping, cropping, zooming, and changing the contrast and brightness of the trained images. Transfer learning uses the pretrained weights of ImageNet [28] as the initial weights, which greatly speeds up the convergence of the model compared with the random initialization weights. The normalized mean and standard deviation used in training are the same as those in ImageNet, and the optimizer used is Adam. The learning rate decays from 10 −3 to 10 −4 exponentially. The loss function used is focal loss (FL) [29] to solve the problem of species imbalance caused by the differences in the number of images of different species of mineral. The accuracy and the cross-entropy loss curves on the training and validation sets are shown in Figure 4. In Figure 4, we can see that curves using both images and hardness converge faster than curves using images only. used is focal loss (FL) [29] to solve the problem of species imbalance caused by the differences in the number of images of different species of mineral. The accuracy and the crossentropy loss curves on the training and validation sets are shown in Figure 4. In Figure 4, we can see that curves using both images and hardness converge faster than curves using images only.

Test Results and Discussion
To show the accuracy of our method, 15,291 of the total 183,688 images obtained from the Mindat [25] and their Mohs hardness were used to test our neural network model, which again have not been used by the model in the training process, as stated in Section 3. Our training and testing of the model were carried out on a server with a NVIDIA Tesla P100 graphics card. After one of the mineral images and its Mohs hardness are input into the neural network in Figure 2, the top-5 mineral names with the highest probability are given. The accuracy and confusion matrix [30] were used to evaluate the performance of

Test Results and Discussion
To show the accuracy of our method, 15,291 of the total 183,688 images obtained from the Mindat [25] and their Mohs hardness were used to test our neural network model, which again have not been used by the model in the training process, as stated in Section 3. Our training and testing of the model were carried out on a server with a NVIDIA Tesla P100 graphics card. After one of the mineral images and its Mohs hardness are input into the neural network in Figure 2, the top-5 mineral names with the highest probability are given. The accuracy and confusion matrix [30] were used to evaluate the performance of our method. Comparisons with other similar methods were also made and the results are given. A smartphone app based on the method was implemented, and the result is also shown in this section.

Our Test Results
Tests using image only (EfficientNet-b4 neural network is used), hardness only (a fully connected network is used) and both image and hardness (the ensemble network shown in Figure 2 is used) were carried out. The accuracy results are shown in Table 2, in which top-1 accuracy means that the mineral name obtained from the model with the highest probability is exactly the mineral to be identified, and top-5 accuracy means that one of the top 5 mineral names with the highest probability obtained from the model is the identified mineral. Table 2 shows that the combination of image and hardness can greatly improve the accuracy of top-1. Comparisons of the accuracy of specific mineral species are shown in Figure 5. We can see that after adding the hardness, the accuracy of almost all mineral species is improved, especially for the minerals 2, 4,9,11,15,19,20,25,26,28,29,31,32,33,34, and 35, which were improved by more than 15%. The main reason is that the shapes, textures or colors of many minerals are similar, or are often associated with quartz [31]. Therefore, quartz often appears in the image, which makes it difficult for the model to identify the minerals correctly based only on images. The accuracy of mineral 1 and 4 was slightly reduced after adding the hardness. This is because their hardness ranges overlap with other minerals and their images are also similar with others. From Table 2 and Figure 5 we can see that adding hardness can increase the accuracy for most minerals.    Table 3 shows the comparison of the number of identified minerals and the accuracy of the related studies. Compared with the method that uses spectral images obtained from Raman spectroscopy [14], which can only identify six minerals, our method can identify 36 minerals with higher accuracy. In contrast, our method does not require special instruments to obtain spectral data, and the Mohs hardness we need can be easily obtained by portable picks. Compared with the work of Zhang et al. [15] and Baykan and Yılmaz [16], who identified minerals using microscopic images, our work does not need a microscope, and the number of minerals identified increases from four and five to 36 with almost the same accuracy. Compared with Solar et al. [17] and Liu et al. [18], whose work used photo images only, our work increases the number of identified minerals to 36, and the accuracy decreases by only 0.4% compared to the work of Solar et al. [17] and increases by 16.4% compared to the work of Liu et al. [18]. It is important to note that the numbers contained The confusion matrix using images only and the combination of images and hardness are shown in Figure 6. In Figure 6, we can see that adding hardness is beneficial for improving the accuracy of mineral identification. If the amplitude of the grids on the confusion matrix diagonal is larger (generally darker), the result has higher accuracy. The smaller the amplitude of the grids outside the diagonal (generally lighter), the lower the probability of incorrect identification and the better the model performs. Compared with the left in Figure 6, the diagonal color on the right is darkened, and the colors of the other grids are generally lighter, which shows that the accuracy of the model was improved significantly by adding hardness. Table 3 shows the comparison of the number of identified minerals and the accuracy of the related studies. Compared with the method that uses spectral images obtained from Raman spectroscopy [14], which can only identify six minerals, our method can identify 36 minerals with higher accuracy. In contrast, our method does not require special instruments to obtain spectral data, and the Mohs hardness we need can be easily obtained by portable picks. Compared with the work of Zhang et al. [15] and Baykan and Yılmaz [16], who identified minerals using microscopic images, our work does not need a microscope, and the number of minerals identified increases from four and five to 36 with almost the same accuracy. Compared with Solar et al. [17] and Liu et al. [18], whose work used photo images only, our work increases the number of identified minerals to 36, and the accuracy decreases by only 0.4% compared to the work of Solar et al. [17] and increases by 16.4% compared to the work of Liu et al. [18]. It is important to note that the numbers contained in Table 3 have been collected in published papers using different sensing technologies as well as reference materials and samples used for construction and testing of the model, therefore the numbers should be considered indicative rather than a pure direct comparison using a well-planned experimental design.   Table 3 shows the comparison of the number of identified minerals and the accuracy of the related studies. Compared with the method that uses spectral images obtained from Raman spectroscopy [14], which can only identify six minerals, our method can identify 36 minerals with higher accuracy. In contrast, our method does not require special instruments to obtain spectral data, and the Mohs hardness we need can be easily obtained by portable picks. Compared with the work of Zhang et al. [15] and Baykan and Yılmaz [16], who identified minerals using microscopic images, our work does not need a microscope, and the number of minerals identified increases from four and five to 36 with almost the same accuracy. Compared with Solar et al. [17] and Liu et al. [18], whose work used photo images only, our work increases the number of identified minerals to 36, and the accuracy decreases by only 0.4% compared to the work of Solar et al. [17] and increases by 16.4% compared to the work of Liu et al. [18]. It is important to note that the numbers contained in Table 3 have been collected in published papers using different sensing technologies as well as reference materials and samples used for construction and testing of the model, therefore the numbers should be considered indicative rather than a pure direct comparison using a well-planned experimental design.

Application of the Model on Smartphones
An app on smartphones using our trained model is implemented, and it can identify the 36 species of minerals listed in Table 1. The app can be available by sending an email to the authors. Figure 7 shows four pages of the app. Page (a) appears after pressing the red round button located at the lower right corner of the homepage, which allows the user to choose the image to be identified. The image can be taken with the camera or chosen from an album on the phone. If the mineral image taken by the camera is in the dark, the phone flash will be triggered, which ensures that the image is light enough. Page (b) shows that the image to be identified can be cropped manually to obtain the main part of the image. Page (c) gives the image after cropping and the Mohs hardness to be entered, and the identification result is displayed at the bottom after pressing the "OK" button. If the Mohs hardness of the mineral cannot be obtained, the app can identify the mineral based on its photo image only without entering the Mohs hardness value, which is shown on Page (d).
that the image to be identified can be cropped manually to obtain the main part of the image. Page (c) gives the image after cropping and the Mohs hardness to be entered, and the identification result is displayed at the bottom after pressing the "OK" button. If the Mohs hardness of the mineral cannot be obtained, the app can identify the mineral based on its photo image only without entering the Mohs hardness value, which is shown on Page (d). The app was tested on 73 samples obtained from the National Infrastructure of Mineral Rock and Fossil Resources for Science and Technology [32], and the 73 samples belong to the 36 species listed in Table 1. Photos were taken outside of a glass cover, and the Mohs hardness value was entered randomly from their Mohs hardness range because touching The app was tested on 73 samples obtained from the National Infrastructure of Mineral Rock and Fossil Resources for Science and Technology [32], and the 73 samples belong to the 36 species listed in Table 1. Photos were taken outside of a glass cover, and the Mohs hardness value was entered randomly from their Mohs hardness range because touching the samples was not allowed. The test results are shown in Table 4. As described in Section 4.1, the use of hardness can increase the accuracy of mineral identification. Table 4 shows that minerals with typical characteristics are easier to identify, such as stibnite, pyrite and almandine, but minerals with similar colors or crystal shapes are easily confused, such as orpiment and sulfur. This is the reason why the sphalerite sample in Table 4 was not identified. Use of the real hardness value obtained by portable Mohs hardness picks, but not the random value from the Mohs hardness range in the app, will increase the accuracy.
The app does not require access to the internet or communication signals, which will benefit the general public, beginners and professionals in the field with no need for carrying heavy instruments such as microscopes.

Conclusions
In this paper, a mineral identification method based on deep learning that combines images and hardness is proposed. Compared with the traditional visual or microscopic observations, the proposed method reduces the reliance on experienced experts, and the workload is low. Compared with the method that uses mineral images only, our method that combines Mohs hardness identifies more minerals and greatly improves the accuracy. An app based on our method independent of the internet and communication signals was implemented. By using the phone camera to take a photo and using portable picks to obtain the Mohs hardness of the mineral, we can easily, timely and accurately identify minerals using the app, which provides a reliable method for mineral identification, especially in the field, with no need for carrying heavy instruments such as microscopes. In the future, more features, such as transparency, color and density will be introduced and the DNN will be replaced with the state-of-the-art model once it appears to improve the accuracy further. Of course, this method, like all neural network methods for identification, has a universal limitation: if a mineral that is not in the 36 minerals in training is input into the model, one of the closest of the 36 minerals will be given, which is incorrect. Collecting more species of mineral data is one method for alleviating this problem. Research on rock identification is also under consideration. We hope our research will be helpful to commercial products like IMAGO [19] and KORE GEOSYSTEMS [20].
Author Contributions: Conceptualization, X.Z. and X.J.; methodology, X.Z. and Y.X.; software, X.Z. and Y.X.; validation, X.J. and G.W.; writing-original draft preparation, X.Z.; writing-review and editing, X.J. All authors have read and agreed to the published version of the manuscript.