An Enhanced Rock Mineral Recognition Method Integrating a Deep Learning Model and Clustering Algorithm

: Rock mineral recognition is a costly and time-consuming task when using traditional methods, during which physical and chemical properties are tested at micro-and macro-scale in the laboratory. As a solution, a comprehensive recognition model of 12 kinds of rock minerals can be utilized, based upon the deep learning and transfer learning algorithms. In the process, the texture features of images are extracted and a color model for rock mineral identiﬁcation can also be established by the K-means algorithm. Finally, a comprehensive identiﬁcation model is made by combining the deep learning model and color model. The test results of the comprehensive model reveal that color and texture are important features in rock mineral identiﬁcation, and that deep learning methods can e ﬀ ectively improve identiﬁcation accuracy. To prove that the comprehensive model could extract e ﬀ ective features of mineral images, we also established a support vector machine (SVM) model and a random forest (RF) model based on Histogram of Oriented Gradient (HOG) features. The comparison indicates that the comprehensive model has the best performance of all.


Introduction
The classification and identification of rock minerals are indispensable in geological works.The traditional recognition methods are based on the physical and chemical properties of rock minerals, which are used to identify rock minerals at macro-and micro-scales [1,2].With the development of computer science and artificial intelligence, the recognition model can be established using machine learning methods [3][4][5][6][7].Traditional recognition methods have definite physical meaning; while machine learning methods are driven by data [8,9].The two methods have their own strengths and weaknesses.
The significant strength of traditional identification model is that the model can be explained.Vassilev and Vassileva [10,11] analyzed the optical properties of low-temperature ash (LTA) and high-temperature ash (HTA) of the coal samples to get the weight of each mineral in the crystalline matter basis.Then they classified the inorganic matter in coal using the composition from the HTA and got a good result.But the experiment environment and operating procedures of this method are difficult to implement.Zaini [12] used wavelength position, linear spectral un-mixing and a spectral angle mapper to explore carbonate mineral compositions and distribution on the rock mineral surface and got a good result.Moreover, Zaini's method needs SisuCHEMA SWIR sensor, which is integrated with a computer workstation to collect the data.Adep [13] developed an expert system for hyperspectral data classification with neural network technology.Their system was also used to

Deep Learning Algorithm
Deep learning was proposed by Hinton and Salakhutdinov [32] and has been widely applied in semantic recognition [33], image recognition [34], object detection [35], and the medical field [36].In image recognition, the end-to-end training mode is adopted in the training process.In each iteration of a training process, each layer can adjust its parameters according to the feedback from training data to improve the accuracy of the model.Moreover, the nonlinear function mapping ability is added to the deep learning model.As a consequence, it reduces the computational time for extracting features from images and improves the accuracy of the deep learning model.
In this paper, the Inception-v3 model [37] was set as a pre-trained model.In the Inception-v3 model, there are five convolutional layers and two max-pooling layers in front of the net structure.To reduce the computation cost, the convolution kernel is converted from one 5 × 5 matrix to two 3 × 3 matrixes.Then, 11 mixed layers follow.These mixed layers could be divided into three categories, named block module 1, block module 2 and block module 3. Block module 1 has three repetitive mixed layers, and each layer is connected by a contact layer.The input size of first mixed layer is (35,35,192), and output size is (35,35,256).The rest mixed layers' input, and output size are (35,35,288) in the block module 1. Batch size is the number of training samples in each step.In the front of block module 2, there is a small mixed layer, whose input size is (35,35,288) and the output size is (17,17,768).The rest of the mixed layers' input and output size are (17,17,768) in the block module 2. In block module 3, the input size is (17,17,768) and output size is (8,8,2048).After the mixed layer, a convolutional layer, mean pooling layer and dropout layer is followed to extracted image features.Finally, the softmax classifier is trained at the end of the net.
In the process of convolution, a color image is converted into a 3D matrix including all RGB values; then, each pixel is evaluated using the kernel and pooling during each iteration.The average pooling layer and max-pooling layer could get the average value and max value of a matrix,

Deep Learning Algorithm
Deep learning was proposed by Hinton and Salakhutdinov [32] and has been widely applied in semantic recognition [33], image recognition [34], object detection [35], and the medical field [36].In image recognition, the end-to-end training mode is adopted in the training process.In each iteration of a training process, each layer can adjust its parameters according to the feedback from training data to improve the accuracy of the model.Moreover, the nonlinear function mapping ability is added to the deep learning model.As a consequence, it reduces the computational time for extracting features from images and improves the accuracy of the deep learning model.
In this paper, the Inception-v3 model [37] was set as a pre-trained model.In the Inception-v3 model, there are five convolutional layers and two max-pooling layers in front of the net structure.To reduce the computation cost, the convolution kernel is converted from one 5 × 5 matrix to two 3 × 3 matrixes.Then, 11 mixed layers follow.These mixed layers could be divided into three categories, named block module 1, block module 2 and block module 3. Block module 1 has three repetitive mixed layers, and each layer is connected by a contact layer.The input size of first mixed layer is (35,35,192), and output size is (35,35,256).The rest mixed layers' input, and output size are (35,35,288) in the block module 1. Batch size is the number of training samples in each step.In the front of block module 2, there is a small mixed layer, whose input size is (35,35,288) and the output size is (17,17,768).The rest of the mixed layers' input and output size are (17,17,768) in the block module 2. In block module 3, the input size is (17,17,768) and output size is (8,8,2048).After the mixed layer, a convolutional layer, mean pooling layer and dropout layer is followed to extracted image features.Finally, the softmax classifier is trained at the end of the net.
In the process of convolution, a color image is converted into a 3D matrix including all RGB values; then, each pixel is evaluated using the kernel and pooling during each iteration.The average pooling layer and max-pooling layer could get the average value and max value of a matrix, respectively.Finally, the common features can be extracted from a large number of images.In the mixed layer, convolution and pooling are synchronized, which is able to achieve better performance in feature extraction.

Transfer Leaning Method
According to transfer learning, the parameters in trained models from clustering domains can be utilized to establish a new model efficiently.In a new task of images recognition, the parameters of a pre-trained model are set as the initial parameters.Transfer learning can help scholars avoid building a model from scratch and make full use of the data, which reduces computational cost and dependence on big datasets.
In this research, the Inception-v3 model was selected as the pre-trained model for rock mineral image recognition.The pre-trained deep learning model is able to reduce computational and time cost when a new model is trained.Therefore, transfer learning based on a deep learning model has been widely used [38,39].The original data set of the Inception-v3 model contains about 1.2 million images from more than 1000 categories.The model involves about 25 million parameters.However, with the transfer learning method, the training process based on this model can be quickly finished even though the computer has no GPU in general.Figure 2 shows the process of transfer learning method.respectively.Finally, the common features can be extracted from a large number of images.In the mixed layer, convolution and pooling are synchronized, which is able to achieve better performance in feature extraction.

Transfer Leaning Method
According to transfer learning, the parameters in trained models from clustering domains can be utilized to establish a new model efficiently.In a new task of images recognition, the parameters of a pre-trained model are set as the initial parameters.Transfer learning can help scholars avoid building a model from scratch and make full use of the data, which reduces computational cost and dependence on big datasets.
In this research, the Inception-v3 model was selected as the pre-trained model for rock mineral image recognition.The pre-trained deep learning model is able to reduce computational and time cost when a new model is trained.Therefore, transfer learning based on a deep learning model has been widely used [38,39].The original data set of the Inception-v3 model contains about 1.2 million images from more than 1000 categories.The model involves about 25 million parameters.However, with the transfer learning method, the training process based on this model can be quickly finished even though the computer has no GPU in general.Figure 2 shows the process of transfer learning method.

Clustering Algorithm
The clustering algorithm has a mass application in pattern recognition (PR) and is based on the Euclidian Distance.Absolute distance is the main point of Euclidian Distance, which measures the clustering of lots of datasets by absolute distance.Reasonably, a rank will be proposed by the distance size of the dataset.
In this research, the color model is established based on the K-means algorithm.In the algorithm, the absolute distances of objects are calculated, and the objects that share the closest absolute distances are in the same category.Finally, a silhouette coefficient is used to evaluate the clustering effect: In Equation (1), ai is the average distance from the ith point to other points in the same kind, which is called intra-cluster dissimilarity; bi is the average distance from the ith point to the points in any kind, which is called inter-cluster dissimilarity.When the number of kinds is more than 3, the average value is set as a silhouette coefficient.The silhouette coefficient is in [−1, 1].In the training color model, the model receiving the maximum silhouette coefficient is going to be established.

Clustering Algorithm
The clustering algorithm has a mass application in pattern recognition (PR) and is based on the Euclidian Distance.Absolute distance is the main point of Euclidian Distance, which measures the clustering of lots of datasets by absolute distance.Reasonably, a rank will be proposed by the distance size of the dataset.
In this research, the color model is established based on the K-means algorithm.In the algorithm, the absolute distances of objects are calculated, and the objects that share the closest absolute distances are in the same category.Finally, a silhouette coefficient is used to evaluate the clustering effect: In Equation ( 1), a i is the average distance from the ith point to other points in the same kind, which is called intra-cluster dissimilarity; b i is the average distance from the ith point to the points in any kind, which is called inter-cluster dissimilarity.When the number of kinds is more than 3, the average value is set as a silhouette coefficient.The silhouette coefficient is in [−1, 1].In the training color model, the model receiving the maximum silhouette coefficient is going to be established.

Support Vector Machine
Support Vector Machine (SVM) is a supervised learning method.When given a binary classification dataset x n , y n N n=1 with x n ∈ R D and y n ∈ [−1, 1], for each (x n , y n ) belongs to x n , y n N n=1 .An SVM model can be established by the equation: The optimal weight vector w T and bias b are obtained by the Hinge Loss Function [40].In this research, the Histogram of Oriented Gradient (HOG) algorithm was used to extract the feature of rock mineral images and then an SVM model was trained with these features.

Random Forest
Random Forest (RF) is proposed by Leo Breiman.The steps of establishing an RF model are as following: Step 1: Input T as training dataset size and M as feature numbers.
Step 2: Input the feature number of each node (m, m << M) to get the decision result at a node in the tree.
Step 3: Select K samples with playback from a dataset N randomly.Then, K taxonomic trees are constructed by these selected samples.The rest of the dataset is used to test the model.
Step 4: Randomly select m features at each node and find the best splitting mode by the selected features.
Step 5: Each tree grows completely without pruning and may be adopted after a normal tree classifier is built.

Algorithm Implementation
Rock minerals have unique physical properties, such as color, streak, transparency, aggregate shape and luster [41].By analyzing their properties, rock minerals can be identified more accurately.Table 1 shows the properties of 12 kinds of rock minerals.In Table 1, color is a key feature of rock minerals in rock mineral image recognition, because different compositions of rock minerals lead to different colors.Also, the composition of different rock minerals usually changes, and the colors vary widely.Some rock minerals with a single composition have a pure color.For example, cinnabar is usually red, malachite is green, and calcite is white.A rock mineral often shows a mixed color on the surface because of mixed composition, which is often indicated by binomial nomenclature, such as lime-green.However, whether a particular type of rock mineral has a pure or mixed composition, those in that classification always share some certain kinds of colors.Therefore, color is an important classification criterion.
The texture is also an important feature for rock mineral image recognition.The main components of rock mineral texture are aggregate shape and cleavage.Different rock minerals have different distinctive textures.For example, cinnabar's aggregate is granular, with perfect cleavage and uneven to sub-conchoidal fracture; aquamarine's aggregate is cluster form, with imperfect cleavage and conchoidal to irregular fracture; malachite's aggregate is multi-form, as shown in Figure 3.The texture is also helpful in recognizing rock minerals.Therefore, the combination of color and texture can improve the accuracy of the rock mineral recognition model.
Minerals 2019, 9, x FOR PEER REVIEW 6 of 16 In Table 1, color is a key feature of rock minerals in rock mineral image recognition, because different compositions of rock minerals lead to different colors.Also, the composition of different rock minerals usually changes, and the colors vary widely.Some rock minerals with a single composition have a pure color.For example, cinnabar is usually red, malachite is green, and calcite is white.A rock mineral often shows a mixed color on the surface because of mixed composition, which is often indicated by binomial nomenclature, such as lime-green.However, whether a particular type of rock mineral has a pure or mixed composition, those in that classification always share some certain kinds of colors.Therefore, color is an important classification criterion.
The texture is also an important feature for rock mineral image recognition.The main components of rock mineral texture are aggregate shape and cleavage.Different rock minerals have different distinctive textures.For example, cinnabar's aggregate is granular, with perfect cleavage and uneven to sub-conchoidal fracture; aquamarine's aggregate is cluster form, with imperfect cleavage and conchoidal to irregular fracture; malachite's aggregate is multi-form, as shown in Figure 3.The texture is also helpful in recognizing rock minerals.Therefore, the combination of color and texture can improve the accuracy of the rock mineral recognition model.

Mineral Texture Feature Extraction
The visual features of rock minerals also include the texture of rock mineral aggregates and cleavage.In light conditions, the reflection of different cleavages is different.Meanwhile, the reflection of a given rock mineral follows a pattern because of its own characteristics.Therefore, the texture of rock minerals can be extracted based on the brightness and color variation of images.
In an area where the brightness changes significantly in the image, the rock mineral texture can be outlined.According to the RGB color system, color images can be converted to grayscale to show brightness variations more effectively.The conversion rule [43] is as follows: where, gray is the grayscale value of a pixel, and R, G and B are the three primary color values of the pixel.The gray value of the image can be calculated by Equation (3).In the light condition, the brightness in the opaque area of a rock mineral image is distinctly different from that of the surrounding pixels.If the rock mineral's luster is brilliant in the image, some pixels achieve a higher brightness value than that of their surrounding area, while the rest of the pixels do not.If the luster of a rock mineral is dull in the image, which indicates that it has a low reflection, pixels achieve a lower brightness value than their surrounding pixels, as shown in Figure 4.

Mineral Texture Feature Extraction
The visual features of rock minerals also include the texture of rock mineral aggregates and cleavage.In light conditions, the reflection of different cleavages is different.Meanwhile, the reflection of a given rock mineral follows a pattern because of its own characteristics.Therefore, the texture of rock minerals can be extracted based on the brightness and color variation of images.
In an area where the brightness changes significantly in the image, the rock mineral texture can be outlined.According to the RGB color system, color images can be converted to grayscale to show brightness variations more effectively.The conversion rule [43] is as follows: where, gray is the grayscale value of a pixel, and R, G and B are the three primary color values of the pixel.The gray value of the image can be calculated by Equation (3).In the light condition, the brightness in the opaque area of a rock mineral image is distinctly different from that of the surrounding pixels.If the rock mineral's luster is brilliant in the image, some pixels achieve a higher brightness value than that of their surrounding area, while the rest of the pixels do not.If the luster of a rock mineral is dull in the image, which indicates that it has a low reflection, pixels achieve a lower brightness value than their surrounding pixels, as shown in Figure 4.
where,   is the variation of brightness for each pixel and the grayscale value for pixel points is determined by Equation (3).Moreover, gray is the mean value of the grayscale of the central pixel and 8 pixels around the central pixel.In order to eliminate edge influence, the maximum and minimum are removed, as shown in Equations ( 5) and ( 6).
where, the grayi is the value of the ith pixel's grayscale value,   and   are the maximum and minimum grayscale values of the pixels being calculated, |  | ′ is the variation of the ith grayscale pixel.T is selected as the threshold value.After experimentation, we found that a value of T = 15 was suitable.If the point satisfies |Δ  | ′ > T, it will be set as a feature point.
In addition, the texture features could be extracted from color changes at the interface of rock minerals.Therefore, the color change at the interface of a rock mineral could be measured by the variation of RGB proportion.Three linearly independent indexes, namely C1, C2 and C3 were selected, which were calculated in Equation (7).Moreover, n was set as 0.01 to avoid a denominator of zero.
1 = /( + ),  2 = /( + ),  3 = /( + ), where, ∆Z i is the variation of brightness for each pixel and the grayscale value for pixel points is determined by Equation (3).Moreover, gray is the mean value of the grayscale of the central pixel and 8 pixels around the central pixel.In order to eliminate edge influence, the maximum and minimum are removed, as shown in Equations ( 5) and (6).
where, the gray i is the value of the ith pixel's grayscale value, gray max and gray min are the maximum and minimum grayscale values of the pixels being calculated, |∆Z i | is the variation of the ith grayscale pixel.T is selected as the threshold value.After experimentation, we found that a value of T = 15 was suitable.If the point satisfies |∆Z i | > T, it will be set as a feature point.
In addition, the texture features could be extracted from color changes at the interface of rock minerals.Therefore, the color change at the interface of a rock mineral could be measured by the variation of RGB proportion.Three linearly independent indexes, namely C 1 , C 2 and C 3 were selected, which were calculated in Equation (7).Moreover, n was set as 0.01 to avoid a denominator of where, C was calculated to evaluate the variation of C 1 , C 2 and C 3 comprehensively, as shown in Equation (7).
where, K is the expansion coefficient, and in this research, K was set to 1. C will change sharply when any one of C 1 , C 2 and C 3 changes.∆C was set as the maximum difference in coefficients: where, ∆C i was the difference between the pixel and the surrounding pixels, and T 1 was used as a threshold value to determine whether the point was a texture boundary point; and T' was a matrix, which was used to store the maximum distances within each class.T 1 indicated the max distance between the 12 rock minerals, which was calculated by K-means.First, a matrix which contains the RGB values of every type of rock mineral image was calculated by K-means algorithm.Then, a result matrix that contained the max distance for these 12 kinds of rock mineral was assigned to T': where, D i is the max distance in each category from any point's C value to the mean colors' RGB values; and m is the number of mineral classes.Finally, the maximum value in T' was assigned to T 1 .When T 1 is smaller than ∆C, the pixel point be marked as a texture boundary point; and in the contrary case, it is not going to be marked.The texture and cleavage can be outlined in the image by combining the brightness and color change.The recognition accuracy of rock mineral images can be increased with the extraction of texture and the cleavage of rock minerals.In a word, the feature of rock mineral images can be extracted by calculating the changes in brightness values and color values.In this research, texture features of rock minerals were extracted to strengthen the texture features of the original rock mineral images by marking and outlining the texture boundary points, as shown in Figure 5.After extraction, Inception-v3 model and the color model will be trained.where, C was calculated to evaluate the variation of C1, C2 and C3 comprehensively, as shown in Equation (7).
where, K is the expansion coefficient, and in this research, K was set to 1. C will change sharply when any one of C1, C2 and C3 changes.Δ was set as the maximum difference in coefficients: where, Δ  was the difference between the pixel and the surrounding pixels, and T1 was used as a threshold value to determine whether the point was a texture boundary point; and T' was a matrix, which was used to store the maximum distances within each class.T1 indicated the max distance between the 12 rock minerals, which was calculated by K-means.First, a matrix which contains the RGB values of every type of rock mineral image was calculated by K-means algorithm.Then, a result matrix that contained the max distance for these 12 kinds of rock mineral was assigned to T': where, Di is the max distance in each category from any point's C value to the mean colors' RGB values; and m is the number of mineral classes.Finally, the maximum value in T' was assigned to T1.When T1 is smaller than , the pixel point will be marked as a texture boundary point; and in the contrary case, it is not going to be marked.The texture and cleavage can be outlined in the image by combining the brightness and color change.The recognition accuracy of rock mineral images can be increased with the extraction of texture and the cleavage of rock minerals.In a word, the feature of rock mineral images can be extracted by calculating the changes in brightness values and color values.In this research, texture features of rock minerals were extracted to strengthen the texture features of the original rock mineral images by marking and outlining the texture boundary points, as shown in Figure 5.After extraction, Inception-v3 model and the color model will be trained.

Color Model of Rock Mineral
The colors of the 12 types of rock minerals considered here are shown in Table 1.A computer generated the color through combining red, green, and blue to build RGB color space, in which red, green, and blue have values in [0, 255].As a result, there are 1,677,216 (226 × 256 × 256 = 1,677,216) colors in RGB color space.Any change of the value in R, G or B will lead to color variation.It is possible to implement recognition for rock mineral images based on the feature of color.The rock mineral label can be determined by the following equation: where, ρji are the Euclid Distance.Ri, Gi and Bi are the mean values of RGB value in rock mineral images.Rji, Gji and Bji are the mean values of the RGB value of the ith color in the jth rock mineral.
According to the color distance Sji, the rock mineral ranked top 6 are recognized, and shown in the result.See Equations (11)(12).
Here a color model for rock mineral will be established based on the colors of rock mineral.The color model will be used to assist inception-v3 model in identifying rock minerals.During identification of the type of a mineral, the inception-v3 model identifies the mineral first, then a six-size set of identification results is passed to the color model.The final result is given by the color model after matching the color of the mineral with the results from inception-v3 model.

Image Processing
The dataset [42] consists of 4178 images of 12 kinds of elaborate rock minerals.The information about rock mineral images in each category is shown in Table 2. Twenty images were selected randomly in each category as a test dataset, and the remaining images were used to retrain the Inception-v3 model.

Color Model of Rock Mineral
The colors of the 12 types of rock minerals considered here are shown in Table 1.A computer generated the color through combining red, green, and blue to build RGB color space, in which red, green, and blue have values in [0, 255].As a result, there are 1,677,216 (226 × 256 × 256 = 1,677,216) colors in RGB color space.Any change of the value in R, G or B will lead to color variation.It is possible to implement recognition for rock mineral images based on the feature of color.The rock mineral label can be determined by the following equation: where, ρ ji are the Euclid Distance.R i , G i and B i are the mean values of RGB value in rock mineral images.R ji , G ji and B ji are the mean values of the RGB value of the ith color in the jth rock mineral.According to the color distance S ji , the rock mineral ranked top 6 are recognized, and shown in the result.See Equations ( 11) and ( 12).
Score j = 1 − min S ji n j=1 min S ji Here a color model for rock mineral will be established based on the colors of rock mineral.The color model will be used to assist inception-v3 model in identifying rock minerals.During identification of the type of a mineral, the inception-v3 model identifies the mineral first, then a six-size set of identification results is passed to the color model.The final result is given by the color model after matching the color of the mineral with the results from Inception-v3 model.

Image Processing
The dataset [42] consists of 4178 images of 12 kinds of elaborate rock minerals.The information about rock mineral images in each category is shown in Table 2. Twenty images were selected randomly in each category as a test dataset, and the remaining images were used to retrain the Inception-v3 model.
In establishing the model, numerous factors including the number and clarity of rock mineral images, background noise and differences between features of minerals all influence its accuracy.Generally, accuracy increases as the number of rock mineral images increases.However, the imbalance among different types in quantity may lead to low accuracy.Therefore, we collected the images and ensured that the total number of each category was at least 150.The proportion of rock minerals in an image was then adjusted to at least 80%.Finally, the noise of the image, such as complex backgrounds and labels, was removed to give the images the same size.In the process of training and the color model, the texture features of all images were extracted and strengthened.All images were also translated into the RGB matrix.Then, the brightness values and a matrix containing the ∆C values of the points of each image were calculated by Equation (3) and Equations ( 7)-( 9), respectively.If they were determined to be boundary points, the points were marked and outlined.Some pre-processed results are shown in Figure 5.Then, the Inception-v3 model was trained.The processed images were used as the raw data.The input image size was set to 299 × 299.All the input images were pre-processed for training.There were three input channels for length, width, and color.The training steps were set to 20,000, and the learning rate was set to 0.01.The prediction results were compared with the true label in each step, so that the training and validation accuracy were both calculated to update the weights in the model.The true label is the class name from the dataset.The primary measures used to evaluate training effectiveness are the training accuracy, validation accuracy and train cross-entropy, as shown in Figure 6.In establishing the model, numerous factors including the number and clarity of rock mineral images, background noise and differences between features of minerals all influence its accuracy.Generally, accuracy increases as the number of rock mineral images increases.However, the imbalance among different types in quantity may lead to low accuracy.Therefore, we collected the images and ensured that the total number of each category was at least 150.The proportion of rock minerals in an image was then adjusted to at least 80%.Finally, the noise of the image, such as complex backgrounds and labels, was removed to give the images the same size.In the process of training and the color model, the texture features of all images were extracted and strengthened.All images were also translated into the RGB matrix.Then, the brightness values and a matrix containing the △C values of the points of each image were calculated by Equation (3) and Equations ( 7)-( 9), respectively.If they were determined to be boundary points, the points were marked and outlined.Some pre-processed results are shown in Figure 5.Then, the inception-v3 model was trained.

Model Training
The processed images were used as the raw data.The input image size was set to 299 × 299.All the input images were pre-processed for training.There were three input channels for length, width, and color.The training steps were set to 20,000, and the learning rate was set to 0.01.The prediction results were compared with the true label in each step, so that the training and validation accuracy were both calculated to update the weights in the model.The true label is the class name from the dataset.The primary measures used to evaluate training effectiveness are the training accuracy, validation accuracy and train cross-entropy, as shown in Figure 6.The lighting conditions have an influence on the RGB value of the images.To get a better color model, the self-adaption K-means algorithm, which is optimized by the silhouette coefficient, was used to train the model based on 10 image slices for each category in different lighting condition (with a size of about 300 × 300).There were about 900,000 points in each category.These points were then used to train the color model by K-means.When the model was trained, the silhouette coefficient and the maximum distance, which is the Di in Equation ( 10), of each category from point to the mean value were calculated.The training process ended until the silhouette coefficient was maximum.The training results of the color model is shown in Table 3.Additionally, Figure 7 shows samples of rock mineral images which were trained in color model.The lighting conditions have an influence on the RGB value of the images.To get a better color model, the self-adaption K-means algorithm, which is optimized by the silhouette coefficient, was used to train the model based on 10 image slices for each category in different lighting condition (with a size of about 300 × 300).There were about 900,000 points in each category.These points were then used to train the color model by K-means.When the model was trained, the silhouette coefficient and the maximum distance, which is the D i in Equation ( 10), of each category from point to the mean value To ensure the models in Table 4 are statistically feasible, the precision rate, recall rate and F1-measure value are used to validate the models.In Table 4, the retrained Inception-v3 model based on raw images was able to reach a validation accuracy of 73.1%, and the top-1 and top-3 accuracies were 64.1% and 96.0%, respectively.The test accuracies (precision rate) of SVM-HOG and RF-HOG are 32.8% and 31.2%,respectively.The accuracy is too low, which indicates that the HOG features are not effective.On the other side, the comprehensive model has a significant performance, which proves it can extract the texture features.The test accuracies of each category (recall rate) of these models are shown in Figure 8, and the F1-measure value of the models is shown in Table 5.As we can see, the accuracy of each category of the comprehensive model is much higher than that of the rest models, which indicates that the comprehensive model is suitable for rock mineral identification.The F1-measure value of the comprehensive model indicates that the model is statistically feasible.To ensure the models in Table 4 are statistically feasible, the precision rate, recall rate and F1-measure value are used to validate the models.In Table 4, the retrained inception-v3 model based on raw images was able to reach a validation accuracy of 73.1%, and the top-1 and top-3 accuracies were 64.1% and 96.0%, respectively.The test accuracies (precision rate) of SVM-HOG and RF-HOG are 32.8% and 31.2%,respectively.The accuracy is too low, which indicates that the HOG features are not effective.On the other side, the comprehensive model has a significant performance, which proves it can extract the texture features.The test accuracies of each category (recall rate) of these models are shown in Figure 8, and the F1-measure value of the models is shown in Table 5.As we can see, the accuracy of each category of the comprehensive model is much higher than that of the rest models, which indicates that the comprehensive model is suitable for rock mineral identification.The F1-measure value of the comprehensive model indicates that the model is statistically feasible.It revealed that the accuracy of the Inception-v3 model, with the texture feature extraction images, is higher than that with the raw images in Table 4.The Inception-v3 model with the texture feature extraction images reached a higher accuracy, which means the texture is a significant feature for rock mineral identification.The texture extraction makes it easier to distinguish different rock minerals, which improves the accuracy of the retrained model.
The comprehensive model achieved the highest accuracy in the three models.Color properties could increase identification accuracy greatly.From the test accuracy of the comprehensive model, we can see that the top-1 accuracy is 74.2%, and the top-3 accuracy is 99.0%.By combining the retrained Inception-v3 model and the color model, a comprehensive identification could be made for different rock minerals.The texture and color features were used the most during identification.
Following are some recognition samples in Figure 9.As you can see, these models sometimes achieved different results.When the texture features are similar, and the colors are different.For example, magnetite's cleavage is similar to the azurite's, but their colors are not similar, the color model could identify them successfully.When the minerals' color is similar, but has different cleavage such as calcite and gym, the Inception-v3 model trained with feature extraction images could identify them successfully.Therefore, the comprehensive model has the advantages of both the color model and the Inception-v3 model trained with feature extraction images.Occasionally, however, the comprehensive model is going to be wrong when the minerals' texture feature is vague, or the color is abnormal.This condition will confuse the comprehensive model, as shown in Figure 10.It is magnetite in Figure 10, but because of the blue lighting and vague texture feature, these models all received an incorrect result.It revealed that the accuracy of the inception-v3 model, with the texture feature extraction images, is higher than that with the raw images in Table 4.The inception-v3 model with the texture feature extraction images reached a higher accuracy, which means the texture is a significant feature for rock mineral identification.The texture extraction makes it easier to distinguish different rock minerals, which improves the accuracy of the retrained model.
The comprehensive model achieved the highest accuracy in the three models.Color properties could increase identification accuracy greatly.From the test accuracy of the comprehensive model, we can see that the top-1 accuracy is 74.2%, and the top-3 accuracy is 99.0%.By combining the retrained Inception-v3 model and the color model, a comprehensive identification could be made for different rock minerals.The texture and color features were used the most during identification.
Following are some recognition samples in Figure 9.As you can see, these models sometimes achieved different results.When the texture features are similar, and the colors are different.For example, magnetite's cleavage is similar to the azurite's, but their colors are not similar, the color model could identify them successfully.When the minerals' color is similar, but has different cleavage such as calcite and gym, the Inception-v3 model trained with feature extraction images could identify them successfully.Therefore, the comprehensive model has the advantages of both the color model and the Inception-v3 model trained with feature extraction images.Occasionally, however, the comprehensive model is going to be wrong when the minerals' texture feature is vague, or the color is abnormal.This condition will confuse the comprehensive model, as shown in Figure 10.It is magnetite in Figure 10, but because of the blue lighting and vague texture feature, these models all received an incorrect result.Furthermore, the elaborate mineral images, with clear mineral features, are adopted.If the features, such as cleavage and luster, can be extracted, the comprehensive model can be further tested for mineral specimen identification, even for field survey.Furthermore, the elaborate mineral images, with clear mineral features, are adopted.If the features, such as cleavage and luster, can be extracted, the comprehensive model can be further tested for mineral specimen identification, even for field survey.Furthermore, the elaborate mineral images, with clear mineral features, are adopted.If the features, such as cleavage and luster, can be extracted, the comprehensive model can be further tested for mineral specimen identification, even for field survey.

Figure 1 .
Figure 1.The training process and identification process of a comprehensive model.

Figure 1 .
Figure 1.The training process and identification process of a comprehensive model.

Figure 4 .
Figure 4.The brightness distribution of rock mineral images.(a) Calcite; (b) brightness distribution of calcite image; (c) cinnabar; (d) brightness distribution of cinnabar image.

Figure 4 .
Figure 4.The brightness distribution of rock mineral images.(a) Calcite; (b) brightness distribution of calcite image; (c) cinnabar; (d) brightness distribution of cinnabar image.In the same light condition, the brightness value fluctuates around a fixed value, and the changed value of the fluctuation is set as |∆Z i |.The brightness value dramatically changes if it is beside the texture and the value is far more than |∆Z i |, namely |∆Z| >> |∆Z i |.The variation for a pixel is calculated:

Figure 8 .
Figure 8. Test accuracies of each category.

Figure 9 .Figure 10 .
Figure 9.Samples of recognition results.(a-c) are from the comprehensive model, (d-f) are from the Inception-v3 model trained with feature extraction images, (g-j) are from the Inception-v3 model without feature extraction.

Author Contributions:
Modules design and development, C.L.; Overall framework design, M.L.; Data collection and application test, Y.Z. and Y.Z.; Algorithm and model improvement, S.H.

Figure 9 . 16 (
Figure 9.Samples of recognition results.(a-c) are from the comprehensive model, (d-f) are from the Inception-v3 model trained with feature extraction images, (g-j) are from the Inception-v3 model without feature extraction.

Figure 9 .Figure 10 .
Figure 9.Samples of recognition results.(a-c) are from the comprehensive model, (d-f) are from the Inception-v3 model trained with feature extraction images, (g-j) are from the Inception-v3 model without feature extraction.

Author Contributions:
Modules design and development, C.L.; Overall framework design, M.L.; Data collection and application test, Y.Z. and Y.Z.; Algorithm and model improvement, S.H.
In this research, three models based on Inception-v3 model were established.The deep learning model, coupled with color model reached top-1 and top-3 accuracies of 74.2% and 99.0%, respectively.The retrained model using raw images reached top-1 and top-3 accuracies of 64.1% and 96.0%, respectively.The retrained model using texture extraction images reached top-1 and top-3 accuracies of 67.5% and 98.3%, respectively.The comparison of the three models indicates that the comprehensive model is the best of all.The SVM-HOG model achieved a validation accuracy of 32.8%.The RF-HOG model achieved a validation accuracy of 31.2%.The results indicate that the deep learning model can extract effective image features.The comparison between the traditional models and the deep learning model shows that the deep learning models are much more effective than the traditional ones.The deep learning algorithm provides a new idea for rock mineral identification.The clustering algorithm could also improve the performance of the deep learning model.Therefore, the combination of a deep learning algorithm and clustering algorithm is an effective way for rock mineral identification.Meanwhile, the color features and texture features of rock minerals are significant properties for identification.The identification accuracy of Inception-v3 model could be improved greatly by texture and color extraction.

Table 1 .
Twelve kinds of rock mineral's physical properties.

Table 2 .
The types and numbers of rock mineral images.

Table 2 .
The types and numbers of rock mineral images.

Table 4 .
Results of five models.

Table 4 .
Results of five models.

Table 5 .
Validation parameters of models.

Table 5 .
Validation parameters of models.