Segmentation for Multi-Rock Types on Digital Outcrop Photographs Using Deep Learning Techniques

The basic identification and classification of sedimentary rocks into sandstone and mudstone are important in the study of sedimentology and they are executed by a sedimentologist. However, such manual activity involves countless hours of observation and data collection prior to any interpretation. When such activity is conducted in the field as part of an outcrop study, the sedimentologist is likely to be exposed to challenging conditions such as the weather and their accessibility to the outcrops. This study uses high-resolution photographs which are acquired from a sedimentological study to test an alternative basic multi-rock identification through machine learning. While existing studies have effectively applied deep learning techniques to classify the rock types in field rock images, their approaches only handle a single rock-type classification per image. One study applied deep learning techniques to classify multi-rock types in each image; however, the test was performed on artificially overlaid images of different rock types in a test sample and not of naturally occurring rock surfaces of multiple rock types. To the best of our knowledge, no study has applied semantic segmentation to solve the multi-rock classification problem using digital photographs of multiple rock types. This paper presents the application of two state-of-the-art segmentation models, namely U-Net and LinkNet, to identify multiple rock types in digital photographs by segmenting the sandstone, mudstone, and background classes in a self-collected dataset of 102 images from a field in Brunei Darussalam. Four pre-trained networks, including Resnet34, Inceptionv3, VGG16, and Efficientnetb7 were used as a backbone for both models, and the performances of the individual models and their ensembles were compared. We also investigated the impact of image enhancement and different color representations on the performances of these segmentation models. The experiment results of this study show that among the individual models, LinkNet with Efficientnetb7 as a backbone had the best performance with a mean over intersection (MIoU) value of 0.8135 for all of the classes. While the ensemble of U-Net models (with all four backbones) performed slightly better than the LinkNet with Efficientnetb7 did with an MIoU of 0.8201. When different color representations and image enhancements were explored, the best performance (MIoU = 0.8178) was noticed for the L*a*b* color representation with Efficientnetb7 using U-Net segmentation. For the individual classes of interest (sandstone and mudstone), U-Net with Efficientnetb7 was found to be the best model for the segmentation. Thus, this study presents the potential of semantic segmentation in automating the reservoir characterization process whereby we can extract the patches of interest from the rocks for much deeper study and modeling to be conducted.


Introduction
Rock type identification is a critical first step in resource exploration and development [1][2][3]. This involves a visual examination of the specimens for specific properties that are typically based on color, composition, sedimentary structures, and granularity [1].
focused ion beam-SEM (FIB-SEM) images to identify sandstone and shale rock types. For the 2D CNN classification, U-Net, U-VGG16 and U-ResNet were used and for the 3D one, 3D U-Net was used. Images that were 128 × 128 were sampled from two 3D rock images, creating a total of 2569 segmented samples. The U-Resnet and U-VGG16 models achieved above 0.85 mean IOU scores. Alfarisi et al. [7] discussed the different types of machine learning algorithms that were applied on the CT and MRI scans of rocks to calculate the different properties that determine the rock types such as permeability and porosity. Jin et al. [9] developed a multi-module densely connected U-net to perform the semantic segmentation on digital camera photographic images of boreholes to determine the ore distribution and delineate the ore and waste rock boundary.
While Ran et al. [1], Pascual et al. [10] and Liang et al. [12] effectively applied deep learning techniques to classify the rock types in field rock images, their approaches could only handle a single rock-type classification per image. In our work, we apply deep learning to achieve a multi-rock-type classification. Liu et al. [2] applied deep learning techniques to classify multi-rock types on each image, however, the test was performed on artificially overlayed images of different rock types in a test sample and not on images of naturally occurring rock surfaces of multiple rock types. In our work, the model was trained and it was evaluated using images containing naturally occurring multi rock type surfaces. Semantic segmentation using deep learning was applied to classify the rock types by Ringer and Yoon [6], Alfarisi et al. [7] and Xu et al. [8], but these were on special images that were obtained from CT, MRI and SEM images, and not normal camera photographs. While Cheng and Guo [11] applied a CNN to classify the rock granularity and Niu et al. [15] applied a CNN on µCT and SEM images to segment the sandstone data and measure the properties relating to the rock type, these problems are not within the scope of this work. In our work, we applied semantic segmentation using deep learning techniques on digital photographs containing multiple rock types. Thus, our contribution is the application of semantic segmentation using U-Net and LinkNet to identify multiple rock types in digital photographs. Table 1 illustrates the types of deep learning rock type classification problems and their techniques and the research gap of this work. Table 1. Different types of deep learning rock type classification problems and solutions.

Literature
Type of Images

Techniques Semantic Segmentation
Ran et al. [1] digital camera photographs S CNN N Liu et al. [2] digital camera photographs M 1 faster R-CNN, simplified VGG16 N Ringer and Yoon [6] SEM, CT S U-Net, U-VGG16, U-Resnet Y Alfarisi et al. [7] SEM, CT [11] polarized light microscopy S CNN N Liang et al. [12] digital camera photographs S EfficientNetB0 N Niu et al. [15] µCT In this work, we utilize high-resolution images that were acquired from an earlier outcrop study [16], which are dominated by sedimentary rocks. Here, the two main sedimentary rock types are sandstone beds and mudstone beds, which range in thickness from cm to dm. However, the majority of the beds are classified as thin-beds of less than 1 m. In the outcrop study, the individual beds have been measured, described, identified and interpreted, thereby adhering to the typical sedimentological study. Such a study requires significant amount of time and effort to acquire, analyze and understand the geology and its controlling factors. This forms an integral part of an analogue of the subsurface, of which the data availability and/or accessibility can be limited. Such an analogue can be used as proxy for some of the hydrocarbon fields that exist in NW Borneo as a means to interpret their geological significance and importance. The sedimentological study identified a total of 1483 sandstone beds with an equal number of mudstone beds over a length of 237 m stacked beds. The effort took five weeks for the geological observation to be conducted, which includes acquiring the high-resolution photos.
The aims of this study are: (1) to identify the two main rock types from a series of training images that were acquired from the outcrop study through the use of various deep learning techniques, and (2) to investigate the workflows that can be used as a tool for the rock type classifications from similar images and/or photographs.
In terms of the ongoing efforts towards the automation and decision support in reservoir characterization, we developed an automatic rock type classification system to identify the rocks of interest, in particular sandstone and mudstone. In this work, we aim to develop a multi-rock type classification system using deep learning techniques on digital outcrop rock images. To the best of our knowledge, there has been no study applying semantic segmentation for the multi-rock type classifications of field rock images. This is a preliminary step towards the sedimentological characterization of the facies that have been identified. Using semantic segmentation to classify sandstone and mudstone, we can quickly extract the rock patches of interest for a much deeper study and modelling to be conducted, such as calculating the rock distribution [9].
In this work, two different models, U-Net and LinkNet, were applied for the segmentation of the sandstone and mudstone rocks with various encoder backbones, Efficientb7, Resnet34, Inceptionv3 and VGG16 [17,18], and ensemble learning, and their results were analyzed both quantitatively and qualitatively ( Figure 1).

Materials and Methods
The proposed methodology that was used in this study is depicted in Figure 2. The details of each step are explained in the following sections. In Section 2, we explain the methodology, as briefly explained above, which was applied to perform the sandstone and mudstone rock segmentation of the digital outcrop photos. In Section 3, we present, interpret and discuss the results that were obtained from our investigations. Finally, we summarize our work and share the key lessons in the conclusion section.

Materials and Methods
The proposed methodology that was used in this study is depicted in Figure 2. The details of each step are explained in the following sections.

Materials and Methods
The proposed methodology that was used in this study is depicted in Figure 2. The details of each step are explained in the following sections.

Dataset Collection and Annotation
A total of 102 images which were used in this study were collected using a highresolution digital camera. Each image consisted of multiple instances of sandstone and mudstone data. The resolution of the images was 6000 × 4000 pixels. The images were annotated by a field expert to generate a semantic segmentation mask to train the deep learning models. Each image pixel was assigned a label to represent either a pixel belonging to a mudstone class or a sandstone class or the background class ( Figure 3). Thus, there were 375 instances of sandstone, 312 instances of mudstone and 138 background instances which were labeled for the training, validation and testing of the semantic segmentation models. The APEER (https://www.apeer.com/, accessed on 24 January 2022) annotation tool was used to annotate the dataset and generate the ground truth labels.

Dataset Collection and Annotation
A total of 102 images which were used in this study were collected using a highresolution digital camera. Each image consisted of multiple instances of sandstone and mudstone data. The resolution of the images was 6000 × 4000 pixels. The images were annotated by a field expert to generate a semantic segmentation mask to train the deep learning models. Each image pixel was assigned a label to represent either a pixel belonging to a mudstone class or a sandstone class or the background class ( Figure 3). Thus, there were 375 instances of sandstone, 312 instances of mudstone and 138 background instances which were labeled for the training, validation and testing of the semantic segmentation models. The APEER (https://www.apeer.com/, accessed on 24 January 2022) annotation tool was used to annotate the dataset and generate the ground truth labels.

Pre-Processing
Together with their ground truth annotations, all of the input images were resiz 256 × 256 pixels to reduce the computational complexity in processing the higher re tion images. A standard preprocessing step was then applied to maximize the p mance gain based on the backbone encoder. For example, when we were using Res

Pre-Processing
Together with their ground truth annotations, all of the input images were resized to 256 × 256 pixels to reduce the computational complexity in processing the higher resolution images. A standard preprocessing step was then applied to maximize the performance gain based on the backbone encoder. For example, when we were using Resnet as an encoder backbone, all of the input images were first converted from an RGB to an BGR color channel and then, we zero-centered each color channel based on a specific ImageNet dataset without pixel scaling it. On the other hand, the backbone encoders such as Efficientb7 did not require any pixel scaling as the model consists of a rescaling layer that automatically scales the input images during the model training and inference.
The color transformations and image enhancement (e.g., histogram equalization) have been found to be helpful in improving the classification and segmentation of the images in previous studies [19][20][21]. We evaluated the effect of the histogram equalization and color transformations to RGB images on the segmentation performance. To apply the histogram equalization on the multichannel color image (i.e., RGB), the image was converted to a YCrCb format, and then, the histogram equalization was applied to the luminance channel. The training/testing dataset was generated by converting back the output image to the RGB format. The impact of the color transformations was also investigated by using three common color-space models including YCrCb, L*a*b* and HSV.

Segmentation Models
Two different semantic segmentation networks were compared to assess the extent of their performance when they were used for the segmentation of the sandstone and mudstone rocks with various encoder backbones. These models have achieved state-ofthe-art results in different segmentation tasks with varying degrees of complexity [22]. The summary of these models is explained below.
U-Net: U-Net was first proposed for performing segmentation tasks in the biomedical domain [23]. Due to its performance and efficiency with smaller datasets, it has now been adapted in different domains. Like the other semantic segmentation networks, U-Net follows an encoder-decoder architecture where an encoder down-samples the image to capture the context, while a decoder part of the network symmetrically expands to enable the precise localization of the objects in an image. A 3 × 3 convolution extracts the features while down-sampling the image, and a de-convolution process is then applied to up-sample the feature map. Skip connections are added to the encoder and the decoder parts to copy the feature maps to avoid the occurrence of information loss. Finally, a 1 × 1 convolution is applied to generate a segmentation mask by classifying each pixel into a specific class.
LinkNet: LinkNet architecture was proposed with the aim of improving the information sharing between the encoder part and the decoder part [24]. This was achieved by replacing the convolution layers of the U-Net architecture with the Resnet blocks on the encoder and decoder parts. In addition, LinkNet uses the addition method instead of the stacking method (present in U-Net) to transform the synthesis features.
These models follow a generic encoder-decoder structure where the encoder extracts the key features from the input image and then uses the decoder part to project the features into the pixel space to obtain a pixel-level classification. The encoder part is usually a pre-trained model, which is used as a backbone encoder, to improve the network learning speed and performance by taking advantage of the good weight initialization. In this study, we assessed the performance of the U-Net and LinkNet models on four widely used classification networks. These models include VGG16, Resnet34, Inceptionv3 and Efficientb7, and they have been widely adopted as the backbone feature extractors for various computer vision tasks [17,18]. While using a pre-trained network as an encoder, the normal convolution layers are replaced with the backbone module layers. For example, the convolution layers in a standard U-Net architecture are replaced with the Resnet blocks when one is using Resnet as the network backbone encoder. Similarly, the Resnet modules are adopted in the decoder part to replace the deconvolution layers.

Network Training
For both U-Net and LinkNet, a batch size of 4 images was used with an Adam optimizer with a learning rate of 0.0001. All of the backbone architectures were initialized with the ImageNet weights since it has been repeatedly shown that initializing a network with the trained weights helps speed up the training and network convergence [25]. An early stopping criterion was used to find the best-performing model within the first 100 training epochs by monitoring the validation loss. All of the networks were trained with a combination of focal loss and dice loss, having a class weight of 0.2 for the background class, a class weight of 0.4 for the Mudstone class and a class weight of 0.4 for the Sandstone class. This approach was used to encourage the models to perform better on segmentation of the Mudstone and Sandstone classes.
All of the experiments were conducted using the TensorFlow deep learning framework on a machine that was equipped with an Intel Core i7 and NVIDIA GeForce RTX 3070 with 32 GB RAM.

Ensemble Predictions
We also investigated the use of an ensemble strategy to improve the performance of the predicted output mask. Ensemble learning is commonly used to combine the performance of weak models to make final predictions [26]. When one is performing a model ensemble, the results of each model prediction are aggregated via a weighted average on a test set. We adopted the weighted average ensemble prediction technique from the individual models to make the final predicted mask. All of the models were equally weighted to contribute to the ensemble prediction.

Performance Evaluation
In this study, we evaluated the performance of the models using mean intersection over union (MIoU), which is a widely adopted metric for measuring the performance of the segmentation models [27]. The MIoU calculates an average score of all of the classes (background, mudstone, and sandstone) by measuring the overlap between the target and predicted classes. This is achieved by finding the ratio of the true positive over the sum of true positive, false positive and false negative of the segmented pixels as calculated below: where N xx is the true positive pixels, N cls is the total number of classes, N yx is the false negative pixels and N xy is the false positive pixels. In this study, we used a threshold of 0.5 overlap to determine the IoU score of the model predictions. Further, the precision (2), recall (3) and F1-score (4) were also computed to observe the performances of the different models.

Results and Discussion
A total of 102 images were used, with 70% of the data being used for training, 15% of it being used for validation and 15% of it being used as a test set. Each model was optimized to reflect the best performance by tuning the model hyperparameters to the training set. The MIoU was used to evaluate the performance of each model. Tables 2 and 3 show the  When observing the performance of a U-Net architecture, the use of a pretrained backbone feature extractor has a clear advantage as it significantly improves the model convergence and performance over the vanilla architecture. This is evident when looking at the training and validation loss graphs for both the vanilla U-Net and the one using the backbone extractor in Figure 2. Using vanilla/standard U-Net without any backbone feature extractor, the network struggles to converge until around 90 epochs. On the other hand, using a pretrained backbone enables the network to converge faster even when it is using such a small training dataset, as seen in Figure 4. at the training and validation loss graphs for both the vanilla U-Net and the one using the backbone extractor in Figure 2. Using vanilla/standard U-Net without any backbone feature extractor, the network struggles to converge until around 90 epochs. On the other hand, using a pretrained backbone enables the network to converge faster even when it is using such a small training dataset, as seen in Figure 4.

Individual Models vs. Ensemble
From the results in Tables 2 and 3, we can observe that using the ensemble learning strategy resulted in slightly higher performance gains than the individual models obtained, with the exception of the Efficientb7-based models. For example, with the use of ensemble learning in the U-Net model, an MIoU of 0.8201 was achieved, which is higher than those which were achieved by the other backbone encoders such as Efficientb7, Res-

Individual Models vs. Ensemble
From the results in Tables 2 and 3, we can observe that using the ensemble learning strategy resulted in slightly higher performance gains than the individual models obtained, with the exception of the Efficientb7-based models. For example, with the use of ensemble learning in the U-Net model, an MIoU of 0.8201 was achieved, which is higher than those which were achieved by the other backbone encoders such as Efficientb7, Resnet34, Inceptionv3 and VGG16. This performance is comparable when it we were using either of the two segmentation models.
When looking at the individual class performance, a similar pattern can be observed in both ensemble and individual models, with there being a higher performance in the segmentation of the background class than in the segmentation of mudstone and sandstone. However, an ensemble of U-Net models shows a slightly higher performance for the mudstone and sandstone classes on a test set with MIoUs of 0.7726 and 0.8003, respectively, which are almost 2% better than the best-performing U-Net Efficientb7 model. This suggests that the ensemble learning strategy offers a better performance when one is accessing the performance of the individual classes and the overall model performance.
The average values of precision, recall and the F1-score are plotted in Figures 5 and 6 for the U-Net and LinkNet models with different backbones, respectively. The superior performance of the Efficientb7-based backbone can be observed for both U-Net and LinkNet segmentation models. A higher value of the precision metric was noticed for all of the backbones for both models indicating that more relevant pixels were correctly retrieved compared to number of the irrelevant pixels. Since the goal of the segmentation models was to retrieve the sandstone class more accurately, we further analyzed the performance of the models at the individual levels. For the sandstone class, the overall best performance was observed for the LinkNet model with the Efficientb7 backbone with a precision, recall and F1-score of 0.9001, 0.8685 and 0.8840, respectively. A slightly better value of precision (0.9103) was seen for the U-Net model with the Efficientb7 backbone in the sandstone class with a slightly low value of recall (0.8500).     Figure 6. Average values of precision, recall and F1-score for LinkNet models with different backbones for test data.

U-Net vs. LinkNet
The U-Net and LinkNet models had a matching performance on both the validation and test sets (Tables 2 and 3). Despite the LinkNet architecture having a Resnet-inspired encoder structure, the U-Net model with the Resnet34 encoder provided a better performance than LinkNet did, with it having an MIoU of 0.7854 on a test set over an MIoU of 0.7634 for the LinkNet with the Resnet34 encoder. This may also be attributed to it having a slightly higher number of trainable parameters than the LinkNet models did. However, by looking at the overall performance of both model variants, this suggests that the LinkNet model also performed better on the individual classes. For example, when looking at the performance of LinkNet with the Efficientb7 encoder, this model achieved an MIoU of 0.7922 in the segmentation of the sandstone class, which is slightly higher than using the Efficientb7 encoder achieved in a U-Net architecture. The results also suggest that the performance between the two architectures varied depending on the backbone encoder that was used.

Comparison of Different Backbone Architecture
Each of the backbone architectures that were applied present unique features for the segmentation network. Based on the results, it is clear that using the Efficientb7 as a backbone feature encoder enabled the segmentation network to generate more discriminating features than the other backbone encoders could with a higher number of parameters. Both the U-Net and LinkNet architectures outperformed the other models when they were using Efficientb7 as a backbone encoder on validation and test set. On the other hand, the remaining backbone architectures such as VGG16, Resnet34 and Inceptionv3 had a matching performance. We can also notice that VGG16 as an encoder outperformed both Resnet34 and Inceptionv3, despite it having few model parameters on the test set. It is likely that the simple stacking of the convolution layers in VGG16 which served as the encoder part can extract meaningful features of the images. On the other hand, since both sandstone and mudstone appeared in almost equal proportion relative to image size, a less complex network such as VGG16 can be an effective encoder.
When looking at the individual class performance of the backbone encoders, the performance of the background class outperformed both the sandstone and mudstone segmentation. This suggests that regardless of the encoder architecture that was used, all of the models struggled to segment the classes other than the background, as the background class mainly consisted of a uniform appearance in almost all of the images. On the other hand, all of the models struggled to perform more in segmentation in the mudstone class than they did in the sandstone class. The sandstone class likely occupied a large proportion of the images compared to the mudstone class.

Effect of Image Enhancement and Color Transformations
The effect of histogram equalization and color transformation to RGB images on the performance of segmentation was also analyzed. Figures 7 and 8 represent the MIoUs for the U-Net and LinkNet models, respectively, with various backbone networks and different color representations. The best performance (MIoU = 0.8178) was noticed for the L*a*b* color representation with Efficientb7 using U-Net segmentation, which was slightly higher than the MIoU of the RGB color representation (0.8102) with the same model and backbone. More than 2% improvement was noticed in the performance of the U-Net model with the Inceptionv3 backbone for the L*a*b* color representation as compared to that of the RGB color representation. There was not much significant improvement which was observed for the histogram equalization over the RGB representation on the performance of both the U-Net and LinkNet models. We also found that the HSV color representation resulted in a poor performance for the segmentation of the three classes, especially for LinkNet with the VGG16 backbone. In general, the performance did not vary too much for all of the representations except for the models that were based on the HSV color representation.      Tables 4 and 5, respectively. The IoU for the sandstone class varies between 0.6762 (HSV with Inceptionv3) and 0.8012 (L*a*b* with Efficientb7) for the U-Net segmentation models. For most of the backbones with the U-Net model, the IoU value for the sandstone class is above 0.7. In contrast to the U-Net model, the best value of the IoU for sandstone was 0.7998 for the LinkNet model with the YCrCb color model and the Efficientb7 backbone network. Again, a poor performance was noticed for the HSV color representation, especially with the VGG16 backbone network. Overall, Efficientb7 was observed as the best individual backbone for both the U-Net and LinkNet models.

Qualitative Analysis of Segmentation Models
Figures 9 and 10 present the comparison of the predicted masks for the different backbone encoders on selected images from the test set. The visual inspection of the output suggests that for the wider and more intact sandstone regions, the models performed better, while for a mix of sandstone and mudstone or for broken, small pieces of sandstone, the performance of the models was low. For example, when looking at the image of row three in Figures 9 and 10, most of the mudstone class has been misclassified as a sandstone class. In contrast, rows five to seven show a better segmentation performance for all of the models for the wide and intact sandstones. In general, all of the models tended to perform better on the segmentation of the background class than they did for the other classes. Though the error is observable in Figures 9 and 10, the precision analysis that is presented in Section 3.1 provides more insight into the efficacy of the semantic segmentation models. For the sandstone class, higher values of the precision and recall scores were noticed for both the LinkNet and U-Net models with the Efficientb7 backbone. An important point to note is that the intact and wider sandstones are of more interest to the domain experts for the reservoir/rock characterization. The overall objective of applying semantic segmentation for the given data set is to roughly estimate the amount of sandstone that is present in the rock/reservoir of interest. The percentage of the sandstone for an individual image can be computed using semantic segmentation and for the whole data set (consisting of n images), the total percentage of sandstone that is present in the rock can be computed by combining these individual results.
An important point to note is that the intact and wider sandstones are of more interest to the domain experts for the reservoir/rock characterization. The overall objective of applying semantic segmentation for the given data set is to roughly estimate the amount of sandstone that is present in the rock/reservoir of interest. The percentage of the sandstone for an individual image can be computed using semantic segmentation and for the whole data set (consisting of n images), the total percentage of sandstone that is present in the rock can be computed by combining these individual results.

Conclusions and Future Work
This study assessed the extent of using machine learning to segment the sandstone and mudstone classes using the digital images that were collected from the field. Two existing state-of-the-art models, including U-Net and LinkNet, were compared with different backbone encoders. The results suggest that these models can obtain a reasonable performance when they are segmenting between sandstone and mudstone rocks even with a small training dataset. On the other hand, using pretraining encoders as network backbones can further improve their performance. Specifically, a backbone encoder such as Efficientb7 offers a more performance advantage than the other encoders do such as Resnet34, VGG16 and Inceptionv3. Finally, using an ensemble of models improved the model's performance both when assessing the individual class performance and the overall MIoU. There were not many significant improvements that were noticed in the performance of the segmentation models after performing the image enhancement (histogram equalization) and changing the color representations from RGB to other color types.

Conclusions and Future Work
This study assessed the extent of using machine learning to segment the sandstone and mudstone classes using the digital images that were collected from the field. Two existing state-of-the-art models, including U-Net and LinkNet, were compared with different backbone encoders. The results suggest that these models can obtain a reasonable performance when they are segmenting between sandstone and mudstone rocks even with a small training dataset. On the other hand, using pretraining encoders as network backbones can further improve their performance. Specifically, a backbone encoder such as Efficientb7 offers a more performance advantage than the other encoders do such as Resnet34, VGG16 and Inceptionv3. Finally, using an ensemble of models improved the model's performance both when assessing the individual class performance and the overall MIoU. There were not many significant improvements that were noticed in the performance of the segmentation models after performing the image enhancement (histogram equalization) and changing the color representations from RGB to other color types.
Based on the result of this study, the future work will focus on designing a more efficient encoder backbone to work with a small training sample. Other techniques such as few-shot learning for semantic segmentation will be explored to reduce the reliance on a large training dataset. To increase the confidence in rock identification for sandstone, there is an opportunity to incorporate sedimentary structures as part of the overall machine learning workflow.
To extend the automation capability of multi-rock type identification, we will investigate our existing deep learning framework and extend its capability to quantify the proportion of the different rock types that were found in the digital photographs. The automation of such process will assist geologists in processing, identifying and quantifying the different rock types in large volumes of photographs. This will involve measuring the segmented areas of interest relating to the rock types and evaluating them against the human-measured proportions.  Data Availability Statement: These data were 3rd Party Data. Data were obtained from I.P and they are privately owned.

Conflicts of Interest:
The authors declare no conflict of interest.