Application of Deep Learning in Petrographic Coal Images Segmentation

: The study of the petrographic structure of medium- and high-rank coals is important from both a cognitive and a utilitarian point of view. The petrographic constituents and their individual characteristics and features are responsible for the properties of coal and the way it behaves in various technological processes. This paper considers the application of convolutional neural networks for coal petrographic images segmentation. The U-Net-based model for segmentation was proposed. The network was trained to segment inertinite, liptinite, and vitrinite. The segmentations prepared manually by a domain expert were used as the ground truth. The results show that inertinite and vitrinite can be successfully segmented with minimal difference from the ground truth. The liptinite turned out to be much more difﬁcult to segment. After usage of transfer learning, moderate results were obtained. Nevertheless, the application of the U-Net-based network for petrographic image segmentation was successful. The results are good enough to consider the method as a supporting tool for domain experts in everyday work.


Introduction
Coal petrography is a science that, despite the passage of many years, is developing and updating its knowledge with a view to new directions for use in the energy industry. Particular emphasis is placed on clean coal technologies as well as the recovery of critical elements from coal [1][2][3][4][5][6].
Coal is a heterogeneous substance in terms of its chemical composition. Its heterogeneity is due to the variation in the peat-forming plant material from which it was formed and the variation in the conditions, time, pressure, and temperature to which the organic material was subjected during both its biochemical and geochemical phases [7]. The basic units of the structure of coal, homogeneous in physical and chemical terms, are macerals. The study of the petrographic structure of coal is important from both a cognitive and a utilitarian point of view [7][8][9][10][11]. It is the petrographic constituents and their individual characteristics and features that are responsible for the property of coal and the way it behaves in various technological processes [7,8,[12][13][14][15].
Knowledge of the percentage of individual petrographic constituents in coal is very important, as the petrographic constituents differ in terms of their physical and chemical properties, such as volatile matter content, elemental composition, vitrinite reflectance, and specific density, all of which affect the chemical, physical and technological properties of coal [7,13,[16][17][18].
The knowledge of the petrographic and mineral composition of the coal deposit, and the properties resulting from this composition, should be the basis for optimizing the conditions in coal preparation plants [6,19,20]. Such an approach makes it possible to control the properties of the final product in order to obtain concentrates with precisely reflectograms for maceral identification [51]. The coal samples were submerged in resin and polished. After the resin regions were masked out in the image, the cumulative curve of reflectance was computed using the gray values of the remaining pixels. The shape of the curve reflects the composition of the coal. The method turned out to be successful, but it required appropriate preparation of sample and imaging protocol (e.g., usage of red dye for resin and green light for images acquisition). The gray level values had to be calibrated, so that the resulting curve could be interpreted in terms of maceral's containment. The idea was further enhanced by using simultaneous analysis of optic and SEM gathered images [52]. The idea of the usage of optical microscopy obtained images for automated macerals identification was also considered by other researchers [53][54][55][56][57][58]. Młynarczuk and Skiba proposed the usage of machine learning (ML) and artificial intelligence methods in maceral identification [59]. The maceral group identification is based on the color features vector computed for the square neighborhood of the selected pixel. The k nearest neighbors (kNN) and multilevel perceptron (MLP) were used as the classifier. The results were very promising, and the method was developed to identify the macerals within the inertinite group [60]. The features vector was extended to include both color and texture properties of the pixels. The results were satisfactory, but the effectiveness depended on the maceral. However, none of the methods tried to semantically segment the image. One of the attempts in this direction was made by Wang et al. [56]. In this attempt, the shapes of macerals groups were identified using a clustering procedure, namely a modified k-means algorithm. Then, the discovered objects were classified using morphological, color and texture features. It should be emphasized that in addition to the analysis of images from various types of microscopes, other methods were also used for carbon analysis, examples of which can be found in [61,62].
Semantic segmentation is a topic of much research interest nowadays [63,64]. The application of deep learning (DL) and convolutional neural networks (CNN) allowed for achieving stunning results. The DL was used as a tool for microfossils, core images, petrographic and rock images classification [65]. Attempts are being made to apply these approaches to the analysis of coal characteristics and particularly its petrography using visual information. The most fundamental characteristic of coal's run-of-mine (ROM) distinguishes between coal rocks and the accompanying gangue. Pu et al. used the VGG16 CNN for the classification of images presenting coal or gangue in different configurations (as stockpiles, during transportation, photographed in laboratory conditions, etc.) with satisfactory results [66]. Li et al. developed a solution for the identification of coal and gangue rocks on images [67]. The proposed framework processed the Gaussian pyramid of the input image. The rock grains were detected and classified as coal or gangue. The authors reported impressive accuracy, exceeding 98%, in rock type recognition. The application of semantic segmentation to maceral group identification was developed by Lei et al. [68]. The proposed network utilizes the U-Net [69] network enhanced with the attention gates. The authors used the multi-class form of the output layer of the network. The segmentation results were very good and proved the robustness of DL methods.
The identification of the macerals directly on the image by means of the direct assignment of the maceral label to every pixel would be beneficial in many ways. First of all, it will allow the determination of maceral composition. Secondly, it would provide the scientist with information allowing them to judge whether the individual parts of the image have been correctly identified. Third, the calibration should not be critical in maceral identification, because not only are the color statistics considered, but also the spatial arrangement of pixels constituting the maceral groups (for medium-and high-rank coals). Therefore, attempts were made to develop a method suitable for such coal petrographic images analysis. The presented paper provides a proposition of such a method using the deep learning approach. The highlights of the presented results include the development of the coal petrographic images database, the method of image preparation and augmentation, and the development of a U-Net [69]-based convolutional neural network for the semantic segmentation of coal petrographic images. The proposed approach is based on using single-class classification-a separate model of the same architecture was trained for each of the macerals.

Materials and Methods
The identification of macerals is based on the microscopic evaluation of grain morphology and color. On this basis, three groups of macerals were distinguished: liptinite, vitrinite, and inertinite ( Figure 1) [9][10][11]. The color of liptinite changes from brown through dark grey to light grey in the microscopic image. Under incident light, depending on coal rank, the color of vitrinite changes from dark grey through light grey to almost white. On the other hand, in the same light conditions, the color of inertinite in coal is always the brightest and changes from light grey to white and bright white. The reflectance of all macerals increases with the increasing carbonization of the organic matter of the coal (Figure 2). At the vitrinite reflectance (%Rr) level of about 1.5%, the simultaneous differences in reflectance and in color between liptinite and vitrinite disappear, and with a %Rr about 2.4%, the differences between vitrinite and inertinite also disappear. the development of the coal petrographic images database, the method of image preparation and augmentation, and the development of a U-Net [69]-based convolutional neural network for the semantic segmentation of coal petrographic images. The proposed approach is based on using single-class classification-a separate model of the same architecture was trained for each of the macerals.

Materials and Methods
The identification of macerals is based on the microscopic evaluation of grain morphology and color. On this basis, three groups of macerals were distinguished: liptinite, vitrinite, and inertinite ( Figure 1) [9][10][11]. The color of liptinite changes from brown through dark grey to light grey in the microscopic image. Under incident light, depending on coal rank, the color of vitrinite changes from dark grey through light grey to almost white. On the other hand, in the same light conditions, the color of inertinite in coal is always the brightest and changes from light grey to white and bright white. The reflectance of all macerals increases with the increasing carbonization of the organic matter of the coal (Figure 2). At the vitrinite reflectance (%Rr) level of about 1.5%, the simultaneous differences in reflectance and in color between liptinite and vitrinite disappear, and with a %Rr about 2.4%, the differences between vitrinite and inertinite also disappear.  Table 1. The microscopic specimens were prepared by the immersion of coal dust in a mixture of epoxy resin and hardener, obtained by mixing the components at a ratio of 8:1. The immersed microscopic specimens were left for at least 24 h until solidification. The solidified specimens were ground and polished using a Struers LaboForce-3 grinding/polishing machine (Struers Inc., Cleveland, OH, USA). A Zeiss Axio Imager Z 2m microscope (Carl Zeiss AG, Oberkochen, Germany) ( Figure 3) was used for the study. A magnification of 500 times and white light reflected in oil immersion were used. Surfaces were selected for which photographs were taken using an Axiocam 506 color camera. The set of microscope photographs obtained showed different macerals for which a mask set was developed. In the petrographic analysis, the participation of maceral groups was most important. The results of the determination of the mineral substance are rarely used. Therefore, they were omitted in the first stage of the research. We plan to take care of this problem in the future.  For the purposes of this study, medium-sized samples were prepared for petrographic analyses, according to the PN-ISO 7404-2:2005 Methods for the petrographic analysis of bituminous coal and anthracite-Part 2: Method of preparing coal samples, from selected coal samples in which vitrinite reflectance did not exceed 0.8%. Coal samples were taken from coals originating from Polish coal basins: the Upper Silesian Coal Basin and the Lublin Coal Basin. Data on the tested coal samples (rank, the origin of the samples, and maceral compositions are presented in Table 1. The microscopic specimens were prepared by the immersion of coal dust in a mixture of epoxy resin and hardener, obtained by mixing the components at a ratio of 8:1. The immersed microscopic specimens were left for at least 24 h until solidification. The solidified specimens were ground and polished using a Struers LaboForce-3 grinding/polishing machine (Struers Inc., Cleveland, OH, USA). A Zeiss Axio Imager Z 2m microscope (Carl Zeiss AG, Oberkochen, Germany) ( Figure 3) was used for the study. A magnification of 500 times and white light reflected in oil immersion were used. Surfaces were selected for which photographs were taken using an Axiocam 506 color camera. The set of microscope photographs obtained showed different macerals for which a mask set was developed. In the petrographic analysis, the participation of maceral groups was most important. The results of the determination of the mineral substance are rarely used. Therefore, they were omitted in the first stage of the research. We plan to take care of this problem in the future.
The images were captured with the resolution 3072 × 2304 pixels with 8-bit RGB color space. For further processing, the images were cut into 512 × 512 parts. Then, the manual segmentation of the vitrinite, inertinite, and liptinite was performed by a domain expert. The segmentation was used as the ground truth for further processing. There were separate masks created for each of the macerals. The completed database consisted of 162 images for which the masks were created (three masks were created for each input image). The example image and masks are presented in Figure 4.  The images were captured with the resolution 3072 × 2304 pixels with 8-bit RGB color space. For further processing, the images were cut into 512 × 512 parts. Then, the manual segmentation of the vitrinite, inertinite, and liptinite was performed by a domain expert. The segmentation was used as the ground truth for further processing. There were separate masks created for each of the macerals. The completed database consisted of 162 images for which the masks were created (three masks were created for each input image). The example image and masks are presented in Figure 4.   The images were captured with the resolution 3072 × 2304 pixels with 8-bit RGB color space. For further processing, the images were cut into 512 × 512 parts. Then, the manual segmentation of the vitrinite, inertinite, and liptinite was performed by a domain expert. The segmentation was used as the ground truth for further processing. There were separate masks created for each of the macerals. The completed database consisted of 162 images for which the masks were created (three masks were created for each input image). The example image and masks are presented in Figure 4.  It was decided to use a separately trained single-class model for the maceral identification. This approach has its advantages, dictated by both practical and computational considerations. In petrographic practice, the identification of macerals is used in various variants. For example, as far as the analysis of the vitrinite reflectance index is concerned, the recognition of only one maceral-collotelinite-is required. Similarly, the research carried out in order to determine the coke-forming properties requires the recognition of two types of petrographic components, namely reactive ones, which include macerals from the vitrinite and liptinite groups, and inert ones, which include macerals from the inertinite group. Using a single-class approach gives the opportunity to make it possible to obtain a network that is particularly sensitive to a specific group of macerals. Such a network is expected to be easier to train than a multiclass network and will allow for the usage of a simpler architecture without sacrificing the performance. The above is also true with respect to the preparation of a set of training images. It also provides the possibility to optimize the network architecture for each group of macerals. The usage of single-class models does not limit the common analysis of all of the maceral groups simultaneously. The outputs of the models can be combined into the result, showing all classes on a single image using for example argmax function (argmax function returns the argument for which the maximum value of output was achieved).
For the segmentation experiments, the U-Net convolutional semantic segmentation network was used [71]. The architecture of the network is presented in Figure 5.
It was decided to use a separately trained single-class model for the maceral identification. This approach has its advantages, dictated by both practical and computational considerations. In petrographic practice, the identification of macerals is used in various variants. For example, as far as the analysis of the vitrinite reflectance index is concerned, the recognition of only one maceral-collotelinite-is required. Similarly, the research carried out in order to determine the coke-forming properties requires the recognition of two types of petrographic components, namely reactive ones, which include macerals from the vitrinite and liptinite groups, and inert ones, which include macerals from the inertinite group. Using a single-class approach gives the opportunity to make it possible to obtain a network that is particularly sensitive to a specific group of macerals. Such a network is expected to be easier to train than a multiclass network and will allow for the usage of a simpler architecture without sacrificing the performance. The above is also true with respect to the preparation of a set of training images. It also provides the possibility to optimize the network architecture for each group of macerals. The usage of single-class models does not limit the common analysis of all of the maceral groups simultaneously. The outputs of the models can be combined into the result, showing all classes on a single image using for example argmax function (argmax function returns the argument for which the maximum value of output was achieved).
For the segmentation experiments, the U-Net convolutional semantic segmentation network was used [71]. The architecture of the network is presented in Figure 5. The U-Net is an example of an autoencoder network. It can be divided into three parts: the contraction part (4 blocks composed of two convolutional steps and pulling  Figure 5. The architecture of U-Net network [71]. The U-Net is an example of an autoencoder network. It can be divided into three parts: the contraction part (4 blocks composed of two convolutional steps and pulling step), the bottleneck (the convolutional layer with 1024 channels), and the expansion part (4 blocks composed of two convolutional layers followed by upscaling layer). All convolutional layers use the rectified linear unit (ReLU) as the activation function. The input layer in the constructed U-Net-based network has a 512 × 512 resolution, which is in accordance with the input image size. The output layer was constructed with a 1 × 1 convolutional layer with a sigmoid activation function. The list of layers along with their shapes is presented  Table 2. The network architecture was implemented using the Tensorflow deep learning framework library. The input images were split randomly into training and validation sets. The validation set was formed with 10% of all images. The binary cross-entropy function was used as a loss function during the network training process. During the training, the pixel-wise accuracy (PA), intersection-over-union (IoU), and mean intersection-over-union (MIoU) were also monitored as effectiveness measures. The ADAM optimizer was chosen for model learning [72]. The training of the network was performed in two stages. In the first stage, the batch size and the learning rate range were estimated. The model training was stopped after just a few epochs and the training results were analyzed. The upper limit for learning rate was established by choosing the value at which the model improved the performance in at least 4 consecutive epochs. The batch size was limited by the size of the input dataset. The bigger the batch size, the fewer steps per epoch the training procedure can make. It was assumed that the biggest batch size allowed for at least several dozen steps for the epoch. The second stage was devoted to model training. During the training, the decreasing learning rate was used. The model was trained for 50 epochs with a constant rate. If no improvement to the loss function was observed, the learning rate was decreased by 10 and the training process was repeated. The accuracy for the validation set was observed as an indicator for possible overtraining. The training was stopped once the validation set accuracy start to decrease. The presented learning procedure was used for each of the macerals. During the training process, two kinds of data modifications were performed:

1.
The images which do not show the given maceral were excluded from the training set. For example, if the model was trained for vitrinite segmentation, all images where vitrinite was not present were excluded from the training set; 2.
Basic images augmentation was performed. The augmentation was limited to rotation by π/2, π, 3π/2 and mirroring horizontally and vertically.
The order of the images during each epoch was randomized. All input images were in RGB color space. All masks were binary. No image preprocessing except for the described augmentation was performed.
All the calculations were performed on an MS Windows workstation equipped with an Intel i7 processor running at 3.6 GHz (maximum), 32 GB RAM, and an NVIDIA GeForce GTX 1080 graphic card. All software necessary for computation was prepared with the Python programming language using the TensorFlow framework [73].

Results and Discussion
The learning rate during the first stage of experiments was changed from 10 −1 to 10 −6 . It was observed that learning rates greater than 10 −4 caused huge changes in the loss function for consecutive epochs. The loss hardly shows any improvement. Therefore, the rate of 10 −4 was chosen as the largest learning rate used in the calculation. After every 50 epochs during the training, the results were examined and the learning rate decreased once the loss function values started to oscillate from one epoch to another. The calculations were stopped once the validation test accuracy started to increase. At this moment, the learning rate was as low as 10 −7 .
The training for the inertinite started from the randomly initialized network, using Xavier initializer [74]. During the 250 epochs of the training process, the learning rate was changed from 10 −4 to 10 −6 . The final accuracy computed for the validation set was equal to 0.9385. The values of IoU and MIoU were equal 0.79 and 0.85, respectively. The segmentation results compared with the input image and the ground truth for selected images are presented in Figure 6.
The presented results of segmentation are indeed the output values from the last layer in the used U-Net based network. The values, being the values of the sigmoid function, vary from 0 to one, which is reflected by the grayscale level in the picture. The segmentation quality can be assessed as very good, though not perfect. It may be noticed that some minor artifacts are visible on each of the presented images. The network has difficulties in recognizing the tiny structures of inertinite visible among other macerals of similar greyscale and textures. In addition to that, vast structures were correctly noticed by U-Net and marked.
The training procedure and the results obtained for vitrinite were similar to those for inertinite. The learning process also lasted 250 epochs, though the learning rate was changed in a wider range. It started at 10 −4 , but ended with 10 −7 . The selected results of the segmentation are presented in Figure 8. The obtained accuracy computed for the validation set was equal to 0.9176. The IoU was equal to 0.78 and the MIoU was equal to 0.75.
The analysis of the results shows that the quality of segmentation is similar to that for inertinite; however, a slightly higher level of artifacts was observed, which is also visible in the presented images (see Figure 7). The results can be considered very good and suitable for practical applications. The presented results of segmentation are indeed the output values from the last layer in the used U-Net based network. The values, being the values of the sigmoid function, vary from 0 to one, which is reflected by the grayscale level in the picture. The segmentation quality can be assessed as very good, though not perfect. It may be noticed that some minor artifacts are visible on each of the presented images. The network has difficulties in recognizing the tiny structures of inertinite visible among other macerals of similar greyscale and textures. In addition to that, vast structures were correctly noticed by U-Net and marked.
The training procedure and the results obtained for vitrinite were similar to those for inertinite. The learning process also lasted 250 epochs, though the learning rate was changed in a wider range. It started at 10 −4 , but ended with 10 −7 . The selected results of the segmentation are presented in Fxxigure 8. The obtained accuracy computed for the validation set was equal to 0.9176. The IoU was equal to 0.78 and the MIoU was equal to 0.75.
The analysis of the results shows that the quality of segmentation is similar to that for inertinite; however, a slightly higher level of artifacts was observed, which is also visible in the presented images (see Figure 7). The results can be considered very good and suitable for practical applications. As expected for medium-and high-rank coals, the segmentation of liptinite turned out to be the most difficult. There was no success with the training network beginning with the As expected for medium-and high-rank coals, the segmentation of liptinite turned out to be the most difficult. There was no success with the training network beginning with the randomly initialized weights. Moderately satisfactory results were obtained when the training process for liptinite used the weights from the trained model for inertinite segmentation. The application of such performed transfer learning made it possible to obtain acceptable liptinite segmentation, but the errors and artifacts are clearly visible in the resulting images. The accuracy value for the validation set was 0.9791. Such a large value, with a relatively low quality of segmentation, results from the small area covered by the liptinite on the analyzed images. The calculated values for the IoU (0.18) and the MIoU (0.58) show that the segmentation is indeed poor, and can be treated as a rough identification of liptinite's presence. The results of segmentation are presented in Figure 8. randomly initialized weights. Moderately satisfactory results were obtained when the training process for liptinite used the weights from the trained model for inertinite segmentation. The application of such performed transfer learning made it possible to obtain acceptable liptinite segmentation, but the errors and artifacts are clearly visible in the resulting images. The accuracy value for the validation set was 0.9791. Such a large value, with a relatively low quality of segmentation, results from the small area covered by the liptinite on the analyzed images. The calculated values for the IoU (0.18) and the MIoU (0.58) show that the segmentation is indeed poor, and can be treated as a rough identification of liptinite's presence. The results of segmentation are presented in Figure 8.  A summary of the obtained values of accuracy, IoU, and MIoU is presented in Table 3. The quality of segmentation obtained for inertinite and vitrinite was good enough to be used as the basis for the development of an autonomous maceral identification method. The imperfections were small, not differing much from the ground truth. Moreover, during the analysis of the results, it turned out that the network was able to identify the small inertinite structures overlooked during manual segmentation. It seems reasonable to use the U-Net-based convolutional network for the segmentation of the mentioned macerals with only a little attention from a domain expert. Unfortunately, this is definitely not true for liptinite. The network was able to identify the liptinite only roughly. The result should instead be treated as approximate, possible locations of liptinite structures which have to be verified and corrected by a domain expert. The training for liptinite was also more difficult than for other macerals. It is probably caused by its more varied appearance. In addition, liptinite covered small areas in the images and was present only on relatively small numbers of them. Nevertheless, such support in assessing the maceral can be useful in practice. The obtained results can be related to others reported in the literature [56,68]; however, the comparison is not obvious as the mentioned papers do not provide the measures for the macerals' groups separately. Therefore, it is reasonable to use the mean values for the IoU and MIoU presented in Table 3 and the values of the same measures presented in [68]. The presented U-Net-based network gives better results than the non-DL methods. The results obtained by the improved U-Net (enhanced with the use of attention gates) are better than presented in the paper, though the difference is small (IoU~0.8554 and MIoU~0.631 for best enhanced network presented in [68]). However, it is impossible to assess how it is divided into individual macerals groups. When the liptinite, with the worst results, is omitted, the mean IoU and MIoU for inertinite and vitrinite are much greater. As the proposed models address the segmentation of each of the macerals individually, they should be treated as complementary to the model presented in [68]. Wang et al. presents the results obtained using different deep learning networks architectures, such as U-Net (standard multi-class architecture), SegNet, and DeepLab V3+ [57]. The results are provided for each of the macerals separately. The comparison is presented in Table 4. The proposed simplified U-Net-based network did very well in segmenting the inertinite, achieving a better result than much more sophisticated DeepLab V3+ network (the best from architectures compared in [57]). The results obtained for vitrinite are slightly worse than for the other two networks. There are very large differences in the case of liptinite. The network architecture was probably too simple to successfully cope with the most difficult to recognize maceral groups. The results obtained for two other maceral groups are optimistic. In particular, the IoU measure for the inertinite is good enough to contribute to the assumptions made and present the network's robustness. The proposed network can be efficiently used for inertinite and vitrinite identification in the petrographic images.
The discussed approaches present different means to provide the solution for maceral groups identification. The usage of different models trained for each maceral group separately gives the opportunity for finetuning. This also allows the architecture of the net to be kept relatively simple (e.g., simpler than in the original U-Net) while still providing good performance, at least for inertinite and vitrinite. The results are also encouraging in research targeted at discovering the simplest and most robust neural network structure for the efficient analysis of petrographic images.

Conclusions
The application of a U-Net-based CNN network for macerals segmentation on the coal petrographic optical microscope images has been presented. The set of images was manually segmented by experts and used further as the ground truth. The network was trained to segment inertinite, liptinite, and vitrinite. During the training, basic image augmentation was used (horizontal and vertical flipping, rotation by multiplicity of π/2 angle). The result show that very good results can be achieved for inertinite segmentation. The vitrinite was segmented slightly worse, but also at a very good level. The liptinite was most difficult to process. Moderately good results were obtained after the transfer learning usage. Even so, the segmentation was noticeably worse than for inertinite and vitrinite.
The obtained results show that the proposed convolutional autoencoder could effectively be used for maceral segmentation. Although the results for the liptinite were worse than those for other macerals, due to the advanced rank of analyzed coal samples, the network in most cases was able to locate the estimated maceral location. The inertinite and vitrinite segmentation are good enough to be considered as a base for autonomous petrographic processing. Although the results do not justify such a sentence in the case of liptinite, it still can be a valuable tool supporting the expert during petrographic image analysis. Data augmentation and transfer learning in particular proved their effectiveness in at least partially solving the problems in difficult cases. The comparison of the results with similar research showed that the obtained values of IoU and MIoU are better than those reported in the literature for the ML models, and are similar to those achieved by using the DL models (for inertinite and vitrinite). The segmentation of liptinite with the simplified, U-Net-based network is still a challenge and requires further research. The ML methods as well as image analysis methods are very promising, and have been utilized for coal analysis by many scientists with satisfactory results. The approach proposed here, though encouraging, fulfills only a tiny portion of the scientific challenges related to coal petrography. Further research work in this field is required.