Coal is an important fossil fuel in China, where power generation accounts for approximately 45% of coal consumption [
1]. The microcomponent composition of coal determines its physical and chemical properties, technological characteristics, and industrial applications. Petrographic analysis of coal has been widely applied in commercial coal quality testing and industrial fields such as coal coking, gasification, liquefaction, and oil and gas exploration [
2].
Petrographic analysis of coal currently still relies primarily on manual identification and measurement, which involves high labor intensity and time consumption. Additionally, the subjective differences in observation among personnel lead to poor comparability of identification data between laboratories, etc. At home and abroad, automatic vitrinite reflectance measurement of coal has been realized through automatic coal–rock measurement technology, but there are some problems, such as inaccurate vitrinite identification and low vitrinite identification rate, especially the low content of the vitrinite or small particles in mixed coal. The existing identification technologies cannot accurately and efficiently identify vitrinite, and there are problems of data omission and misdetection [
3]. Identifying the vitrinite in mixed coal efficiently, accurately, and completely is the premise and basis of vitrinite reflectance measurement of coal.
Sun Tao et al. [
4] developed a CA_Poly_DeepLab v3+ network tailored for coal–rock image segmentation. This network integrates a channel attention module into the DeepLab v3+ network and employs an adaptive learning strategy to adjust the network’s learning rate. This approach improves the accuracy of target segmentation and edge processing, resulting in contours that are closer to reality. However, this method still has issues with robustness, making its practical application less than ideal. Shi Guangliang et al. [
5] proposed a coal–rock identification method based on the Kalman optimal estimation of load data from the shearer arm pin. They used the Kalman optimal estimation algorithm for noise reduction and then determined the interval of real-time load values to identify the coal–rock interface on the cutting face. Experiments verified that this improvement reduced data fluctuation and effectively increased data discernibility. However, the overall algorithm is complex, and its generalization reliability is unclear. Jiang Song et al. [
6] proposed a refined segmentation method for blast heap blocks based on the DeepLabV3+ network. They first introduced a multi-branch separable attention mechanism into the backbone network to fuse features from different channels. Then, they used a point rendering module to reduce the loss of semantic information and finally employed a dynamic learning rate adjustment strategy to accelerate the model’s convergence speed. This method has better overall performance, particularly in improving edge and small object segmentation. However, due to the difficulty and high risk of collecting images in mining areas, as well as the challenges in semantic segmentation annotation, the dataset remains limited. Qinpeng Guo et al. [
7] proposed a segmentation method for blast rock images based on an improved watershed algorithm. They first obtained a preprocessed binary image then performed distance transformation and selected an appropriate grayscale threshold to obtain contours. Finally, they applied the watershed algorithm for segmentation. This algorithm can accurately mark seed points and perform watershed segmentation on blast rock images, effectively reducing the likelihood of incorrect segmentation. However, this method still cannot meet real-time requirements. Zhiwei Li et al. [
8] proposed a coal–rock fracture segmentation method based on contour evolution and gradient direction consistency. They first established a fracture contour evolution model to obtain preliminary segmentation results. Then, they used adaptive median filtering to remove high-density noise from the image and employed 3D bilateral filtering to enhance fracture boundaries. Finally, they optimized the preliminary segmentation results with a gradient direction consistency model. This method can accurately capture the boundaries of fractures with weak edges, offering high segmentation efficiency and strong adaptability. However, it does not fully utilize the scene, affecting its detection accuracy. Xiao Dong et al. [
9] combined spectroscopy with deep learning algorithms to propose a rapid field identification method for coal types. They first preprocessed the spectral data of various coal and rock types then used a convolutional neural network to extract 2D spectral features and an extreme learning machine to classify these features. Finally, they optimized the model parameters using the whale optimization algorithm. This method can quickly and accurately identify coal types, but the complexity of the algorithm design leads to relatively low efficiency.
In order to accurately identify each type of vitrinite and reduce the complexity of identification, this paper analyzes and researches the improved methods mentioned above, proposing a new improvement strategy based on the DeepLabv3+ network [
10]. First, addressing the issues of slow prediction speed and low prediction accuracy, this paper employs the lightweight MobileNetV2 network as the backbone network. Additionally, to obtain multi-scale features with a larger receptive field, the ASPP module is improved by adding an atrous convolution layer with a dilation rate of 24, thereby reducing feature loss and increasing identification accuracy. Then, to avoid the amplification of erroneous or unimportant features, different mainstream attention mechanisms are compared when added to various parts of the DeepLabv3+ network to determine the impact on identification results, thus selecting an appropriate attention mechanism module. Meanwhile, experiments are conducted with channel multipliers of 5, 8, 12, and 16 in the channel module of the CBAM attention mechanism to choose the suitable channel multiplier. Finally, to ensure the consistency of each channel’s output feature map for each type of coal vitrinite, a corrective convolution module is added to the network’s output, further enhancing the model’s identification accuracy.