Abstract
Coal block image segmentation is of great significance for obtaining the particle size distribution and specific gravity information of ores. However, the existing methods are limited by harsh environments, such as dust, complex shapes, and the uneven distribution of light, color and texture. To address these challenges, based on the backbone of the U-Net encoder and decoder, and combining the characteristics of dilated convolution and inverted residual structures, we propose a lightweight deep convolutional network (DIRU-Net) for coal block image segmentation. We have also constructed a high-quality dataset of conveyor belt coal block images, solving the problem that there are currently no publicly available datasets. We comprehensively evaluated DIRU-Net in the coal block dataset and compared it with other state-of-the-art coal block segmentation methods. DIRU-Net outperforms all methods in terms of segmentation performance and lightweight. Among them, the segmentation accuracy rate reaches 94.8%, and the parameter size is only 0.77 MB.
MSC:
68T07
1. Introduction
The mining process of coal mainly includes wheel excavator excavating, belt transportation and sorting in coal storage bins, etc. Figure 1 shows the site environment of coal mining, where the mined coal is transported to the coal storehouse through a conveyor belt. The uses and prices of coal vary depending on its particle size. The particle size of coal blocks mined by wheeled excavators generally does not exceed 200 mm, while coal blocks with a particle size of 10 to 200 mm are used in chemical plants. When using a circulating fluidized bed boiler for coal particle size combustion, the size of the coal particles needs to be controlled between 8 and 13 mm. Excessive crushing or overly large coal particles will have adverse effects on the normal operation of the system, as this may lead to the occurrence of a lack of fluidized layers. Therefore, when choosing the particle size of coal, excessive crushing or oversized crushing should be avoided to ensure the normal operation of the circulating fluidized bed boiler. Coal blocks larger than 13 mm need to be crushed a second time [1]. Thus, the size distribution information of coal blocks is also very important for the coal user. To calculate the distribution range and proportion of the coal particle size, a fast and accurate system is needed to detect and count the block size [2]. To solve this problem, we studied the application of semantic image segmentation in ore images and proposed a fast and accurate coal block segmentation model. Our research object is the coal block images on the conveyor belt. Accurately extracting the coal block contour from its images is very challenging for two main reasons. First, a poor working environment and dust interference will reduce input image quality, and the color difference between coal blocks and their shadow part is not obvious, which produces similar color feature information and leads to over-segmentation problems [3]. Second, coal fragments have complex and diverse shapes, surface color and texture are uneven, and there are adhesion and occlusion between fragments [4]. These factors will result in fuzziness or even the disappearance of fragments’ boundary lines and cause under-segmentation problems. With the wide application of artificial intelligence and pattern recognition in various fields, in the past, the oil sand ore size estimation method [5], OSTU and its improvement methods [6,7], and watershed and its improvement methods [8,9] have been proposed for ore image processing. These methods were proposed earlier with some defects and improvement space in both segmentation accuracy and speed.
Figure 1.
Belt transportation process in the coal mining site.
With the development of new convolutional neural network (CNN) models such as U-Net [10], ResNet [11,12], and MobileNet [13,14,15] and their outstanding research results, CNN has gradually been widely used in many fields [16,17]. Haq et al. [18] proposed a DCNNBT model to achieve the high-precision classification and detection of four types of brain tumors by optimizing several key hyperparameters. Aguilera et al. [19] used Blockchain Convolutional Neural Networks (BCNN) combined with audiovisual emotional patterns to recognize and handle medical emergencies with significant implications. Wang et al. [20] proposed a new 4D-CNN-based method for cyanobacterial bloom prediction, which combined with remote sensing images to realize the accurate prediction of the outbreak process.
Based on U-Net, Ma et al. [21] proposed a new belt ore image segmentation method composed of a classification model, coarse image segmentation (CIS) algorithm, and fine image segmentation (FIS) algorithm. Li et al. [22] deleted the crop operation in the traditional U-Net, halved each layer’s convolution kernel number, and proposed an improved ore image segmentation method, which has faster speed, stronger robustness, and higher precision than U-Net. Xiao et al. [23] combined the DU-Net [24] model with the residual structure ResNet and proposed a new RDU-Net model. Wang et al. [25] proposed the Boundary-Aware U-Net model, a multi-task learning framework based on U-Net, which improves the segmentation accuracy compared to the benchmark U-Net. Huang et al. [26] designed a group cross-channel attention residue U-Net model (GCAU-Net), that can take full advantage of a low-level feature of the tumor area. In his research for recovering details of brain tumors, a parallel detail recovery (DR) algorithm is introduced and for emphasizing important feature groups and channels, a cross-channel attention (GCA) module is proposed. Gu et al. [27] introduced a novel dense atrous convolution (DAC) block and a residual multi-kernel pooling (RMP) module and proposed a context encoding network (CE-Net) to obtain more high-level and spatial information for two-dimensional medical image segmentation, aiming at the problem of how to capture the global features of each dimension. Li et al. [28] designed a triple attention network (TA-Net) that can simultaneously identify global context information in the channel domain, spatial domain, and feature internal domain by exploring the attention mechanism. Shi et al. [29] proposed a multi-scale dense network (MD-Net) that can make the most of multi-scale information and encoder characteristics. In their work, the residual atrophy spatial pyramid pool (Res-ASPP) module is added to extract multi-scale information and improve the information flow.
In our study, a new network model (DIRU-Net) is designed to speed up image processing and lighten the network model size. We perform comparative statistical experiments to prove the feasibility and superiority of our model. Our model combines the inverse residual structure with dilated convolution [30] to shorten the segmentation time of each image while maintaining high accuracy, which is more in line with the requirements of speed, accuracy, and real-time in practical application.
Our significant contributions and originality in this work are summarized as follows:
- (1)
- Due to the current lack of a public dataset of coal block images, we took videos of coal blocks on the conveyor belt in the coal mine and manually annotated them with the help of LabelMe software (version 5.2) to obtain a high-quality dataset of coal blocks. Aiming at the problems of complex coal block contours and long manual annotation time, a data augmentation method was used to expand the limited dataset, saving a significant amount of annotation time. Considering that the RGB color information of the original image in the coal block and background segmentation task does not affect the segmentation result, the image was grayscale processed first. This can reduce the amount of image data by two-thirds and help accelerate the training speed of the model. Then, Contrast Limiting Histogram Equalization (CLAHE), bilateral filtering method and Gamma processing are used to enhance the contrast between coal blocks and the brightness of the image.
- (2)
- Based on the U-shaped deep learning segmentation model, this paper introduces dilated convolution and inverted residual structures into the U-Net network structure and proposes a new coal block image segmentation model: DIRU-Net. The dilated convolutional layer of this model can capture a wider field of view. The depth-separable convolutional layer reduces the number of parameters of the model through the working mode of one convolution kernel responsible for one channel convolution, achieving the rapid and accurate segmentation of coal block images.
- (3)
- To prove the effectiveness of the proposed model, a comparative experiment was conducted with the classic image segmentation model. The analysis results of the comparative experiment can demonstrate that the algorithm proposed in this paper can achieve more accurate segmentation than other segmentation models, while the number of parameters of the model is the least.
2. Materials and Methods
2.1. Dataset Production
Our experimental object is the coal block images of the Yimin open pit coal mine in Inner Mongolia. The coal seam in this coalfield is thick and flat, most of which is lignite, as shown in Figure 1. Lignite is easily weathered and broken in the air, with a low ignition point. The stacking height should not exceed two meters, which limits its transportation and market. Therefore, it is mainly used as fuel for local power plants.
We took experimental data in this coalfield. Figure 2 illustrates the schematic diagram of the data-capturing system. This system consists of a vibrator, conveyor belt, industrial camera, computer, and ore sample outlet. The camera is installed vertically above the conveyor belt to take videos. The moving speed of the conveyor belt is subject to the actual working state, which is 4 m/s. Images are extracted by frame from the video as the original images of the dataset (Figure 3a).
Figure 2.
The image acquisition system.
Figure 3.
Manually produced dataset image; (a) original image; (b) corresponding ground truth image; (c) mask image.
We made the corresponding ground truth (Figure 3b) and mask image (Figure 3c) of the original images using LabelMe software. The mask images are used to submerge the area outside of the conveyor belt.
The original image size is 1080 × 720. To reduce the computational difficulty of the model and increase its running speed, a limited dataset was expanded to avoid overfitting problems. During the training and testing process, we randomly cropped 128 × 128-pixel image blocks from the original image first, as shown in Figure 4. Our dataset consists of 4000 sample images, which are divided into a training set, a validation set and a test set in a ratio of 6:2:2.
Figure 4.
Typical 128 × 128 training patches. (a) The original image, (b) the preprocessed image, (c) the ground truth image.
2.2. Image Preprocessing
Due to the poor environment, the color of coal fragments was similar to that of the shadow part, and the surface color and texture were uneven. There was a lot of noise on the ore image. Therefore, it was necessary to add the preprocessing process to reduce noise and improve the clarity between fragments. Using color images cannot improve the segmentation accuracy, but slows down the segmentation time. Therefore, we used gray-scale images as the input data of the network. In the gray-scale image, a byte represents a pixel, which reduces the amount of data by 2/3 compared with the colored image, which is conducive to accelerating the calculation speed of the model.
The three methods used in image preprocessing are graying, the contrast limited adaptive histogram equalization (CLAHE) processing and bilateral filtering. Graying is to reduce the amount of image data and speed up the training and prediction of the model. The CLAHE method is implemented through the CLAHE algorithm integrated by OpenCV. It can effectively limit the noise amplification, enlarge the contrast, and make the gap between coal blocks more obvious. The process of preprocessing is shown in Figure 5. In addition, we would like to illustrate that the use of preprocessing techniques is for the convenience of model training and does not affect the comparative experimental results of this paper—because all the models are tested in the environment of the same dataset.
Figure 5.
Preprocessing steps. (a) Original image, (b) grayed image, (c) CLAHE processed image, (d) bilateral filtered image.
3. DIRU-Net Framework
3.1. DIRU-Net Structure
When the layers of the traditional segmentation network are deeper, the number of parameters and computational complexity increase sharply, and the calculation time of the model is extended without changing the hardware environment. This results in a slow prediction speed; however, the actual application scenario has higher accuracy and speed requirements. To lighten the network and improve the prediction speed, inspired by Mobile-Net, we proposed a new DIRU-Net by combining the dilated convolution with inverted residuals. Figure 6 shows the DIRU-Net model structure, which is composed of eight DIR blocks and a U-Net backbone architecture (encoder on the left and decoder on the right). The summary table of each layer, the number of parameters and the output shape of DIRU-Net are shown in Table 1 below.
Figure 6.
DIRU-Net network structure.
Table 1.
Network structure of DIRU-Net.
There are several reasons why we improved the convolution layer structure based on U-Net.
- (1)
- The encoder and decoder flow idea of U-Net is very popular in the field of segmentation and has high segmentation accuracy. Therefore, this structure is also used in the trunk of our model.
- (2)
- The depth of the network is very important for learning the characteristics of stronger expression ability, but as the layers are deeper, the gradient of back-propagation will become unstable with the multiplication and become particularly larger or smaller, and it will cause gradient explosion or disappearance. Although batch norm and RELU can solve this problem, the performance of the network will become worse and worse with the increase in depth. Therefore, He Kaiming and his team proposed the idea of residual connection to solve this problem. We also adopt this idea in our network to avoid performance degradation, which can also make up for the characteristic information between the upper and lower layers and improve efficiency.
- (3)
- The hot application and development of deep learning networks in various fields have promoted the miniaturization of network models with smaller volumes and faster speeds. To lighten our model and accelerate the prediction speed, we combine the dilated convolution with the inverted residuals and propose the DIR block, which reduces the calculation cost and accelerates the prediction speed.
3.2. DIR Block Architecture
3.2.1. DIR Block
As shown in Figure 7, the DIR block is composed of three convolutions, which are a 5 × 5 dimension-increasing dilated convolution, a depth-wise convolution, and a pointwise convolution. The dilated convolution is to insert 0 values in the height and width directions based on the kernel of the standard convolution to increase the reception field. Therefore, it has a redundant hyper-parameter dilation rate, which is used to adjust the spacing of the original convolution kernel. To ensure that the characteristic information between the upper and lower layers is not lost, a residual connection part is added. The number of feature maps after depth-wise convolution is the same as the number of channels in the input layer, so the number of feature map channels cannot be expanded. Moreover, this operation convolutes each channel of the input layer independently and does not effectively use the characteristic information of different channels in the same spatial position. Therefore, pointwise convolution is needed to combine these feature maps to generate a new feature map. This combined operation is called depth-wise separable convolution, which can lighten the network structure. For retaining the low dimensional features, we did not use RELU activation after the dimensionality reduction in the third layer. Firstly, we add the input to the output according to the dimension and then perform RELU activation after completing the residual connection. In the experiment of MobileNetV3 [15], it was found that the H-Swish function performed better than the RELU function in deeper network layers. Inspired by this, we used the H-Swish activation function in the 4th and 5th layers of the network to replace the traditional RELU function.
Figure 7.
DIR block.
3.2.2. Residual Connection
Residual connection is aimed at the gradient explosion problem of deep network back-propagation. It has the characteristics of accelerating convergence. Because the residual connection has this way of directly adding the output of the previous layer to the input of the next layer, the latter layer can obtain the feature information of the previous layer through transfer learning, thus enhancing the feature transfer between layers. As shown in Figure 8, there are two kinds of residual connection structures: when the size and number of channels of the input and output characteristic diagram are the same, the input and output are directly added together; when the parameters of the input and output characteristic graphs are different, the input size and channels are adjusted to be the same as the output feature map through a residual convolution, and then it is added to the output. In our DIR module, the number of output channels is different from that of input channels, so we used the second residual connection form (Figure 8b). The equation of the residual convolution structure can be expressed as follows:
where and are the input and output feature map, respectively, represents the dimension-increasing convolution of the input (right), and represents the middle depth-wise and pointwise convolution part of the DIR module.
Figure 8.
Schematic diagram of two residual connection forms. (a) The input is the same as the output, (b) the input is different from the output.
3.2.3. Depth-Wise Separable Convolution
Depth-wise separable convolution combines depth-wise and pointwise convolution to extract features. In our DIR module, the combination of the second and third layers is a depth-wise separable convolution, which effectively reduces the number of parameters and computation of the model.
Figure 9a shows the ordinary convolution layer, it has O filters, and each filter contains C convolution kernels. In Figure 9b, the deep convolution layer’s input and output channels are the same, it has O filters, each filter contains a convolution kernel, and one convolution kernel is responsible for one channel. In Figure 9c, the pointwise convolution layer has O filters, and each filter has C convolution kernels, and the size of the convolution kernel is 1 × 1. The calculation amount of the three convolution layers can be expressed as follows:
where , , and are the calculation quantities of a, b, c three convolution operations in Figure 9, respectively. K indicates the size of the convolution kernel, S indicates the step size, P indicates the padding pixel, W and H represent the width and height of the input feature map, C is the input channels, and O is output channels. In our model, when the size of the convolution kernel for deep convolution is 3 × 3, the value of padding equals 1; when the size of the convolution kernel for deep convolution is 1 × 1, the value of padding equals 0. The step size of all the convolution layers is 1.
Figure 9.
Three convolutions: (a) ordinary convolution filter, (b) deep convolution filter, (c) pointwise convolution filter. M is the number of input channels, N is the number of output channels, and the width and height of the convolution kernel is K.
Assuming that the calculation amount of depth convolution and pointwise convolution is smaller than that of ordinary convolution, the comparison is as follows:
The values of and O are more than zero, so we have the equation below:
Because the value of K is fixed, can be equal to a constant M, so
Because of , when , is established. It is concluded that the computational complexity of depth convolution and pointwise convolution is less than that of ordinary convolution.
4. Experimental Results
4.1. Experimental System Configuration
The computer for this experiment is configured with Intel Core i5-7500 3.40 GHz CPU, NVIDIA GTX1050Ti GPU, 8 GB RAM and 500 GB Western Digital hard disk. A single GPU is used, with total capacity of 4.00 GiB; Cudnn10.2.89 and cuda11.1 are used to speed up the training and predicting process. We set 100 epochs for training, and the initial learning rate of the Adam optimizer was 0.001. When the accuracy rate does not change for 15 consecutive epochs, the learning rate will be reduced by the multiple of 0.1. The authenticity and diversity of the dataset are the important means to evaluate the overall performance of a model in practical application. Our dataset was collected at the real site and has authenticity, and then manually marks the corresponding ground truth images as the reference object of the prediction results. We used a self-made dataset to conduct comparative experiments in different typical segmentation networks. To ensure the fairness of the experiment, all the methods in this paper use the Cross-Entropy loss function. The explanation and description of the cross-entropy loss function are as presented below.
The cross-entropy loss function is a loss function used for classification problems, especially binary and multi-classification problems. For binary classification problems, it calculates the cross-entropy between the true label and the predicted probability. For multi-classification problems, it calculates the cross-entropy between the true label and the predicted probability of each category, and takes their average as the loss function. The coal fast image segmentation in this paper belongs to the binary classification problem of the coal and background, so we chose the cross-entropy loss function, which is specifically defined as follows:
where n is the number of samples; yi is the true label; and is the predicted probability.
The advantage of the cross-entropy loss function lies in its ability to effectively penalize the model for misclassification and is not sensitive to the form of the probability distribution of the model output. Furthermore, when the gradient descent method is used for model training, the derivative form of the cross-entropy loss function is simple and easy to calculate.
4.2. Evaluation Criteria
The purpose of the DIRU-Net model proposed in this paper is to reduce the number of parameters and computational load of the model compared with existing methods, making the model lightweight. Therefore, in addition to the evaluation metrics for model segmentation accuracy: ACC, Precision, Recall, and F1-Score (F1), our evaluation metrics also include those for model volume and speed: Time, FLOPs, and Params.
ACC is a metric that measures the ratio between correctly classified pixels and total pixels in a dataset. Precision refers to the proportion of true positive samples in all predicted positive samples. Recall represents the proportion of the correctly predicted number of positive cases to the actual total number of positive cases. F1 is the harmonic average of ACC and Recall. These indicators take the following form:
where TP indicates the true positive results, FP indicates the false positive results, TN indicates the true negative results, and FN indicates the false negative results.
Flops is the abbreviation of the floating point of operations. It is the number of floating-point operations, understood as the amount of computation, and can be used to measure the time complexity of algorithms or models. The size of flops is closely linked to the speed of the model, and our purpose is to ensure high accuracy and speed up the model at the same time, so the amount of calculation is set as one of the evaluation criteria. The smaller the flops required to process a picture, the faster it will be able to process more pictures at the same time. Flops are related to many factors, such as the number of network layers, parameter quantities, selected activation functions, etc. Generally speaking, the lower the parameter quantity of the network, the smaller the flops will be, and the requirements for hardware memory are relatively low, so it is more friendly to the embedded system.
The convolution of a feature map is to multiply the convolution kernel value by each pixel in the reception field and add all of them to obtain a new pixel. Then the convolution kernel is moved according to the step size, and then this operation is repeated until each pixel in the feature map is introduced into the calculation.
For performing one convolution, the ordinary K × K convolution requires K × K multiplication operations and K × K − 1 addition operations (add up multiplication operation’s results). Therefore, one convolution operation requires times of multiplication and addition. The number of convolutions to be performed on a feature map is
where W and H are the width and height of the input image, respectively; K indicates the size of the convolution kernel; and are the pixels filled in the width and height directions, respectively; and S indicates the moving step of the convolution kernel.
The number of convolution operations on C feature maps is C. After convolution on the input feature map, the results of convolution need to be added to obtain an output feature map. The addition of the results of C convolutions requires C times of addition, so the amount of calculation required to output a feature map is
Multiply the above results to obtain the amount of calculation required to output one channel. It needs to output O channels, and the calculation amount should be multiplied by O, so the calculation amount needed to output O feature maps is
The above formula obtains the amount of computation required for one layer of convolutional neural network. Add the calculation amount of each layer to obtain the calculation amount of the whole network.
4.3. Comparative Experiment
To prove the superiority of our proposed model over the typical segmentation model in speed and accuracy, we performed comparative experiments on U-Net, U-Net++, ResUnet, R2U-Net, and SegNet models, and the segmentation efficiency of each model is measured by seven evaluation criteria. The results of the comparative experiment are shown in Table 2 and Figure 10.
Table 2.
Performance of the six tested models. (Bold is the best score, and the next one is underlined).
Figure 10.
Experimental results based on different network models.
Table 2 records the evaluation criteria values, obtained from the comparative experiments of six different segmentation networks on the coal block dataset. Whereas the unit of ACC, precision, Recall and F1 is percentage; time is the average value of the time consumed in the prediction of the network model for a single picture, its unit is a second; and the unit of FLOPs and Params is MB.
From Table 2, we can see that the values of various evaluation indicators are significantly different, which can reflect the advantages and disadvantages, especially the three indicators of speed, FLOPs, and Params, which are exactly the part we pay most attention to.
The ACC value of our network has reached 0.948, and the Recall and F1 values are also higher than those of other networks. The segmentation accuracy of the report (94.8%) was obtained by taking the average of the results from five experiments. From these criteria, we can see that we have achieved the goal of ensuring high prediction accuracy. When the prediction accuracy of the network is almost at the same level, the prediction speed of our network for a single picture reaches about 6.25 s, which is 2.88 times faster than that of U-Net++, 18 s.
In terms of FLOPs and Params, it also shows a very superior performance. The parameter amount and FLOPs amount with a 128 × 128 gray image as the input data reach the minimum values of 0.77 MB and 399.14 MB, respectively. They have shrunk by 11.89 times and 21.84 times, respectively, compared to the U-Net++ network, which ranks second.
This is the best achievement to reduce the number of calculations, shorten the prediction time and speed up the network while ensuring the prediction accuracy.
Figure 10 shows the prediction results segmented by the six network models compared. From the figure, it can be seen that the R2U-Net effect is obviously not good, and there is hardly a correctly segmented ore. These errors are mainly caused by the following external factors: first, it is difficult to distinguish the segmentation lines between ore particles because of insufficient light in this part of the original image; second, there are some smaller particles on the surface of the block, which leads to the appearance of holes marked in black in the segmented block.
4.4. Ablation Study
The module proposed in this paper is the DIR block. In this study, when the DIR block is not used, the proposed network DIRU-Net serves as the Baseline (U-Net). The ablation experiment results of this paper are shown in Table 3 and Figure 11 as follows:
Table 3.
Performance comparison of ablation experiments.
Figure 11.
Coal block segmentation results of the ablation study.
4.5. Loss Function Curve
Figure 12 shows the loss curve of the proposed method during the training process. The loss value indicates the degree to which the network fits the training data. The lower the value, the better the fitting effect. The training results prove that the DIRU-Net we proposed correctly fits the coal block image data in the training set and can accurately classify and predict the labels of the vast majority of pixels.
Figure 12.
Training loss curve of the model.
4.6. Real-Time Performance Testing on Different Hardware Types
To test the real-time performance of the proposed method on different hardware types, we conducted runtime benchmark tests on i5-7500 CPU and GTX1050Ti GPU, respectively. The experimental results are shown in Table 4. It can be seen from Table 4 that DIRU-Net takes the least time and has the best real-time performance.
Table 4.
Real-time test results of all methods on different hardware types.
5. Conclusions
In the process of coal production, the segmentation of coal blocks is of great significance for the statistical analysis of the size and quantity of coal blocks. The shortcomings of the traditional coal block segmentation model include a slow speed, large volume and high cost. Based on the backbone of the U-Net encoder and decoder and combining the characteristics of dilated convolution and inverted residual structures, we propose a lightweight deep convolutional network (DIRU-Net) for coal block image segmentation. We comprehensively evaluated DIRU-Net in the coal block dataset and compared it with other state-of-the-art coal block segmentation methods. DIRU-Net outperforms all methods in terms of segmentation performance and lightweight. Among them, the accuracy rate of coal block segmentation reached 94.8%, and the number of model parameters was only 0.77 MB. DIRU-Net provides a highly accurate and lightweight method for coal block segmentation.
Author Contributions
Conceptualization, J.L. and G.F.; methodology, B.M.; software, B.M. and J.L.; validation, G.F. and B.M.; formal analysis, G.F.; writing—original draft preparation, J.L. and B.M.; writing—review and editing, J.L. and G.F.; visualization, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported in part by the National Key R&D Program of China under Grant 2022YFB2703304; in part by the National Natural Science Foundation of China under Grant 52074064; in part by the Natural Science Foundation of Science and Technology Department of Liaoning Province under Grant 2024-MSLH-524; and in part by the Fundamental Research Funds for the Central Universities under Grant N25GFZ010 and N25GFZ028; and in part by the Liaoning Provincial Science and Technology Plan-Joint Program of Key Research and Development Project under Grant 2023JH2/101800046.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Sun, Z. Research on Comprehensive Mechanized Coal Mining Technology in Coal Mines, Coal economy in Inner Mongolia. Inn. Mong. Coal Econ. 2025, 15, 55–57. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, X.; Zhang, Z.; Deng, F. Deep learning in image segmentation for mineral production: A review. Comput. Geosci. 2023, 180, 105455. [Google Scholar] [CrossRef]
- Zhang, H.; Xiao, D. A high-precision and lightweight ore particle segmentation network for industrial conveyor belt. Expert Syst. Appl. 2025, 273, 126891. [Google Scholar] [CrossRef]
- Li, F.; Liu, X.; Yin, Y.; Li, Z. DDR-Unet: A High-Accuracy and Efficient Ore Image Segmentation Method. IEEE Trans. Instrum. Meas. 2023, 72, 5027920. [Google Scholar] [CrossRef]
- Mukherjee, D.P.; Potapovich, Y.; Levner, I.; Zhang, H. Ore Image Segmentation by Learning Image and Shape Features. Pattern Recognit. Lett. 2009, 30, 615–622. [Google Scholar] [CrossRef]
- Zhan, Y.; Zhang, G. An Improved OTSU Algorithm Using Histogram Accumulation Moment for Ore Segmentation. Symmetry 2019, 11, 431. [Google Scholar] [CrossRef]
- Zhang, G.Y.; Liu, G.Z.; Zhu, H.; Qiu, B. Ore Image Thresholding Using Bi-Neighbourhood Otsu’s Approach. Electron. Lett. 2010, 46, 1666. [Google Scholar] [CrossRef]
- Dong, K.; Jiang, D. Automated Estimation of Ore Size Distributions Based on Machine Vision. In Unifying Electrical Engineering and Electronics Engineering; Xing, S., Chen, S., Wei, Z., Xia, J., Eds.; Lecture Notes in Electrical Engineering; Springer: New York, NY, USA, 2014; Volume 238, pp. 1125–1131. ISBN 978-1-4614-4980-5. [Google Scholar]
- Zhang, G.; Liu, G.; Zhu, H. Segmentation Algorithm of Complex Ore Images Based on Templates Transformation and Reconstruction. Int. J. Miner. Metall. Mater. 2011, 18, 385–389. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. ISBN 978-3-319-24573-7. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Proceedings of the Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Zhou, Z.; Yuan, H.; Cai, X. Rock Thin Section Image Identification Based on Convolutional Neural Networks of Adaptive and Second-Order Pooling Methods. Mathematics 2023, 11, 1245. [Google Scholar] [CrossRef]
- Gu, Y.; Deng, L. STAGCN: Spatial–Temporal Attention Graph Convolution Network for Traffic Forecasting. Mathematics 2022, 10, 1599. [Google Scholar] [CrossRef]
- Haq, M.A.; Khan, I.; Ahmed, A.; Eldin, S.M.; Alshehri, A.; Ghamry, N.A. DCNNBT: A Novel Deep Convolution Neural Network-Based Brain Tumor Classification Model. Fractals 2023, 31, 2340102. [Google Scholar] [CrossRef]
- Aguilera, R.C.; Ortiz, M.P.; Banda, A.A.; Aguilera, L.E.C. Blockchain CNN Deep Learning Expert System for Healthcare Emergency. Fractals 2021, 29, 2150227. [Google Scholar] [CrossRef]
- Wang, L.; Wang, X.; Zhao, Z.; Wu, Y.; Xu, J.; Zhang, H.; Yu, J.; Sun, Q.; Bai, Y. Multi-Factor Status Prediction by 4d Fractal CNN Based on Remote Sensing Images. Fractals 2022, 30, 2240101. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, P.; Man, X.; Ou, L. A New Belt Ore Image Segmentation Method Based on the Convolutional Neural Network and the Image-Processing Technology. Minerals 2020, 10, 1115. [Google Scholar] [CrossRef]
- Li, H.; Pan, C.; Chen, Z.; Wulamu, A.; Yang, A. Ore Image Segmentation Method Based on U-Net and Watershed. Comput. Mater. Contin. 2020, 65, 563–578. [Google Scholar] [CrossRef]
- Xiao, D.; Liu, X.; Le, B.T.; Ji, Z.; Sun, X. An Ore Image Segmentation Method Based on RDU-Net Model. Sensors 2020, 20, 4979. [Google Scholar] [CrossRef]
- Jin, Q.; Meng, Z.; Pham, T.D.; Chen, Q.; Wei, L.; Su, R. DUNet: A Deformable Network for Retinal Vessel Segmentation. Knowl.-Based Syst. 2019, 178, 149–162. [Google Scholar] [CrossRef]
- Wang, W.; Li, Q.; Xiao, C.; Zhang, D.; Miao, L.; Wang, L. An Improved Boundary-Aware U-Net for Ore Image Semantic Segmentation. Sensors 2021, 21, 2615. [Google Scholar] [CrossRef]
- Huang, Z.; Zhao, Y.; Liu, Y.; Song, G. GCAUNet: A Group Cross-Channel Attention Residual UNet for Slice Based Brain Tumor Segmentation. Biomed. Signal Process. Control 2021, 70, 102958. [Google Scholar] [CrossRef]
- Gu, Z.; Cheng, J.; Fu, H.; Zhou, K.; Hao, H.; Zhao, Y.; Zhang, T.; Gao, S.; Liu, J. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Trans. Med. Imaging 2019, 38, 2281–2292. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Yang, J.; Ni, J.; Elazab, A.; Wu, J. TA-Net: Triple Attention Network for Medical Image Segmentation. Comput. Biol. Med. 2021, 137, 104836. [Google Scholar] [CrossRef] [PubMed]
- Shi, Z.; Wang, T.; Huang, Z.; Xie, F.; Liu, Z.; Wang, B.; Xu, J. MD-Net: A Multi-Scale Dense Network for Retinal Vessel Segmentation. Biomed. Signal Process. Control 2021, 70, 102977. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2016, arXiv:1511.07122. [Google Scholar] [CrossRef]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 1856–1867. [Google Scholar] [CrossRef]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A Deep Learning Framework for Semantic Segmentation of Remotely Sensed Data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Alom, M.Z.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Nuclei Segmentation with Recurrent Residual Convolutional Neural Networks Based U-Net (R2U-Net). In Proceedings of the NAECON 2018—IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 228–233. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).