# Automatic Coal and Gangue Segmentation Using U-Net Based Fully Convolutional Networks

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Input Data

#### 2.1. Raw Data Collection

#### 2.2. Data Preparation

#### 2.3. Data Augmentation

## 3. The Proposed Approach

#### 3.1. Network Architecture

#### 3.2. Model Training

#### 3.3. Evaluation Metrics

## 4. Results and Discussion

#### 4.1. Visual Results

#### 4.2. Network Performance

#### 4.3. Effects of Data Augmentation

#### 4.4. Impacts of Input Image Size

#### 4.5. Comparisons with Other Methods

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

**Figure 1.**An overview of our deep learning-based coal and gangue segmentation technique. (

**a**) Training stage; (

**b**) Testing stage.

**Figure 2.**Data collection and gangue separating systems: (

**a**) Data collection and; (

**b**) gangue grabbing manipulator.

**Figure 6.**The results of gangue segmentation using the U-Net based approach: first column, testing images; second column, the manually labeled ground truth; third column, the probability maps generated by the trained model; and the last column; results overlaid on the original images.

**Figure 8.**The P-R curves and the ROC curves with their corresponding AUC values. (

**a**) The P-R curve; (

**b**) the ROC curve.

**Figure 10.**Performance of the trained model without data augmentation. (

**a**) The P-R curve; (

**b**) the ROC curve.

**Figure 11.**Performance of the trained model with an input image size $256\times 256$. (

**a**) The P-R curve; (

**b**) the ROC curve.

Ground Truth | |||
---|---|---|---|

Gangue (P) | Background (N) | ||

Predicted Result | Gangue (${\mathrm{P}}^{\prime}$) | True Positive (TP) | False Positive (FP) |

Background (${\mathrm{N}}^{\prime}$) | False Negative (FN) | True Negative (TN) | |

$\mathrm{ACCURACY}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{P}+\mathrm{N}}$$\mathrm{PRECISION}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$$\mathrm{RECALL}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$\mathrm{TPR}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$\mathrm{FPR}=\frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}}$ |

Images | AUROC | AUPRC | Accuracy | Precision | Recall | IoU |
---|---|---|---|---|---|---|

Test01.png | 0.94 | 0.90 | 0.88 | 0.83 | 0.94 | 0.80 |

Test02.png | 0.98 | 0.99 | 0.93 | 0.97 | 0.81 | 0.79 |

Test03.png | 0.99 | 0.99 | 0.98 | 0.98 | 0.98 | 0.96 |

Test04.png | 0.94 | 0.90 | 0.90 | 0.83 | 0.99 | 0.82 |

Test05.png | 0.99 | 0.99 | 0.95 | 0.95 | 0.96 | 0.92 |

Test06.png | 0.96 | 0.97 | 0.93 | 0.90 | 0.91 | 0.82 |

Mean ($\sigma $) | 0.96 (2.13%) | 0.96 (4.07%) | 0.93 (3.24%) | 0.90 (6.19%) | 0.94 (6.04%) | 0.86 (6.44%) |

Without Augmentation | With Augmentation | |||
---|---|---|---|---|

Confusion matrix | 654,689 | 77,528 | 661,849 | 70,368 |

73,321 | 767,326 | 40,913 | 799,734 | |

Accuracy | 0.90 | 0.93 | ||

Precision | 0.89 | 0.90 | ||

Recall | 0.90 | 0.94 | ||

IoU | 0.81 | 0.86 |

Task | Predict Time | Train the Model | Physical Size Per Pixel |
---|---|---|---|

$512\times 512$ Input size | 48.2 ms per image | 5.8 h | 1.56 mm |

$256\times 256$ Input size | 18.5 ms per image | 1.8 h | 3.13 mm |

Methods | Dataset Size | Image Attributes | Coal and Gangue Position | Algorithm Category | Detection Accuracy |
---|---|---|---|---|---|

LeNet [3] | 20,000 | $224\times 224$ 8-bit | Single | Classification | 95.9% |

AlexNet [4] | 2012 | $224\times 224$ RGB | Single | Classification | 96.0% |

CG-RPN [10] | 2316 | $1024\times 1024$ RGB | Sparse | Classification with box | 98.3% |

Our method | 60 | $512\times 512$ 8-bit | Multiple heaped | Pixel-wise segmentation | 93.0% |

