# Revealing the Boundaries of Selected Gastro-Intestinal (GI) Organs by Implementing CNNs in Endoscopic Capsule Images

## Abstract

## 1. Introduction

- CADe by FUJIfilm, PENTAX,
- GI Genious by Medtronic,
- EndoBRAIN by Cybernet,
- Endo-AID by Olympus (as of this time, pending approval).

## 2. Materials and Methods

#### 2.1. Data and Resources

- create validation data using 20% of the training data, in order to be used as unknown data during the training process;
- reduce the size of images from the 576 × 576 original image size to 256 × 256;
- define the batch size to be 16;
- use autotune data, in order to help our model to be faster, since it will decouple the time when data is produced from the time when data is consumed. It will use a background thread and an internal buffer to prefetch elements from the input dataset ahead of the time they are requested. Autotune function will prompt the tf.data to tune the value dynamically at runtime;
- perform data normalization, by transforming all pixel values in the [0, 1] interval.

#### 2.2. Proposed CNNs 4-Class Classifiers

#### 2.2.1. CNN–Model 1

#### 2.2.2. CNN–Model 2

#### 2.2.3. CNN–Model 3

## 3. Results of CNNs Training

## 4. Methods of Multiclass CNNs Independent Evaluation

- ${t}_{k}={\displaystyle {\sum}_{i}^{K}{C}_{ik}}$ is the number of times class k truly correct,
- ${p}_{k}={\displaystyle {\sum}_{i}^{K}{C}_{ik}}$ is the number of times class k was predicted into class,
- $c={\displaystyle {\sum}_{k}^{K}{C}_{kk}}$ is the total number of samples that was correctly predicted,
- $s={\displaystyle {\sum}_{i}^{K}{\displaystyle {\sum}_{j}^{K}{C}_{ij}}}$ is the overall number of samples, and
- $c\times s-{\displaystyle {\sum}_{k}^{K}{p}_{k}\times {t}_{k}}$ includes the elements wrongly classified by the model and covers multiplicative entities that are weaker than the product c × s.

_{i}, stands for the fraction of positive class predictions that were actually positive. The value of sensitivity or recall of the ith class, R

_{i}, stands for the fraction of all positive samples that were correctly predicted as positive by the classifier.

^{2}(degrees of freedom, N = sample size) = chi-square statistic value, p = p value, while our p-values are calculated by the Pearson’s chi-squared test, using Equation (8):

_{i,j}is the “theoretical frequency” for a cell, given the hypothesis of independence, and O

_{i,j}is the observations of type j ignoring the row attribute (fraction of column totals).

## 5. Results of CNN Models Testing

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

Layer Name | Layer Type | Number of Filters | Output Shape | Number of Parameters |
---|---|---|---|---|

Conv2D | Conv2D | 16 | (256, 256, 16) | 448 |

Max_pooling2D | MaxPooling2D | (128, 128, 16) | 0 | |

Conv2D_1 | Conv2D | 32 | (128, 128, 32) | 4640 |

Max_pooling2D_1 | MaxPooling2D | (64, 64, 32) | 0 | |

Conv2D_2 | Conv2D | 64 | (64, 64, 32) | 18,496 |

Max_pooling2D_2 | MaxPooling2D | (32, 32, 64) | 0 | |

Flatten | Flatten | 65,536 | 0 | |

Dense | Dense | 128 | 8,388,736 | |

Dense_1 | Dense | 4 | 516 |

Layer Name | Layer Type | Number of Filters | Output Shape | Number of Parameters |
---|---|---|---|---|

Conv2D | Conv2D | 32 | (254, 254, 32) | 896 |

Max_pooling2D | MaxPooling2D | (127, 127, 32) | 0 | |

Conv2D_1 | Conv2D | 64 | (125, 125, 64) | 18,496 |

Max_pooling2D_1 | MaxPooling2D | (62, 62, 64) | 0 | |

Conv2D_2 | Conv2D | 128 | (60, 60, 128) | 73,856 |

Max_pooling2D_2 | MaxPooling2D | (30, 30, 128) | 0 | |

Conv2D_3 | Conv2D | 256 | (28, 28, 256) | 295,168 |

Max_pooling2D_3 | MaxPooling2D | (14, 14, 256) | 0 | |

Flatten | Flatten | 50,176 | 0 | |

Dense | Dense | 256 | 12,845,312 | |

Dense_1 | Dense | 4 | 1028 |

Layer Name | Layer Type | Number of Filters | Output Shape | Number of Parameters |
---|---|---|---|---|

Conv2D | Conv2D | 32 | (252, 252, 32) | 2432 |

Max_Pooling2D | MaxPooling2D | (126, 126, 32) | 0 | |

Conv2D_1 | Conv2D | 32 | (122, 122, 32) | 25,632 |

Max_Pooling2D_1 | MaxPooling2D | (61, 61, 32) | 0 | |

Conv2D_2 | Conv2D | 32 | (57, 57, 32) | 25,632 |

Max_Pooling2D_2 | MaxPooling | (28, 28, 32) | 0 | |

Flatten | Flatten | 25,088 | 0 | |

Dense | Dense | 256 | 6,422,784 | |

Dropout | Dropout | 256 | 0 | |

Dense_1 | Dense_1 | 4 | 1028 |

**Table 4.**Report from testing 496 frames (124 not seen before frames of each organ) with CNN model 1.

Matrix C | Predicted Values | ||||||
---|---|---|---|---|---|---|---|

Class k | TP_{ii} | TN | $\sum}_{\mathit{i}=1}^{4}\mathit{F}{\mathit{P}}_{\mathit{i}$ | $\sum}_{\mathit{i}=1}^{4}\mathit{F}{\mathit{N}}_{\mathit{i}$ | Total t_{k} | ||

Actual values | 1 | Esophagus | C_{11} = 27 | 87 | 8 | 2 | t_{k=}_{1} = 124 |

2 | Stomach | C_{22} = 25 | 83 | 4 | 12 | t_{k=}_{2} = 124 | |

3 | Small Intestine | C_{33} = 26 | 91 | 4 | 3 | t_{k=}_{3} = 124 | |

4 | Colon | C_{44} = 29 | 94 | 1 | 0 | t_{k=}_{4} = 124 | |

Total p_{k} | c = 107 | p_{k=}_{2}= 355 | p_{k=}_{3}= 17 | p_{k=}_{4}= 17 | s = 992 | ||

4 × 4 contingency table Significance Level: 0.05, X ^{2} (N = 124) = 249.92, p < 0.00001. Significant at p < 0.05. |

**Table 5.**Report from testing 496 frames (124 not seen before frames of each organ) with CNN model 2.

Matrix C | Predicted Values | ||||||
---|---|---|---|---|---|---|---|

Class k | TP_{ii} | TN | $\sum}_{i=1}^{4}F{P}_{i$ | $\sum}_{i=1}^{4}F{N}_{i$ | Total t_{k} | ||

Actual values | 1 | Esophagus | C_{11} = 28 | 90 | 5 | 1 | t_{k=}_{1} = 124 |

2 | Stomach | C_{22} = 30 | 84 | 3 | 7 | t_{k=}_{2} = 124 | |

3 | Small Intestine | C_{33} = 26 | 93 | 2 | 3 | t_{k=}_{3} = 124 | |

4 | Colon | C_{44} = 29 | 94 | 1 | 0 | t_{k=}_{4} = 124 | |

Total p_{k} | c = 113 | p_{k=2} = 361 | p_{k=3} = 11 | p_{k=4} = 11 | S = 992 | ||

4 × 4 contingency table Significance Level: 0.05, X ^{2} (N = 124) = 219.76, p = 0.00001. Significant at p < 0.05. |

**Table 6.**Report from testing 496 frames (124 not seen before frames of each organ) with CNN Model 3.

Matrix C | Predicted Values | ||||||
---|---|---|---|---|---|---|---|

Class k | TP_{ii} | TN | $\sum}_{\mathit{i}=1}^{4}\mathit{F}{\mathit{P}}_{\mathit{i}$ | $\sum}_{\mathit{i}=1}^{4}\mathit{F}{\mathit{N}}_{\mathit{i}$ | Total t_{k} | ||

Actual values | 1 | Esophagus | C_{11} = 27 | 95 | 0 | 2 | t_{k=}_{1} = 124 |

2 | Stomach | C_{22} = 25 | 84 | 3 | 12 | t_{k=}_{2} = 124 | |

3 | Small Intestine | C_{33} = 22 | 95 | 0 | 7 | t_{k=}_{3} = 124 | |

4 | Colon | C_{44} = 29 | 77 | 18 | 0 | t_{k=}_{4} = 124 | |

Total p_{k} | c = 103 | p_{k=}_{2}= 351 | p_{k=}_{3}= 21 | p_{k=}_{4}= 21 | s = 992 | ||

4 × 4 contingency table Significance Level: 0.05, X ^{2} (N = 124) = 223.19, p = < 0.00001. Significant at p < 0.05. |

**Table 7.**Performance metrics of our CNN models 1, 2 and 3 for independent validation (data not seen before).

Model/Class | Accuracy | Error Rate | Precision | Specificity | Sensitivity (Recall) | F1 Score | MCC |
---|---|---|---|---|---|---|---|

Model 1/Class 1 vs. rest | 0.9193 | 0.0806 | 0.7714 | 0.9157 | 0.9310 | ||

Model 1/Class 2 vs. rest | 0.8709 | 0.1290 | 0.8620 | 0.9540 | 0.6756 | ||

Model 1/Class 3 vs. rest | 0.9435 | 0.0564 | 0.8666 | 0.9578 | 0.8965 | ||

Model 1/Class 4 vs. rest | 0.9919 | 0.0080 | 0.9666 | 0.9894 | 1 | ||

Model 1/Average macro | 0.9314 | 0.0685 | 0.8667 | 0.9542 | 0.8758 | 0.871188 | 62.53618 |

Model 2/Class 1 vs. rest | 0.9516 | 0.0483 | 0.8484 | 0.9473 | 0.9655 | ||

Model 2/Class 2 vs. rest | 0.9193 | 0.0806 | 0.9090 | 0.9655 | 0.8108 | ||

Model 2/Class 3 vs. rest | 0.9596 | 0.0403 | 0.9285 | 0.9789 | 0.8965 | ||

Model 2/Class 4 vs. rest | 0.9919 | 0.0080 | 0.9666 | 0.9894 | 1 | ||

Model 2/Average macro | 0.9556 | 0.0443 | 0.9131 | 0.9703 | 0.9182 | 0.915655 | 69.9200 |

Model 3/Class 1 vs. rest | 0.9838 | 0.0161 | 1 | 1 | 0.9310 | ||

Model 3/Class 2 vs. rest | 0.8790 | 0.1209 | 0.8928 | 0.9655 | 0.6756 | ||

Model 3/Class 3 vs. rest | 0.9435 | 0.0564 | 1 | 1 | 0.7586 | ||

Model 3/Class 4 vs. rest | 0.8548 | 0.1451 | 0.6170 | 0.8105 | 1 | ||

Model 3/Average macro | 0.9153 | 0.0846 | 0.8775 | 0.9440 | 0.8413 | 0.858994 | 57.63078 |

