# Automated Diatom Classification (Part B): A Deep Learning Approach

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials

#### 2.1. Data Labeling

#### 2.2. Image Processing

#### 2.3. Data Augmentation

#### 2.4. Dataset Building

#### 2.4.1. Segmented Dataset

- Binary Thresholding: automatic segmentation based on Otsu’s thresholding.
- Maximum area: calculation of the largest region (area).
- Hole filling: interior holes are filled if present.
- Segmentation: the ROI is cropped with the coordinates of the bounding-box of the largest area (step 2).

#### 2.4.2. Normalized Dataset

#### 2.4.3. Original + Normalized Dataset

## 3. Deep Learning

#### 3.1. Convolutional Neural Networks

#### 3.2. Training

#### 3.3. Testing

#### 3.4. Validation

- True Positive. The instance belongs to the class and so is predicted.
- False Positive. The instance does not belong to the class but is predicted as positive. This is the so-called Type I error.
- True Negative. The instance does not belong to the class and so is predicted.
- False Negative. The instance does belong to the class but is predicted as negative. This is the so-called Type II error.

- True Positive Rate (TPR) or Sensitivity. Defined in Equation (1), it measures the proportion of positive samples correctly classified.$$\frac{\sum True\phantom{\rule{3.33333pt}{0ex}}positive}{\sum True\phantom{\rule{3.33333pt}{0ex}}positive+\sum False\phantom{\rule{3.33333pt}{0ex}}negative}$$
- True Negative Rate (TNR) or Specificity. Defined in Equation (2), it measures the proportion of negative samples correctly classified.$$\frac{\sum True\phantom{\rule{3.33333pt}{0ex}}negative}{\sum False\phantom{\rule{3.33333pt}{0ex}}positive+\sum True\phantom{\rule{3.33333pt}{0ex}}negative}$$
- Accuracy. Defined in Equation (3), it measures the proportion of correctly classified samples (positives and negatives) against the total population (number of samples that have been classified).$$\frac{\sum True\phantom{\rule{3.33333pt}{0ex}}positive+\sum True\phantom{\rule{3.33333pt}{0ex}}negative}{\sum Total\phantom{\rule{3.33333pt}{0ex}}population}$$

## 4. Results and Discussion

#### 4.1. Original Dataset

#### 4.2. Segmented Dataset

- -
- Cymbella excisa var angusta (19) and Cymbella excisa var excisa (20);
- -
- Encyonopsis alpina (32) and Encyonopsis minuta (33)—same than the original database;
- -
- Eolimna minima (34) and Eolimna rhombelliptica (35) —same than the original database;
- -
- Epithemia adnata (37) and Epithemia turgida (39);
- -
- Nitzschia amphibia (66) and Nitzschia fossilis (71).

#### 4.3. Normalized Dataset

- -
- Encyonema reichardtii (29): Achnanthidium atomoides (2), Achnanthidium caravelense (3), Achnanthidium eutrophilum (6), Gomphonema minutum (52).
- -
- Achnanthidium rivulare (9): Achnanthes subhudsonis (1), Achnanthidium atomoides (2), Achnanthidium eutrophilum (6), Eolimna minima (34), Eolimna rhombelliptica (35).
- -
- Nitzschia fossilis (71): Nitzschia amphibia (66), Nitzschia tropica (74).

- -
- Encyonema alpina (32) with Encyonopsis minuta (33).
- -
- Nitzschia amphibia (66) with Nitzschia fossilis (71).

#### 4.4. Original + Normalized Dataset

#### 4.5. Discussion

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- The European Parliament and the Council of the European Union. Directive 2000/60/EC. Establishing a Framework for Community Action in the Field of Water Policy; Official Journal of the European Community: Maastricht, the Netherlands, 2000. [Google Scholar]
- Blanco, S.; Becares, E. Are biotic indices sensitive to river toxicants? A comparison of metrics based on diatoms and macro-invertebrates. Chemosphere
**2010**, 79, 18–25. [Google Scholar] [CrossRef] [PubMed] - Smol, J.; Stoermer, E. The Diatoms: Applications for the Environmental and Earth Sciences; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
- Bueno, G.; Deniz, O.; Pedraza, A.; Salido, J.; Cristobal, G.; Saul, B. Automated Diatom Classification (Part A): Handcrafted feature approaches. Appl. Sci.
**2017**, in press. [Google Scholar] - Dimitrovski, I.; Kocev, D.; Loskovska, S.; Dzeroski, S. Hierarchical classification of diatom images using ensembles of predictive clustering trees. Ecol. Inform.
**2012**, 7, 19–29. [Google Scholar] [CrossRef] - Krizhevsky, A.; Sutskever, I.; Hinton, G. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2012; pp. 1097–1105. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Du Buf, H.; Bayer, M. Automatic Diatom Identification; Series in Machine Perception and Artificial Intelligence; World Scientific Publishing Co.: Munich, Germany, 2002. [Google Scholar]
- Pappas, J.; Stoermer, E. Legendre shape descriptors and shape group determination of specimens in the Cymbella cistula species complex. Phycologia
**2003**, 42, 90–97. [Google Scholar] [CrossRef] - Lai, Q.T.; Lee, K.C.; Tang, A.H.; Wong, K.K.; So, H.K.; Tsia, K.K. High-throughput time-stretch imaging flow cytometry for multi-class classification of phytoplankton. Opt. Express
**2016**, 24, 28170–28184. [Google Scholar] [CrossRef] [PubMed] - Gonzalez, R.; Woods, R. Digital Image Processing; Pearson Prentice Hall: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res.
**2011**, 12, 2121–2159. [Google Scholar] - Nesterov, Y. Gradient Methods for Minimizing Composite Objective Function; Technical Report; University College London: London, UK, 2007. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett.
**2006**, 27, 861–874. [Google Scholar] [CrossRef]

**Figure 8.**Original dataset main confusions. Each row shows an example of the true species, a missclassified sample between two species and a sample of the second one are shown.

**Figure 12.**Segmented dataset main confusions. Each row shows an example of the true species, a missclassifed sample between two species and a sample of the second one are shown.

**Figure 16.**Normalized dataset main confusions (part 1). Each row shows an example of the true species, a missclassifed sample between two species and a sample of the second one are shown.

**Figure 17.**Normalized dataset main confusions (part 2). Each row shows an example of the true species, a missclassifed sample between two species and a sample of the second one are shown.

Year [Reference] | Num. Species | Num. Samples | Num. of Features and Type | Classifier | Accuracy (%) |
---|---|---|---|---|---|

2002 [8] | 37 | 781 | 321 from Geometrical, Textural, | Bagging Tree | 96.9 |

Morphological and Frequency | |||||

2003 [9] | 1 | 66 | 10 Morphological | Multiple Discriminant Analysis | 80.3 |

2012 [5] | 38 | 837 | 30 Morphological and 200 Texture | Random forest | 97.97 |

48 | 1019 | 30 Morphological and 200 Texture | Random forest | 97.15 | |

55 | 1098 | 30 Morphological and 200 Texture | Random forest | 96.17 | |

2016 [10] | 14 | 10,000 | 4 Geometrical 7 Moments and 33 Morphological | SVM | 94.7 |

2017 [4] | 80 | 24,000 | 273 Morphological, Statistical, Textural, Space-Frecuency | Bagging Tree | 98.1 |

Proposed | 80 | 24,000 | CNN-AlexNet | Softmax | 95.62 |

Proposed | 80 | 160,000 | CNN-AlexNet | Softmax | 99.51 |

1. Achnanthes subhudsonis | 984 | 2. Achnanthidium atomoides | 1032 |

3. Achnanthidium caravelense | 472 | 4. Achnanthidium catenatum | 1496 |

5. Achnanthidium druartii | 744 | 6. Achnanthidium eutrophilum | 776 |

7. Achnanthidium exile | 784 | 8. Achnanthidium jackii | 1000 |

9. Achnanthidium rivulare | 2440 | 10. Amphora pediculus | 936 |

11. Aulacoseira subarctica | 904 | 12. Cocconeis lineata | 648 |

13. Cocconeis pediculus | 392 | 14. Cocconeis placentula var euglypta | 936 |

15. Craticula accomoda | 688 | 16. Cyclostephanos dubius | 680 |

17. Cyclotella atomus | 792 | 18. Cyclotella meneghiniana | 824 |

19. Cymbella excisa var angusta | 632 | 20. Cymbella excisa var excisa | 1928 |

21. Cymbella excisiformis var excisiformis | 1136 | 22. Cymbella parva | 1416 |

23. Denticula tenuis | 1448 | 24. Diatoma mesodon | 920 |

25. Diatoma moniliformis | 1072 | 26. Diatoma vulgaris | 704 |

27. Discostella pseudostelligera | 656 | 28. Encyonema minutum | 960 |

29. Encyonema reichardtii | 1216 | 30. Encyonema silesiacum | 864 |

31. Encyonema ventricosum | 808 | 32. Encyonopsis alpina | 848 |

33. Encyonopsis minuta | 712 | 34. Eolimna minima | 1392 |

35. Eolimna rhombelliptica | 1056 | 36. Eolimna subminuscula | 752 |

37. Epithemia adnata | 576 | 38. Epithemia sorex | 680 |

39. Epithemia turgida | 744 | 40. Fragilaria arcus | 744 |

41. Fragilaria gracilis | 432 | 42. Fragilaria pararumpens | 592 |

43. Fragilaria perminuta | 712 | 44. Fragilaria rumpens | 392 |

45. Fragilaria vaucheriae | 656 | 46. Gomphonema angustatum | 688 |

47. Gomphonema angustivalva | 440 | 48. Gomphonema insigniforme | 720 |

49. Gomphonema micropumilum | 712 | 50. Gomphonema micropus | 936 |

51. Gomphonema minusculum | 1264 | 52. Gomphonema minutum | 744 |

53. Gomphonema parvulum f saprophilum | 416 | 54. Gomphonema pumilum var elegans | 1024 |

55. Gomphonema rhombicum | 512 | 56. Humidophila contenta | 840 |

57. Karayevia clevei var clevei | 672 | 58. Luticola goeppertiana | 1088 |

59. Mayamaea permitis | 320 | 60. Melosira varians | 1168 |

61. Navicula cryptotenella | 1088 | 62. Navicula cryptotenelloides | 856 |

63. Navicula gregaria | 400 | 64. Navicula lanceolata | 616 |

65. Navicula tripunctata | 792 | 66. Nitzschia amphibia | 992 |

67. Nitzschia capitellata | 984 | 68. Nitzschia costei | 576 |

69. Nitzschia desertorum | 568 | 70. Nitzschia dissipata var media | 648 |

71. Nitzschia fossilis | 608 | 72. Nitzschia frustulum var frustulum | 1808 |

73. Nitzschia inconspicua | 2040 | 74. Nitzschia tropica | 520 |

75. Nitzschia umbonata | 728 | 76. Rhoicosphenia abbreviata | 752 |

77. Skeletonema potamos | 1240 | 78. Staurosira binodis | 752 |

79. Staurosira venter | 696 | 80. Thalassiosira pseudonana | 560 |

Layer Type | Size | Number of Kernels | Number of Neurons |
---|---|---|---|

Image input | 224×224×3 | 150,528 | |

Convolution | 11×11×3 | 96 | 253,440 |

ReLU | |||

Channel normalization | |||

Pooling | |||

Convolution | 5×5×48 | 256 | 186,624 |

ReLU | |||

Channel normalization | |||

Pooling | |||

Convolution | 3×3×256 | 384 | 64,896 |

ReLU | |||

Convolution | 3×3×192 | 384 | 64,896 |

ReLU | |||

Convolution | 3×3×192 | 256 | 43,264 |

ReLU | |||

Pooling | |||

Fully connected | 4096 | ||

ReLU | |||

Dropout | |||

Fully connected | 4096 | ||

ReLU | |||

Dropout | |||

Fully connected | 80 | ||

Softmax | |||

Classification |

Dataset | Samples per Class | Mean Accuracy (%) | Standard Deviation |
---|---|---|---|

Original | 300 | 96.35 | 0.44 |

Original | 700 | 98.64 | 0.13 |

Original | 1000 | 99.24 | 0.09 |

Dataset | Samples per Class | Mean Accuracy (%) | Standard Deviation |
---|---|---|---|

Segmented | 300 | 95.62 | 0.48 |

Segmented | 700 | 98.27 | 0.15 |

Segmented | 1000 | 98.81 | 0.15 |

Dataset | Samples per Class | Mean Accuracy (%) | Standard Deviation |
---|---|---|---|

Normalized | 300 | 93.23 | 0.26 |

Normalized | 700 | 97.55 | 0.12 |

Normalized | 1000 | 98.84 | 0.15 |

Dataset | Samples per Class | Mean Accuracy (%) | Standard Deviation |
---|---|---|---|

Original + Normalized | 300 | 92.69 | 0.41 |

Original + Normalized | 700 | 96.91 | 0.25 |

Original + Normalized | 1000 | 98.22 | 0.17 |

Original + Normalized | 2000 | 99.51 | 0.048 |

Class | Sensitivity | Class | Sensitivity |
---|---|---|---|

Encyonopsis alpina | 0.93 | Navicula cryptotenella | 0.99 |

Eolimna minima | 0.96 | Nitzschia fossilis | 0.99 |

Achnanthidium rivulare | 0.97 | Rhoicosphenia abbreviata | 0.99 |

Encyonema reichardtii | 0.975 | Achnanthes subhudsonis | 0.995 |

Nitzschia amphibia | 0.98 | Achnanthidium atomoides | 0.995 |

Achnanthidium catenatum | 0.985 | Achnanthidium eutrophilum | 0.995 |

Encyonopsis minuta | 0.985 | Achnanthidium jackii | 0.995 |

Gomphonema insigniforme | 0.99 | Cyclostephanos dubius | 0.995 |

Achnanthidium exile | 0.99 | Cyclotella atomus | 0.995 |

Amphora pediculus | 0.99 | Eolimna rhombelliptica | 0.995 |

Cymbella excisa var excisa | 0.99 | Epithemia adnata | 0.995 |

Gomphonema angustivalva | 0.99 | Fragilaria gracilis | 0.995 |

Gomphonema micropus | 0.99 | Gomphonema pumilum var elegans | 0.995 |

Gomphonema minutum | 0.99 | Navicula cryptotenelloides | 0.995 |

**Table 9.**Examples of activations in the first two convolution layers in neural network. Each row shows an example of the classified diatom and the corresponding activations.

Diatom Sample | Convolution 1 Activations | Convolution 2 Activations |
---|---|---|

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pedraza, A.; Bueno, G.; Deniz, O.; Cristóbal, G.; Blanco, S.; Borrego-Ramos, M.
Automated Diatom Classification (Part B): A Deep Learning Approach. *Appl. Sci.* **2017**, *7*, 460.
https://doi.org/10.3390/app7050460

**AMA Style**

Pedraza A, Bueno G, Deniz O, Cristóbal G, Blanco S, Borrego-Ramos M.
Automated Diatom Classification (Part B): A Deep Learning Approach. *Applied Sciences*. 2017; 7(5):460.
https://doi.org/10.3390/app7050460

**Chicago/Turabian Style**

Pedraza, Anibal, Gloria Bueno, Oscar Deniz, Gabriel Cristóbal, Saúl Blanco, and María Borrego-Ramos.
2017. "Automated Diatom Classification (Part B): A Deep Learning Approach" *Applied Sciences* 7, no. 5: 460.
https://doi.org/10.3390/app7050460