Automated Diatom Classiﬁcation (Part A): Handcrafted Feature Approaches

: This paper deals with automatic taxa identiﬁcation based on machine learning methods. The aim is therefore to automatically classify diatoms, in terms of pattern recognition terminology. Diatoms are a kind of algae microorganism with high biodiversity at the species level, which are useful for water quality assessment. The most relevant features for diatom description and classiﬁcation have been selected using an extensive dataset of 80 taxa with a minimum of 100 samples/taxon augmented to 300 samples/taxon. In addition to published morphological, statistical and textural descriptors, a new textural descriptor, Local Binary Patterns (LBP), to characterize the diatom’s valves


Introduction
Diatoms are a major group of algae and are among the most common microorganisms in marine and freshwater habitats. They are important contributors to the primary production in aquatic ecosystems, placed at the bottom of the food chain. The diatoms have been shown to be increasingly important worldwide in studies related to climate change, as well as in the development of functions that allow the modeling of such change. Moreover, they are good indicators of environmental conditions and are commonly used in water quality assessment [1,2].
Diatom indices are known to correlate more significantly with water chemical variables, within continental waters, while macroinvertebrate-or plant-based methods are more sensitive to changes affecting structural parameters [3].
Diatoms have several advantages over other indicators that make them ideal as indicators of water quality. These features are: (a) their ability to spread over a variety of habitats; (b) they are relatively easy to sample, and such sampling has no impact on the ecosystem during collection; (c) they have a quick response to variation in environmental conditions; and (d) they are sensitive to changes in environmental conditions that may not be observed in other communities.
Diatom cells are enclosed within a unique silica cell wall known as a frustule made up of two valves, which fit into each other like a pill box. The frustules show a wide diversity of shapes and sizes, though mainly, they can be divided into centric diatoms (radial symmetry, rounded) and pennate diatoms (bilateral symmetry, elongated). The main taxonomic features used in diatom identification are related to the morphology and ornamentation of the frustule. The presence of raphe and the ornamentation of the frustule, stigmata and other features are important in identifying these organisms [4].
The diatom size, in the range of 2-2000 µm, is suitable for observation of most species using an optical microscope. However, diatoms require specialized skills for their classification, that is trained diatomists (phycologists specializing in diatom taxonomy). As stated in [5], more than 200,000 diatom species are estimated to exist, although just half of them have been described. Several intercalibration tests have shown that the results of biological indices based on diatoms are highly sensitive to the level of accuracy in taxonomic classification [6]. The identification task is very difficult due to the huge number of species estimated to exist [7].
Classical taxonomic diagnosis is performed using key features or by visual comparison with type samples or reference iconographies [8]. The conventional approach is to analyze these microalgae using light microscopy (brightfield, DIC, RIC, etc.). Diatom-based metrics are calculated based on the relative abundance of different taxa in an assemblage and the autecological parameters characterizing each species. However, the current manual analysis of images is tedious, requiring highly qualified staff, and it is time consuming. This is the case when diatoms are used in the context of water quality, as in this study. According to a European directive, in order to compute an index score, the identification of a minimum of 400 valves per sample is required [9].
In the case of transparent specimens such as diatoms, brightfield microscopy presents some difficulties. Some details are barely distinguishable from the background, and some other alternative modalities such as phase contrast, DIC or dark field need to be considered. There are few species of the Nitzschia, which are good examples of that (see Nitzschia costei, Nitzschia frustulum var frustulum and Nitzschia inconspicua in Figure 2, Images 68, 72 and 73). This is probably due to a not well-developed silicification process.
Phase contrast microscopy is a suitable technique for visualizing transparent specimens. However, it produces some artifacts (halos and shading-off) that limit its usefulness in some applications. Halos are unresolved images with reverse contrast, and shading-off is a contrast-decreasing effect from the edge of the specimen towards the center of it [10]. DIC microscopy is also a popular method for improving the contrast of unstained specimens. DIC is absent of the halo effect found in phase contrast microscopy producing pseudo-relief images that can be understood as the derivative of the optical path length defined as product of the index of refraction times the specimen thickness. In dark field microscopy, the specimen is illuminated with a hollow cone of light, which is too wide to enter the objective lens. Dark field is a suitable modality for diatom visualization that can be a substitute of phase contrast or DIC in many cases, although both are out of the scope of this work. Scanning electron microscopy is another suitable modality, especially in taxonomy for revealing structural details [11], which is also out of the scope of this paper. On the other hand, most of the diatom databases that are publicly available use brightfield microscopy.
Other difficulties attached to microscopy are: images partially focused and multiple orientations (views). Both are related to the projection of 3D objects into 2D images. The main challenges faced by automatic identification and classification methods are due not only to the high number of species to recognize, but to the great similarities between them and even the presence of polymorphisms within species. In some cases, analysis is simply unfeasible due to the huge amount of information and images involved. Advances in digital microscopy and image analysis systems offer a potentially advantageous solution compared to manual methods of counting and classification.
Before going further, it is worth pointing out the difference between classification and identification in terms of biology and pattern recognition terminology. In biology, it may be said that the term identification answers the question: 'What is the name of the taxon in front of me?'. However, classification answers questions of the sort: How is this taxon related to other taxa? According to the pattern recognition terminology, a class is a set of objects that within a given context is recognized as similar. Such a class has usually a unique name, the class name. The individual objects within a class have a label that refers to this name. Additionally, classification is the assignment of a class name to an object by evaluating a trained classifier for that object. Therefore, in this study, our objects or classes are the diatoms, and we will classify them to give them a label with their taxon name. Henceforth, a diatom taxa will be referred to as a class.
A good review of morphometric methods for shape analysis and landmark-based analysis used in diatom research is presented in Pappas et al. [12]. Another attempt to describe the morphology and geometry of diatoms is the work of Kloster et al. [13], who developed a system that allows the segmentation and feature extraction from diatom contours, although it does not provide textural information about the frustule. As mentioned by Pappas et al. [12], two areas of study may be identified with regard to the methods used in diatom research, namely shape analysis and pattern recognition. We will focus on pattern recognition or machine learning methods, but notice that the traditional machine learning methods are based on a previous definition of a set of features describing the objects to be classified that may or may not have a biologically-meaningful interpretation.
Methods for diatom detection and identification have been studied in Cairns et al., 1979 [14], Culverhouse et al., 1996 [15], Pech-Pacheco and Alvarez-Borrego, 1998 [16], Pech-Pacheco 2001 [17]. Cairns et al. [15] have proposed some diatom identification methods based on coherent optics and holography. However, such work did not have any impact on diatom research mainly due to the fact that the identification system was too specialized and probably too expensive to be used by a diatomist community [18]. Culverhouse et al. [15] derived some methods for phytoplankton identification based on neural networks, but again, they do not provide a fully-automatic method. Pech-Pacheco et al. [16] have proposed a hybrid optical-digital method for the identification of five different species of phytoplankton through the use of operators invariant to translation rotation and scale. Pappas and Stoermer 2003 [19], used form descriptors by Legendre polynomials and principal component analysis in the identification of the Cymbella cistula species.
An important attempt to automate diatom classification was conducted for the ADIAC project (Automatic Diatom Identification And Classification) [18,20]. Several accuracy results were reported with a database composed of different numbers of diatom taxa ranging from 37-55 classes. In ADIAC, 171 features were used for diatom classification. These features are intended to describe the diatom symmetry, shape, geometry and texture by means of different descriptors, such as: rectangularity, circularity, compactness, shape of poles, length, width, length-width ratio, size, stria density orientation, horizontal frequency, Gray-Level Co-occurrence Matrix (GLCM), moment invariants, Gabor wavelets, Fourier and Scale-invariant Feature Transform (SIFT) descriptors. The classifiers that perform better are bagging of decision trees and random forest of predictive clustering trees, all of them evaluated with 10-fold cross-validation (10 fcv). The best results, up to 97.97% accuracy, were obtained with 38 classes using Fourier and SIFT descriptors with random forest. Performance decreased down to 96.17% when classifying 55 classes with the same descriptors and classifier [21].
New techniques based on Convolutional Neural Networks (CNN) have also been explored to classify sea plankton (Kuang 2015 [22], and Dai et al., 2016 [23]). Note, however, that these images are different than those studied in this paper since this type of plankton varies from phytoplankton (diatoms). In [22], a database of 30,000 images belonging to 121 classes was used. The results were poor with a maximum performance of 73.90%. In [23], a database of 30,000 images belonging to 33 classes was used. They obtained an accuracy up to 96.3%. The other work related to CNN applied to diatoms is the one published by the authors (see the next paper companion [24]). The work presented here and the methodology have been compared to the CNN approach [24].
Thus, most of the efforts are still carried out with handcrafted approaches or "hand-designed" methods where a set of fixed features is used. That is, the methods rely on expert knowledge to extract the most relevant features versus CNN approaches that learn features from data. However, still, the handcrafted methods present limited results as in [25], where 14 classes were classified with Support Vector Machine (SVM) 10 fcv, using 44 GLCM features that describe only geometric and morphological properties. They obtained an accuracy of 94.7%. Therefore, the automated classification of diatoms (in terms of pattern recognition) or taxon identification remains a challenge. At present, there is no system capable of taking into account variations in both the contour and the texture in a relatively large number of species. One of the reasons is the difficulty in acquiring a big dataset of tagged data with a sufficient number of samples per species. The classification of diatoms is tedious and laborious, even for an expert diatomist.
In this paper, we present a complete study of relevant features to describe and classify diatoms. The main purpose is to define the most discriminant features and to make a comparison of classifiers based on these features versus the CNN approach. For that, we collected an important database of 80 diatom taxa with 300 samples per taxon described in Section 2. The database was composed of an average of 100 distinct diatoms per class and augmented by means of computational simulations up to 300 samples per class. In order to extract the main diatom features, we propose in Section 3 a segmentation to apply descriptors to the contour and inner diatom regions. In Section 4, a complete list of descriptors is provided. Those are handcrafted features that describe, in terms of computer vision, the discriminant properties of diatoms. Sections 5 and 6 describe classification strategies and some classifiers. Experimental results are presented in Section 7 where an overall accuracy of 98.11% is presented, which improves previous related works. Finally, Section 8 concludes the paper addressing unresolved challenging problems.

Materials: Dataset Preparation
Once having collected the diatom samples from the rivers, the chemical treatment of the sample is carried out in the laboratory with hydrogen peroxide (120 vol.), which causes the digestion of the organic matter and allows one to obtain suspensions of frustules and valves free of organic remains. The process is done at a temperature of 70-90 • C, to accelerate the reaction. A few drops of the sample are taken and deposited in a round coverslip. After evaporation of the water, the diatom frustules remain in the cover-objects. Then, using a synthetic resin (Naphrax) with an optical refractive index of 1.7, diatoms are attached to the glass slide for later classification under brightfield microscopy following standard protocols [9].
The 80 diatom species studied here were collected during the years 2003 and 2015 from the Duero Basin water in Spain [26]. Those are the 80 dominant taxa in terms of relative abundance and occurrence. A Westbury SP/40 Brunel microscope and a Brunel AMA 050 camera were employed to capture the images at 60× magnification, with a numeric aperture of 0.85 and a physical resolution of 7.91 pixels/µm. An average of 100 distinct diatom valves per each diatom class were then manually cropped and labeled by an expert diatomist. The exact number of the cropped diatom valves is shown in Table 1. To complete the dataset with up to 300 image samples per taxon, a data augmentation was performed by means of applying rotations of 90 • , 180 • , 270 • and up-down and right-left flips to the cropped images. These 6 transformations were performed on the original images and only if needed, to obtain up to 300 image samples per class. Thus, we end up with 24,000 images to classify into 80 diatom taxa.
Data augmentation aims at increasing the number of images in the dataset by representing image data in different orientations. That is, different copies of the same image are made, but from different perspectives or visual angles. Rotating and mirroring the data in different orientations may eventually help with identifying a similar object in different orientations. A more robust classifier will be obtained if the data are randomly rotated in multiple orientations.
In Figure 1, a capture with a manually-selected diatom is shown. The elements that form the ornamentation of the frustule in a diatom are illustrated in the cropped diatom. Features of the stria are key in diatom taxonomy, such as: areola and lineolae. Areola is a perforation (or pore) in the diatom valve, and lineolae are areola elongated in the apical direction. The lineolae density is calculated in ppm (pixels per micron). Every sample was manually cropped to ensure the best diatom samples avoiding as much as possible nearby samples or debris. The list of the 80 species with the number of original selected images is indicated in Table 1, and some examples are depicted in Figure 2. The number of the taxon corresponding to the one listed in Table 1 is shown in the upper left corner of each picture. The database can be obtained by request (see the contact at http://aqualitas-retos.es/en/), and it will be publicly available at the end of the project. Figure 3 shows the same diatom species at different views ("valvar view" and "girdle view") and sizes. Due to the deposition process of the sample, in most situations, the diatoms appear in valvar view, although sometimes appear in lateral view (less than 10% of cases).

Valve Segmentation: Binary Thresholding
There are several works about diatom detection mainly related to the above-mentioned ADIAC project. It is out of the scope of this paper to present all of the segmentation methods; for further details, the reader is referred to the reviews mentioned in Section 1 [12,13,18]. In this work, we present an automatic method used to do an initial quick segmentation, which required visual supervision afterward to include only the correct segmented diatoms. Some of the images have been acquired with low contrast and background noise, which produces a poor segmentation in terms of valve overlapping with other structures. That means visual supervision is needed for discarding segmentation errors. The valve is the most significant region of the diatoms where structural differences can be distinguished. Therefore, the segmentation process should accurately extract such a region for extracting relevant features. A proper segmentation of the valve is expected to affect textural, frequential and statistical descriptors. This segmentation is done here by means of a binary thresholding where the segmented region is the binary masks where descriptors must be computed.
The process to obtain the binary masks consists of four steps: 1. Binary thresholding: automatic segmentation based on Otsu's thresholding.

2.
Maximum area: calculation of the largest region (area).

3.
Hole filling: interior holes are filled if present, using mathematical morphology operators.

4.
Segmentation: the ROI is cropped with the coordinates of the bounding-box of the largest area ( Step 2).
Due to the presence of debris and overlapping with different diatom taxa (or debris), the segmentation was afterward manually checked to discard bad segmented diatoms or finish correction; see some examples of the segmented diatoms and their masks in Figure 4.

Diatom Handcrafted Feature Descriptors
An effort must be made in translating the knowledge of the diatomists to distinguish the different diatom species and describe the most relevant features in terms of computer vision and pattern classification. The goal is to mimic human perception and the ability to recognize a 3D object from a 2D image, in this case diatoms. Even if it is sometimes unclear what features are used by the expert to distinguish among very similar diatom species, we proposed and described diatom features in terms of automatic pattern classification.
Along the next subsections, the handcrafted features are presented in groups according to their formulation with a brief explanation. The different groups of descriptors are indicated in Table 2. A total of 1460 descriptors is computed, and all of them are calculated uniquely in those pixels belonging to the segmented binary masks.

Morphological Descriptors
Morphological features related to frustule's contour and area are computed from the binary masks.

Area
This descriptor is calculated as the sum of pixels in the binary mask (B ∈ (0, 1)) of size MxN:

Eccentricity
These descriptors reflect elongation in relation with the binary mask's center of mass, also called the centroid and defined as: The first Eccentricity 1 is defined as a quotient of the maximum and minimum distance between the centroid and binary mask's border (i.e., frustule's contour), also called outer and inner circumference radius.
Similarly, Eccentricity 2 is calculated as the quotient of the semi-axes of the best fitting ellipse for the mask, and Eccentricity 3 is the ratio of the inertia moments of the two semi-axes of the best fitting ellipse. The moments are defined in Section 4.4.

Perimeter
This descriptor is the number of pixels that belong to the diatom's outline (i.e., a pixel belongs to the perimeter if it is nonzero and is connected with at least one pixel equal to zero).

Shape
This descriptor is a measure of the elongation of an object. It is given by:

Fullness
This descriptor is the ratio of the mask area to the bounding box area.

First Order Statistical: Histogram
These descriptors, listed in Table 3, calculate common statistics in the image histogram h(i) calculated on a 255-bin H. This group of descriptors is sensible to variations of gray pixel levels, but they ignore their local correlation. Table 3. First-order statistical descriptors.

Second Order Statistical: Co-Occurrence Matrix
The co-occurrence matrix c(m, n), which is defined as the distribution of co-occurring pixel values at a given distance (d) and direction ( • ), can be used for measuring the texture of an image. The distances and direction used in this study are d equal to 1, 3 and 5 pixels and ( • ) equal to 0 • , 45 • , 90 • and 135 • (see Table 2). Feature descriptors extracted from co-occurrence matrices are also called Haralick features (Haralick) [27]. The 19 second order statistical features used in this study are listed in Table 4. Table 4. Second order statistical descriptors.
H bins number, HX and HY entropy of p x and p y .

Local Binary Patterns
The Local Binary Pattern (LBP) operator is based on the idea that textural properties within homogeneous regions can be represented as patterns [28][29][30][31]. These patterns represent micro-features. It analyzes a "texture spectrum", e.g., using a 3 × 3 mask and comparing their values with the central pixel. The pixels with a lower value than the central one are labeled with "0", otherwise with "1". A linear combination is applied where the labeled pixels are multiplied by a fixed weighting function and summed to obtain a label: LBP(g c ) = ∑ 7 p=0 s(g p − g c )2 p , where {g p |p = 0, . . . , 7} are the neighbors of g c , and the comparison function is defined as: The mask may be defined using a circular neighborhood [32], denoted by (P, R), where P is the number of sampling points and R is the radius of the neighborhood. Ojala et al. [32] observed that over 90% of patterns can be described with few LBP patterns, so they introduced a uniformity measure U(LBP P,R (g c )) = |s(g P−1 − g c ) − s(g 0 − g c )| + ∑ P−1 p=1 |s(g p − g c ) − s(g p−1 − g c )|, which corresponds to the number of transitions (0/1) in the labeled LBP.
In this way, the uniform-LBP (LBP uni P,R ) can be obtained as: After the LPB operator, a labeled image is obtained. Once this process is completed, the pixel-wise information from the labeled image is encoded as a histogram, so that it can be interpreted as a fingerprint of the analyzed diatom area. LBP uni P,R produces (P + 2)-bin histograms [33] where the statistical descriptors are computed. As far as the authors know, LBP has never been tested before for diatom classification. Figure 5 shows some examples of LBP corresponding to some diatom species.

Hu Moments
Image moments provide a shape description both morphologically and statistically [34]. Hu moments are invariant with respect to translation, scale and rotation and can be generated from the central moments. The central moments for an image g(m, n) can be formulated as follows: where m c = M 10 M 00 and n c = M 01 M 00 are the components of the centroid and M pq raw moments M pq = ∑ m ∑ n m p · n q · g(m, n).

Texture Descriptors in the Space-Frequency Domain
The previously-introduced statistical descriptors are calculated in the space-frequency domain. It is assumed that features, somehow hidden, arise with higher visibility in this domain. Thus, some transformations must be applied to the image to analyze its properties in this domain. In this study, a log-Gaborfilters were used to characterize the texture of diatoms. Then, for every transformed domain or sub-band, a total number of 964 statistical descriptors is calculated; see Section 4.2.

Log Gabor Transform
Log-Gabor filters are defined in the frequency domain as Gaussian functions shifted from the origin to avoid the singularity of the log function. In addition, the Gaussian envelope is modulated by a complex exponential with even and odd phases, which is effective for characterizing edges. Here, log Gabor filters proposed by Fischer et al. [35] have been used. The log Gabor descriptor is based on the energy calculated at every scaled level: where F −1 is the inverse Fourier transform, G lo is the log Gabor filter with L scales and O orientations in log-polar coordinates and (ρ, θ) and (σ ρ , σ θ ) are the angular and radial bandwidths; see [35] for more details. Again, L = 4, O = 6, and the residual DC-component is discarded. Figure 6 shows the log Gabor filters (four bands: G1, G2, G3 and G4) applied to a diatom example.

Discriminant Analysis
The handcrafted feature descriptors described previously produce 1460 characteristics. However, not all of the features are discriminating for the problem we are faced with, due to the fact that they are describing properties that are widely spread along all classification groups or because they are redundant (correlated) with respect to other features. In that case, such descriptors provide useless information that will likely impair classification not only in terms of performance and accuracy, but also in terms of speed due to the higher dimensionality [36,37]. Therefore, a feature selection process is required to remove redundant information. To this end, we used the correlation coefficient as the similarity measure between two or more features.

Correlation
The entire bank of first and second order statistical descriptors is calculated and extracted from the LBP labeled image and from each decomposition band of the log Gabor transformed image. This means that the total number of descriptors becomes four-times larger given a four-level decomposition transform. This approach increases the workload and may cause highly correlated variables.
Correlated variables have been calculated for unsupervised feature selection using the maximal information compression index as the feature similarity measure [38]. The maximal information compression index is defined as the smallest eigenvalue, λ, of the covariance matrix of the set of variables under consideration. λ is zero when the features are linearly dependent and increases as the amount of dependency decrease. A threshold value equal to 95% is used from which features are considered redundant. Classification accuracy, according to calculations, was not significantly affected if varying the threshold from 95-99%.
An overall 81% dimensionality reduction was achieved with this technique. Thus, the 1460 initial descriptors were reduced to 273.

Sequential Forward Feature Selection
In order to elucidate the most discriminant descriptors, a discriminant analysis was done by means of a Sequential Forward Feature Selection (SFFS), also known as the floating search method [39]. The algorithm selects a subset of features that best predict the objects to be classified. Thus, features are sequentially added to an empty candidate set until the addition of further features does not decrease the misclassification error rate (i.e., the number of misclassified observations divided by the number of observations) of a learning algorithm (quadratic discriminant analysis). Another variant of the method is Backward Selection (SBS), in which features are sequentially removed from a full candidate set until the removal of further features increases the misclassification error.
This methodology gives a list of features ordered by discrimination capacity. In Table 5, such a list is shown for the first 100 most discriminant handcrafted features. The percentage of morphological descriptors is only 4%, although two of them are on the top. For the rest of descriptors, the percentages are as follows: statistical (28%); logGabor (40%) and LBP (30%).

Classification
A classifier is a function that maps the features extracted from descriptors to an output probability. This output is the probability that the input features belong to a given class. There are many approaches to develop these functions; it is out of the scope of this paper to explain these methods in detail. For this, the reader is referred to [40]. The purpose here is to find a suitable set of discriminant features and the classifier that best maps these features to the correct taxon. Thus, we have compared an extensive range of classifiers like nearest-neighbor, k-means and SVM, Decision Trees (DT) by means of random forest and bagging trees, the quadratic Bayes normal classifier and the Fisher classifier. The methods used here are all supervised learning methods. The best results were obtained with SVM and bagging decision trees. SVM achieved an accuracy up to 95.38% and bagging DT up to 98.11%; therefore, we show here the results of the latest method.
To train and test the algorithms, a 10-fold cross-validation has been applied, that is the full set of images was split into a training and a test set, where nine folds was used for training and the remaining one in each iteration for testing. Both the training and test processes are based on a random selection of samples; therefore, the ordering of the taxa does not affect the accuracy. Experiments with leave-one-out were done for 20 classes, and the results were very similar, with just 0.04% less accuracy than using 10 fcv; however, since the process takes too long for more classes, we decided to report the widely-used 10 fcv for the 80 classes.

Bagging Trees
A Decision Tree (DT) is a method in which classification is performed through a tree graph. The input feeds an initialization node (root node) from which a given test sample is tested at each stage (internal node) of the classification tree, all the way down through the leaves or internal node to the end of a tree branch or terminal node. The 'path' followed by the sample depends on the conditions associated with each internal node. These conditions are established during training rather manually or automatically. To select automatically the optimal conditions, DT algorithms consist of testing all potential variables and selecting the variable that maximizes a given criterion.
The bagging tree classifier is used to improve robustness and classification accuracy. Bagging improves variance by averaging/majority selection of the outcome from multiple fully-grown trees on variants of the training set. It uses bootstrap with replacement to generate multiple training sets. All trees are fully-grown binary trees (unpruned), and at each node in the tree, one searches over all features for splitting a node, that is to find the feature that best splits the data at that node [41]. In this study, a bagging tree with 200 learners and 30 splits provided the best results. All of the experiments were done using the classification learner apps of MATLAB R2016a.

Results
The results are organized into several experiments to show concrete aspects of descriptors and classifiers. To this end, the list of handcrafted feature descriptors (1460 in total) after feature reduction with 95% correlation (273 features) and the selected classifier (bagging DT) are used. Notice that the morphological, statistical and moment-based features are invariant to rotation and mirroring, but not the LBP and log Gabor features used in this study. Therefore, in total, only 255 features out of the 1460 are invariant to rotation and mirroring. This is reduced to 42 features after the correlation process, which means that only 15% of features are invariant to the data augmentation performed.
While data augmentation may introduce some bias in the experiments, mainly related to the invariant features, our aim in adding the augmentation was to compare classic methods with deep learning (see [24]), for which the same augmentation is done. Moreover, as mentioned above, most features (85%) are not invariant to rotation and mirroring, and therefore, data represented with these features can be considered without bias.

Experiment 1: Comparing Descriptor Types
All handcrafted feature descriptors mentioned above were tested here according to their category (see Table 2). The selected classifier was bagging DT.From the plot in Figure 7, most descriptors lead to similar classification results when they operate individually, with an overall accuracy around 95% for bagging DT. Furthermore, two of them are above the average (95%); these are LBP and statistical, though the moments are very low. This supports morphological and textural features. Thus, it corroborates that macro-features' analysis obtained from morphological descriptors and textural patterns already provides competitive accuracy compared to local micro-feature analysis done by spatio-frequential descriptors. Experiment 1, shown in Figure 7, has been carried out with 100 samples per class to avoid any bias of the data. A similar conclusion was obtained with 300 samples per class.

Experiment 2: Combinations of Descriptor Types
The discrimination capacity of the classifiers may be enhanced by the combination of different types of descriptors. In Figure 8, morphological and statistical descriptors provide high accuracy rates. These descriptors bring together the two main handcrafted discriminating features: shape and texture. Thus, morphological and statistical descriptors constitute a baseline for comparison. The results show that statistical descriptors combined with morphological descriptors provided together an improvement of 3%. The combination of statistical with texture (LBP) improves the performance of the morphological descriptors. This proves the relevance and importance of textural features versus morphological properties. The combination of statistical with the space-frequency descriptors provided an overall accuracy improvement of 2%, while the combination of all features (textural + morphological + frequency + statistical) is the most accurate with an improvement of 4%. Note that the moments were discarded due to the low individual performance they gave and also the low relevance obtained when calculating the SFFS (see Section 5.2).
Thus, the combination of morphological + textural + space-frequency + statistical descriptors provided the lowest error rate, with an overall accuracy (Acc.) equal to 98.11% using a bagging DT with 10 fcv. The combination of textural and space-frequency (LBP + Log Gabor), that is using the data represented without any augmentation bias, provided an overall accuracy of 97.63% using bagging DT with 10 fcv. This is highlighted in Figure 8.

Experiment 3: Dataset Dimension
Some questions arise at this point about the database dimensionality. How does this affect the performance? Is the classifier able to learn more with a higher number of samples? Is the classifier able to get the same performance with a lower number of data, that is without the use of augmented data? Thus, we tested with several subsets of 20 diatom types and 40 diatom types with 300 samples per diatom; a subset with the 35 classes that had a minimum of 100 samples per diatom; as well as with 80 diatom types with 2000 samples per diatom versus the previous dataset with 300 samples per diatom. To extend the dataset, a data augmentation with rotations every 2 • was performed. Figure 9 shows the results when classifying with the bagging DT and 10 fcv. It represents the minimum and maximum Acc. value obtained in the different trials. The decrease in the number of classes with 300 samples per class increased its accuracy, though not significantly. Additionally, the increase in the number of samples lowered the error. The use of 100 samples per class classifying 35 classes with no data augmentation decreased accuracy to 96.25% versus 98.11% with 300 samples per class. A similar decrease happens when classifying 20 classes, where an accuracy of 97.56% (100 samples/class without data augmentation) is obtained versus 98.82% (300 samples/class with data augmentation).
Results are usually illustrated with confusion matrices. Each column of a confusion matrix represents the instances in a predicted class while each row represents the instances in an actual class. Since there are many classes in this study, heat maps have been used to display the % of correct and incorrect classifications. A heat map displays the confusion matrix as an image whose color intensities reflect the magnitude of its values. In this case, green values indicate the % of correct classifications and pink-red values indicate the % of instances incorrectly classified. These percentages, as well as the true positive rate (green column) and false negative rate (pink-red column) for each class are shown when no more than 20 classes are classified, as well as the true positive rate (green column) and false negative rate (pink-red column).
The confusion matrix heat map for the classification of the 80 diatom classes with the bagging tree is shown in Figure 10. Figure 11 shows the confusion matrix heat map for the classification of the diatom classes without data augmentation. Figure 11 shows 20 classes chosen randomly for visualization purpose. The confusion matrix heat map of several trials done with the 20 classes is shown in Figure 12. The main diagonal of Figure 12 shows how some classes obtained 100% correct classification. Figure 13 shows those diatoms species that produce the major misclassification errors, i.e., Nitzschia costei, Gomphonema angustatum, Gomphonema micropumilum, Gomphonema minusculum and Gomphonema minutum. Automation methods may be a tool to help both the expert and the non-expert, but in difficult cases, it should always be the expert who makes the final decision. We believe that it is the diatomist who must consider if an automatic classification system is acceptable, assuming a certain level of accuracy to classify different taxa. A reject option is also possible, so that only difficult cases are presented to the diatomist, while the easy ones are classified automatically.
Notice that this methodology may be applied to other taxa not included in this study. Moreover, it is currently being applied to another 20 different taxa not shown in this study, and similar results are being obtained.

Conclusions
The main contributions of this work are: (a) big database collection: a big database of diatoms composed of 80 diatom types with an average of 100 distinct diatom valves per type manually cropped has been collected; this database is being extended to 100 types; and (b) the study of the main handcrafted features to classify diatoms, proposing an efficient method for classification.
Thus, the state of the art in morphological, statistical and texture descriptors has been exhaustively tested for classifying 80 diatom types. Some of them, like texture based on LBP and log Gabor descriptors, had not been evaluated before in this field. The combination of these features provided an improvement up to 98.11% accuracy compared to previous related works. We concluded that the combination of different descriptors categories, such as morphological and statistical descriptors, together with space-frequency representations, specifically log Gabor and texture by means of LBP, provided the best classification accuracy rates.
Along this research, we also come across several other challenging issues, such as the classification of the same diatom type during its life cycle and from different views. A diatom type may present different appearances according to its view with respect to the z-view, and consequently, its morphological and also statistical descriptors can drastically vary. This causes the morphological descriptors to not always analyze the most discriminating features. This could be solved by having more data representing all possible cases or alternatively by using a hierarchical tree classification. Thus, it could be recommended to simplify the classification using morphological refinement in two classes: centric diatoms and pennate diatoms. Afterwards, depending on the taxon, the search would focus on some other details, like texture, striae density (number of stria per micron), lineolae density (pixels per micron), etc. It should be noted that some diatom taxa are almost the same, even for experts, which proves the difficulty of this task.
Precise segmentation is a critical point for the whole classification process. This is a limitation for handcrafted feature approaches. Effectively, segmentation should be done accurately, although our purpose in the current study focuses on comparing descriptor's discriminant capacity. Therefore, we do not pursue perfect binary masks, but suitable enough to be equally shared by all descriptors.
In order to handle the above-mentioned difficulties, the authors suggest to explore new classification techniques based on deep learning (Convolutional Neural Networks (CNN)) able to learn further from larger datasets and without segmentation. In this study, it has been proven that the accuracy in traditional methods does not always improve with augmented data (it is also counterproductive due to orientation-invariant features and possible overfitting). An accuracy up to 97.56% was obtained classifying 20 classes with no data augmentation (100 samples/class) versus 98.82% with data augmentation (300 samples/class).