A Tuning Method for Diatom Segmentation Techniques

Phytoplankton such as diatoms or desmids are useful for monitoring water quality. Manual image analysis is impractical due to the huge diversity of this group of microalgae and its great morphological plasticity, hence the importance of automating the analysis procedure. High-resolution images of phytoplankton cells can now be acquired by digital microscopes, which facilitate automating the analysis and identification process of specimens. Therefore, new systems of image analysis are potentially advantageous compared to manual methods of counting for solution identification. Segmentation is an important step in the analysis of phytoplankton images. Many standard techniques like thresholding and edge detection are employed in the segmentation of diatoms and other phytoplankton, which are crucial organisms in microscopy images. However, in general, they require several parameters to be fixed beforehand by the user in order to get the best results. This process is usually done by comparing results and looking for the best parameters. To automatize this process, we propose an automatic tuning method to find the optimal parameters in an iterative procedure, called Parametric Segmentation Tuning (PST). This technique compares successive segmentation results, choosing the ones that gets the maximal similarity. In this paper, tuning is formulated as an optimization problem using a similarity function within the solution space. This space consists of the set of binary images that are generated by the segmentation technique to be tuned, where these binary images are seen as a function of the original images and the segmentation parameters. The PST technique was tested with two of the most popular techniques employed to segment phytoplankton images: the Canny edge detection and a binarisation method. The results of the thresholding technique were validated by comparing them to those of the Otsu method and the Canny method with a ground truth. They show that PST is effective to find the best parameters.


Introduction
Segmentation is a crucial step in the analysis and identification of diatoms and other phytoplankton organisms because it allows for the separation of the cells from the background.Image segmentation is commonly addressed by standard techniques, such as thresholding and edge detection, in which some parameters are usually required to be fixed beforehand.Moreover, there is not an automatic method that does not require prior knowledge of the employed technique to tune the segmentation procedure.Many segmentation methods have been proposed, but the problem cannot be completely solved, as image segmentation is an ill-posed problem without a clear unique solution.

Materials and Methods
To develop the proposed method, 50 diatom images taken from the public data Automatic Diatom Identification and Classification (ADIAC) project were used [12].The mathematical algorithms to tune the segmentation methods were developed using Matlab.The software was implemented in an Intel core i7-4500U 1.8GHz computer with 1.6 GB RAM running under Windows 8.1.The method will become publicly available through the lab webpage http://www.gatv.ssr.upm.es/~jmm/PST_8989776.zip.
The methods commonly used to segment phytoplankton usually require some parameters to be fixed by the user before their analysis and classification.The segmentation result is generally a binarised image where the background appears in black and the foreground in white or vice versa.The resulting images, obtained with different values, are usually compared between them or with a ground truth image, i.e., the image produced by an expert.This comparison is made to find values that make it possible to obtain the best segmentation.
As shown in Figure 1, by varying a parameter within a range of values in a segmentation method (Canny), results move from under-segmented images (Figure 1B,C) to over-segmented ones (Figure 1E,F), passing through an intermediate value where the best possible result is obtained (Figure 1D).It can be seen that changes between successive under-segmented and over-segmented images are more abrupt compared to those produced between images closer to the optimal result.This behaviour is expected because the optimal result seeks to get closer to what is actually seen in the original image.The under-segmented and over-segmented results correspond to the farthest values Appl.Sci.2017, 7, 762 3 of 16 from the optimum.In these results, true edges or regions are eliminated (under-segmentation) or false ones are produced (over-segmentation), introducing the abrupt changes observed between them.
Therefore, if two successive under-segmented images are compared, the first one will be less under-segmented than the next one.Assuming that the first image is the ground truth, we will have a high number of false positives and a low number of false negatives.Similarly, when comparing two successive over-segmented images, the first one will be less over-segmented than the second one.In this case, if the first image is the ground truth, we will then have a low number of false positives and a high number of false negatives.It must be mentioned that this observation is valid in diatom and phytoplankton typical images, where segmentation is developed to detect the shapes of the organisms.However, this is not valid in images where textures or complex background are observed.Based on this premise, the following method for tuning parameters is proposed.or false ones are produced (over-segmentation), introducing the abrupt changes observed between them.Therefore, if two successive under-segmented images are compared, the first one will be less under-segmented than the next one.Assuming that the first image is the ground truth, we will have a high number of false positives and a low number of false negatives.Similarly, when comparing two successive over-segmented images, the first one will be less over-segmented than the second one.In this case, if the first image is the ground truth, we will then have a low number of false positives and a high number of false negatives.It must be mentioned that this observation is valid in diatom and phytoplankton typical images, where segmentation is developed to detect the shapes of the organisms.However, this is not valid in images where textures or complex background are observed.Based on this premise, the following method for tuning parameters is proposed.Let ( , ) represent the transformation of an image I into a binary one as a result of a segmentation algorithm given a certain number r of parameters, i.e., = { , , . . .}.In the binary image, level 1 represents the object of interest, and level 0 the background.Therefore, an rdimensional solution space ⊆ ℝ generated by the transformation ( , ) can be defined.In the solution space, each coordinate is given by a parameter and each point of the space represents a binary image.
Considering image segmentation as an optimization problem, the best solution can be found by maximizing a similarity function or minimizing a distance used as cost function in .In this way, the best solution can be found by sweeping the solution space, while evaluating the cost function Let T I, → p represent the transformation of an image I into a binary one as a result of a segmentation algorithm given a certain number r of parameters, i.e., → p = {p 1 , p 2 , . . .p r }.In the binary image, level 1 represents the object of interest, and level 0 the background.Therefore, an r-dimensional solution space P r ⊆ R r generated by the transformation T I, → p can be defined.In the solution space, each coordinate is given by a parameter and each point of the space represents a binary image.
Considering image segmentation as an optimization problem, the best solution can be found by maximizing a similarity function or minimizing a distance used as cost function Ψ in P r .In this way, the best solution can be found by sweeping the solution space, while evaluating the cost function between each pair of successive binary images.Each binary image I B → p = T I, → p is obtained by modifying at least one parameter of the segmentation algorithm T I, → p .For the sake of simplicity, we will now consider the number of parameters equal to one, i.e., r = 1.In this case, the technique consists of an iterative process that attempts to minimise the error by comparing two successive segmented images I B (p = m) = T(I, p = m) and I B (p = m − 1) = T(I, p = m − 1), i.e., when the parameter takes the values m and m − 1.

Definition of the Segmentation Tuning as an Optimisation Problem
To compare each pair of successive images in the space of parameters, it is necessary to formalize the properties of the binary images.In addition, some definitions and operators are established to be applied in our tuning technique.
A binary image I of size N × M, where N and M are the width and height of the image respectively, can be represented as a binary vector α of size u = N × M. Definition 1.Let the binary set be represented by Z 2 = {0, 1}, then the u-dimensional binary space is given by: Definition 2. An element α of Z u 2 is a u-upla formed by α = (α 1 , α 1 , . . .α u ) , with α i ∈ Z 2 .This is: , the complement α is defined as the vector obtained by inverting all the elements of α: α = (α 1 , α 2 , . . .α u ) where α i = 0 if α i = 1 and 2 , the length of α is defined as a function of the binary u-dimensional space to the positive integers Z + : 2 , the norm of α is defined as a function N of the binary u-dimensional space to the positive real numbers R + Operations between the elements defined in the space Z u 2 .
Definition 6.Let α, β represent two elements of the space Z u 2 .The dot operation is defined as: Definition 7. Difference Operation Let α, β represent two elements of the space Z u 2 .The difference operation is defined as a function : It takes two binary images of α, β ∈ Z u 2 , compares their elements α i , β i and assigns them a third image α β, given by the difference between the elements in the following way: OPERATION ff i 1 0 Order rules between elements of the space 2 , an order rule indicating the similarity between the elements α, β is defined as: The following operations are necessary to define the order rules: , α i = 0 and β i = 1.Therefore, there is a false positive.

Definition 8. Matches operation
Let α, β represent elements of the space Z u 2 .The matches operation is defined as a function It takes two binary images of α, β, compares their elements α i , β i and assigns them a value, written as α β.From the matches operation, it can be observed that: p n in all the space of P r . Given , a similarity function can be defined as So that (Ψ I Bn , I B(n−1) compares I Bn with I B(n−1) in all the space of P 3 (see Definition 8).
If I Bn is congruent with I B(n−1) , then I < Bn ∼ = I B(n−1) = I * Bn is the segmented optimum image, which depends on the optimum parameter The Ψ function is an index associated to each pair of successive binary images From the argument of the optimum of Ψ, the best binary segmented image is found I * Bn .

Similarity Functions
Similarity functions [13], also called indexes or indicators based on qualitative (binary) attribute data, were first used in ecology for grouping of either biotical communities or ecologically related species.The literature refers to over 50 similarity or dissimilarity indexes.The following indexes are derived from Definition 8.

Definition 9. Similarity function
A similarity function or indicator Ψ(α, β) is a measure of the degree of similarity between two vectors α, β of the u-dimensional binary space.Ψ(α, β) has the following properties: The operation is used to define different similarity functions between α, β such as:

Maximum sensitivity area indicator
This indicator relates the true positive rate (TPR) and the false positive rate (FPR).The relation between the TPR and the FPR is represented by the well-known receiver operating characteristic (ROC) curve [14].If the TPR tends to 1 and the FPR tends to 0, then the correlation between α, β is high.The following relations define the metrics of the indicator: If k 1 approaches 0 and k 2 approaches infinite, then Ψ IS (α, β) tends to 1.If the false negative relation < α, β > tends to 0, then k 1 tends to 0. This is if β tends to α, then: If the false positive relation < α, β > tends to 0, then k 2 tends to infinite.Thus, if β tends to α then: If Ψ IS (α, β) approaches 1 (maximum area), then the results obtained in the classification is accurate (Figure 2A).

Minimum distance indicator
This indicator measures the minimum distance between the points (TPR, FPR) and (0,1) in the ROC space, as shown in Figure 2B.
If k 1 approaches 0 and k 2 approaches infinite, then Ψ dmin (α, β) tends to 0 (minimum distance).The conditions for k 1 and k 2 are identical to those of the previous indicator.

Coverage indicator of the segmented area
The coverage or superposition indicator compares the reference set α with the segmented one β, and presents the one-to-one correspondence between the sets pixels.The coverage indicator of segmented area Ψ IC (α, β) is defined as [13]: where 0 ≤ Ψ(α, β) < 1 if α = β and 1 if α = β.

Parameter Segmentation Tuning Technique (PST)
The optimal values of the r parameters of a segmentation method can be found by using the PST technique illustrated in Figure 3.A similarity indicator is employed to compare binary successive images, i.e., ,

Parameter Segmentation Tuning Technique (PST)
The optimal values of the r parameters of a segmentation method can be found by using the PST technique illustrated in Figure 3.A similarity indicator is employed to compare binary successive images, i.e., I Bn , I B(n−1) .
when it tends to 0. (B) Minimum distance between the perfect segmentation point (0,1) and the point (FPR, TPR).

Parameter Segmentation Tuning Technique (PST)
The optimal values of the r parameters of a segmentation method can be found by using the PST technique illustrated in Figure 3.A similarity indicator is employed to compare binary successive images, i.e., , The iterative process is carried out by modifying the segmentation parameters of the technique to be tuned, which is used to create the space of binary images.They will be compared to get the best result using as criteria a similarity indicator.

Figure 3.
Flowchart of the Parametric Segmentation Tuning technique.The iterative process is carried out by modifying the segmentation parameters of the technique to be tuned, which is used to create the space of binary images.They will be compared to get the best result using as criteria a similarity indicator.
The corresponding algorithm works, in essence, by modifying each one of the parameters in steps, getting the segmented images and comparing them by pairs to find the parameters that produce the closest similarity between them as follows.

The PST technique pseudo-code
Input: I: image to segment , in the space of parameters P r ⊆ R r , converge to a region of minimum locals.This means that, although infrequent, it is possible to find more than one solution, i.e., the Canny edge detector, to the segmentation problem.Those solutions are very close, and are visually almost identical, as can be observed in Figure 4.

Validation of the PST Technique
To validate the PST approach, two segmentation techniques were tuned: the Canny edge detector [8] and a binarisation procedure.These techniques are usually used as steps in the analysis and identification of diatoms and the detection of other phytoplankton organisms [7]. Figure 5 shows nine images taken from the employed dataset [12].A different kind of diatoms were chosen.

Canny edge detection technique
The Canny edge detector [8] is denoted as I B = T C (I, [ h min , h max , σ]).It employs three parameters: where σ is the standard deviation of a convolution mask given by the first derivate of the Gaussian function, and h min and h max are the thresholds used in the hysteresis process.The purpose of this process is to reduce the appearance of false contours and local maximum values produced by noise.

Binarisation technique
To test the PST technique in a thresholding method, we developed our own thresholding algorithm.This segmentation technique consists of separating pixels on the image into two classes, high-and low-intensity pixels.This technique transforms a greyscale image into a binary one I B .This makes it possible to differentiate objects in the background by identifying a threshold t.In this case, the pixels labelled with 1 belong to the object, while the pixels labelled with 0 belong to the background.
We propose the following method to find the optimal threshold t * using similarity functions.

in other case
The optimal threshold value t * is the argument of the maximum of the similarity function, and it is subject to one of the following conditions: where µ is the mean and σ the standard deviation of the image, and κ is a normalisation factor to be tuned.κ was found by the PST in the range: 0.6 < κ < 5.The value κσ defines the quality of the binarisation.

Results and Discussion
The results of the binarisation algorithm were validated by comparing them with those of the Otsu method, one of the most popular and efficient thresholding techniques, and those of Canny with a ground truth.

Tuning of the Parameters of the Canny Edge Detector
Figure 6 shows the results obtained by tuning the Canny detector with the PST and by an expert in a blind test obtained with the PST.As shown, the best manual segmentation took around 25 min to be found, while the PST took around 42 s.
Table 1 presents the optimal values obtained from the PST and the expert.As can be seen in Figure 6, the results are very similar, even though the values from the user are sometimes far from those obtained from the PST.

Tuning of the Binarisarion Algorithm Using PST
To understand how the PST find the optimal parameters, a deeper analysis of the binarisation algorithm is done.The analysis of the graphic in Figure 7 makes it possible to see the evolution of the sensitivity and coverage similarity indicators for the image A in Figure 1.It must be noted that very similar successive segmented images can also be found when certain parameters take very low or very high values, i.e., values outside the useful range of a parameter.For example, when an image is thresholded by hand, it is easy to see that if the threshold is too low or too high, the resulting images will be almost completely white or black, and the changes between two successive binarised images will be very low.Therefore, the scanning of extreme values should be avoided, which will also speed up the process of searching for the best parameters values.In any case, if the ranges of useful values are unknown, the problem is easily solved by excluding the maximum and minimum similarity values found at the ends of each parameter range, as shown in Figure 7. Figure 8 shows the graphics of the four indexes for the image A5 in Figure 9.As it can be observed, the shape of the sensitivity and the total coverage indexes are very similar and the best threshold is t = 170.Indexes of minimum distance analysis (Figure 8B) and co-linearity (Figure 8C) vary in a similar way, and the first local maximum, employed as optimal threshold, is also located close to t = 170.

Tuning of the Binarisarion Algorithm Using PST
To understand how the PST find the optimal parameters, a deeper analysis of the binarisation algorithm is done.The analysis of the graphic in Figure 7 makes it possible to see the evolution of the sensitivity and coverage similarity indicators for the image A in Figure 1.It must be noted that very similar successive segmented images can also be found when certain parameters take very low or very high values, i.e., values outside the useful range of a parameter.For example, when an image is thresholded by hand, it is easy to see that if the threshold is too low or too high, the resulting images will be almost completely white or black, and the changes between two successive binarised images will be very low.Therefore, the scanning of extreme values should be avoided, which will also speed up the process of searching for the best parameters values.In any case, if the ranges of useful values are unknown, the problem is easily solved by excluding the maximum and minimum similarity values found at the ends of each parameter range, as shown in Figure 7.

Tuning of the Binarisarion Algorithm Using PST
To understand how the PST find the optimal parameters, a deeper analysis of the binarisation algorithm is done.The analysis of the graphic in Figure 7 makes it possible to see the evolution of the sensitivity and coverage similarity indicators for the image A in Figure 1.It must be noted that very similar successive segmented images can also be found when certain parameters take very low or very high values, i.e., values outside the useful range of a parameter.For example, when an image is thresholded by hand, it is easy to see that if the threshold is too low or too high, the resulting images will be almost completely white or black, and the changes between two successive binarised images will be very low.Therefore, the scanning of extreme values should be avoided, which will also speed up the process of searching for the best parameters values.In any case, if the ranges of useful values are unknown, the problem is easily solved by excluding the maximum and minimum similarity values found at the ends of each parameter range, as shown in Figure 7. Figure 8 shows the graphics of the four indexes for the image A5 in Figure 9.As it can be observed, the shape of the sensitivity and the total coverage indexes are very similar and the best threshold is t = 170.Indexes of minimum distance analysis (Figure 8B) and co-linearity (Figure 8C) vary in a similar way, and the first local maximum, employed as optimal threshold, is also located close to t = 170.Figure 8 shows the graphics of the four indexes for the image A5 in Figure 9.As it can be observed, the shape of the sensitivity and the total coverage indexes are very similar and the best threshold is t = 170.Indexes of minimum distance analysis (Figure 8B) and co-linearity (Figure 8C) vary in a similar way, and the first local maximum, employed as optimal threshold, is also located close to t = 170.

Conclusions
Several methods have been proposed for diatom segmentation.However, they generally require some parameters to be fixed by hand.To make this procedure automatic a tuning method was introduced.This technique seems to be the first attempt to achieve this.
In this paper, the problem of image segmentation was posed as an optimization problem, and the best parameters values were found in the space of feasible solutions contained in the u-dimensional binary space.The operations and relations among the elements of the binary set were defined, as well as the objective function.The parameters associated to the algorithm were optimized using the new Parametric Segmentation Tuning (PST) technique, and through different similarity functions.The PST generates the u-dimensional binary space and the similarity functions are employed to compare segmented images to find the optimal one.
To test the technique two segmentation algorithms were tuned by using the PST approach.In the first one, the Canny edge detection algorithm, made it possible to find diatom edges correctly.In the second algorithm, the PST was employed to find the best thresholded image and results were in line with those obtained with the Otsu method, showing the capacity of the PST method.In this way, the PST was validated by comparing the tuning results of the Canny method, against an expert and those of the binarisation against the Otsu algorithm.It was found that our method is quicker than the manual tuning and efficient, getting similar results than those obtained by the expert and the Otsu method.This makes the PST a convenient tool to find optimal parameters in diatom segmentation processes, saving time to researchers by automating these techniques.This method can be employed to tune similar segmentation procedures, used to analyse phytoplankton images.

Figure 1 .
Figure 1.(A) Original image taken from the public data Automatic Diatom Identification and Classification (ADIAC) project.Outcomes are of the variation of a parameter (hmax) within a range of values (0, 1) with the Canny edge detector, where results move from under-segmented images (B,C) to over-segmented ones (E,F), passing through an intermediate value where the best possible result is obtained (D).

Figure 1 .
Figure 1.(A) Original image taken from the public data Automatic Diatom Identification and Classification (ADIAC) project.Outcomes are of the variation of a parameter (hmax) within a range of values (0, 1) with the Canny edge detector, where results move from under-segmented images (B,C) to over-segmented ones (E,F), passing through an intermediate value where the best possible result is obtained (D).
binary space (see Definition 1) and T is a transformation of image I in a binary image I B (I B ⊂ Z u 2 ), depending on → p , then the transformation T generates a group of binary images I Bn = T I, → p n depending on each → Appl.Sci.2017, 7, 762 6 of 16 subject to I Bn , I B(n−1) Z u 2 and → p n , → p (n−1) ∈ P r .

Figure 2 .
Figure 2. (A) A receiver operating characteristic (ROC) curve obtained from tuning the Canny edge detector with Figure 1A.The index of sensitivity was obtained from the ROC curve.It represents a relationship between the true positive rate (TPR) when it tends to 1 and the false positive rate (FPR) when it tends to 0. (B) Minimum distance between the perfect segmentation point (0,1) and the point (FPR, TPR).

Figure 2 .
Figure 2. (A) A receiver operating characteristic (ROC) curve obtained from tuning the Canny edge detector with Figure 1A.The index of sensitivity was obtained from the ROC curve.It represents a relationship between the true positive rate (TPR) when it tends to 1 and the false positive rate (FPR) when it tends to 0. (B) Minimum distance between the perfect segmentation point (0,1) and the point (FPR, TPR).

Figure 3 .
Figure 3. Flowchart of the Parametric Segmentation Tuning technique.The iterative process is carried out by modifying the segmentation parameters of the technique to be tuned, which is used to create the space of binary images.They will be compared to get the best result using as criteria a similarity indicator.

→pp
: Segmentation parameters T: Segmentation technique to be tuned Functions: sim(a,b): similarity function Ψ between binary images a and b.Definitions: p 10 , p 20 , . . .p r0 : initial values of the parameters p 1 f , p 2 f , . . .p r f : final values of the parameters in → Creates space of segmented images and compares successive images: → p ← { p 1 ← p 10 , p 2 ← p 20 , . . .p r ← p r0 } Initialisation of the parameters min_sim ← very high value.Initialisation of the minimum similarity index I B ← T I, → p : output segmented image for a given selection of parameters → p repeat for every parameter in → p until p 1 = p 1 f , p 2 = p 2 f , . . ., p r = p r f { p n ← p n+1 increase a parameter p n in I B , I B−1 min_sim_temp ← sim (I B , I B−1 ) If (min_sim_temp < min_sim) { min_sim ← min_sim_temp → p * ← → p } I B−1 ← I B } Output: → p *: best parameters.The optimum of the similarity function Optimum Ψ [I Bn → p n ] , [I B(n−1) → p n−1

Figure 5 .
Figure 5. Nine of the 50 diatom images taken from the public data Automatic Diatom Identification and Classification (ADIAC) project employed to test the Parameter Segmentation Tuning (PST) technique.

Figure 5 .
Figure 5. Nine of the 50 diatom images taken from the public data Automatic Diatom Identification and Classification (ADIAC) project employed to test the Parameter Segmentation Tuning (PST) technique.

Figure 5 .
Figure 5. Nine of the 50 diatom images taken from the public data Automatic Diatom Identification and Classification (ADIAC) project employed to test the Parameter Segmentation Tuning (PST) technique.

Figure 7 .
Figure 7. Coverage and sensitivity indexes.At can be observed, these similarity functions have three local minima.The local minima located at the ends of the range of values are not useful and are ignored.The local minimum located close to 105 provides the best threshold value.

Figure 7 .
Figure 7. Coverage and sensitivity indexes.At can be observed, these similarity functions have three local minima.The local minima located at the ends of the range of values are not useful and are ignored.The local minimum located close to 105 provides the best threshold value.

Figure 7 .
Figure 7. Coverage and sensitivity indexes.At can be observed, these similarity functions have three local minima.The local minima located at the ends of the range of values are not useful and are ignored.The local minimum located close to 105 provides the best threshold value.

Figure 8 .
Figure 8. Similarity indexes from Image A5 in Figure 9: (A) sensitivity and coverage index; (B) minimum distance index; (C) co-linearity index.The best threshold is given by the local minimum or maxima located around t = 170.

Figure 9 and
Figure 9 and Table 2 compare the threshold outcomes obtained by PST technique with the results obtained by means of the Otsu algorithm.It can be seen that the results are similar, showing the quality of the PST in finding an optimum threshold.

Figure 8 .
Figure 8. Similarity indexes from Image A5 in Figure 9: (A) sensitivity and coverage index; (B) minimum distance index; (C) co-linearity index.The best threshold is given by the local minimum or maxima located around t = 170.

Figure 9 .
Figure 9.The thresholding results by using the PST and Otsu techniques: (A) original images; (B) segmented images using the TSP approach ( ); (C) segmented images using the Otsu technique ( ).

Figure 9 .
Figure 9.The thresholding results by using the PST and Otsu techniques: (A) original images; (B) segmented images using the TSP approach (t); (C) segmented images using the Otsu technique (t o ).

Figure 9 and
Figure 9 and Table 2 compare the threshold outcomes obtained by PST technique with the results obtained by means of the Otsu algorithm.It can be seen that the results are similar, showing the quality of the PST in finding an optimum threshold.