A Tuning Method for Diatom Segmentation Techniques

Rojas Camacho, Oswaldo; Forero, Manuel Guillermo; Menéndez, José Manuel

doi:10.3390/app7080762

Open AccessArticle

A Tuning Method for Diatom Segmentation Techniques

by

Oswaldo Rojas Camacho

^1,*,

Manuel Guillermo Forero

²

and

José Manuel Menéndez

³

¹

Departamento de Ingeniería de Sistemas e Industrial, Universidad Nacional de Colombia, Bogotá 111321, Colombia

²

Semillero Lún, Group D+TEC, Facultad de Ingeniería, Universidad de Ibagué, Ibagué 730001, Colombia

³

Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2017, 7(8), 762; https://doi.org/10.3390/app7080762

Submission received: 6 May 2017 / Revised: 14 June 2017 / Accepted: 22 June 2017 / Published: 27 July 2017

(This article belongs to the Special Issue Automated Analysis and Identification of Phytoplankton Images)

Download

Browse Figures

Versions Notes

Abstract

:

Phytoplankton such as diatoms or desmids are useful for monitoring water quality. Manual image analysis is impractical due to the huge diversity of this group of microalgae and its great morphological plasticity, hence the importance of automating the analysis procedure. High-resolution images of phytoplankton cells can now be acquired by digital microscopes, which facilitate automating the analysis and identification process of specimens. Therefore, new systems of image analysis are potentially advantageous compared to manual methods of counting for solution identification. Segmentation is an important step in the analysis of phytoplankton images. Many standard techniques like thresholding and edge detection are employed in the segmentation of diatoms and other phytoplankton, which are crucial organisms in microscopy images. However, in general, they require several parameters to be fixed beforehand by the user in order to get the best results. This process is usually done by comparing results and looking for the best parameters. To automatize this process, we propose an automatic tuning method to find the optimal parameters in an iterative procedure, called Parametric Segmentation Tuning (PST). This technique compares successive segmentation results, choosing the ones that gets the maximal similarity. In this paper, tuning is formulated as an optimization problem using a similarity function within the solution space. This space consists of the set of binary images that are generated by the segmentation technique to be tuned, where these binary images are seen as a function of the original images and the segmentation parameters. The PST technique was tested with two of the most popular techniques employed to segment phytoplankton images: the Canny edge detection and a binarisation method. The results of the thresholding technique were validated by comparing them to those of the Otsu method and the Canny method with a ground truth. They show that PST is effective to find the best parameters.

Keywords:

diatom; segmentation; tuning; thresholding; phytoplankton

1. Introduction

Segmentation is a crucial step in the analysis and identification of diatoms and other phytoplankton organisms because it allows for the separation of the cells from the background. Image segmentation is commonly addressed by standard techniques, such as thresholding and edge detection, in which some parameters are usually required to be fixed beforehand. Moreover, there is not an automatic method that does not require prior knowledge of the employed technique to tune the segmentation procedure. Many segmentation methods have been proposed, but the problem cannot be completely solved, as image segmentation is an ill-posed problem without a clear unique solution. Therefore, users have to tune parameters by comparing the results obtained with different parameter values to get the best ones. This task is time-consuming and subjective, given that there is not an ideal ground-truth image.

The few works published in this domain are mainly related to a particular segmentation technique. In this way, Howe [1] developed a stability heuristic criterion that helps to tune a method to binarise documents. Susukida [2] proposed an evaluation criterion of segmentation, based on a measure of image entropy, to automatically optimize the granularity of a graph-based segmentation technique for mammography. Pignalberi et al. [3] used genetic algorithms to tune range image segmentation. Martin et al. [4] proposed a method to tune the key parameters thanks to a preliminary supervised learning stage. Khan et al. [5] introduced an automatic feedback-based image processing method that uses a fuzzy formulation of a priori knowledge of the characteristics of the objects of interest to adapt the segmentation parameters. However, after a literature review, tuning techniques for diatom segmentation methods were not found. Several methods have been employed for diatom segmentation. Fisher et al. [6] find two optimal segmentation thresholds heuristically. Koster et al. [7] developed a toolbox, which includes several segmentation techniques, such as the Canny [8] edge detector and the Otsu [9] threshold. These techniques are tested by users to build a workflow for the segmentation and feature extraction of diatoms. Jalba et al. [10] developed a method-based on the watershed segmentation [11]. These techniques need several parameters to be fixed heuristically, given that there is not a tuning technique that helps finding the best parameters. Therefore, in this paper, a formal and complete definition of a tuning segmentation technique is introduced. It is shown how this technique can be employed to find the best parameters for diatom segmentation.

The technique was tested with two of the most popular techniques to segment phytoplankton images [7]: the Canny edge detection [8] and a binarisation method. Results of the binarisation algorithm were compared to those obtained by the Otsu method and those obtained by applying Canny with a ground truth. They show that the proposed tuning method is effective and useful to find the optimal parameters.

The paper is organized as follows: Section 2 introduces the formulation and mathematical definition of the proposed method. Section 3 provides experiments and results. Lastly, some conclusions and perspectives of this work are given.

2. Materials and Methods

To develop the proposed method, 50 diatom images taken from the public data Automatic Diatom Identification and Classification (ADIAC) project were used [12]. The mathematical algorithms to tune the segmentation methods were developed using Matlab. The software was implemented in an Intel core i7-4500U 1.8GHz computer with 1.6 GB RAM running under Windows 8.1. The method will become publicly available through the lab webpage http://www.gatv.ssr.upm.es/~jmm/PST_8989776.zip.

The methods commonly used to segment phytoplankton usually require some parameters to be fixed by the user before their analysis and classification. The segmentation result is generally a binarised image where the background appears in black and the foreground in white or vice versa. The resulting images, obtained with different values, are usually compared between them or with a ground truth image, i.e., the image produced by an expert. This comparison is made to find values that make it possible to obtain the best segmentation.

As shown in Figure 1, by varying a parameter within a range of values in a segmentation method (Canny), results move from under-segmented images (Figure 1B,C) to over-segmented ones (Figure 1E,F), passing through an intermediate value where the best possible result is obtained (Figure 1D). It can be seen that changes between successive under-segmented and over-segmented images are more abrupt compared to those produced between images closer to the optimal result. This behaviour is expected because the optimal result seeks to get closer to what is actually seen in the original image. The under-segmented and over-segmented results correspond to the farthest values from the optimum. In these results, true edges or regions are eliminated (under-segmentation) or false ones are produced (over-segmentation), introducing the abrupt changes observed between them.

Therefore, if two successive under-segmented images are compared, the first one will be less under-segmented than the next one. Assuming that the first image is the ground truth, we will have a high number of false positives and a low number of false negatives. Similarly, when comparing two successive over-segmented images, the first one will be less over-segmented than the second one. In this case, if the first image is the ground truth, we will then have a low number of false positives and a high number of false negatives. It must be mentioned that this observation is valid in diatom and phytoplankton typical images, where segmentation is developed to detect the shapes of the organisms. However, this is not valid in images where textures or complex background are observed. Based on this premise, the following method for tuning parameters is proposed.

Let

T (I, \vec{p})

represent the transformation of an image I into a binary one as a result of a segmentation algorithm given a certain number r of parameters, i.e.,

\vec{p} = {p_{1}, p_{2}, \dots p_{r}}

. In the binary image, level 1 represents the object of interest, and level 0 the background. Therefore, an r-dimensional solution space

P^{r} \subseteq ℝ^{r}

generated by the transformation

T (I, \vec{p})

can be defined. In the solution space, each coordinate is given by a parameter and each point of the space represents a binary image.

Considering image segmentation as an optimization problem, the best solution can be found by maximizing a similarity function or minimizing a distance used as cost function

Ψ

in

P^{r}

. In this way, the best solution can be found by sweeping the solution space, while evaluating the cost function between each pair of successive binary images. Each binary image

I_{B} (\vec{p}) = T (I, \vec{p})

is obtained by modifying at least one parameter of the segmentation algorithm

T (I, \vec{p})

. For the sake of simplicity, we will now consider the number of parameters equal to one, i.e., r = 1. In this case, the technique consists of an iterative process that attempts to minimise the error by comparing two successive segmented images

I_{B} (p = m) = T (I, p = m)

and

I_{B} (p = m - 1) = T (I, p = m - 1)

, i.e., when the parameter takes the values m and m − 1.

2.1. Definition of the Segmentation Tuning as an Optimisation Problem

To compare each pair of successive images in the space of parameters, it is necessary to formalize the properties of the binary images. In addition, some definitions and operators are established to be applied in our tuning technique.

A binary image I of size

N \times M

, where N and M are the width and height of the image respectively, can be represented as a binary vector

α

of size

u = N \times M

.

Definition 1.

Let the binary set be represented by

Z_{2} = {0, 1}

, then the u-dimensional binary space is given by:

Z_{2}^{u} = Z_{2} \times Z_{2} \times \dots \times Z_{2} u-times .

Definition 2.

An element

α

of

Z_{2}^{u}

is a u-upla formed by

α = (α_{1}, α_{1}, \dots α_{u})

, with

α_{i} \in Z_{2}

. This is:

Z_{2}^{u} = {α = (α_{1}, α_{1}, \dots α_{u}), |, α \in Z_{2} \times Z_{2} \times \dots \times Z_{2} \land α_{i} \in Z_{2}} .

Definition 3.

If

α \in Z_{2}^{u}

, the complement

\bar{α}

is defined as the vector obtained by inverting all the elements of

α

:

\bar{α} = ({\bar{α}}_{1}, {\bar{α}}_{2}, \dots {\bar{α}}_{u}) where {\bar{α}}_{i} = 0 if α_{i} = 1 and {\bar{α}}_{i} = 1 if α_{i} = 0 .

Definition 4.

If

α \in Z_{2}^{u}

, the length of

α

is defined as a function

ℓ

of the binary u-dimensional space to the positive integers

ℤ^{+} :

ℓ : Z_{2}^{u} ⟶ ℤ^{+} function is defined as : ℓ (α) = \sum_{i = 1}^{u} α_{i} .

Definition 5.

If

α \in Z_{2}^{u}

, the norm of

α

is defined as a function

N

of the binary u-dimensional space to the positive real numbers

ℝ^{+}

N : Z_{2}^{u} ⟶ ℝ^{+} function is defined as : N (α) = ‖ α ‖ = \sqrt{\sum_{i = 1}^{u} α_{i}^{2}} = \sqrt{\sum_{i = 1}^{u} α_{i}} = \sqrt{l (α)} .

Operations between the elements defined in the space

Z_{2}^{u}

.

Definition 6.

Let

α, β

represent two elements of the space

Z_{2}^{u}

. The dot operation

⨀

is defined as:

α ⨀ β = < α, β > = α \cdot β = α_{1} . β_{1} + α_{2} . β_{2} + \dots α_{u} . β_{u} = ‖ α ‖ ‖ β ‖ c o s ϕ .

Definition 7.

Difference Operation

⊖

Let

α, β

represent two elements of the space

Z_{2}^{u}

. The difference operation is defined as a function

⊖ : Z_{2}^{u} \times Z_{2}^{u} - > {- 1, 0, 1}

.

It takes two binary images of

α, β \in Z_{2}^{u}

, compares their elements

α_{i}

,

β_{i}

and assigns them a third image

α ⊖ β

, given by the difference between the elements in the following way:

OPERATION $⊖$
$⊖$		$α_{i}$
		1	0
$β_{i}$	1	0	−1
	0	1	0

Order rules between elements of the space

Z_{2}^{u}

.

If

α, β \in Z_{2}^{u}

, an order rule indicating the similarity between the elements

α, β

is defined as:

ℛ (α, β) = {\begin{matrix} \begin{matrix} [α > β] : f a l s e n e g a t i v e \\ [α < β] : f a l s e p o s i t i v e \\ [α = β] : t r u e n e g a t i v e \end{matrix} \\ o r t r u e p o s i t i v e . \end{matrix}

The following operations are necessary to define the order rules:

If

ℛ (α, β) = [α_{i} = β_{i}

], then

⊖ = α_{i} - β_{i}

= 0. If

α_{i}

=

β_{i}

= 1, they are true positives. If

α_{i}

=

β_{i}

= 0, they are true negatives.

If

ℛ (α, β) = [α_{i} > β_{i}]

, then

⊖ =

α_{i}

−

β_{i}

= 1, i.e.,

α_{i}

= 1 and

β_{i} = 0

. Therefore, there is a false negative.

If

ℛ (α, β) = [α_{i} < β_{i}]

, then

⊖ = α_{i} - β_{i}

= 11 (2’s complement), i.e.,

α_{i}

= 0 and

β_{i}

= 1. Therefore, there is a false positive.

Definition 8.

Matches operation

△

Let

α, β

represent elements of the space

Z_{2}^{u}

. The matches operation is defined as a function

△ : Z_{2}^{u} \times Z_{2}^{u} \to ℤ^{+} .

It takes two binary images of

α, β

, compares their elements

α_{i}

,

β_{i}

and assigns them a value, written as

α △ β

. From the matches operation, it can be observed that:

$α △ β$ is the number of matches when $α_{i} = β_{i} = 1$
$\bar{α} △ \bar{β}$ is the number of matches when $α_{i} = β_{i} = 0$
$\bar{α} △ β$ is the number of matches when $α_{i} = 0, β_{i} = 1$
$α △ \bar{β}$ is the number of matches when $α_{i} = 1, β_{i} = 0$ .

If

Z_{2}^{u}

is a u-dimensional binary space (see Definition 1) and

T

is a transformation of image I in a binary image

I_{B}

(

I_{B} \subset Z_{2}^{u}

), depending on

\vec{p}

, then the transformation T generates a group of binary images

I_{B n} = T (I, {\vec{p}}_{n})

depending on each

{\vec{p}}_{n}

in all the space of

ℙ^{r}

.

Given

I_{B n} = T (I, {\vec{p}}_{n})

and

I_{B (n - 1)} = T (I, {\vec{p}}_{(n - 1)})

, a similarity function can be defined as

Ψ : Z_{2}^{u} \times Z_{2}^{u} ⟶ [0, 1] .

So that

(Ψ (I_{B n}, I_{B (n - 1)})

compares

I_{B n}

with

I_{B (n - 1)}

in all the space of

P^{3}

(see Definition 8).

If

I_{B n}

is congruent with

I_{B (n - 1)}

, then

I <_{B n}

≅

I_{B (n - 1)}

=

I_{B n}^{*}

is the segmented optimum image, which depends on the optimum parameter

{\vec{p}}^{*} = [p_{1}^{*}, p_{2}^{*}, p_{3}^{*}]

, i.e.,

{\vec{p}}^{*} = [p_{1}^{*}, p_{2}^{*}, \dots p_{r}^{*}] \leftarrow a r g {o p t i m u m Ψ ([I_{B n} ({\vec{p}}_{n})], [I_{B (n - 1)} ({\vec{p}}_{(n - 1)})])}

(1)

subject to

I_{B n}, I_{B (n - 1)} Z_{2}^{u}

and

{\vec{p}}_{n}

,

{\vec{p}}_{(n - 1)}

\in P^{r}

.

The

Ψ

function is an index associated to each pair of successive binary images

(I_{B n}, I_{B (n - 1)})

of

Z_{2}^{u} \times Z_{2}^{u}

. From the argument of the optimum of

Ψ,

the best binary segmented image is found

I_{B n}^{*}

.

2.2. Similarity Functions

Similarity functions [13], also called indexes or indicators based on qualitative (binary) attribute data, were first used in ecology for grouping of either biotical communities or ecologically related species. The literature refers to over 50 similarity or dissimilarity indexes. The following indexes are derived from Definition 8.

Definition 9.

Similarity function

A similarity function or indicator

Ψ (α, β)

is a measure of the degree of similarity between two vectors

α, β

of the u-dimensional binary space.

Ψ (α, β)

has the following properties:

Ψ (α, β) = {\begin{matrix} 0 \leq Ψ (α, β) < 1 if α \neq β \\ 1 if α = β \end{matrix}

Ψ (α, β) = Ψ (β, α) .

The operation

△

is used to define different similarity functions between

α, β

such as:

1. Co-linearity indicator

The co-linearity indicator is proposed based on the

ℛ (α, β) = [α_{i} = β_{i}]

relation with matches

< α, β >

, or true positive relations.

Ψ_{I C o} (α, β) = λ = \frac{< α, β >}{\sqrt{N (α)} \sqrt{N (β)}}

(2)

If

λ = 1,

then

α, β

are co-linear. If λ = 0, then

α, β

are orthogonal.

2. Maximum sensitivity area indicator

This indicator relates the true positive rate (TPR) and the false positive rate (FPR). The relation between the TPR and the FPR is represented by the well-known receiver operating characteristic (ROC) curve [14]. If the TPR tends to 1 and the FPR tends to 0, then the correlation between

α, β

is high. The following relations define the metrics of the indicator:

T P R = \frac{1}{1 + k_{1}} FPR = \frac{1}{1 + k_{2}} k_{1} = \frac{< α, \bar{β} >}{< α, β >} k_{2} = \frac{< \bar{α}, \bar{β} >}{< \bar{α}, β >}

Ψ_{I S} (α, β) = Ψ_{a r e a - m a x} (α, β) = 0.5 (1 + \frac{1}{1 + k_{1}} - \frac{1}{1 + k_{2}})

(3)

If

k_{1}

approaches 0 and

k_{2}

approaches infinite, then

Ψ_{I S} (α, β)

tends to 1.

If the false negative relation

< α, \bar{β} >

tends to 0, then

k_{1}

tends to 0. This is if

\bar{β}

tends to

\bar{α}

, then:

< α, \bar{β} > = < α, \bar{α}, > = 0 .

If the false positive relation

< \bar{α}, β >

tends to 0, then

k_{2}

tends to infinite. Thus, if

β

tends to

α

then:

< \bar{α}, β > = < \bar{α}, α > = 0 .

If

Ψ_{I S} (α, β)

approaches 1 (maximum area), then the results obtained in the classification is accurate (Figure 2A).

3. Minimum distance indicator

This indicator measures the minimum distance between the points (TPR, FPR) and (0,1) in the ROC space, as shown in Figure 2B.

Ψ_{d m i n} (α, β) = \sqrt{{(1 - (\frac{1}{1 + k_{1}}))}^{2} + {(\frac{1}{1 + k_{2}})}^{2}}

(4)

If

k_{1}

approaches 0 and

k_{2}

approaches infinite, then

Ψ_{d m i n} (α, β)

tends to 0 (minimum distance). The conditions for

k_{1}

and

k_{2}

are identical to those of the previous indicator.

4. Coverage indicator of the segmented area

The coverage or superposition indicator compares the reference set

α

with the segmented one

β

, and presents the one-to-one correspondence between the sets pixels. The coverage indicator of segmented area

Ψ_{I C} (α, β)

is defined as [13]:

Ψ_{I C} (α, β) = \frac{2 < α, β >}{‖ α ‖ + ‖ β ‖}

(5)

where

0 \leq Ψ (α, β) < 1 if α \neq β

and

1 if α = β

.

2.3. Parameter Segmentation Tuning Technique (PST)

The optimal values of the r parameters of a segmentation method can be found by using the PST technique illustrated in Figure 3. A similarity indicator is employed to compare binary successive images, i.e.,

I_{B n}

,

I_{B (n - 1)}

.

The corresponding algorithm works, in essence, by modifying each one of the parameters in steps, getting the segmented images and comparing them by pairs to find the parameters that produce the closest similarity between them as follows.

The PST technique pseudo-code

Input:
I: image to segment

\vec{p}

: Segmentation parameters
T: Segmentation technique to be tuned
Functions:
sim(a,b): similarity function

Ψ

between binary images a and b.
Definitions:

p_{10}, p_{20}, \dots p_{r 0}

: initial values of the parameters

p_{1 f}, p_{2 f}, \dots p_{r f}

: final values of the parameters in

\vec{p}

Creates space of segmented images and compares successive images:

\vec{p} \leftarrow {p_{1} \leftarrow p_{10}, p_{2} \leftarrow p_{20}, \dots p_{r} \leftarrow p_{r 0}

} Initialisation of the parameters
min_sim

\leftarrow

very high value. Initialisation of the minimum similarity index

I_{B} \leftarrow T (I, \vec{p})

: output segmented image for a given selection of parameters

\vec{p}

repeat for every parameter in

\vec{p}

until

p_{1} = p_{1 f}, p_{2} = p_{2 f}, \dots, p_{r} = p_{r f}

{

p_{n} \leftarrow p_{n + 1}

increase a parameter

p_{n}

in

\vec{p}

I_{B} \leftarrow T (I, \vec{p}) :

Compare segmented images

I_{B}, I_{B - 1}

min_sim_temp

\leftarrow

sim (

I_{B}, I_{B - 1}

)
If (min_sim_temp < min_sim)
{
min_sim

\leftarrow

min_sim_temp

\vec{p}

*

\leftarrow \vec{p}

}

I_{B - 1} \leftarrow I_{B}

}
Output:

\vec{p}

*: best parameters.

The optimum of the similarity function

O p t i m u m Ψ ([I_{B n} ({\vec{p}}_{n})], [I_{B (n - 1)} ({\vec{p}}_{n - 1})])

, in the space of parameters

P^{r} \subseteq ℝ^{r}

, converge to a region of minimum locals. This means that, although infrequent, it is possible to find more than one solution, i.e., the Canny edge detector, to the segmentation problem. Those solutions are very close, and are visually almost identical, as can be observed in Figure 4.

2.4. Validation of the PST Technique

To validate the PST approach, two segmentation techniques were tuned: the Canny edge detector [8] and a binarisation procedure. These techniques are usually used as steps in the analysis and identification of diatoms and the detection of other phytoplankton organisms [7]. Figure 5 shows nine images taken from the employed dataset [12]. A different kind of diatoms were chosen.

1. Canny edge detection technique

The Canny edge detector [8] is denoted as

I_{B} = T_{C} (I, [h_{m i n}, h_{m a x}, σ])

. It employs three parameters:

\vec{p} = [h_{m i n}, h_{m a x}, σ]

, where

σ

is the standard deviation of a convolution mask given by the first derivate of the Gaussian function, and

h_{m i n}

and

h_{m a x}

are the thresholds used in the hysteresis process. The purpose of this process is to reduce the appearance of false contours and local maximum values produced by noise.

2. Binarisation technique

To test the PST technique in a thresholding method, we developed our own thresholding algorithm. This segmentation technique consists of separating pixels on the image into two classes, high- and low-intensity pixels. This technique transforms a greyscale image into a binary one

I_{B}

. This makes it possible to differentiate objects in the background by identifying a threshold

t

. In this case, the pixels labelled with 1 belong to the object, while the pixels labelled with 0 belong to the background.

We propose the following method to find the optimal threshold

t^{*}

using similarity functions.

I_{B} = T (I, t (κ)) = {\begin{matrix} 1 if I (x, y) > t \\ 0 in other case \end{matrix}

The optimal threshold value

t^{*}

is the argument of the maximum of the similarity function, and it is subject to one of the following conditions:

i f {a r g (m a x (Ψ (I_{n}, I_{n - 1})))} \leq μ \Rightarrow t^{*} = a r g (m a x (Ψ (I_{n}, I_{n - 1}))) + κ σ

i f {a r g (m a x (Ψ (I_{n}, I_{n - 1})))} > μ \Rightarrow t^{*} = a r g (m a x (Ψ (I_{n}, I_{n - 1}))) - κ σ

where

μ

is the mean and

σ

the standard deviation of the image, and

κ

is a normalisation factor to be tuned.

κ

was found by the PST in the range:

0.6 < κ < 5

. The value

κ σ

defines the quality of the binarisation.

3. Results and Discussion

The results of the binarisation algorithm were validated by comparing them with those of the Otsu method, one of the most popular and efficient thresholding techniques, and those of Canny with a ground truth.

3.1. Tuning of the Parameters of the Canny Edge Detector

Figure 6 shows the results obtained by tuning the Canny detector with the PST and by an expert in a blind test obtained with the PST. As shown, the best manual segmentation took around 25 min to be found, while the PST took around 42 s.

Table 1 presents the optimal values obtained from the PST and the expert. As can be seen in Figure 6, the results are very similar, even though the values from the user are sometimes far from those obtained from the PST.

3.2. Tuning of the Binarisarion Algorithm Using PST

To understand how the PST find the optimal parameters, a deeper analysis of the binarisation algorithm is done. The analysis of the graphic in Figure 7 makes it possible to see the evolution of the sensitivity and coverage similarity indicators for the image A in Figure 1. It must be noted that very similar successive segmented images can also be found when certain parameters take very low or very high values, i.e., values outside the useful range of a parameter. For example, when an image is thresholded by hand, it is easy to see that if the threshold is too low or too high, the resulting images will be almost completely white or black, and the changes between two successive binarised images will be very low. Therefore, the scanning of extreme values should be avoided, which will also speed up the process of searching for the best parameters values. In any case, if the ranges of useful values are unknown, the problem is easily solved by excluding the maximum and minimum similarity values found at the ends of each parameter range, as shown in Figure 7.

Figure 8 shows the graphics of the four indexes for the image A5 in Figure 9. As it can be observed, the shape of the sensitivity and the total coverage indexes are very similar and the best threshold is t = 170. Indexes of minimum distance analysis (Figure 8B) and co-linearity (Figure 8C) vary in a similar way, and the first local maximum, employed as optimal threshold, is also located close to t = 170.

Figure 9 and Table 2 compare the threshold outcomes obtained by PST technique with the results obtained by means of the Otsu algorithm. It can be seen that the results are similar, showing the quality of the PST in finding an optimum threshold.

4. Conclusions

Several methods have been proposed for diatom segmentation. However, they generally require some parameters to be fixed by hand. To make this procedure automatic a tuning method was introduced. This technique seems to be the first attempt to achieve this.

In this paper, the problem of image segmentation was posed as an optimization problem, and the best parameters values were found in the space of feasible solutions contained in the u-dimensional binary space. The operations and relations among the elements of the binary set were defined, as well as the objective function. The parameters associated to the algorithm were optimized using the new Parametric Segmentation Tuning (PST) technique, and through different similarity functions. The PST generates the u-dimensional binary space and the similarity functions are employed to compare segmented images to find the optimal one.

To test the technique two segmentation algorithms were tuned by using the PST approach. In the first one, the Canny edge detection algorithm, made it possible to find diatom edges correctly. In the second algorithm, the PST was employed to find the best thresholded image and results were in line with those obtained with the Otsu method, showing the capacity of the PST method. In this way, the PST was validated by comparing the tuning results of the Canny method, against an expert and those of the binarisation against the Otsu algorithm. It was found that our method is quicker than the manual tuning and efficient, getting similar results than those obtained by the expert and the Otsu method. This makes the PST a convenient tool to find optimal parameters in diatom segmentation processes, saving time to researchers by automating these techniques. This method can be employed to tune similar segmentation procedures, used to analyse phytoplankton images.

Author Contributions

Oswaldo Rojas Camacho and Manuel Guillermo Forero conceived and designed the experiments; Oswaldo Rojas Camacho performed the experiments and developed the software; Oswaldo Rojas Camacho and Manuel Guillermo Forero analyzed the data; Manuel Guillermo Forero and José Manuel Menéndez were joint supervisors; Oswaldo Rojas Camacho, Manuel Guillermo Forero and José Manuel Menéndez wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Howe, N.R. Document binarization with automatic parameter tuning. J. Doc. Anal. Recognit. 2013, 16, 247–258. [Google Scholar] [CrossRef]
Susukida, H.; Ma, F.; Bajger, M. Automatic tuning of a graph-based image segmentation method for digital mammography applications. In Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Paris, France, 14–17 May 2008; pp. 89–92. [Google Scholar]
Pignalberi, G.; Cucchiara, R.; Cinque, L.; Levialdi, S. Tuning Range Image Segmentation by Genetic. EURASIP J. Appl. Sign. Process. 2003, 8, 780–790. [Google Scholar] [CrossRef]
Martin, V.; Maillot, N.; Thommat, M. A learning approach for adaptive image segmentation. In Proceedings of the Fourth IEEE International Conference on Computer Vision Systems, New York, NY, USA, 4–7 January 2006. [Google Scholar]
Khan, M.; Reischl, M.; Schweitzer, B.; Weiss, C.; Mikut, R. Feedback driven design of normalization techniques for biological images using fuzzy formulation of a priori knowledge. In Computational Intelligence in Intelligent Data Analysis; Springer: Berlin, Germany; Volume 445, pp. 167–178.
Fischer, S.; Gilomen, K.; Bunke, H. Identification of diatoms by grid graph matching. In Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, Windsor, ON, Canada, 6–9 August 2002; pp. 94–103. [Google Scholar]
Kloster, M.; Kauer, G.; Beszteril, B. SHERPA: An image segmentation and outline feature extraction tool for diatoms and other objects. BMC Bioinf. 2014, 15, 218. [Google Scholar] [CrossRef] [PubMed]
Canny, A. Computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
Otsu, N. A threshold selection method from gray-level histograms. IEEE Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Jalba, A.C.; Wilkinson, M.; Roerdink, J. Automatic segmentation of diatoms images for classifications. Microsc. Res. Tech. 2004, 65, 72–85. [Google Scholar] [CrossRef] [PubMed]
Beucher, S. The watershed tansformation applied to image segmentation. In Proceedings of the 10th Pfefferkorn Conference on Signal and Image Processing in Microscopy and Microanalysis, Cambridge, UK, 27–30 September 1991; pp. 16–19. [Google Scholar]
ADIAC. Available online: http://rbg-web2.rbge.org.uk/ADIAC/pubdat/downloads/public_images.htm. (accessed on 20 April 2017).
Hubalek, Z. Coefficients of association and similarity, based on binary (presence-absence). Biol. Rev. 1982, 57, 669–689. [Google Scholar] [CrossRef]
Brown, D.; Davis, H.T. Receiver operating characteristics curves and related decision measures: A tutorial. Chemom. Intell. Lab. Syst. 2006, 80, 24–38. [Google Scholar] [CrossRef]

Figure 1. (A) Original image taken from the public data Automatic Diatom Identification and Classification (ADIAC) project. Outcomes are of the variation of a parameter (hmax) within a range of values (0, 1) with the Canny edge detector, where results move from under-segmented images (B,C) to over-segmented ones (E,F), passing through an intermediate value where the best possible result is obtained (D).

Figure 2. (A) A receiver operating characteristic (ROC) curve obtained from tuning the Canny edge detector with Figure 1A. The index of sensitivity was obtained from the ROC curve. It represents a relationship between the true positive rate (TPR) when it tends to 1 and the false positive rate (FPR) when it tends to 0. (B) Minimum distance between the perfect segmentation point (0,1) and the point (FPR, TPR).

Figure 3. Flowchart of the Parametric Segmentation Tuning technique. The iterative process is carried out by modifying the segmentation parameters of the technique to be tuned, which is used to create the space of binary images. They will be compared to get the best result using as criteria a similarity indicator.

Figure 4. Results obtained with the Canny edge detector. (A) Original image. (B) Results obtained with three different set of parameters [h_min, h_max,

σ

]: [0.095, 1.0, 0.1], [0.095, 1.0, 0.3], [0.15, 1.0, 0.9]. (C) Original image. (D) Results obtained with [h_min, h_max,

σ

]: [0.15, 0.6, 0.7], [0.15, 0.5, 0.7], [0.15, 0.7, 0.9]. Only one result is reproduced given that they are almost identical.

Figure 4. Results obtained with the Canny edge detector. (A) Original image. (B) Results obtained with three different set of parameters [h_min, h_max,

σ

]: [0.095, 1.0, 0.1], [0.095, 1.0, 0.3], [0.15, 1.0, 0.9]. (C) Original image. (D) Results obtained with [h_min, h_max,

σ

]: [0.15, 0.6, 0.7], [0.15, 0.5, 0.7], [0.15, 0.7, 0.9]. Only one result is reproduced given that they are almost identical.

Figure 5. Nine of the 50 diatom images taken from the public data Automatic Diatom Identification and Classification (ADIAC) project employed to test the Parameter Segmentation Tuning (PST) technique.

Figure 6. Optimal Canny segmentation; (A) Original images; (B) PST results; (C) Manual results obtained by an expert.

Figure 7. Coverage and sensitivity indexes. At can be observed, these similarity functions have three local minima. The local minima located at the ends of the range of values are not useful and are ignored. The local minimum located close to 105 provides the best threshold value.

Figure 8. Similarity indexes from Image A5 in Figure 9: (A) sensitivity and coverage index; (B) minimum distance index; (C) co-linearity index. The best threshold is given by the local minimum or maxima located around t = 170.

Figure 9. The thresholding results by using the PST and Otsu techniques: (A) original images; (B) segmented images using the TSP approach (

t

); (C) segmented images using the Otsu technique (

t_{o}

).

Figure 9. The thresholding results by using the PST and Otsu techniques: (A) original images; (B) segmented images using the TSP approach (

t

); (C) segmented images using the Otsu technique (

t_{o}

).

Table 1. Segmentation results.

Images	PST Parameters			Duration	Manual Parameters			Duration
Original	h_min	h_max	σ	Time [s]	h_min	h_mim	σ	Time [min]
A1	0.095	0.7	0.8	24	0.15	0.7	0.2	16
A2	0.015	0.7	0.8	35	0.05	0.7	0.6	23
A3	0.015	0.5	0.8	25	0.05	0.5	0.4	21
A4	0.095	0.7	0.9	34	0.05	1.0	0.4	22
A5	0.095	0.7	0.9	45	0.05	0.6	0.7	24
A6	0.015	0.3	0.8	57	0.05	0.3	0.9	20
A7	0.095	0.7	0.6	55	0.09	0.8	0.9	35
A8	0.095	0.3	0.9	47	0.17	0.3	0.7	28
A9	0.095	0.4	0.4	59	0.09	0.4	0.3	40

Table 2. Thresholding Results.

Images	PST		Otsu
Original	t	ĸ	t_o
A1	0.656	1.65	0.651
A2	0.394	3.50	0.490
A3	0.749	2.00	0.745
A4	0.518	3.50	0.576
A5	0.757	1.75	0.749
A6	0.578	2.25	0.588
A7	0.639	2.75	0.694
A8	0.742	3.40	0.529
A9	0.785	2.20	0.517

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rojas Camacho, O.; Forero, M.G.; Menéndez, J.M. A Tuning Method for Diatom Segmentation Techniques. Appl. Sci. 2017, 7, 762. https://doi.org/10.3390/app7080762

AMA Style

Rojas Camacho O, Forero MG, Menéndez JM. A Tuning Method for Diatom Segmentation Techniques. Applied Sciences. 2017; 7(8):762. https://doi.org/10.3390/app7080762

Chicago/Turabian Style

Rojas Camacho, Oswaldo, Manuel Guillermo Forero, and José Manuel Menéndez. 2017. "A Tuning Method for Diatom Segmentation Techniques" Applied Sciences 7, no. 8: 762. https://doi.org/10.3390/app7080762

APA Style

Rojas Camacho, O., Forero, M. G., & Menéndez, J. M. (2017). A Tuning Method for Diatom Segmentation Techniques. Applied Sciences, 7(8), 762. https://doi.org/10.3390/app7080762

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Tuning Method for Diatom Segmentation Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Definition of the Segmentation Tuning as an Optimisation Problem

2.2. Similarity Functions

2.3. Parameter Segmentation Tuning Technique (PST)

2.4. Validation of the PST Technique

3. Results and Discussion

3.1. Tuning of the Parameters of the Canny Edge Detector

3.2. Tuning of the Binarisarion Algorithm Using PST

4. Conclusions

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI