This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This paper presents an image segmentation algorithm based on Gaussian multiscale aggregation oriented to hand biometric applications. The method is able to isolate the hand from a wide variety of background textures such as carpets, fabric, glass, grass, soil or stones. The evaluation was carried out by using a publicly available synthetic database with 408,000 hand images in different backgrounds, comparing the performance in terms of accuracy and computational cost to two competitive segmentation methods existing in literature, namely Lossy Data Compression (LDC) and Normalized Cuts (NCuts). The results highlight that the proposed method outperforms current competitive segmentation methods with regard to computational cost, time performance, accuracy and memory usage.

Hand biometrics is receiving an increasing attention at present because of their huge applicability in daily scenarios and the relation between user acceptance and identification/verification rates [

The characteristics of this biometric technique in terms of non-invasiveness and acceptability highlight the fact that hand biometrics could be a proper and adequate biometric method for verification and identification in devices like PC or mobile phones, since hand biometrics system requirements are easily met with a standard camera and hardware processor.

However, as applications requiring hand biometrics tends to contact-less, platform-free scenarios (e.g., smartphones [

Consequently, image pre-processing becomes compulsory to tackle with this problem, by providing an accurate segmentation algorithm to isolate hand from background, whatever its nature, and independent from environment and illumination conditions.

Thus, a segmentation method is proposed able to isolate hand from different background, regardless the environmental and illumination conditions.

The proposed approach is based on multiscale aggregation, gathering pixels along scales according to a given similarity Gaussian function. This method produces an iterative clustering aggregation, providing a solution for hand image segmentation with a quasi-linear computational cost and an adequate accuracy for biometric applications.

The method has been tested with a synthetic image database, with around 408,000 images considering different backgrounds (e.g., soil, skins/fur, carpets, walls or grass) and illumination environments, and compared to two competitive approaches in literature in terms of image segmentation. These approaches are named Lossy Data Compression (LDC) [

Finally, the layout of the paper remains as follows: Section 2 provides and overview on the current literature, describing the proposed method under Section 3. The database involved in evaluation is presented in Section 4, together with the results, presented in Section 5, providing conclusions and future work in Section 6.

Segmentation is an important research field in image processing [

In fact, the overall performance in terms of identification accuracy relies strongly on the result provided by the segmentation and pre-processing procedure.

Concerning hand-based biometrics, segmentation has received little attention in early works, provided that initial approaches carry out the acquisition procedure in a constrained and homogeneous background [

However, as hand biometrics is evolving from contact and peg-based approaches to completely contact-less, peg-free and platform independent scenarios, hand segmentation is increasing its difficulty and complication [

Several approaches in literature tackle with this problem by providing non-contact, platform-free scenarios but with constrained background, usually employing a monochromatic color, easily distinctive from hand texture by means of simple image thresholding [

A possible solution for unconstrained and non-homogeneous backgrounds is a segmentation method based on multiscale aggregation [

The most common applications of this approach consider image segmentation and boundary detection based on texture [

The results obtained by multiscale aggregation in the fields of unsupervised image segmentation are certainly promising [

Nonetheless, several aspects must be improved in terms of computational cost and memory usage efficiency [

The proposed approach attempts to provide an accurate segmentation of a colour hand image. The algorithm strategy consists of aggregating similar nodes according to a specific criteria along different scales until a given goal is met, ensuring that aggregated nodes within segments verify certain properties.

First step of the algorithm consists of providing a particular structure to the amount of elements within the image. Likewise to other methods [

In this approach, the structure on the first scale is assumed to be a 4-neighbourhood strategy, while for subsequent scales, structure is provided by means of Delaunay triangulation [

In addition, each node is represented by a similarity function denoted by
_{i}_{i}

More in detail,

Thus, similarity functions leads to the concept of likelihood between nodes in connecting edges, providing a definition of weights within graph 𝒢.

Given a graph 𝒢 = (𝒱, ℰ), the similarity among pair of nodes is provided by means of weights 𝒲, which are defined for each scale _{i}_{j}_{i}_{j}

^{[}^{s}^{]} associated to a pair of nodes _{i}_{j}

Therefore, graph 𝒢 = (𝒱, ℰ, 𝒲) contains not only structural information on a given scale

Furthermore, 𝒲_{i,j} can be regarded as the weight associated to edge _{i,j}, so that 𝒲_{i,j} = 𝒲(_{i,j}). Notice that weights are not defined for each pair of nodes in 𝒱, but only for those pairs of nodes with correspondence in edge set ℰ.

Some properties can be extracted from the definition of 𝒲_{i,j} ∈ 𝒲 as the similarity between two nodes _{i}_{j}_{i,j} satisfies ∀

𝒲_{i,j} ≥ 0

𝒲_{i,j} = 𝒲_{j,i}

𝒲_{i,j} = 1 ↔ _{i}_{j}

Property (1) results from the definition given by _{i}_{j}

These former properties stand for each scale

Furthermore, each node _{i}

On the other hand, the essence of this algorithm relies on aggregation, which consists of grouping and clustering those similar nodes/segments in subgraphs, according to some criteria along scales.

The proposed method bases the aggregation procedure on the weights in 𝒲, given the fact that, those pairs of nodes/subgraphs with higher weights are more similar than those with lower weights, and therefore, those former pairs deserve to be aggregated under a same segment/subgraph. Thus, a function must be defined to provide some order in set 𝒲, so that posterior subgraphs in subsequent scales contain nodes with high weights and, therefore, high similarity.

Let Ω be an ordering function, which orders edges in ℰ according to 𝒲, as follows:
_{i,j} = 𝒲(_{i,j}) ≥ 𝒲_{i,k} = 𝒲(_{i,k}), then Ω(_{i,j}) ≥ Ω(_{i,k}).

In other words, let _{1}, . . ., _{m}_{i}_{j}_{i}_{𝒲} represent the weight set 𝒲 after Ω is applied.

Once the concept of ordering function is introduced, the algorithm aggregates pair of nodes based on this former weight ordering, ensuring that the dispersion of each segment remains bounded. This aggregation criteria is represented by the _{i,j} represent the dispersion of aggregating nodes _{i}_{j}

Once pairs of nodes have been ordered and an aggregation criteria have been stated, the Gaussian Multiscale Algorithm aggregates pair of nodes with previous criteria _{i,j} holds, otherwise, different segments are assigned to previous pair of nodes.

In addition, the number of assigned graphs in scale _{i,j} = 1 − _{i,j} as follows:
_{i,j} is defined as
_{𝒲}, until whether every element in Ω_{𝒲} is evaluated or every node in 𝒱 is assigned a segment in subsequent scale.

Gaussian Multiscale Aggregation assures that every node in scale

After aggregation, nodes in scale ^{[}^{s}^{]}, being ^{[}^{s}^{]} the number of nodes in scale

Consequently, let
^{[0]} is represented by a gaussian function of mean and deviation corresponding to the average and dispersion intensity of their neighbour nodes, as stated before.

Therefore, similarity functions can be completely defined as in

Concerning location, the position of subgraphs is obtained by averaging the position of the nodes contained on each subgraph. This is essential in order to provide a neighbourhood structure, since after aggregation every scale

A Delaunay graph for a set _{1}, . . ., _{n}_{i}_{j}_{i}_{j}^{2}, knowing the locations of the endpoints permits a solution in

This operation represents the final step in the loop, since at this moment, there exist a new subgraph
^{[}^{s}^{+1]} are provided by Delaunay triangulation, and weights 𝒲^{[}^{s}^{+1]} are obtained based on

The whole loop is repeated until only two subgraphs remain, as stated at the beginning of this section. However, due to the constraints provided to aggregate (^{[}^{s}^{]} a factor able to avoid aggregation method from being stuck in the loop. This factor can be dynamically increased or decreased, according to previous method necessities. However, this value is initially set to ^{[}^{s}^{]} = 0.01, for each scale ^{[}^{s}^{]} to adapt the necessities of the algorithm remains as future work.

The computational cost of this algorithm is quasi-linear with the number of pixels, since each scale gathers nodes in the sense that nodes in subsequent scales are reduced by (in practice) a three times factor (

After presenting the algorithm, next section describes the creation of the database involved in evaluation.

This section describes the creation of a synthetic database containing a total of 408,000 images of hands with a wide range of possible backgrounds like carpets, fabric, glass, grass, mud, different objects, paper, parquet, pavement, plastic, skin and fur, sky, soil, stones, tiles, tree, wall and wood.

The main aim of this database is twofold:

First, the main purpose is to provide a comparative evaluation frame for segmentation algorithm, where existing approaches in literature could be compared. In other words, this database makes it possible to assess to what extent the segmentation algorithm can satisfactory perform a hand isolation from background on real scenarios.

In addition, this database contains the ground-truth result for each image, providing a possible supervised evaluation criteria. These ground-truth images were obtained, given that hands were taken with a blue-coloured background, so that hand can be easily extracted by simple thresholding [

The creation of the synthetic database (named GB2S Database) considers the hands extracted in former database and the set of the aforementioned different textures, which were obtained from the website

First of all, a straightforward segmentation was carried out with a threshold-based segmentation [_{h}_{b}

Afterwards, both masks are laid one over each other, with _{b}

In order to ensure there is no considerable difference in illumination between hand and background, each image is converted from RGB to YCbCr color space [

All these former operations attempt to ensure a fair scenario, simulating the conditions provided in real situations.

For each hand image, a total of 5 × 17 (five images and 17 textures) synthetic images are created, collecting a total of 120 × 2 × 20 × 5 × 17 = 408,000 images (120 individuals, two hands, 20 acquisitions per hand, five images and 17 textures) to properly evaluate segmentation on real scenarios. Some visual examples of this database are provided in

This presented database is publicly available at

Once the database has been presented, the following section comes out with the evaluation of the algorithm and the obtained results.

This section contains the results of the comparative evaluation of the proposed approach to LDC [

Although there exist some unsupervised evaluation methods for image segmentation [

The proposed evaluation method is based on F-measure, [

Aiming a fair comparison, the propose algorithm is compared to two competitive segmentation methods existing in the literature, namely Lossy Data Compression (LDC) [

The evaluation of a segmentation method involves different aspects concerning accuracy, computational cost and parameters dependency.

First aspect is related to what extent the algorithm is able to properly detect or isolate a specific object within an image. Concretely in this paper, accuracy is understood as the capability of the proposed algorithm to properly isolate hand from background.

In addition, accuracy can be also visually evaluated.

Secondly, concerning computational cost,

The results provided in

Finally, this section will study the dependency of two parameters strongly related to algorithm performance, namely

Factor

During the explanation of the method, the algorithm is said to be quasi-linear with the number of pixels. This statement is supported by

The application of hand biometrics to unconstrained and contact-less, platform-free environments implies an increase in difficulty in the pre-processing and segmentation procedure in hand acquisition. Therefore, an unsupervised segmentation algorithm has been proposed based on Gaussian multiscale aggregation. This method gathers iteratively those pixels similar in texture and color under segments, until a certain number of clusters/segments is provided as a result.

This method is able to isolate hand from a wide range of backgrounds (carpets, fabric, glass, grass, mud, different objects, paper, parquet, pavement, plastic, skin and fur, sky, soil, stones, tiles, tree, wall and wood), simulating real situations and unconstrained background scenarios.

Besides, the evaluation of the proposed approach has been carried out based on a publicly available synthetic database, containing 408,000 hand image acquisitions with different background textures. The evaluation consisted of a comparison of the performance in terms of accuracy and computational cost to two competitive segmentation methods existing in literature, namely Lossy Data Compression (LDC) [

The results obtained point out that the performance of the proposed algorithm outcomes existing segmentation algorithms in literature, regarding not only accuracy and computational cost, but also memory usage, since the proposed algorithm is quasi-linear in relation to the number of pixels.

As future work, we consider to implement the method with a dynamic

This research has been supported by the Ministry of Industry, Tourism and Trade of Spain, in the framework of the project CENIT-Segur@, reference CENIT-2007 2004.

Visual representation of two functions ^{[}^{s}^{]} and the weighted

Samples from the synthetic database in different backgrounds for a given acquisition.

A comparative study of results provided by segmentation algorithm in comparison to ground-truth. First column gathers examples from first database, together with their segmentation on second column, considered as ground truth. Third column presents synthetic images based on first column images, providing on the fourth column the final segmentation result. Last two column present the segmentation result provided by the Lossy Data Compression (LDC) [

Dependency of the aggregation process on parameter

Proportion of processing time for each scale. Most of the time is required by the aggregation procedure on the first scale.

Segmentation evaluation by means of F-measure in database GB2S with 17 different background textures, together with the corresponding standard deviation. In addition, the results for LDC and NCut are also provided for comparison.

Texture | Proposed, |
LDC, |
NC, |
---|---|---|---|

Carpets | 92.1 ± 0.1 | 73.7 ± 0.3 | 65.1 ± 0.3 |

Paper | 91.3 ± 0.1 | 83.2 ± 0.2 | 72.8 ± 0.4 |

Stones | 91.2 ± 0.1 | 78.2 ± 0.4 | 71.5 ± 0.3 |

Fabric | 88.4 ± 0.3 | 65.3 ± 0.1 | 60.1 ± 0.2 |

Parquet | 88.3 ± 0.2 | 66.1 ± 0.2 | 62.3 ± 0.3 |

Tiles | 90.1 ± 0.2 | 71.5 ± 0.3 | 68.7 ± 0.2 |

Glass | 94.1 ± 0.1 | 75.8 ± 0.1 | 71.4 ± 0.1 |

Pavement | 88.9 ± 0.2 | 67.8 ± 0.1 | 63.7 ± 0.2 |

Tree | 96.0 ± 0.2 | 73.4 ± 0.2 | 67.2 ± 0.1 |

Grass | 93.3 ± 0.2 | 70.1 ± 0.1 | 65.3 ± 0.2 |

Skin and Fur | 95.3 ± 0.3 | 82.3 ± 0.2 | 71.8 ± 0.3 |

Wall | 94.1 ± 0.1 | 70.9 ± 0.2 | 62.3 ± 0.2 |

Mud | 89.5 ± 0.2 | 68.3 ± 0.1 | 60.1 ± 0.2 |

Sky | 96.1 ± 0.1 | 77.2 ± 0.2 | 71.3 ± 0.1 |

Wood | 93.5 ± 0.1 | 82.5 ± 0.2 | 73.5 ± 0.1 |

Objects | 92.0 ± 0.1 | 70.1 ± 0.1 | 61.6 ± 0.3 |

Soil | 89.0 ± 0.2 | 67.2 ± 0.3 | 59.7 ± 0.2 |

Relation between time performance (in seconds), the dimension of the image, and the size in number of pixels, comparing the proposed method with LDC approach and Normalized Cuts (NCut).

Image Dimensions | Number of Pixels | Proposed (seconds) | LDC (seconds) | NCut (seconds) |
---|---|---|---|---|

| ||||

600 × 800 | 480,000 | 30.1 | 233.1 | 321.7 |

450 × 600 | 270,000 | 19.8 | 63.4 | 129.5 |

300 × 400 | 120,000 | 9.4 | 52.1 | 25.1 |

150 × 200 | 30,000 | 3.1 | 32.8 | 7.2 |