# An Interactive Image Segmentation Method in Hand Gesture Recognition

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Modelling of Hand Gesture Images

#### 2.1. Single Gaussian Model

**x**

_{i}in dataset $\mathit{X}=\left\{{\mathit{x}}_{1},{\mathit{x}}_{2},\dots ,{\mathit{x}}_{n}\right\}$ should be at least 3-dimensional. To address this problem, the concept of the multi-dimensional Gaussian distribution is introduced. The definition of d dimensional Gaussian distribution is:

**μ**is a d dimensional vector, and as for the RGB model, each component of

**μ**represents the average red, green and blue color density value. $\mathrm{\Sigma}$ is the covariance matrix and ${\mathrm{\Sigma}}^{-1}$ is its inverse matrix. (

**x**−

**μ**)

^{T}is the transposed matrix of (

**x**−

**μ**). To simplify Equation (3) above, θ is introduced to represent the parameters

**μ**and $\mathrm{\Sigma}$, then the probability density function of the d dimensional Gaussian distribution can be written as:

#### 2.2. Gaussian Mixture Model of RGB Image

**x**belonging to the i-th single Gaussian model, and $\sum _{i=1}^{k}{\pi}_{i}}=1$. ${p}_{i}(\mathit{x};{\theta}_{i})$ is the probability density function of the i-th single Gaussian model, parameterized by ${\mu}_{i}$ and ${\mathrm{\Sigma}}_{i}$ in ${N}_{i}(\mathit{x};{\mathit{\mu}}_{i},{\mathrm{\Sigma}}_{i})$. $\mathrm{\Theta}$ is introduced as a parameters [23] set, {${\pi}_{1},{\pi}_{2},\dots ,{\pi}_{k},{\theta}_{1},{\theta}_{2},\dots ,{\theta}_{k}$}, to denote ${\alpha}_{i}$ and ${\theta}_{i}$.

**X**as a sample, its probability density is:

**X**. Then we hope to find a set of parameter $\mathrm{\Theta}$ to finish modelling. According to maximum likelihood method [24], our next task is to find $\widehat{\mathrm{\Theta}}$ where:

**X**to estimate $\mathrm{\Theta}$, the $\mathrm{\Theta}$ becomes variables and

**X**are the fixed parameters, it is denoted in the second form. The value of $p(\mathit{X};\mathrm{\Theta})$ is usually too small to be calculated by computer, so we are going to replace it with the log-likelihood function [25]:

#### 2.3. Expectation Maximum Algorithm

_{i}(

**x**

_{j}). It is a posterior probability of π

_{i}, in another words, the posterior probability of each

**x**

_{j}belonging to the i-th single Gaussian model, from the dataset

**X**.

- Initialization: Initialize ${\mathit{\mu}}_{i0}$ with random numbers [27], and the unit matrices are used as covariance matrices ${\mathrm{\Sigma}}_{i0}$ to start the first iteration. The mixed coefficients or prior probability is assumed as ${\pi}_{i0}=\frac{1}{k}$.
- E-step: Compute the posterior probability of ${\pi}_{i}$ using current parameters:$${Q}_{i}({\mathit{x}}_{j}):=\frac{{\pi}_{i}{p}_{i}({\mathit{x}}_{j};{\theta}_{i})}{{\displaystyle \sum _{t=1}^{k}{\pi}_{t}{p}_{t}({\mathit{x}}_{j};{\theta}_{t})}}=\frac{{\pi}_{i}N({\mathit{x}}_{j};{\mathit{\mu}}_{i},{\mathrm{\Sigma}}_{i})}{{\displaystyle \sum _{t=1}^{k}{\pi}_{t}N({\mathit{x}}_{j};{\mathit{\mu}}_{t},{\mathrm{\Sigma}}_{t})}}$$
- M-step: Renew the parameters:$${\pi}_{i}:=\frac{1}{n}{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})}$$$${\mu}_{i}:=\frac{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j}){\mathit{x}}_{t}}}{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})}}$$$${\mathrm{\Sigma}}_{i}:=\frac{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})({\mathit{x}}_{j}-{\mu}_{i}){({\mathit{x}}_{j}-{\mathit{\mu}}_{i})}^{T}}}{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})}}$$

## 3. Interactive Image Segmentation

**α**. By introducing it, we changed the segmentation problem into a pixels labelling problem. As α

_{j}∈ {1,0}, the value 0 is taken for labelling background pixels and 1 for foreground pixels.

**x**

_{j}, either from the background or the foreground model, is marked as α

_{j}= 1 or 0. The parameters of each component become: θ

_{i}= {π

_{i}(α

_{j}), μ

_{i}(α

_{j}), Σ

_{i}(α

_{j}); α

_{j}= 0,1, i = 1, …, k}.

#### 3.1. Gibbs Random Field

**A**being in the state

**a**. T is a constant parameter, whose unit is temperature in physics, and usually its value is 1. $Z(T)$ is the partition function, and:

**a**, to apply GRF in image segmentation, the Gibbs Energy [30] can be defined as follows:

**N**:

**N**, to adjust the exponential term. $E(x)$ in the equation below is the expectation:

#### 3.2. Automatical Seed Selection

**U**[31].

**B**is the background seed pixel set and

**O**is the foreground seed set. After the training over training set

**X**, the set

**O**is obtained as the segmentation result and $O\subset U$. Three pixel sets are shown in Figure 5.

**O**. We also define the pixels on the image edges as background seeds, which belong to set B, because the gestures are usually located far away from the edges of the images. The result of seeds selection are displayed in Figure 6 below.

#### 3.3. Min-Cut/Max-Flow Algorithm

**N**, from pixel to pixel, from pixel to S and from pixel to T, denoted as $\overline{{\mathit{x}}_{u}{\mathit{x}}_{\mathit{v}}},\text{}\overline{{\mathit{x}}_{u}\mathrm{S}},\text{}\overline{{\mathit{x}}_{u}\mathrm{T}}$. Each link is assumed with a certain weight or a cost [34] while it being cut down, which detailed in Table 1.

**U**region. Secondly, the parameters set $\mathrm{\Theta}$ is learned from the whole pixel set

**X**. Thirdly, use the min-cut to minimize the Gibbs energy of the whole image. Then jump to the first step to start another round, and after eight times, the optimal segmentation will be achieved.

## 4. Experimental Comparison

#### 4.1. Region Accuracy

_{β}− measure [38]. Compared with normal F

_{β}− measure, the two terms Precision and Recall become:

^{w}as to Precision

^{w}, normally β = 1. Then, we apply ${F}_{1}^{w}-measure$ to calculate the RA of different segmentation results. The higher RA is, the better the segmentation achieved is.

#### 4.2. Boundary Accuracy

_{GT}and B

_{SEG}as shown in Figure 10.

_{GT}and s ∈ B

_{SEG}, dist(

**·**) denotes the Euclidean distance, N(

**·**) is the pixel number in the set. The value of BA shows the segmentation accuracy of boundaries.

#### 4.3. Results Analysis

## 5. Hand Gesture Recognition

## 6. Conclusions and Future Work

## Acknowledgments

## Author Contributions

## Conflicts of Interest

**Figure 3.**Color distributions of the gesture image. (

**a**) Red distribution; (

**b**) green distribution; (

**c**) blue distribution.

Link Type | Weight | Precondition |
---|---|---|

$\overline{{\mathit{x}}_{u}{\mathit{x}}_{\mathit{v}}}$ | $\mathrm{exp}(-\beta \Vert {\mathit{x}}_{u}-{\mathit{x}}_{v}{\Vert}^{2})$ | ${\mathit{x}}_{u},{\mathit{x}}_{v}\in \mathit{N}$ |

$\overline{{\mathit{x}}_{u}\mathrm{S}}$ | $U(\alpha =0,i,\theta ,\mathit{X})$ | ${\mathit{x}}_{u}\in \mathit{U}$ |

K | ${\mathit{x}}_{u}\in \mathit{O}$ | |

0 | ${\mathit{x}}_{u}\in \mathit{B}$ | |

$\overline{{\mathit{x}}_{u}\mathrm{T}}$ | $U(\alpha =1,i,\theta ,\mathit{X})$ | ${\mathit{x}}_{u}\in \mathit{U}$ |

0 | ${\mathit{x}}_{u}\in \mathit{O}$ | |

K | ${\mathit{x}}_{u}\in \mathit{B}$ | |

where $K=1+\underset{{\mathit{x}}_{u}\in \mathit{X}}{\mathrm{max}}{\displaystyle \sum _{{\mathit{x}}_{u},{\mathit{x}}_{v}\in \mathit{N}}\mathrm{exp}(-\beta {\Vert {\mathit{x}}_{u}-{\mathit{x}}_{v}\Vert}^{2})}$ |

Gestures | Recognition Rates |
---|---|

Hand close | 86.7% |

Hand open | 73.3% |

Wrist extension | 100% |

Wrist flexion | 100% |

Fine pitch | 66.7% |

Over all rate | 85.3% |

Gestures | Recognition Rates |
---|---|

Hand close | 93.3% |

Hand open | 100% |

Wrist extension | 100% |

Wrist flexion | 100% |

Fine pitch | 100% |

Over all rate | 98.7% |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

