# An Interactive Image Segmentation Method in Hand Gesture Recognition

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Modelling of Hand Gesture Images

#### 2.1. Single Gaussian Model

**x**

_{i}in dataset $\mathit{X}=\left\{{\mathit{x}}_{1},{\mathit{x}}_{2},\dots ,{\mathit{x}}_{n}\right\}$ should be at least 3-dimensional. To address this problem, the concept of the multi-dimensional Gaussian distribution is introduced. The definition of d dimensional Gaussian distribution is:

**μ**is a d dimensional vector, and as for the RGB model, each component of

**μ**represents the average red, green and blue color density value. $\mathrm{\Sigma}$ is the covariance matrix and ${\mathrm{\Sigma}}^{-1}$ is its inverse matrix. (

**x**−

**μ**)

^{T}is the transposed matrix of (

**x**−

**μ**). To simplify Equation (3) above, θ is introduced to represent the parameters

**μ**and $\mathrm{\Sigma}$, then the probability density function of the d dimensional Gaussian distribution can be written as:

#### 2.2. Gaussian Mixture Model of RGB Image

**x**belonging to the i-th single Gaussian model, and $\sum _{i=1}^{k}{\pi}_{i}}=1$. ${p}_{i}(\mathit{x};{\theta}_{i})$ is the probability density function of the i-th single Gaussian model, parameterized by ${\mu}_{i}$ and ${\mathrm{\Sigma}}_{i}$ in ${N}_{i}(\mathit{x};{\mathit{\mu}}_{i},{\mathrm{\Sigma}}_{i})$. $\mathrm{\Theta}$ is introduced as a parameters [23] set, {${\pi}_{1},{\pi}_{2},\dots ,{\pi}_{k},{\theta}_{1},{\theta}_{2},\dots ,{\theta}_{k}$}, to denote ${\alpha}_{i}$ and ${\theta}_{i}$.

**X**as a sample, its probability density is:

**X**. Then we hope to find a set of parameter $\mathrm{\Theta}$ to finish modelling. According to maximum likelihood method [24], our next task is to find $\widehat{\mathrm{\Theta}}$ where:

**X**to estimate $\mathrm{\Theta}$, the $\mathrm{\Theta}$ becomes variables and

**X**are the fixed parameters, it is denoted in the second form. The value of $p(\mathit{X};\mathrm{\Theta})$ is usually too small to be calculated by computer, so we are going to replace it with the log-likelihood function [25]:

#### 2.3. Expectation Maximum Algorithm

_{i}(

**x**

_{j}). It is a posterior probability of π

_{i}, in another words, the posterior probability of each

**x**

_{j}belonging to the i-th single Gaussian model, from the dataset

**X**.

- Initialization: Initialize ${\mathit{\mu}}_{i0}$ with random numbers [27], and the unit matrices are used as covariance matrices ${\mathrm{\Sigma}}_{i0}$ to start the first iteration. The mixed coefficients or prior probability is assumed as ${\pi}_{i0}=\frac{1}{k}$.
- E-step: Compute the posterior probability of ${\pi}_{i}$ using current parameters:$${Q}_{i}({\mathit{x}}_{j}):=\frac{{\pi}_{i}{p}_{i}({\mathit{x}}_{j};{\theta}_{i})}{{\displaystyle \sum _{t=1}^{k}{\pi}_{t}{p}_{t}({\mathit{x}}_{j};{\theta}_{t})}}=\frac{{\pi}_{i}N({\mathit{x}}_{j};{\mathit{\mu}}_{i},{\mathrm{\Sigma}}_{i})}{{\displaystyle \sum _{t=1}^{k}{\pi}_{t}N({\mathit{x}}_{j};{\mathit{\mu}}_{t},{\mathrm{\Sigma}}_{t})}}$$
- M-step: Renew the parameters:$${\pi}_{i}:=\frac{1}{n}{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})}$$$${\mu}_{i}:=\frac{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j}){\mathit{x}}_{t}}}{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})}}$$$${\mathrm{\Sigma}}_{i}:=\frac{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})({\mathit{x}}_{j}-{\mu}_{i}){({\mathit{x}}_{j}-{\mathit{\mu}}_{i})}^{T}}}{{\displaystyle \sum _{j=1}^{n}{Q}_{i}({\mathit{x}}_{j})}}$$

## 3. Interactive Image Segmentation

**α**. By introducing it, we changed the segmentation problem into a pixels labelling problem. As α

_{j}∈ {1,0}, the value 0 is taken for labelling background pixels and 1 for foreground pixels.

**x**

_{j}, either from the background or the foreground model, is marked as α

_{j}= 1 or 0. The parameters of each component become: θ

_{i}= {π

_{i}(α

_{j}), μ

_{i}(α

_{j}), Σ

_{i}(α

_{j}); α

_{j}= 0,1, i = 1, …, k}.

#### 3.1. Gibbs Random Field

**A**being in the state

**a**. T is a constant parameter, whose unit is temperature in physics, and usually its value is 1. $Z(T)$ is the partition function, and:

**a**, to apply GRF in image segmentation, the Gibbs Energy [30] can be defined as follows:

**N**:

**N**, to adjust the exponential term. $E(x)$ in the equation below is the expectation:

#### 3.2. Automatical Seed Selection

**U**[31].

**B**is the background seed pixel set and

**O**is the foreground seed set. After the training over training set

**X**, the set

**O**is obtained as the segmentation result and $O\subset U$. Three pixel sets are shown in Figure 5.

**O**. We also define the pixels on the image edges as background seeds, which belong to set B, because the gestures are usually located far away from the edges of the images. The result of seeds selection are displayed in Figure 6 below.

#### 3.3. Min-Cut/Max-Flow Algorithm

**N**, from pixel to pixel, from pixel to S and from pixel to T, denoted as $\overline{{\mathit{x}}_{u}{\mathit{x}}_{\mathit{v}}},\text{}\overline{{\mathit{x}}_{u}\mathrm{S}},\text{}\overline{{\mathit{x}}_{u}\mathrm{T}}$. Each link is assumed with a certain weight or a cost [34] while it being cut down, which detailed in Table 1.

**U**region. Secondly, the parameters set $\mathrm{\Theta}$ is learned from the whole pixel set

**X**. Thirdly, use the min-cut to minimize the Gibbs energy of the whole image. Then jump to the first step to start another round, and after eight times, the optimal segmentation will be achieved.

## 4. Experimental Comparison

#### 4.1. Region Accuracy

_{β}− measure [38]. Compared with normal F

_{β}− measure, the two terms Precision and Recall become:

^{w}as to Precision

^{w}, normally β = 1. Then, we apply ${F}_{1}^{w}-measure$ to calculate the RA of different segmentation results. The higher RA is, the better the segmentation achieved is.

#### 4.2. Boundary Accuracy

_{GT}and B

_{SEG}as shown in Figure 10.

_{GT}and s ∈ B

_{SEG}, dist(

**·**) denotes the Euclidean distance, N(

**·**) is the pixel number in the set. The value of BA shows the segmentation accuracy of boundaries.

#### 4.3. Results Analysis

## 5. Hand Gesture Recognition

## 6. Conclusions and Future Work

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Nardi, B.A. Context and Consciousness: Activity Theory and Human-Computer Interaction; MIT Press: Cambridge, MA, USA, 1996; p. 400. [Google Scholar]
- Chen, D.C.; Li, G.F.; Jiang, G.Z.; Fang, Y.F.; Ju, Z.J.; Liu, H.H. Intelligent Computational Control of Multi-Fingered Dexterous Robotic Hand. J. Comput. Theor. Nanosci.
**2015**, 12, 6126–6132. [Google Scholar] [CrossRef][Green Version] - Ju, Z.J.; Zhu, X.Y.; Liu, H.H. Empirical Copula-Based Templates to Recognize Surface EMG Signals of Hand Motions. Int. J. Humanoid Robot.
**2011**, 8, 725–741. [Google Scholar] [CrossRef] - Miao, W.; Li, G.F.; Jiang, G.Z.; Fang, Y.; Ju, Z.J.; Liu, H.H. Optimal grasp planning of multi-fingered robotic hands: A review. Appl. Comput. Math.
**2015**, 14, 238–247. [Google Scholar] - Farina, D.; Jiang, N.; Rehbaum, H.; Holobar, A.; Graimann, B.; Dietl, H.; Aszmann, O.C. The extraction of neural information from the surface EMG for the control of upper-limb prostheses: Emerging avenues and challenges. IEEE Trans. Neural Syst. Rehabil. Eng.
**2014**, 22, 797–809. [Google Scholar] [CrossRef] [PubMed] - Ju, Z.; Liu, H. Human Hand Motion Analysis with Multisensory Information. IEEE/ASME Trans. Mechatron.
**2014**, 19, 456–466. [Google Scholar] [CrossRef][Green Version] - Panagiotakis, C.; Papadakis, H.; Grinias, E.; Komodakis, N.; Fragopoulou, P.; Tziritas, G. Interactive Image Segmentation Based on Synthetic Graph Coordinates. Pattern Recognit.
**2013**, 46, 2940–2952. [Google Scholar] [CrossRef] - Yang, D.F.; Wang, S.C.; Liu, H.P.; Liu, Z.J.; Sun, F.C. Scene modeling and autonomous navigation for robots based on kinect system. Robot
**2012**, 34, 581–589. [Google Scholar] [CrossRef] - Wang, C.; Liu, Z.; Chan, S.C. Superpixel-Based Hand Gesture Recognition with Kinect Depth Camera. Trans. Multimed.
**2015**, 17, 29–39. [Google Scholar] [CrossRef] - Sinop, A.K.; Grady, L. A Seeded Image Segmentation Framework Unifying Graph Cuts and Random Walker Which Yields a New Algorithm. In Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 14–20 October 2007; pp. 1–8.
- Grady, L. Multilabel random walker image segmentation using prior models. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; pp. 763–770.
- Couprie, C.; Grady, L.; Najman, L.; Talbot, H. Power watersheds: A new image segmentation framework extending graph cuts, random walker and optimal spanning forest. In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan, 27 September–4 October 2009; pp. 731–738.
- Varun, G.; Carsten, R.; Antonio, C.; Andrew, B.; Andrew, Z. Geodesic star convexity for interactive image segmentation. In Proceedings of the IEEE CVPR, San Francisco, CA, USA, 13–18 June 2010; pp. 3129–3136.
- Ju, Z.; Liu, H. A Unified Fuzzy Framework for Human Hand Motion Recognition. IEEE Trans. Fuzzy Syst.
**2011**, 19, 901–913. [Google Scholar] - Xu, Y.; Yu, G.; Wang, Y.; Wu, X.; Ma, Y. A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images. Sensors
**2016**, 16, 1325. [Google Scholar] [CrossRef] [PubMed] - Fernando, M.; Wijjayanayake, J. Novel Approach to Use Hu Moments with Image Processing Techniques for Real Time Sign Language Communication. Int. J. Image Process.
**2015**, 9, 335–345. [Google Scholar] - Chen, Q.; Georganas, N.D.; Petriu, E.M. Real-time vision-based hand gesture recognition using haar-like features. In Proceedings of the EEE Instrumentation & Measurement Technology Conference IMTC, Warsaw, Poland, 1–3 May 2007; pp. 1–6.
- Sun, R.; Wang, J.J. A Vehicle Recognition Method Based on Kernel K-SVD and Sparse Representation. Pattern Recognit. Artif. Intell.
**2014**, 27, 435–442. [Google Scholar] - Jiang, Y.V.; Won, B.-Y.; Swallow, K.M. First saccadic eye movement reveals persistent attentional guidance by implicit learning. J. Exp. Psychol. Hum. Percept. Perform.
**2014**, 40, 1161–1173. [Google Scholar] [CrossRef] [PubMed] - Ju, Z.; Liu, H.; Zhu, X.; Xiong, Y. Dynamic Grasp Recognition Using Time Clustering, Gaussian Mixture Models and Hidden Markov Models. Adv. Robot.
**2009**, 23, 1359–1371. [Google Scholar] [CrossRef] - Bian, X.; Zhang, X.; Liu, R.; Ma, L.; Fu, X. Adaptive classification of hyperspectral images using local consistency. J. Electron. Imaging
**2014**, 23, 063014. [Google Scholar] - Song, H.; Wang, Y. A spectral-spatial classification of hyperspectral images based on the algebraic multigrid method and hierarchical segmentation algorithm. Remote Sens.
**2016**, 8, 296. [Google Scholar] [CrossRef] - Hatwar, S.; Anil, W. GMM based Image Segmentation and Analysis of Image Restoration Tecniques. Int. J. Comput. Appl.
**2015**, 109, 45–50. [Google Scholar] [CrossRef] - Couprie, C.; Najman, L.; Talbot, H. Seeded segmentation methods for medical image analysis. In Medical Image Processing; Springer: New York, NY, USA, 2011; pp. 27–57. [Google Scholar]
- Bańbura, M.; Modugno, M. Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. J. Appl. Econ.
**2014**, 29, 133–160. [Google Scholar] [CrossRef] - Simonetto, A.; Leus, G. Distributed Maximum Likelihood Sensor Network Localization. IEEE Trans. Signal Process.
**2013**, 62, 1424–1437. [Google Scholar] [CrossRef] - Ju, Z.; Liu, H. Fuzzy Gaussian Mixture Models. Pattern Recognit.
**2012**, 45, 1146–1158. [Google Scholar] [CrossRef] - Zhang, Y.; Brady, M.; Smith, S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging
**2001**, 20, 45–57. [Google Scholar] [CrossRef] [PubMed] - Song, W.; Cho, K.; Um, K.; Won, C.S.; Sim, S. Intuitive terrain reconstruction using height observation-based ground segmentation and 3D object boundary estimation. Sensors
**2012**, 12, 17186–17207. [Google Scholar] [CrossRef] [PubMed] - Wei, S.; Kyungeun, C.; Kyhyun, U.; Chee, S.; Sungdae, S. Complete Scene Recovery and Terrain Classification in Textured Terrain Meshes. Sensors
**2012**, 12, 11221–11237. [Google Scholar] - Liao, L.; Lin, T.; Li, B.; Zhang, W. MR brain image segmentation based on modified fuzzy C-means clustering using fuzzy GIbbs random field. J. Biomed. Eng.
**2008**, 25, 1264–1270. [Google Scholar] - Kakumanu, P.; Makrogiannis, S.; Bourbakis, N. A survey of skin-color modeling and detection methods. Pattern Recognit.
**2007**, 40, 1106–1122. [Google Scholar] [CrossRef] - Lee, G.; Lee, S.; Kim, G.; Park, J.; Park, Y. A Modified GrabCut Using a Clustering Technique to Reduce Image Noise. Symmetry
**2016**, 8, 64. [Google Scholar] [CrossRef] - Ning, J.; Zhang, L.; Zhang, D.; Wu, C. Interactive image segmentation by maximal similarity based region merging. Pattern Recognit.
**2010**, 43, 445–456. [Google Scholar] [CrossRef] - Grabcut Image Dataset. Available online: http://research.microsoft.com/enus/um/cambridge/projects/visionimagevideoediting/segmentation/grabcut.htm (accessed on 18 December 2016).
- Everingham, M.; Van, G.L.; Williams, C.K.; Winn, I.J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2009/ (accessed on 26 December 2016).
- Rhemann, C.; Rother, C.; Wang, J.; Gelautz, M.; Kohli, P.; Rott, P. A perceptually motivated online benchmark for image matting. In Proceedings of the CVPR, Miami, FL, USA, 20–25 June 2009; pp. 1826–1833.
- Margolin, R.; Zelnik-Manor, L.; Tal, A. How to Evaluate Foreground Maps? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 248–255.
- Zhao, Y.; Nie, X.; Duan, Y. A benchmark for interactive image segmentation algorithms. In Proceedings of the IEEE Person-Oriented Vision, Kona, HI, USA, 7 January 2011; pp. 33–38.
- Zhou, Y.; Liu, K.; Carrillo, R.E.; Barner, K.E.; Kiamilev, F. Kernel-based sparse representation for gesture recognition. Pattern Recognit.
**2013**, 46, 3208–3222. [Google Scholar] [CrossRef] - Yu, F.; Zhou, F. Classification of machinery vibration signals based on group sparse representation. J. Vibroeng.
**2016**, 18, 1540–1545. [Google Scholar] [CrossRef]

**Figure 3.**Color distributions of the gesture image. (

**a**) Red distribution; (

**b**) green distribution; (

**c**) blue distribution.

Link Type | Weight | Precondition |
---|---|---|

$\overline{{\mathit{x}}_{u}{\mathit{x}}_{\mathit{v}}}$ | $\mathrm{exp}(-\beta \Vert {\mathit{x}}_{u}-{\mathit{x}}_{v}{\Vert}^{2})$ | ${\mathit{x}}_{u},{\mathit{x}}_{v}\in \mathit{N}$ |

$\overline{{\mathit{x}}_{u}\mathrm{S}}$ | $U(\alpha =0,i,\theta ,\mathit{X})$ | ${\mathit{x}}_{u}\in \mathit{U}$ |

K | ${\mathit{x}}_{u}\in \mathit{O}$ | |

0 | ${\mathit{x}}_{u}\in \mathit{B}$ | |

$\overline{{\mathit{x}}_{u}\mathrm{T}}$ | $U(\alpha =1,i,\theta ,\mathit{X})$ | ${\mathit{x}}_{u}\in \mathit{U}$ |

0 | ${\mathit{x}}_{u}\in \mathit{O}$ | |

K | ${\mathit{x}}_{u}\in \mathit{B}$ | |

where $K=1+\underset{{\mathit{x}}_{u}\in \mathit{X}}{\mathrm{max}}{\displaystyle \sum _{{\mathit{x}}_{u},{\mathit{x}}_{v}\in \mathit{N}}\mathrm{exp}(-\beta {\Vert {\mathit{x}}_{u}-{\mathit{x}}_{v}\Vert}^{2})}$ |

Gestures | Recognition Rates |
---|---|

Hand close | 86.7% |

Hand open | 73.3% |

Wrist extension | 100% |

Wrist flexion | 100% |

Fine pitch | 66.7% |

Over all rate | 85.3% |

Gestures | Recognition Rates |
---|---|

Hand close | 93.3% |

Hand open | 100% |

Wrist extension | 100% |

Wrist flexion | 100% |

Fine pitch | 100% |

Over all rate | 98.7% |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chen, D.; Li, G.; Sun, Y.; Kong, J.; Jiang, G.; Tang, H.; Ju, Z.; Yu, H.; Liu, H. An Interactive Image Segmentation Method in Hand Gesture Recognition. *Sensors* **2017**, *17*, 253.
https://doi.org/10.3390/s17020253

**AMA Style**

Chen D, Li G, Sun Y, Kong J, Jiang G, Tang H, Ju Z, Yu H, Liu H. An Interactive Image Segmentation Method in Hand Gesture Recognition. *Sensors*. 2017; 17(2):253.
https://doi.org/10.3390/s17020253

**Chicago/Turabian Style**

Chen, Disi, Gongfa Li, Ying Sun, Jianyi Kong, Guozhang Jiang, Heng Tang, Zhaojie Ju, Hui Yu, and Honghai Liu. 2017. "An Interactive Image Segmentation Method in Hand Gesture Recognition" *Sensors* 17, no. 2: 253.
https://doi.org/10.3390/s17020253