Our approach involves two phases: in the first one, the system is trained. Thus, all of the parameters needed for the definition of the fuzzy rules for skin color detection are estimated here. Then, the second step implies an extensive set of experiments with different databases for validating the proposed model. In order to achieve our goals, let us define some terms first.

#### 2.1. Definitions

**Definition** **1.** A region R in an image I is said to be connexif there is a neighborhood relation between any pixel $p\in R$.

**Definition** **2.** A connex region ${R}_{F}$ in an image I contains a face F when the set of the main features of F, $\varphi $, is included in ${R}_{F}$.

The extracted features $\varphi $ may include information about geometrical features (position of the eyes, mouth, etc.), texture information, color spaces, etc. In our case, $\varphi $ will consider the face area, shape and skin color detection.

It is clear that any image I may contain several faces and that a connex region ${R}_{i}$ may not contain any face. Let us assume that a face ${F}_{i}$ is defined as the i-th rectangular connex region ${R}_{{F}_{i}}$ containing the distinctive features of a face, $\varphi $, and that is limited by the pixels ${p}_{{1}_{i}},{p}_{{2}_{i}}$, located at the top and bottom vertexes of the region.

In this work, we shall focus on face detection in color images. A color space is a method by which color can be specified, created and visualized.

**Definition** **3.** A color vector $c\in C$, where C is a color space, of a pixel p is defined as an l-tuple of color components $c\left(p\right)=\{{c}_{1}\left(p\right),{c}_{2}\left(p\right),\dots ,{c}_{l}\left(p\right)\}$, where ${c}_{i}\left(p\right)$, for $i=1,2,\dots ,l$, may have N different values.

For instance, for the RGB color space, l = 3 and $c\left(p\right)=\{{r}_{p},{g}_{p},{b}_{p}\}$, where $\{{r}_{p},{g}_{p},{b}_{p}\}\in [0,255]$.

After the face detection process is completed, the system creates an output binary image

O where every pixel is labeled as foreground, i.e., face pixel, or background, i.e., non-face pixel. Therefore, a face detection problem is a segmentation problem, in which the main goal is to find image regions containing a face. This can be stated as follows:

**Definition** **4.** Let I be an image of size $W=n\times m$ pixels. Let ${\mathsf{\Gamma}}_{I}$ be a set of faces in I, formed by a set of k non-overlapped regions, ${R}_{{F}_{i}}$, each one containing a face: ${\mathsf{\Gamma}}_{I}=\{{R}_{{F}_{1}},{R}_{{F}_{2}},\dots ,{R}_{{F}_{k}}\}$. A face detector is a function that converts pixels ${p}_{j}$, for $j=1,2,\dots ,W$, in the original image I into a binary output image O as follows: This definition can be used for any face detection scheme, such as knowledge-based, template matching or feature-based methods [

5]. Due to the fact there are some situations (such as lighting conditions, face orientation or similar colors between faces and background) that may not achieve an optimum segmentation when applying a face detection scheme, some post-processing filter must be applied to obtain the set

${\mathsf{\Gamma}}_{I}$. For our proposals, face detection will be performed using a skin color segmentation algorithm in a convenient color space

C and a subsequent computation of the areas of the segmented connex regions

${R}_{i}\in {\mathsf{\Gamma}}_{I}$.

#### 2.2. Fuzzy Sets and Skin Detection

In

Section 1, we have outlined the main approaches to skin color detection. As we know, in any image segmentation scheme, there is a high level of uncertainty for a classifier to automatically obtain an optimum segmentation [

25]. This fact can be also extended to face detection and, in particular, to skin color segmentation. Thus, we find that applying fuzzy theory can be a convenient way to obtain good detection rates, since a fuzzy set-theoretic model provides a mechanism to represent and manipulate uncertainty within an image.

Color image segmentation using fuzzy classification is a pixel-based segmentation method. This method assigns a color class to each pixel of an input image by applying a set of fuzzy rules on it. We can use this approach to achieve our goal: a pixel can be classified as “skin” or “non-skin” according to a set of fuzzy rules extracted from a training stage using different color spaces. To do this, each color plane will be considered as a fuzzy set, so that the skin detection is performed through fuzzy functions representing the membership degree of each pixel to the different classes. This is accomplished as follows:

**Definition** **5.** Given a color image I of size $W=n\times m$ pixels, where each pixel is defined using a color vector c in a color space C, so that $c\left(p\right)=\{{c}_{1}\left(p\right),{c}_{2}\left(p\right),\dots ,{c}_{l}\left(p\right)\}$, $\forall p\in I$, the histogram of $C,\mathsf{\Psi}\left(C\right)$, is defined as a $b\times l$ array $\mathsf{\Psi}\left(C\right)=\{{f}_{1},{f}_{2},\dots ,{f}_{l}\}$, such that each ${f}_{i}$ is the frequency vector of the color component ${c}_{i}$, for $i=1,2,\dots ,l$, using b bins, on the image I.

As a result, the value of each bin is the number of pixels in image

I having the color

${c}_{i}$. If

$\mathsf{\Psi}\left(C\right)$ is normalized by

W, then

$\mathsf{\Psi}\left(C\right)$ takes the color space

C into the interval [0, 1]; that is,

$\mathsf{\Psi}\left(C\right)$ represents the probability distribution of each color

${c}_{i}$ to be present in image I. According to Zadeh’s theory [

26], a fuzzy set is a pair

$(A,m)$ where

A is a set and

m:

$A\to [0,1]$. This can be applied to the color histogram, where the fuzzy set can be defined as the pair

$(C,\mathsf{\Psi})$, where

C is the color space and Ψ:

$C\to [0,1]$ is the normalized histogram. For each

$c\in C,\mathsf{\Psi}\left(c\right)$ is the grade of membership of

c, so that

$c\in (C,\mathsf{\Psi})\iff c\in C$ and

$\mathsf{\Psi}\left(c\right)\ne 0$.

The research on skin color face detection is mostly based on the popular RGB, YCbCr and HSV color space models [

27]. In addition, some previous works have modeled these color spaces by means of fuzzy sets and fuzzy relations. Thus, the RGB system was used to represent skin colors [

28], where a fuzzy model is extracted after asking people to classify human faces according to their skin color; then, a modified RGB fuzzy skin detector was presented in [

29], based on the work of [

30], trying to improve the skin detection performance. For the HSV space and its variants, the hue component can be defined by means of a fuzzy representation in order to take into account the non-uniformity of the colors’ distribution. Consequently, the authors in [

31] proposed to represent colors with trapezoidal or triangular fuzzy subsets, associating colors with fuzzy sets; fuzzy linguistic hierarchies with different numbers of labels, depending on the desired granularity, were used in [

32]. Then, a fuzzy classifier to detect the presence of faces in small windows, with an HSV color model to detect skin, was presented in [

33]. Finally, fuzzy representations of YCbCr for modeling skin can be found in [

34,

35].

As shown, since RGB, YCbCr and HSV are the most common color spaces models for segmenting skin, we shall use these color systems in this paper, as well. In order to use our fuzzy approach, we must calculate first the normalized histogram for the considered color spaces, RGB, HSV and YCbCr:

${\mathsf{\Psi}}_{RGB}\left(C\right)=\{\mathsf{\Psi}\left(R\right),\mathsf{\Psi}\left(G\right),\mathsf{\Psi}\left(B\right)\}$;

${\mathsf{\Psi}}_{HSV}\left(C\right)=\{\mathsf{\Psi}\left(H\right),\mathsf{\Psi}\left(S\right),\mathsf{\Psi}\left(V\right)\}$;

${\mathsf{\Psi}}_{YCbCr}\left(C\right)=\{\mathsf{\Psi}\left(Y\right),\mathsf{\Psi}\left(Cb\right),\mathsf{\Psi}\left(Cr\right)\}$.

Since the skin detection will be used as a pre-processing task for detecting a face in an image, the training has been performed using a set of 200 images from the XM2VTS database [

36], extracting only the skin information, using different ethnic groups and changing lighting conditions. The XM2VTS face database contains eight recordings of 295 subjects each, acquired over a period of four months. The background in this set of images is homogeneous, with a good contrast for detecting skin. A group of images from the training set is shown in

Figure 2.

The results after obtaining

${\mathsf{\Psi}}_{RGB}\left(C\right)$ are shown in

Figure 3.

From

Figure 3a, the membership functions

${\mu}_{{\mathit{SKIN}}_{i}}\left({c}_{i}\right)$ for the skin color in each plane can be modeled using a bell-shaped function, such that:

where

$i=\{R,G,B\};\{{c}_{R},{c}_{G},{c}_{B}\}\in [0,255];$ ${\beta}_{i}=\mathrm{max}{\mathsf{\Psi}}_{RGB}\left({c}_{i}\right),{\sigma}_{i}^{2}$ is the variance of each fuzzy set

${c}_{i}$ and

${\alpha}_{i}=\mathrm{arg}\underset{{c}_{i}}{\mathrm{max}}{\mathsf{\Psi}}_{RGB}\left({c}_{i}\right)$. The results of the model for the skin pixels are shown in

Figure 3b.

Finally, for the background pixels, i.e., the non-skin pixels in the image, let us consider a variation of the model introduced in [

37], which identifies the fuzziness in the transition region between the object (in this case, the skin) and the background classes. Thus, the membership value of a point to the object is determined by applying an S-function and a Z-function to the each color plane, so that:

where

$i=\{R,G,B\}$;

$\{{c}_{R},{c}_{G},{c}_{B}\}\in [0,255]$ and

$\{{a}_{{S}_{i}},{b}_{{S}_{i}},{\gamma}_{{S}_{i}},{a}_{{Z}_{i}},{b}_{{Z}_{i}},{\gamma}_{{Z}_{i}}\}$ are the cross-over points that determine the shape of the membership functions for the non-skin pixels.

Accordingly, the same process has been adopted for the HSV and YCbCr color spaces. Although many previous works claim that color information in the HSV system is mainly contained either only in the hue component [

38] or in the hue and saturation components [

39], from our experiments, it became clear that the recognition of skin is also influenced by the intensity (value) of the image. This fact makes us consider all three color planes for achieving an accurate color skin detection. The results of calculating the histogram for each color plane in HSV are shown in

Figure 4a. Note that intensity values are between zero and 255.

In the same way, for the YCbCr space, the two chrominance components are usually used to extract skin clusters, whereas the luminance component

Y is discarded. However, practically speaking, the skin-tone color is non-linearly dependent on luminance [

35], so for our proposals, we have also taken into account all three components, whose values are between zero and 255. The histograms are shown in

Figure 5a, and their Gaussian approximations can be seen in

Figure 5b.

As in the RGB case, each plane (

$H,S$ and

V) and (

$Y,Cb$ and

$Cr$) can be modeled, respectively, using a bell-shaped function (see

Figure 4b and

Figure 5b) defined as stated in Equation (

2), where

$i=\{R,G,B,H,S,V,Y,Cb,Cr\};{\beta}_{i}=\mathrm{max}{\mathsf{\Psi}}_{k}\left({c}_{i}\right),$ ${\alpha}_{i}=\mathrm{arg}\underset{{c}_{i}}{\mathrm{max}}{\mathsf{\Psi}}_{k}\left({c}_{i}\right);$ $\{{c}_{R},{c}_{G},{c}_{B},{c}_{H},{c}_{S},{c}_{V},{c}_{Y},{c}_{Cb},{c}_{Cr}\}\in [0,255]$;

k refers to RGB, HSV or YCbCr color systems, and

${\sigma}_{i}^{2}$ is the variance of each fuzzy set

${c}_{i}$.

Finally, for the non-skin pixels in the image, we will apply the S- and Z-functions defined by Equations (

3) and (

4), where

$i=\{R,G,B,H,S,V,Y,Cb,Cr\}$.

Now, given an input image

I, for any pixel

$p\in I$, its color components are fuzzified, according to the parameters defined in Equations (

2)–(

4) and using some of the three color spaces considered above. In order to determine these parameters, a maximum entropy criterion will be used.

Shannon entropy [

40] has been widely used in information theory. It is a measure of the uncertainty associated with a random variable. Specifically, Shannon entropy quantifies the expected value of the information contained in a message [

41]. As a result, Shannon entropy is an extremely powerful approach in image segmentation [

42,

43,

44]. In general terms, the selection of an appropriate threshold using Shannon entropy consists of finding an optimum threshold by maximizing the entropy function. In [

45], Cheng et al. used the concept of fuzzy

c-partition and the maximum entropy principle to select threshold values for gray-level images. When it is used for multilevel thresholding segmentation, the fuzzy

c-partition entropy measures the quantity of extracted information in the image segmentation [

46]. A larger value of entropy indicates that more information is extracted in this process. Taking this methodology into account, the skin color detection process is a segmentation process, as images are divided into a suitable number of fuzzy sets to represent skin and non-skin pixels. From the study developed in this section, it becomes clear that a fuzzy three-partition entropy approach for each color plane will segment images, giving as a result skin and non-skin regions.

Let

${p}_{{\mathit{SKIN}}_{i}},{p}_{{\mathit{NSKIN}}_{i}}^{S},{p}_{{\mathit{NSKIN}}_{i}}^{Z}$ be the probabilities of the three fuzzy sets resulting from Equations (

2)–(

4), defined as follows:

Then, the entropy of the fuzzy three-partition can be calculated as:

The optimal combination of the parameters in Equation (

6) can be computed by the maximum entropy criterion, that is the entropy

$H({a}_{{Z}_{i}},{b}_{{Z}_{i}},{\gamma}_{{Z}_{i}},{a}_{{S}_{i}},{b}_{{S}_{i}},{\gamma}_{{S}_{i}})$ will be maximized when selecting parameters

$({a}_{{Z}_{i}},{b}_{{Z}_{i}},{\gamma}_{{Z}_{i}},{a}_{{S}_{i}},{b}_{{S}_{i}},{\gamma}_{{S}_{i}})$ in a proper way. This process must be repeated for each plane and each color system used in the system.

After the fuzzy inference system has been defined, skin-candidate regions are detected. That is, we must remark that the segmented regions have the same color as that of the skin. The following process is to assure that these regions actually belong to a face or not. This fact will be discussed in the following subsection.