Compact Hybrid Multi-Color Space Descriptor Using Clustering-Based Feature Selection for Texture Classification

Color texture classification aims to recognize patterns by the analysis of their colors and their textures. This process requires using descriptors to represent and discriminate the different texture classes. In most traditional approaches, these descriptors are used with a predefined setting of their parameters and computed from images coded in a chosen color space. The prior choice of a color space, a descriptor and its setting suited to a given application is a crucial but difficult problem that strongly impacts the classification results. To overcome this problem, this paper proposes a color texture representation that simultaneously takes into account the properties of several settings from different descriptors computed from images coded in multiple color spaces. Since the number of color texture features generated from this representation is high, a dimensionality reduction scheme by clustering-based sequential feature selection is applied to provide a compact hybrid multi-color space (CHMCS) descriptor. The experimental results carried out on five benchmark color texture databases with five color spaces and manifold settings of two texture descriptors show that combining different configurations always improves the accuracy compared to a predetermined configuration. On average, the CHMCS representation achieves 94.16% accuracy and outperforms deep learning networks and handcrafted color texture descriptors by over 5%, especially when the dataset is small.


Introduction
As texture and color are two salient visual cues in human perception, color textures provide essential information for object recognition and scene understanding. Therefore, color texture analysis is widely used in many computer vision applications. The development of several benchmark color image databases shows the interest of the scientific community in addressing imaging applications by color texture classification [1][2][3][4]. Color texture classification is the process of predicting the class of input data among a set of categories by the analysis of their colors and their textures. In traditional approaches, this process requires defining a descriptor to represent and discriminate the different texture classes, taking their inter-class and intra-class appearance variations into account. Thus, texture classification is typically categorized into two sub-problems of representation and decision [5]. Texture representation is the step that consists of extracting features that describe color texture information where both the spatial organization of pixels in the image plane and the distribution of their colors in a color space are considered. Numerous descriptors have been proposed in recent decades to represent color textures [6]. These representations can be divided into two categories depending on the descriptors are handcrafted (theory-driven representation) or designed directly from the data (data-driven representation). This paper focuses on the former with a comparison to the latter.
To address a color texture classification problem with handcrafted descriptors, it is first necessary to choose a color space and a texture descriptor which are well suited to the application. Most of these descriptors use parameters that must be carefully tuned depending on the application. The representation of the color textures then consists of extracting color texture features from the chosen descriptor which is computed from images coded in the chosen color space. These choices directly impact the classification performance of color textures. To overcome the problem of a prior choice, many authors propose to combine either multiple descriptors [7,8] and/or multiple color spaces [9][10][11], or manifold parameter settings of a same descriptor [12][13][14] but no work simultaneously combines these three elements for color texture classification purposes. The choice or the combination of different texture descriptors and color spaces, as well as the suitable adjustment of their parameters to produce interpretable, flexible, robust, invariant, and compact descriptors for color texture classification are still open problems. To address this issue, this paper proposes an original approach that simultaneously takes into account the properties of several configurations of different descriptors computed from images coded in multiple color spaces. In order to highlight the contribution of the proposed approach which combines together three key elements of color texture representation (color spaces, texture descriptors and parameter settings) , comparisons are performed when only two, one or none of these elements are combined. For fair assessments, standard descriptors and a basic classifier are used.
Since the proposed approach generates a high-dimensional representation with irrelevant or redundant color texture features, it suffers from the curse of dimensionality that appears especially when the number of features is too large compared to the number of training samples [15]. This phenomenon requires a dimensionality reduction scheme to improve the performance of the used classifier in terms of accuracy and computation time. Such a scheme can be achieved either by feature extraction or by feature selection during a learning process. Feature extraction techniques reduce the feature space dimensionality by transforming the original feature space into a new reduced-size feature space. However, this transformation leads to lose the units and the explainability of the original feature space. Moreover, such a transformation requires the computation of the initial feature set to obtain the new reduced feature space, which could be time consuming. That is the reason why feature selection is here preferred. The goal of feature selection is to find a relevant subset from the initial feature space that can improve the overall performance of a classification algorithm with a better understanding of the data [16,17]. When dealing with high-dimensional data, many feature selection approaches can successfully remove irrelevant features but fail to pull redundant ones out [18,19]. To overcome this problem, several feature selection algorithms that use feature clustering were proposed in recent decades in both supervised and unsupervised contexts [18,[20][21][22][23][24]. This paper focuses on clustering-based feature selection approaches in a supervised context. These approaches aim to divide the initial feature space into a set of groups called clusters so that the features of a same cluster are considered to be redundant. This leads to the selection of one feature to represent each cluster. The resulting feature subset is thus considered to be relevant and non-redundant [20]. Clustering-based feature selection algorithms can outperform the traditional feature selection methods by reducing the redundancy, reaching high accuracy and, in some cases, reducing the calculation time. Even though they have recently gained much attention, their number is still relatively limited and they need parameters to be adjusted [18]. In this context, we have previously proposed a clustering-based sequential feature selection approach for the classification of high-dimensional data where the feature clustering stage is fully automatic and does not require any parameter adjustment [9]. In this paper, we consider an adapted version of this approach which follows three stages. First, an automatic feature clustering algorithm is applied in order to divide the feature set into a number of clusters in which features are correlated. Then, one feature is sequentially selected per group to construct feature spaces with different dimensions. Finally, the dimensionality of the final feature space is determined by using the accuracy of a classifier.
Based on this scheme, this paper proposes a compact, hybrid and multi-color space (CHMCS) representation where texture features are computed and selected from several configurations of different descriptors and with multiple color spaces in order to simultaneously take into account various spatial and color properties. By combining the features, the proposed approach thus overcomes the difficulty of a prior choice of a relevant descriptor, a well suited configuration of each descriptor and an appropriate color space. It aims thus to provide a comprehensible and interpretable representation of textures. Figure 1 illustrates how a texture is represented by the CHMCS descriptor. This figure shows the different stages of the proposed representation which are detailed in the next sections of this paper. Figure 1. The representation of a texture by the proposed compact hybrid multi-color space descriptor: the image of a color texture is firstly coded in multiple color spaces. Then, several configurations of p descriptors are used to generate a large set of color textures features. Finally, a dimensionality reduction scheme is applied to represent the color texture.
After having briefly presented related work on color texture representation in Section 2, Section 3 details how color texture features are extracted from several configurations of two different descriptors computed with images coded in multiple color spaces. Since the representation of the color textures with this feature set generates a high-dimensional feature space, Section 4 presents how a compact hybrid multi-color space (CHMCS) descriptor is derived from this space. This descriptor is based on a dimensionality reduction scheme that uses a clustering method for selecting the most discriminating features with acceptable processing times. In Section 5, experiments are carried out on five benchmark color texture databases. The experimental results show the relevance of the CHMCS descriptor and highlight its performances compared to traditional color texture descriptors and deep learning approaches.

Color Texture Representation
A very large number of texture descriptors have been proposed in recent decades [5]. Most of them have been developed or extended from their gray level definition in order to represent color textures [6]. With the emergence of deep learning, color texture descriptors have evolved from theory-driven descriptors which provide color texture features based on manually defined models into data-driven descriptors which are directly designed from image data.
Most data-driven approaches use convolutional neural networks (CNNs) or pretrained CNNs where a large number of free parameters are determined by training. Popular CNN architectures such as AlexNet, GoogleNet, ResNet or visual geometry group (VGG) have been proposed in the last decade [25][26][27][28]. They are trained on very large datasets and can be easily transferred to many other problems including texture classification. Liu et al. divided CNN-based texture representation methods into three categories: (1) using pretrained generic CNN models where the network is used as a feature extractor that generates features then used by a standard classifier; (2) using fine-tuned CNN models where a training dataset of a specific texture classification task is submitted to the pretrained network in order to fine-tune it (end-to-end learning); and (3) using handcrafted deep CNN where some parameters of the network are predetermined [5]. Several studies have shown that CNNbased methods are suitable for color texture classification tasks [7,13]. Although these deep learning and transfer models provide impressive and, on average, superior performances compared to other approaches, the generated representations and the decisions taken can be difficult to understand. In addition, they suffer from their dependence on training data and tend to overfit with small training sets. To overcome this problem, different image data augmentation approaches have been proposed [29]. Moreover, the performance of CNN-based methods seems to decrease when dealing with fine-grained images, such as color textures [13]. Conversely, the results reached with the theory-driven approaches can be more easily explained and interpreted. Theory-driven descriptors are able to address applications that require fine-grained color texture representations and/or where only limited amounts of labeled training data may be available.
The traditional handcrafted descriptors combine spatial and color information following two main approaches depending on whether they are considered jointly or independently to generate color texture features [30,31]. In these approaches, color textures can be represented using different color spaces [10,32,33]. When spatial and color information are jointly considered, the color texture features take into account the spatial relationships within each of the three color components of the used color space (marginal representation). In addition, the spatial interactions between the color components of neighbor pixels can also be exploited by considering the opposing pairs of color components (opponent component representation). For example, Palm proposed the integrative cooccurrence matrices that combine single-channel cooccurrence matrices applied to a separate color component (for example, the R, G and B channels of a color image) and multi-channel cooccurrence matrices which capture the correlation between textures of different color channels (for example, the R-G, R-B and G-B pairs of channels of a color image) [34]. This concept was extended to the reduced size chromatic cooccurrence matrix (RSCCM) where the quantization level of the color component is decreased to compute cooccurrence matrices within and between color components from which inter-channel and intra-channel features can be extracted [35]. This strategy was also applied to the well-known local binary pattern (LBP) operator in which neighbor pixels are thresholded with the value of the central pixel to give a binary number that codes the local pattern. Several LBP variants have thus emerged: opponent color local binary patterns (OCLBP) where LBP is computed on each color channel separately and from pairs of channels [30]; extended color local binary patterns (EOCLBP), where symmetric pairs of color channels are additionally taken into account [36]; improved opponent color local binary patterns (IOCLBP) in which thresholding is performed with the average value of the neighbor pixels [37]; local color vector binary patterns (LCVBP) that concatenate color norm patterns (thresholding is achieved with the color norm of the pixels that jointly take the three color channels into account); and color angular patterns (the ratio of pixel values between pairs of color channels is used for thresholding) [38], spatially weighted order binary pattern (SWOBP) where a multi-channel color order pattern is used to jointly encode inter-channel features [39].
Choosing the right descriptor and color space for classifying textures is obviously a crucial but difficult problem to solve, agreeing that classification results depend on the choice of the color texture features as well as the tuning of their parameters. To overcome this problem, many authors have proposed to combine various texture descriptors and/or several color spaces in order to take into account their different properties [7][8][9][10][11]40,41]. Porebski et al. proposed a multi color space approach that selects EOCLBP histograms computed from images coded in nine different color spaces [10]. Considering spatial and color information independently, Khan et al. proposed to combine five texture descriptors with a pure color descriptor to obtain a single heterogeneous color texture representation [8]. Cusano et al. showed that the use of an ensemble of twelve handcrafted image descriptors in a multiple classifier framework increased the classification accuracy of texture images acquired under different lighting conditions [7]. They also combined this representation with convolutional neural networks to improve the classification performance. Banerji et al. proposed a 3D-LBP descriptor that they combined with the histograms of oriented gradients of its wavelet transform to produce a novel descriptor [11]. They showed that the fusion of color texture features extracted from this descriptor with seven different color spaces achieves a significantly better image classification performance than with each color space individually considered. Recently, Alimoussa et al. proposed extracting color texture features from two different descriptors and five color spaces which are combined in a single larger set of features [9].
All these works show the relevance of combining different descriptors and/or color spaces to classify textures. In most of them, the configurations of the used descriptors are predefined beforehand with a standard setting of their parameters. However, the texture properties of the different classes may be so variable that they require being represented with different descriptor configurations. Moreover, the appearance of textures can vary due to the change of observation conditions (illumination, field of view, spatial resolution, orientation, viewpoint, deformation, etc.) and texture representation has to take into account possible intra-class property variations. Pioneer work that combines the different configurations of a same descriptor for color texture classification issues has been proposed by Mäenpää et al. and performed with the multi-resolution LBP operator by concatenating the histograms produced by this descriptor with different parameters [30]. This multiscale strategy was then applied by other authors and extended to different variants of the LBP descriptor [12,42]. Bello-Cerezo et al. led a comparative study between off-the-shelf CNN-based features and handcrafted image descriptors from which multiple resolution feature vectors are extracted from each descriptor [13]. This multi-resolution representation is individually applied to different versions of the LBP operator, cooccurrence matrices, histograms of oriented gradients, image patch-based classifier and Gabor features in order to evaluate the performance of color texture classification obtained with each image descriptor and each compared deep learning network. Promising results are also provided by Alimoussa et al. that combine various configurations of EOCLBP and RSCCM descriptors independently considered [14]. Although all these works show the interest of considering several parameter settings of a same descriptor, they do not simultaneously take into account several descriptors with different configurations. The synthesis of these works shows that the authors have combined either multiple descriptors and/or multiple color spaces or several parameter settings of a same descriptor but, to our knowledge, no work has attempted to simultaneously combine these three elements for color texture purposes. The main contribution of this paper is to address this issue that requires a dimensionality reduction because of the high number of generated candidate color texture features.

Dimensionality Reduction by Feature Selection
Feature selection is an essential technique to reduce the dimensionality of a feature space for data classification purposes. It contributes to improve the performances in terms of prediction quality, computation time and model understanding. From a set of available features, it aims to generate a low-dimensional feature subset without irrelevant or redundant features. Feature selection has become the focus of many research applications especially when datasets tend to be huge. However, traditional feature selection methods are not suitable for large dimensional feature spaces [43]. This is the reason why approaches that use feature clustering techniques have more recently gained attention for their ability to improve the selection process.
Classical feature selection methods can be achieved by two main models named "filter" and "wrapper" [44]. Filter models deploy statistical measures to evaluate the discriminating power of features or subsets of features, whereas wrapper models compute the accuracy reached with a particular classifier to guide the search for determining the most discriminating feature subset. Other techniques, called "hybrid" or "embedded" models, combine both filter and wrapper approaches [16]. On the one hand, wrapper models tend to achieve better results than filter ones but suffer from a high computational cost since they depend on a classifier [19,45]. On the other hand, filter models are simple to design, classifier independent and faster. The embedded models that we propose to use in this paper take advantage of the speed of a filter model as well as the selection quality of a wrapper one.
In a supervised learning context, a feature selection method can be performed on a training dataset either by feature ranking or by feature subset search. Feature ranking algorithms individually rank features in order to select the most discriminating ones. Therefore, they are fast and easy to apply. However, it has been shown that the combination of individually relevant features does not necessarily yield a high classification performance [46]. This is mainly due to the non consideration of the interactions and the redundancy that may exist between features. To overcome this problem, a feature subset search, which evaluates groups of features, is preferred.
In the same way, clustering-based feature selection approaches can be performed either by feature ranking or by feature subset search. In each case, a filter or a wrapper model can be associated, leading to different strategies introduced in the following sections [9].

Clustering-Based Feature Ranking Approaches
In clustering-based feature ranking approaches, the feature space is firstly divided into a number of groups by means of a clustering algorithm. The proposed approaches differ depending on the clustering algorithm used. Then, the clusters and/or the features in each cluster are ranked in order to select the representative features of each group. The feature selection is either performed with the filter model [18,47] or with the wrapper model [22]. Harris and Niekerk proposed a feature clustering and ranking (FCR) approach where feature clustering is performed using the affinity propagation algorithm associated with a correlation coefficient as a similarity measure [21]. A single feature from each of the top ranked clusters is then selected by using either a filter model or a wrapper one according to two different evaluation measures.

Clustering-Based Feature Subset Search Approaches
Most of the clustering-based feature subset search approaches associate a clustering algorithm with a wrapper model to evaluate the feature subset relevance following three main schemes. In the first one, clustering is applied as a pre-processing stage where only one feature is selected from each group to constitute the feature set from which a feature subset search is performed [48]. Other schemes cluster the initial feature set into a predefined number of groups, and then evaluate the relevance of each group in order to remove irrelevant feature groups before merging the remaining groups and repeating the whole scheme [24]. In the last kind of scheme, the feature subset search is applied in each group defined by the clustering algorithm and the features selected from each group are merged to form the final selected feature subset [23].
Apart from the one we previously proposed, very few approaches use a filter model with a clustering-based feature subset search [9]. We showed that this approach provides a high level of dimensionality reduction, high classification accuracy with a reasonable processing time and a limited number of parameters to be adjusted compared to other feature selection methods presented here. The selection approach proposed in this paper is inspired from this clustering-based filter feature selection method. It takes advantage of its fast filter model for searching feature subsets with different dimensions and adds a wrapper model to determine the dimensionality of the relevant feature space, thereby building a clustering-based embedded feature selection approach described in Section 4.

Color Texture Features
In this paper, we propose to apply our approach with two popular texture descriptors known for their computational simplicity: the reduced size chromatic cooccurrence matrix (RSCCM) [34,35] and the extended opponent color local binary pattern (EOCLBP) [10,36]. These two descriptors are computed from images coded in different color spaces and require defining a neighborhood N in which the spatial interactions within and between the color components of neighbor pixels are both taken into account. For the first descriptor, Haralick features are extracted from different configurations of RSCCM. For the second one, we propose extracting statistical features from the histograms of many EOCLBP configurations [14].

Color Spaces
Color images are usually acquired by devices that code the colors in the RGB color space. However, the color of pixels can be represented in different color spaces which respect different physical, physiologic, and psycho-visual properties. These color spaces can be categorized into four families: the primary spaces; the luminance-chrominance spaces; the perceptual spaces; and the independent color component spaces [35].
Since the choice of a color space directly impacts the classification results, many authors have tried to compare the results obtained using different color spaces in order to find the most suited one for a given application [30,32,33]. The synthesis of these works shows that there is no color space well suited to represent all types of textures. To solve this problem, a few studies have proposed multi-color space approaches [10]. These approaches simultaneously exploit the properties of multiple color spaces by combining them and thus overcome the difficulty of choosing a single relevant color space. Although these approaches have shown their relevance with variable numbers of considered color spaces, it appears that a limited number of color spaces representative of each family is sufficient to improve classification performances [3,10]. Moreover, many of these spaces require knowing the properties of the illumination and the acquisition system to be independent of the device. As these parameters are not known for all image datasets, we propose to describe textures with only device-dependent color spaces that do not need this knowledge. In addition to the RGB color space, a color space belonging to each of the four families is considered herein: • R n G n B n normalized primary space obtained by dividing each color component value by the sum of the three ones; • YCbCr luminance-chrominance space which separates the achromatic and chromatic signals for the television signal transmission; • I1I2I3 independent color component space which provides the less correlated components as possible; • HSV perceptual space which attempts to quantify the subjective human color perception using the intensity, the hue, and the saturation components with the hexcone model. Figure 2 shows the image of a texture acquired in the RGB color space and converted into each of the used color spaces. Converted images are displayed in false colors by additive mixing and their three color channels are displayed below in grayscale. This figure highlights that the texture can be perceived with different appearances depending on the color component and shows the interest of combining multiple color spaces.

RGB
HSV I1I2I3 YCbCr R n G n B n Figure 2. Image of a texture acquired in the RGB color space and converted into the HSV, I1I2I3, YCbCr and R n G n B n device-dependent color spaces.
The subsequent sections describe the color textures features extracted from images coded in these five three-dimensional spaces whose color components are denoted C 1 , C 2 and C 3 for a given C 1 C 2 C 3 color space.

Haralick Features Extracted from Chromatic Cooccurrence Matrices
This section presents the chromatic cooccurrence matrix and its possible configurations from which features are extracted to describe a color texture.

Chromatic Cooccurrence Matrices
This descriptor is the extension to color of the gray level cooccurrence matrix (GLCM) operator that is considered as a two-dimensional histogram of gray level pairs of neighbor pixels. An important property of this operator is its invariance to orientation changes when all the directions of neighbor pixels are taken into account. The chromatic cooccurrence matrix (CCM) considers both the spatial interactions within and between the color components of neighbor pixels in the image plane and the color distribution in a color space [34,35].
Let Q be the number of levels used to quantify the color components C 1 , C 2 and C 3 of a given color space. A reduced size chromatic cooccurrence matrix (RSCCM) is a Q × Q CCM, where the parameter Q is reduced in order to decrease the memory storage cost and thus, the time required to extract texture features from these matrices [35].
The normalized RSCCM m C g ,C g N [I] measures the spatial interactions in the neighborhood N between the two color components C g and C g (g, g ∈ {1, 2, 3}) of an image I. In addition to the quantization level Q, the neighborhood N is a second parameter defined by the user.
For an image coded in a color space C 1 C 2 C 3 with a quantization level Q and a given neighborhood N , six normalized RSCCMs can be computed: The choice of component pairs (within and between component matrices) can be viewed as a third parameter of this descriptor. The next subsection deals with the different configurations that an RSCCM can take.

RSCCM Configurations
Before calculating a chromatic cooccurrence matrix, several parameters have to be set and adjusted. This configuration is complex when the color and spatial properties of the analyzed textures are heterogeneous. This principally depends on: • The number of normalized RSCCM taken into account for a given color space; • Q, the quantization level that defines the size of the RSCCM; • N , the pixel neighborhood in which cooccurrences are counted. N is controlled by two other parameters: -The neighborhood direction: four two-directional neighborhoods are usually used to compute direction-dependent cooccurrence matrices: 0 • , 45 • , 90 • , and 135 • . In order to simultaneously take into account all the possible directions of an observed texture, an isotropic 3 × 3 neighborhood is generally used with a number of eight neighbors located in the four directions.

-
The neighborhood distance: this distance, denoted δ, is the spatial infinity-norm distance separating each pixel from its neighbors. In this paper, an isotropic (2 × δ + 1) × (2 × δ + 1) neighborhood is used with a number of eight neighbors located in the four directions. Thus, we propose to adjust RSCCM configurations depending on two parameters: the quantization level Q and the neighborhood distance δ, assuming that these two parameters control the representation of textures acquired with different observation conditions. Haralick features are thus extracted from RSCCM configurations resulting from each of the following pairs (δ, Q): (1, 16) (

Haralick Features Extracted from RSCCM
The cooccurrence matrices are able to represent the texture but they are not directly used for color texture classification purposes because of the large amount of information they contain. To reduce it while preserving the relevance of these descriptors, Haralick proposed statistical features that can be extracted from each matrix [34]. We propose using the first 13 Haralick features: energy, homogeneity, contrast, correlation, variance, inverse difference moment, sum average, sum entropy, entropy, difference variance, difference entropy, and two measures of correlation I and II [49].
A color texture is then represented by Haralick features extracted from RSCCM with different parameter settings and computed from images coded in multiple color spaces.

Texture Features Extracted from Color Local Binary Pattern Histograms
This section presents the color local binary pattern and its possible configurations from which features are extracted to describe a color texture.

Color Local Binary Pattern Histograms
Color LBPs are extensions to the color of the local binary pattern (LBP) operator that captures the local texture properties of a gray level image [36]. An important property of this operator is its invariance to monotonic gray-scale changes caused, for example, by illumination variations. In order to characterize the whole color texture image, the LBP operator is applied on each pixel and for each pair of components in the color space C 1 C 2 C 3 . Considering a pair of components (C g , C g ), (g, g ∈ {1, 2, 3}), the color LBP labels a pixel with the component C g by thresholding its neighborhood N in the component C g and by encoding the result as a binary number.
The consideration of the Extended Opponent Color LBP (EOCLBP) operator gives rise to nine LBP images: • Three within-component LBP images (g = g ) with the pairs (C 1 , C 1 ), (C 2 , C 2 ) and The choice of the considered pairs of color components can be viewed as a parameter of this descriptor. The LBP images are usually not exploited directly and most authors prefer to use LBP histograms where histogram bins are considered as texture features [36]. Another original idea is to extract statistical features such as Haralick features from LBP images [50].
Instead of using the bins of EOCLBP histograms, we propose to extract two different types of statistical features from these histograms in order to provide color texture features consistent with those of other descriptors [14]. In order to characterize the textures acquired with different observation conditions, these features are extracted from several EOCLBP configurations presented in the next subsection.

EOCLBP Configurations
Due to its popularity, many variants of the basic LBP operator, such as the rotation invariant LBP or the uniform LBP that reduces its dimensionality, as well as their few extensions to color, have been proposed the last two decades [36,51].
The definition of the original LBP operator with its 3 × 3 neighborhood has then been generalized by using a circular neighborhood N so that the EOCLBP parameters are defined by: • The number of LBP histograms taken into account for a given color space; • A circular neighborhood N controlled by -P, the number of neighbor pixels that determines the dimensionality of the LBP histograms. For example, a 3 × 3 neighborhood with P = 8 neighbors gives rise to a 2 8 = 256-dimensional LBP histogram. For each pair of color components, a color texture is thus described by a 2 P -dimensional histogram; δ, the distance between each pixel and its neighbors. This distance is equal to the radius of the circle around the central pixel. Generally, when a neighbor pixel is not confused with the circle, a bilinear interpolation is used to estimate its location. Here, the neighborhood is pre-sampled.  With these parameters, many LBP configurations are available in order to characterize textures in different scales. In this paper, we propose to consider EOCLBP configurations resulting from each of the following pairs (P, δ):

Statistical Features Extracted from EOCLBP Histograms
With the EOCLBP operator, a color texture can be represented by nine LBP histograms that are concatenated to constitute a vector containing 9 × 2 P features for a given color space C 1 C 2 C 3 . Several approaches have been proposed to reduce the dimensionality of such a feature space, such as the uniform LBP operator. Some authors have selected the most discriminant bins that constitute the LBP histograms [36]. Other authors reduced the number of histograms with only three within-component LBP histograms or by adding only three out of six between-component LBP histograms, assuming that the opponent pairs such as (C 1 , C 2 ) and (C 2 , C 1 ) are highly redundant [30]. Another approach consists of selecting, among the nine LBP histograms, the most discriminant ones for the considered application [10].
In this paper, we propose to extract statistical features from each LBP histogram to constitute a reduced dimensionality statistical feature vector [14]. For this purpose, two types of statistical features are proposed: • Six first-order statistical features: mean, median, mode, standard deviation, and two interquartile ranges; • Eleven second-order statistical features extended from the first 11 Haralick features presented in Section 3.2.3 and adapted to deal with histograms.

Compact Hybrid Multi-Color Space Descriptor
This section presents the method used to define the proposed CHMCS descriptor which provides a hybrid and compact representation of color textures.

Hybrid Multi-Color Space Representation
By combining the texture features extracted from several configurations of different descriptors computed with images coded in multiple color spaces, a color texture is represented in a high-dimensional feature space.
Let N spac , be the number of considered color spaces, N p con f , be the number of configurations associated with the pth descriptor and N p f eat be the number of color texture features extracted from each configuration of the pth descriptor. The total number D of color texture features is given by the relation (1): where N p con f can be computed as the product between N p pair , the number of color component pairs considered in each color space for the pth descriptor, and N p para , the number of parameter combinations associated with the pth descriptor. Table 1 presents the possible dimensionalities of the feature space depending on these values. By simultaneously considering the RSCCM and EOCLBP descriptors as well as all the combinations presented in the last two rows of this table, we build a hybrid multi-color space descriptor by concatenating a total number of D = 19,695 color texture features. Due to the curse of dimensionality, it is essential to reduce this number in order to define a compact representation of color textures.

Compact Representation
The proposed clustering-based embedded feature selection (CEFS) approach consists of three stages:

1.
First, an automatic feature clustering algorithm is applied in order to divide the feature set into a number of clusters in which features are redundant or correlated; 2.
Then, one feature is sequentially selected per group; 3.
Finally, the dimensionality of the feature space is determined.
The two first stages of the CEFS approach significantly speeds the selection procedure up since a large number of redundant features are eliminated at each step. Indeed, the filter model-based sequential feature selection procedure is applied to all the features belonging to the different clusters so that only one feature per cluster is selected at each iteration step. Features belonging to the same cluster are removed and thus not considered in the next steps of the selection procedure. The feature clustering stage is fully automatic and does not require any parameters to be adjusted. This multi-criterion approach associates two complementary measures presented in the following subsections: a correlation-based criterion to cluster the feature set and a distance-based criterion to evaluate the relevance of each candidate feature space. The last subsection presents the third stage of the CEFS approach which uses the accuracy of a classifier operating in each feature subspace selected at different dimensions in order to determine the relevant feature space dimensionality.

Notations
Let S = {F 1 , F 2 , · · · , F k · · · , F D } be a set of D features where F k is the kth feature of this set. Feature selection aims to define a subset Sd ⊂ S with a reduced numberd of features.
In a supervised context where the class labels of the color texture are known, this procedure is applied on training data of N color texture samples. The training data can be represented in the D-dimensional feature space by the N × D data matrix X: Each of the D columns of the matrix X is the N-dimensional feature vector ∈ R N that represents a feature F k (k = 1, · · · , D) and each of the N rows of the matrix X is the D-dimensional sample vector ∈ R D that represents the ith color texture sample (i = 1, · · · , N) so that: Let m = m 1 , m 2 , . . . , m k , . . . , m D be the D-dimensional mean feature vector where m k is the mean of N elements of x k defined by Equation (2): The matrix X is associated with an N-dimensional vector y = (y 1 , y 2 , · · · , y i , · · · , y N ) T ∈ R N that represents the class labels of the training data where y i is the class label of the ith color texture sample (i = 1, · · · , N). Let N C be the number of classes.
Let m j = m 1 j , m 2 j , . . . , m k j , . . . , m D j be the D-dimensional mean feature vector of class j where m k j is the mean of the feature F k computed on the N/N C samples labeled to the jth class:

Feature Clustering
The clustering of the feature set S is based on a dependency graph where the nodes are the considered color texture features which are linked by an edge if they are correlated. Two features are correlated if the absolute value of their Pearson's linear correlation coefficient is higher than a threshold [52]. The correlation ρ between two color texture features F k and F l represented by their vector x k and x l , respectively, is defined by the following equation: If x k and x l are totally correlated, the value of ρ tends to its limits 1 or −1, and if they are completely uncorrelated, ρ is close to zero. Features directly connected are considered to be "dependent" and the features which are indirectly connected via other features are considered to be "long dependent". The proposed clustering algorithm aims to put into the same feature cluster the dependent and long dependent features. Given a correlation threshold, two features F k and F l belonging to S are considered to be long dependent if ∃F m ∈ S, F k is dependent and so, connected to F m and F l is dependent and so, connected to F m in the dependency graph. As in many clustering algorithms such as K-means and affinity propagation, the parameter setting, generally adjusted by a user, is crucial because it directly impacts the clustering result. The originality of our approach is that it automatically adjusts the correlation threshold which is the only parameter of the clustering algorithm. This operation is performed by varying the correlation threshold and then evaluating the clustering quality.
The clustering algorithm partitions the feature set S into a number N t of clusters depending on the value t of the correlation coefficient threshold (t = {0.75, 0.8, 0.85, 0.9, 0.95}) so that S = {C 1 , C 2 , · · · , C a , · · · , C N t } where C a is the ath cluster of features (a = 1, · · · , N t ).
It is assumed that the more the clusters are well separated and compact, the higher the clustering quality is. The clustering quality evaluation is so performed using a measure of cluster separability and compactness defined by Equation (5): where trace(A) is the trace of the matrix A, B is the between-cluster scatter matrix defined by Equation (6) and W is the within-cluster scatter matrix defined by Equation (7).
In these equations, |C a | is the cardinal of C a , µ = (µ 1 , µ 2 , . . . , µ i , . . . , µ N ) T , is the Ndimensional mean sample vector with µ i , the mean of the D elements of x i defined by Equation (8) and µ a = µ a 1 , µ a 2 , . . . , µ a i , . . . , µ a N T is the N-dimensional mean sample vector of the cluster a (a = 1, · · · , N t ) with µ a i , the mean of the ith sample computed on the features belonging to the ath cluster defined by Equation (9): The correlation thresholdt used by the clustering algorithm is the one for which Tr is maximum. Let us note that the higher the correlation threshold is, the less the number of initial connections between features is, the less the number of correlated and also long correlated features is, and therefore the greater the number of clusters is.

Sequential Feature Selection
Then, a sequential forward selection (SFS) approach, based on a filter model, is applied to the feature set S = Nt a=1 {C a } previously clustered.
Since features in the same cluster are considered redundant, only one feature from each cluster is selected by using the filter model.
Following a forward sequential strategy, the feature selection algorithm selects, at each iteration step, a feature from the candidate feature set depending on the value of the evaluation function. For this purpose, each of the remaining candidate features is added to the feature subset under construction in order to consider as many feature subsets as there are candidate features. As for the feature clustering step, we used a distance-based measure as an evaluation function. Previously, the trace criterion was applied to measure the cluster separability and compactness. In this stage of the feature selection process, this criterion, defined by Equation (5), was used to measure the class separability and compactness and evaluate the discriminating power of a candidate feature space. Here, B represents the between-class scatter matrix defined by Equation (10) where the number of samples N N C is equal for each class and W represents the within-class scatter matrix defined by Equation (11).
where y i = j means that the sum is applied to all the samples whose class label y i is equal to j, the index of the considered class. The selected feature subset at each iteration step d of the procedure is the subset for which the trace criterion is the maximum.
Once a feature is added to this subset, the cluster in which this feature belongs is removed to update the remaining candidate feature set that will be evaluated at the next iteration step (d + 1). As a consequence, the number of candidate features dramatically decreases at each step. On the one hand, this feature cluster removal reduces the feature redundancy and, on the other hand, this stage accelerates the selection procedure compared to a classical sequential feature selection method.

Determination of the Relevant Feature Space Dimensionality
Finally, the last step of the feature selection scheme consists of determining the dimension of the relevant feature subspace. Since the evaluation function associated with the filter model is monotonic, it cannot be directly used to determine the dimension of the final feature space. The proposed embedded method integrates a classifier whose accuracy is measured once a feature is added at each step d of the procedure. For this purpose, the training set is divided into a training image subset and a validation image subset from which the classification accuracy is measured following a K-fold evaluation. (K − 1) folds are used to constitute a training image subset and the remaining fold is assigned to a validation image subset from which the classification accuracy is measured. This cross-validation procedure is repeated 10 times with different distributions of the training set images in the validation and training subsets [53]. For each d-dimensional feature subspace selected at each step d, this accuracy is estimated as the mean rateR d of well-classified validation images over the K folds and over the 10 repetitions. Since this measure tends to be stabilized after a limited number of iteration steps, the procedure stops when a maximum number d max of iteration steps is reached or when all the clusters of features are removed. d max is a parameter that controls the learning processing time. The dimensiond of the selected feature subspace is the one for whichR d is maximum (d = 1, · · · , d max ). The accuracy is measured using the nearest neighbor classifier associated with the L1 distance because no parameters need to be adjusted.

Algorithm
Algorithm 1 presents how the CEFS procedure runs.

Algorithm 1
The CEFS procedure.

1.
Cluster the feature set S so that S = Remove the cluster in which F * belongs (S = S\C a | F * ∈ {C a }); 6.
Measure accuracy as the mean rateR d of well classified validation images; 7.
Go to 3 if d ≤ d max or S = ∅; 8.
End with the computation of the dimensiond, otherwise:

Experimental Results
This section firstly presents the image databases on which the experiments were carried out. A first fine analysis of the results reached by our approach on one of these databases is performed. The results obtained with the five databases are then presented, compared, interpreted and discussed.

Datasets
In order to highlight the contribution of our approach, we performed an evaluation on five benchmark color texture databases: Outex, NewBarktex2, USPTex, STex, and Parquet:

•
Outex contains a very large number of surface textures acquired under controlled conditions by a 3-CCD digital color camera and whose size is 746 × 538 pixels. These textures are split up into 29 categories of color texture images such as wood, fabric, wallpaper, sand, tile,... [54]. To build the Outex set, 68 color texture images from 12 categories of this database are split up into 20 disjoint sub-images whose size is 128 × 128 pixels (see Figure 5 for a sample of each category), giving rise to 68 different texture classes. Among these 1360 sub-images, 680 are used for the training subset and the remaining 680 are considered as testing images. This dataset is known as the Outex_TC_00013 test suite. • The BarkTex database includes six tree bark classes, with 68 images per class. To build the NewBarkTex2 dataset, a region of interest, centered on the bark and whose size is 128 × 128 pixels, is first defined [2]. We thus obtain a set of 68 sub-images per class (see Figure 6 for a sample of each class). Half of these images are used for the training and the remaining 204 for the testing stage. Since the sub-images of this dataset come from different original images, the textures of the training and testing subsets are weakly correlated. This decomposition is available at: https: //www-lisic.univ-littoral.fr/~porebski/NewBarkTex2.zip (accessed on 20 May 2022). • USPtex contains 191 natural color textures acquired under an unknown but fixed light source [4]. The images are split up into 128 × 128 disjoint sub-images (see Figure 7 for randomly selected samples of different categories due to the large number of classes). Since the original image size is 512 × 384 pixels, this makes a total of 12 sub-images by a texture. For our experiments, this initial image dataset is split up in order to build a training and a testing image subset: six images are considered for the training and the six others are used as testing images. This decomposition is available at: https://www-lisic.univ-littoral.fr/~porebski/USPtex.zip (accessed on 27 January 2016).  • The Parquet database is composed of fourteen varieties of wood for flooring [3]. Each type of wood presents several grades ranging from 2 to 4 which are considered as independent classes, leading to a total of 38 different classes. The main challenge with this database is that, within each type of wood, the grades are very similar to each other. Moreover, the sizes of the acquired images are different and the number of samples per class varies from 6 to 8. As in [13], six samples per class are retained and the images are center-cropped so that the final dimension of the images ranges from 480 × 480 to 1300 × 1300 pixels (see Figure 9 for a sample of different classes belonging to the OAK wood category). For each texture, three images are considered for the training and the other 3 are used as testing images. This decomposition is available at: https://www-lisic.univ-littoral.fr/~porebski/Parquet.zip (accessed on 10 June 2020).  Table 2 summarizes the color texture datasets used in the experiments of this paper.

Color Texture Feature Combinations
In order to highlight the contribution of combining several descriptor configurations, we propose to compare the CHMCS descriptor with different color texture representations, each coming from one descriptor: These various representations aim to determine which parameters really impact the classification results. To show this impact without being influenced by other parameters such as those of the classifier, we decide to use the nearest neighbor classifier because it does not need any parameter to adjust. As for the learning stage, this classifier is associated with the L1 distance and applied on the test image subsets. The accuracy is measured as the percentage of well-classified test images. All experiments were performed with the Matlab software using the CALCULCO computing platform supported by SCoSI/ULCO (Service Commun du Système d'Information from the University of Littoral Côte d'Opale) with different CPU and RAM. In addition, the online version of the CATAcOMB (Colour Furthermore, Texture Analysis Toolbox for Matlab) toolbox available at https://bitbucket. org/biancovic/catacomb accessed on 23 May 2022 and first released in February 2019 is used for comparisons with other approaches [13]. Table 3 indicates the number of features used for each of the color texture representations above presented and applied to different descriptors.  When these representations are associated with the dimensionality reduction scheme presented in the previous section, they give rise to other compact color texture descriptors that we compare in the next subsections. During the learning stage of these experiments, d max is set to 100 so that the learning processing time is reduced and K is set to 3 for the K-fold cross-validation which is repeated 10 times.

Extensive Analysis on the USPtex Dataset
First, we provide a detailed analysis of the results reached by the proposed approach on the USPtex dataset. Table 4 shows the results obtained for each representation presented in the previous section with several color spaces and the parameter settings of different descriptors taken into account either individually or jointly. This table is organized as follows: • For a given color space and a given parameter setting of a descriptor (RSCCM or EOCLBP), the accuracy obtained with the corresponding compact SPSC representation is indicated with the feature space dimension between brackets. • For each parameter setting of a descriptor (RSCCM or EOCLBP) associated with multiple color spaces, the accuracy obtained with the corresponding compact SPMC representation is indicated in the last column with the feature space dimension between brackets. The previous two columns show, respectively, the mean accuracy and the accuracy interval computed from the five color spaces. The analysis of this table allows first to draw conclusions on the impact of SPMC (multiple color spaces) or MPSC (several parameter settings) representations compared to SPSC ones for each descriptor: • By comparing the accuracy in the last column (SPMC) for each row (chosen parameter setting) of each descriptor with the maximum accuracy obtained with a single color space, the SPMC representation always outperforms the SPSC ones (+9.36% on average for the two descriptors) with higher dimensionality feature spaces (63.9 for SPMC and 30.9 for SPSC on average for the two descriptors). Thus, combining multiple color spaces improves the classification results compared to a single color space which is previously unknown. • By comparing the accuracy in the last row (MPSC) for each column (chosen color space) of each descriptor with the maximum accuracy obtained with a single parameter setting, there are only two cases (out of 190) where the SPSC representation outperforms the SPMC ones. These two cases only appear with the parameter setting (256, 1) of the RSCCM descriptor underlined in the table. On average, for the two descriptors, an accuracy increase of +10.18% is observed with the MPSC representation for a slight increase in the feature space dimensionality (42.0 for MPSC and 30.9 for SPSC). Compared to a single parameter setting which is previously unknown, the combination of several parameter settings globally improves the classification results.
For each descriptor independently considered, the compact MPMC (95.63% for RSCCM and 95.64% for EOCLBP) representation also provides higher accuracy than the best SPSC ones that is underlined and written in bold in Table 4 (94.24% for the (256, 1) parameter setting of RSCCM in the YCbCr color space and 92.32% for the (12,2) parameter setting of the EOCLBP in the I1I2I3 color space). On average, MPMC representations provide higher accuracy than the SPMC (93.02% for RSCCM and 93.12% for EOCLBP) and MPSC (90.17% for RSCCM and 91.39% for EOCLBP) ones. However, there are three cases (underlined and written in italic in Table 4) where the SPMC representation, only with EOCLBP, outperforms a bit for this dataset. The combination of several descriptor configurations (multiple color spaces and parameter settings) is thus always preferred compared to representations where a predefined configuration is previously chosen (color space, parameter settings of the descriptor or both).
For each of the two descriptors used in this paper, Figures 10 and 11 show, respectively, the distribution of the selected features when a MPMC representation is used. The distribution of color spaces, color component pairs, descriptor parameters, and features extracted from the descriptor are represented by pie charts.

RSCCM
For the RSCCM descriptor (see Figure 10),d = 84 color texture features are selected from the 9750 available ones with the USPtex dataset. These features are equally derived from the five spaces with the R n G n B n color space as the most selected (29%). The six possible pairs of components are also selected equally but the three within-component pairs represent more than half of the pairs. Among the 25 available combinations of parameters, 15 are exploited by the descriptor with quantization levels and neighborhood distances which are very different. Among 13 Haralick features, 11 are selected with a dominance of the contrast feature (25%). For the EOCLBP descriptor (see Figure 11),d = 89 color texture features are selected from the 9945 available ones with the USPtex dataset. These features are derived from all color spaces with the RGB color space as the most selected (44%). All possible pairs of components are selected where the three within-component pairs represent more than half of the pairs. Among the 13 available combinations of parameters, 10 are exploited by the descriptor with different numbers of neighbors and neighborhood distances. The standard 3 × 3 isotropic 8-neighborhood seems to be the most often selected (27%). Among the 17 available statistical features, 10 are selected with a dominance of the homogeneity measure (24%).
This study confirms that, for a given descriptor, there is no unique color space, pair of components, parameter setting or feature extracted which is relevant. Indeed, the selected configuration varies with the considered descriptor. These results justify the approach proposed in this paper.
Finally, Table 4 shows that the proposed CHMCS descriptor (97.70% boxed and bold written in this table) outperforms any of all other representations. For the USPtex dataset, the two descriptors are relatively equally exploited and with all color spaces by the CHMCS representation (see Figure 12).

Overall Results
In this subsection, the rest of the results are given for the five color texture datasets. For each of the two descriptors (RSCCM and EOCLBP), Table 5 first highlights the best results obtained with a compact SPSC representation. For each accuracy presented in this table, the corresponding descriptor configuration is given in terms of parameter setting and color space. This pair is different for the same descriptor on all datasets. This result allows to generalize the conclusion of the previous section. Table 6 shows how the selected features of the CHMCS descriptor for each dataset are distributed. The second column of this table gives the dimension of the selected feature space which is lower than d max for all datasets. The next two columns count the number of times that each of the two descriptors are selected: in parentheses is the number of different parameter settings of this descriptor (25 settings are candidates for RSCCM and 13 settings are candidates for EOCLPB) and the number of its different color component pairs (6 pairs are candidates for RSCCM and 9 pairs are candidates for EOCLPB). For most datasets, this number is approximately equal except for the Parquet dataset where RSCCM is more often selected. For the NewBarkTex2 dataset, it can be pointed that it is EOCLBP which is selected a little more often. This table also shows that various parameter settings are used with the two descriptors. The six available component pairs are used by the RSCCM descriptor for four datasets (NewBarkTex2 only requires five pairs) while, for the EOCLPB, the number of used component pairs is between six and nine. The last five columns count the number of times each of the five color spaces is selected. All the color spaces are exploited by the CHMCS descriptor.  Table 7 gives the accuracy obtained on the five datasets with the proposed CHMCS descriptor compared to those obtained with the compact SPSC, MPSC, SPMC, and MPMC representations for each of the two descriptors. For the SPSC, SPMC and MPSC, the mean of the classification rates are given as the measure of accuracy. In order to underline the impact of the dimensionality reduction scheme, the accuracy obtained without selection is also given. For each dataset, the highest accuracy is written in bold and the highest accuracy reached by the other compact representations is underlined. For all datasets, the CHMCS descriptor outperforms the other approaches and shows the purpose of combining different descriptors. When a single descriptor is considered (RSCCM or EOCLBP), compact representations that take into account several descriptor configurations (MPMC) give the highest accuracy. Compared to Table 5, wherein the best results appear with a predefined descriptor parameter setting and color space, the CHMCS descriptor provides a higher accuracy and solves the difficult problem of a prior choice of well-suited configuration. This is an essential key point of our approach. Classification results drastically decrease when no selection is performed. The dimensionality reduction scheme is thus the second essential key point for the success of the classification. Table 8 indicates the dimensionality of the selected feature space from which the classification rates of Table 7 are computed. For the SPSC, SPMC and MPSC, the mean dimensionalities are computed. In all cases, the dimensions are less than d max = 100, which considerably reduces the classification time during the decision stage. Finally, we propose to compare the relevance of the proposed CHMCS descriptor with handcrafted color texture descriptors and deep learning approaches (see Table 9). Deep learning, and more specifically, convolutional neural networks (CNNs) provide impressive performances in computer vision problems such as image classification, object detection or pattern recognition, and have become the benchmark computer vision technique of our time. For a fair comparison, the accuracy is thus evaluated on the five color texture datasets with image classification algorithms based on: • Four popular pretrained CNN models that are fine-tuned with the training images of the considered dataset: AlexNet, GoogleNet, ResNet18 and ResNet50 [25][26][27]. Here, the last fully connected layer is modified to match the number of classes in each target dataset; • Five pretrained generic CNN models that provided the best overall results in [13]: ResNet-50, ResNet-101, ResNet-152, VGG-VD-16 and VGG-VD-19 [27,28]. Here, the order-sensitive output of the last fully connected layer is used to generate the features and an L2 normalization of the resulting feature vector is achieved. The dimensionality of this vector is 2048 for the ResNet models and 4096 for the VGG-VD ones; • Four handcrafted color texture descriptors: OCLBP, IOCLBP, LCVBP, and SWOBP (see Section 2) in addition to the best configuration determined for RSCCM and EOCLBP (see Table 5). OCLBP, IOCLBP, and LCVBP descriptors were used to obtain a multiple resolution feature vector by concatenating the histograms of their rotation-invariant version computed with five 8-neighborhood distances (δ ∈ {1, 2, 3, 5, 10}). The dimensionality of the OCLBP, IOCLBP and LCVBP descriptors are thus (36 × 5 × 6) = 1080, (71 × 5 × 3 + 72 × 5 × 3) = 2145 and (36 × 5 × 4) = 720, respectively. The SWOBP descriptor is used with the setting recommended by the authors (24 neighbors at a distance δ = 3) so that its dimensionality is 2244 when the uniform pattern version is used.
Except for the fine-tuned CNN models where the classification is performed directly by the network, all the results are determined using the nearest neighbor classifier associated with the L1 norm distance. This table shows that our approach based on supervised learning provides results which are consistent with those achieved with deep learning networks and superior to traditional handcrafted color texture descriptors. On average, it gives better results than pretrained CNN-based approaches. For two datasets (Outex and Parquet), the proposed texture representation outperforms the CNN-based representation with the highest accuracy (ResNet50) and provides the same result for the NewBarkTex2 dataset. The smaller the database is, the more the proposed descriptor seems to outperform this network. CNNbased methods fail when the number of training samples is low, as with the Parquet dataset, while our approach gives satisfactory results. Moreover, the CHMCS representation is able to discriminate the color textures of different classes which are very similar to each other as in the Parquet dataset, whereas the pretrained CNNs do not seem suitable for fine-grained texture classification with small inter-class and large intra-class variations, as shown in [13]. Thus, our approach remains very stable with any of the color texture dataset compared to CNNs. The results achieved by the proposed CHMCS representation with only two basic descriptors (RSCCM and EOCLBP) are very encouraging and lead to consider that they can be improved, even for large datasets, by adding other descriptors which provided notable results in Table 9.

Conclusions
In this paper, we proposed a compact, hybrid and multi-color space texture representation based on two key points: • The combination of texture features extracted from several parameter settings of different descriptors computed from images coded in multiple color spaces. This representation simultaneously takes into account the different color and spatial properties of the textures to be analyzed and overcomes the difficulty of a prior parameter setting. • The dimensionality reduction of the feature space by a clustering-based sequential forward selection procedure.
The proposed selection procedure uses a feature subset search algorithm associated with a multi-criteria evaluation and is applied to the features automatically grouped into clusters beforehand. Feature clustering is performed from a dependency graph which is constructed using a correlation measure. Based on a distance measure, only one feature per cluster is selected at each iteration step of the sequential forward procedure and the cluster to which this feature belongs is removed from the feature set. Following an embedded model, the dimension of the feature space is finally determined by an accuracy measure computed from a validation image subset based on a repeated K-fold cross-validation.
In most traditional approaches, the texture descriptors are used with a predefined setting of the parameters and computed from images coded in a chosen color space. The principal contribution of the proposed approach is to combine the manifold configurations of descriptors including the color spaces in order to take into account the possible low inter-class and high intra-class appearance variations of the color textures to be classified.
Another key point of this approach is the use of a correlation coefficient whose threshold is automatically determined by evaluating the feature clustering quality with a cluster separability and compactness measure so that no parameter requires adjustment The results obtained with five benchmark texture databases show that combining the different configurations of a texture descriptor always improves the accuracy compared to approaches that use a prior predefined configuration. The proposed compact hybrid multi color space descriptor provides, on average, better results compared to deep learning approaches. Although the results obtained with the ResNet50 network are better when the number of data is large, this CNN-based representation fails when the dataset is small.
The proposed method could be extended by adding other descriptors in order to produce a better performance, even if the datasets are large. It can also be improved by adding criteria guaranteeing the stability of the feature selection procedure when the number of features is high compared to the number of training samples. Classification results could obviously be enhanced by using more sophisticated classifiers. However, our approach provides a high level of dimensionality reduction and competitive classification accuracy with a reasonable processing time and very few parameters to adjust.