An Automated System for Garment Texture Design Class Identification

: Automatic identification of garment design class might play an important role in the garments and fashion industry. To achieve this, essential initial works are found in the literature. For example, construction of a garment database, automatic segmentation of garments from real life images, categorizing them into the type of garments such as shirts, jackets, tops, skirts, etc . It is now essential to find a system such that it will be possible to identify the particular design (printed, striped or single color) of garment product for an automated system to recommend the garment trends. In this paper, we have focused on this specific issue and thus propose two new descriptors namely Completed CENTRIST (cCENTRIST) and Ternary CENTRIST (tCENTRIST). To test these descriptors, we used two different publically available databases. The experimental results of these databases demonstrate that both cCENTRIST and tCENTRIST achieve nearly about 3% more accuracy than the existing state-of-the art methods.


Introduction
Garment selling companies as well as fashion industries have created an interesting research area in the field of image processing and pattern recognition.A company always wants to achieve a competitive advantage against its rivals for sustaining itself in the industry.Thus if a company somehow knows the current design trends and choices of people regarding their clothing, the company can adopt proper strategies and produce the clothes based on people choices.Besides this, online shopping is becoming very popular now a days.If a retailer knows which type of design for a particular garment is being bought by the consumers, they can increase their stock for that design.Thus, an automatic identification of design class is necessary.Such an automatic identification of garments trend can help different types of people.
Recently, garment related research has become popular [1][2][3][4][5].Most of the works focus on the segmentation of garments from real life images.There are also a few works that identify the type of the garments such as which ones are shirts or jackets.Considering all of these are the essential initial works, now it is necessary to develop a system that can also identify the design class of the garments.Thus, all of these as a whole will help the industry or an owner of a retailer of an online shop.The overall flow of this work is shown in Figure 1.To identify the design class of a garment product, it is necessary to analyze the texture of the product.For texture classification, there are several existing well known methods such as CENsus Transform hiSTogram (CENTRIST) [6], Local Binary Pattern (LBP) [7], Gabor [8], Histogram of Oriented Gradient (HOG) [9], GIST [10] etc.Among those, LBP and CENTRIST are very similar and gain popularity for their computational simplicities and better accuracies.However, it is well known that LBP and CENTRIST are very sensitive to uniform and near uniform region.Variants of LBP like LTP [11] and Completed Local Binary Pattern (CLBP) [12] can handle this issue to some extent.To capture different orientations of a particular design, the features should be rotation invariant.Thus, CLBP might be a better choice in this regard.However, to categorize the garments product based on design class the global relation among the structural information is necessary.CENTRIST [6] has mostly gained this property by incorporating Spatial Pyramid (SP) structure.Thus, to incorporate the structural information, suppress detail texture information, stability in uniform or near uniform region and to achieve rotation invariance, we have proposed two different descriptors: Completed CENTRIST (cCENTRIST) and Ternary CENTRIST (tCENTRIST) and the major contributions of this paper are as follows: i.We have introduced an automated system which can categorize garment products into some specific design classes, ii.For capturing rotation invariant texture properties, we have proposed cCENTRIST and, iii.Propose tCENTRIST, where there are no rotation invariant textures The rest of the paper is organized as follows: Section 2 discusses the background studies; Section 3 describes the proposed method of our garments texture classification, Section 4 presents an experimental result of the approach and finally, Section 5 concludes the paper and summarizes the work.

Background Studies
In this section, we will discuss the existing garment segmentation and their categorization strategies.We will also describe the existing descriptors that are used for texture based classification which might be used for garment design class identification.

Garment Product Segmentation and Type Identification
Gallagher et al. [1] segmented clothing from an image using grab cut algorithm [13] for recognizing a person.This method successfully extracted the region of interest (ROI); however, it is limited to the torso region only.Menfredi et al. [14] also proposed an approach for automatic segmentation of garments and classified them into nine different classes (such as shirts, dresses, skirts).For this, they first extracted the shape of a specific garment by using a projection histogram.After that, they divided the whole image into 117 cells (9 × 13 grids) and grouped them into partially overlapped blocks of 3 × 3 cells.The orientations are quantized into nine bins to compute the HOG features [15] in each cell.They used multiclass linear support vector for training by concatenating the projection histogram and HOG features.Similar recent works can be found in [5] where, the authors used conditional random field (CRF) for parsing outfits.Bourdev et al. [16] in their research proposed a method for recognizing attributes such as gender, hair style and types of clothes such as t-shirts, pants, jeans shorts etc. from an input image.For this, they created a dataset consisting of 8000 people with annotated attributes.Yamaguchi et al. [2] proposed a method for parsing clothes from fashion photographs.They introduced a dataset (Fashionista Dataset), which consisted of 158,235 photographs and then selected 685 photos that had good visibility of the whole body and covered a variety of clothing items.For training and testing, they used these 685 images and associated text annotation with the labeling of the images by identifying 14 different parts of a body and different clothing regions.To identify the clothes, they firstly detected the pose of a person from an input image using a method described in [17] and then they detected similar types of clothing from the dataset.Kalantidis et al. [3] proposed an approach, which can automatically suggest relevant clothing products.When an input image is given, they estimate the pose of the person using the same method used in [2].After that, they segment the clothing area, which is followed by the extraction of clothing class (such as shirt, tops etc.).Lastly, they apply an image retrieval technique to retrieve the visually similar clothes for each class, which is 50 times faster than [2].Recently, to find the three different visual trends, namely floral print, pastel color and neon color from runway to street fashion, the authors in [4] used five different features such as color, texture, shape, parse and style descriptor.They produced inspiring results in several areas such as season, year, brand and the influence of runway collection to three potential visual trends-two colors (neon and pastel) and a design class (floral print).However, it would be more beneficial in this field, if more color and design classes could be incorporated.

Texture Based Classification
For texture based classification, there are several existing well known methods such as Wavelets transform [18], Gabor filters [19], Scale-invariant feature transform (SIFT) [20], HOG [9,15], LBP [7] features.Recently, LBP is considered as an effective texture classification methodology which was proposed for describing the local structure of an image.LBP and its variants can be uniform and/or rotation invariant [21] and have been extensively exploited in many applications, for instance, facial image analysis, including face detection [22][23][24][25], face recognition and facial expression analysis [26][27][28][29][30][31][32][33][34]; demographic (gender, race, age, etc.) classification [35][36][37][38]; moving object detection [39], etc.The major reasons behind the popularity of LBP based methods are their computational simplicity, robustness against monotonic illumination variation and better performance in several areas.However, LBP is very sensitive in uniform and near uniform regions, which makes its code unstable in most of the cases [11].In the last few years, lots of efforts have been invested in LBP based methodology to improve its performance and fit it with different applications such as derivative-based LBP [40], dominant LBP [41], center-symmetric LBP [42], etc. Tan and Triggs proposed Local Ternary Patterns (LTP), which use three-value encoding and shows tolerance to noise up to a certain level [11].They assumed noises in an image vary within a fixed threshold (±5).With this assumption, the authors in [11] made LTP more discriminant and less sensitive to noise in a uniform region.There exist few other proposals that also handle noises in different application areas, such as the methods described in [43][44][45].Among them, one of the recent texture based face detection proposal is Local Gradient Pattern (LGP), which is a variant of LBP and uses adaptive threshold for code generation [45].Apart from sign information (as in LBP), Completed Local Binary Pattern (CLBP) is proposed by Zenhua et al. which incorporates sign, magnitude and center pixel information [12].This is rotation invariant and capable of handling the fluctuation of intensity.Jianxin et al. [6] proposed CENsus Transform hiSTogram (CENTRIST) which is very similar to LBP and was proposed mainly as a visual descriptor for recognizing topological places or scene categories.In order to capture the global structure of an image on larger scales, CENTRIST proposes a spatial representation based on a Spatial Pyramid Matching Scheme (SPM) [46], which is a collection of orderless feature histograms computed over cells defined by multi-level recursive image decomposition.CENTRIST uses a total of 31 blocks, which helps to avoid the artifacts created by the non-overlapping blocks in the traditional SP.Among them, 25 blocks come from level 2, five blocks from level 1 and one block from level 0. The SPM construction mechanism is shown in Figure 2.
So far, we have discussed several feature extraction methods, which are proposed to be used with different types of classifiers for solving various types of applications.Among them, different kernels of the Support Vector Machine (SVM) [47] are popular for their better performances.In recent times, Deep Learning [48,49] has become popular for various classification problems such as speech [48,49], digit [49] and object recognition [50].Unlike Artificial Neural Networks, it uses many levels to represent highly nonlinear and highly varying functions.Usually, Deep Learning requires a large amount of training data to build the model from scratch and a number of training iterations for better performance [51][52][53].

A Brief Description of Texture Descriptors
Our cCENTRIST and tCENTRIST mainly adopt the idea from CLBP, CENTRIST and LTP.We thus provide a brief description on these three descriptors.

CLBP (Completed Local Binary Pattern)
CLBP considers both the signs (CLBP_S) and the magnitude (CLBP_M) information that come from the differences between a pixel and its neighbors.They also generate a binary code (CLBP_C) for the center pixel by global thresholding.Figure 3 shows the framework of CLBP.

| |
(1) Here, SP = 1 when dP ≥ 0 otherwise SP = −1.Figure 4c,d shows the result of Equation ( 1).CLBP_M converted as a binary number format by the Equation ( 4) and each −1 is replaced by a 0 in CLBP_S.Equations ( 2)-( 4) shows the calculation of CLBP_C, CLBP_S and CLBP_M., , , , Here, c is a threshold, which might be calculated as the average of the whole image.P and R are the number of neighbors and the radius of LBP code.It is noteworthy to mention here that CLBP considers uniform and rotation invariant code.Thus, the size of the histogram of CLBP_S and CLBP_M and CLBP_C are 10, 10 and 2 respectively.To calculate the feature vector using CLBP, a 3D histogram is constructed using CLBP_C, CLBP_S and CLBP_M.If we consider only CLBP_S and construct a histogram, then it will become the original LBP.

CENTRIST
CENTRIST [6] is based on the concept of Census Transform (CT) proposed by R. Zabih et al. [54].It is a non-parametric local transform technique that maps a pixel by comparing intensity value with its eight neighboring pixels and produces an eight bit string (CT values).LBP also use the similar strategy.The only difference is that LBP performs interpolation while considering the corner pixels but CENTRIST considers the corner pixel as is.An example of CT calculation is given in Figure 5. CENTRIST uses a histogram of the CT values of image patches for an image to capture both the global and local information.To capture the global structure of an image, they also used spatial representation based on Spatial Pyramid Matching (SPM) scheme, that divides an image into sub regions and integrate correspondence result in those regions, which improves recognition.

Local Ternary Pattern (LTP)
Local Ternary Pattern (LTP) [11] follows the same spirit of LBP.LTP introduces a new bit to manage the fluctuations of intensity.Thus, LTP becomes a ternary code at a pixel c, which is generated by the following Equation ( 5): Here, µ is a threshold value which is ±5.An LTP code is usually split into two binary codes (upper pattern and lower pattern) to reduce the size of the feature vector.Two histograms are created separately and then concatenated to represent the feature vector of an image.

Proposed Method
An image can be obtained from a camera installed in a public place or from an online shop.These captured images can be segmented to obtain garment information and their type by using the method described in [14].The proposed descriptors are then applied to classify them into different design classes.Two new descriptors namely, Completed CENTRIST (cCENTRIST) and Ternary CENTRIST (tCENTRIST) are proposed in this regard.These descriptors are a fusion of Completed Local Binary Pattern (CLBP), Local Ternary Pattern (LTP) and CENsus TRanformed hISTogram (CENTRIST).In Section 2, we have described CLBP and CENTRIST and LTP.In this section, we will describe how we have combined them to produce a better result of garment design class categorization.Figure 6 shows the block diagram of our proposed method.

Completed CENTRIST (cCENTRIST)
cCENTRIST is the combination of CLBP and CENTRIST.For generating features (CT values), unlike LBP, we consider the corner pixel as is.Again, like LBP, we introduce uniform and rotation invariant code of CT considering sign, magnitude and center pixel information.A spatial pyramid (SP) structure is adopted and for each block of SP, a 3D histogram is constructed and the histograms of all these blocks are concatenated.To reduce the dimension of the feature vector, Principle Component Analysis (PCA) is then applied and the final feature vector is constructed, which is then used for garment Combine N blocks for constructing a feature vector of length M × N for the input image.

Ternary CENTRIST (tCENTRIST)
For generating tCENTRIST, we also adopt SP.For each block of SP, we calculate Local Ternary Pattern (LTP) and construct two histograms-one for upper code and another for lower code of LTP.These two histograms are then concatenated to build a single histogram.Finally, histograms for all blocks are combined and PCA is applied for dimension reduction which gives us the final feature vector for an image.SVM is also used here for classification.Algorithm 2 describes this process.

Experiments
In this section, we first describe our datasets followed by the training and testing protocol.Finally, experimental results with necessary discussion will be presented.

Dataset
Firstly, we used a publicly available dataset (http://imagelab.ing.unimore.it/fashion_dataset.asp, Fashion Dataset) which was originally created for Garment Product Recognition [14].We have manually categorized them into three familiar design classes namely "single color" (2440 images), "Print" (1141 images) and "stripe" (636 images).Figure 7 shows the example dataset.Besides this, we also used the "Clothing Attribute Dataset" [55] for evaluating our proposed method.The original dataset contain 1856 images, and there are different basic design classes such as Floral, Graphics, Plaid, Spotted, Striped and Solid pattern that we have used in our experiments.After that, we manually extracted the garment area of each image and reconstructed the dataset by 1575 images with six different categories of cloth images (69 floral images, 110 graphics images, 105 plaid images, 100 spotted images, 140 striped images and 1051 solid pattern images).Figure 8 shows the example images of "Clothing Attribute Dataset" used in our work.

Training and Testing Protocol
For "Fashion Dataset," we have divided each garment class into training and testing set where100 randomly selected images are taken as training from each of the classes, and rest of the images are considered as testing data.Thus, for training, 300 images are used and the rest of the 3917 images are used for test.This process is repeated five times.For "Clothing Attribute Dataset" we consider 50 random images from each of the six categories as training and rest are as testing data.For evaluating the performance, we also considered the overall accuracy, Recall, Precision, F_Measure which are defined in Equations ( 6)- (9).

Experimental Result and Discussion
We have proposed two different methods, namely cCENTRIST and tCENTRIST for feature extraction.Both of these are compared here to the original CENTRIST, HOG, LGP and GIST feature extraction technique.Table 1 shows the accuracy of experimental results using linear kernel of SVM.We have also calculated the results by taking the square root of the feature vector (Power Kernel) before applying linear kernel.We found the HOG descriptor shows an average of 78.61% accuracy when combining it with SPM structure (we use the default parameter settings of the source code: http://www.mathworks.com/matlabcentral/fileexchange/28689-hog-descriptor-for-matlab, and as described in [9]).It is noteworthy to mention here that, without using SPM, the accuracy is only 61.22%.We have also compared our results with GIST [10], which is very successful for scene recognition and achieved very competitive results for design class classification.
Table 1 and Figure 9 show the comparison of six different methods and from here it is clear that the proposed tCENTRIST and cCENTRIST perform better than HOG, LGP, GIST and original CENTRIST in Fashion Dataset for most of the categories.Both of the proposed descriptors perform better and the average accuracies are at least 2.5% better than the original proposals.tCENTRIST perform 88.22% accurately for identifying "Print" category which is better than cCENTRIST whereas cCENTRIST performs better for identifying "Single color" category.In every case, the accuracy of "Stripe" category is much less than "Print" and "Single color" category.Again, cCENTRIST performs superior to the tCENTRIST, which is logical.Because the rotation invariance property of CLBP helps to capture differently oriented Print and Stripe design as a same type of Print and Stripe class.In contrast, tCENTRIST do not have any rotation invariant capability.For this reason it might mislead the classification.If we consider more granular division of class such as vertical, horizontal and diagonal orientation of stripe as three different classes of garments then tCENTRIST will perform better.
We have performed five fold cross validations for generating results as we mentioned before.For each round, we calculate the average accuracy of three classes which is shown on Figure 10, where R1, R2, …, R5 represents results of five different rounds.The most stable output is obtained for cCENTRIST, whereas the most fluctuated result is obtained for LGP, which also advocates the robustness of our proposed method.Analyzing Tables 2-6, it becomes evident that the proposed descriptors clearly outperform the current state-of-the-art methods, namely CENTRIST on Fashion Dataset.To validate the proposed methods further, we used the "Clothing Attribute Dataset" [55] described on Section 4.1 and 4.2.Table 7 shows the experimental results using different feature extraction methods along with our proposed methods in this dataset.It is noteworthy to point out here that we used HOG and LGP features along with SPM structure and used GIST (http://people.csail.mit.edu/torralba/code/spatialenvelope/) as is (without SPM) to meet the spirit of GIST.From Table 7, it is clear that performance of GIST is close to our proposed methods, which is also observed for the previous dataset.Performance of HOG was better in Fashion Dataset, whereas it showed less accuracy in Clothing Attribute Dataset.Again, although LGP used adaptive threshold and showed better performance in face detection, its performance is much lower compared to ours for garment dataset.Instead of handcrafted features like LBP and HOG, deep learning has recently shown inspiring results in many application areas.Even though deep learning requires a large amount of training data to build the model from scratch, it is possible to train with a small amount of data by fine-tuning a pre-trained model (similar to the target application, such as CaffeNet [56] for garment texture design class) on the target data, which is demonstrated in the existing literature [53].Following a straightforward process (https://gist.github.com/jimgoo/0179e52305ca768a601f;http://caffe.berkeleyvision.org/model_zoo.html),we fine tuned the CaffeNet for "Clothing Attribute Dataset".In this case, we used 60 images from different classes for training (10 images from each class), 60 images for validation purpose and the remaining 1455 images for testing the dataset.We applied 50,000 iterations with the default settings (e.g., learning rate and batch size) and changed the parameter (num_output: 6) in the layer fc8 and obtained 73.54% accuracy.However, it might be possible to improve the accuracy by changing the layers and other related issues which might be a good research area in this regard.

Conclusions
In this paper, we have proposed Completed CENsus Transform hISTogram (cCENTRIST) and Ternary CENsus Transform hISTogram (tCENTRIST) for identifying garment design class.Using two different datasets consisting of three and six classes, we have shown that our proposed tCENTRIST and cCENTRIST perform better than several state-of-the-art methods and cCENTRIST shows slightly better results than tCENTRIST.These descriptors can also be applicable to classify larger categories of design class.Furthermore, these two descriptors can also be used for other vision based classifications such as scene and object recognition.
Like original CENTRIST, our proposed cCENTRIST used gray scale images.We believe that incorporation of color information will increase the overall accuracy, which we will also address in future.

Figure 2 .
Figure 2.An example of Spatial Pyramid Representation.

Figure 4
Figure 4  shows an example for calculating CLBP_S and CLBP_M components.Figure4ashows the original 3 × 3 image.Figure4bshows the differences (dP) between each neighboring pixel to the central pixel.This difference is represented by a vector [d0,…,dP−1].dP is further decomposed into two components following Equation (1).

Figure 4 .
Figure 4  shows an example for calculating CLBP_S and CLBP_M components.Figure4ashows the original 3 × 3 image.Figure4bshows the differences (dP) between each neighboring pixel to the central pixel.This difference is represented by a vector [d0,…,dP−1].dP is further decomposed into two components following Equation (1).

Figure 6 .
Figure 6.Proposed method for garments texture classification.

2 .
scale imageI Output: Feature vectors of I 1.For each image I, calculate level 2 Spatial Pyramid (SP) For each block of SP a. calculate LTP b.Construct histogram of LTP End For 3. Concatenate all histograms and apply PCA to extract M feature vector from each block 4. Combine N blocks to construct M × N feature for each image.

Figure 7 .
Figure 7. Example dataset: First row is the example of Print category, Second row is for Single color and Third row contains Stripe category.

Figure 8 .
Figure 8. Example dataset: Column 1 to 6 represents example of Floral, Graphics, Plaid, Solid Color, Spotted and Striped garments respectively.
correct detection of class i as class i; = Total number of detection of class i as class i and j; = Number of detection of class j detected as class i and j.

Figure 9 .
Figure 9.Comparison of the existing methods with the proposed methods.
For classification, we have used Support Vector Machine (SVM).Algorithm 1 combines CLBP and CENTRIST to produce the proposed cCENTRIST.

Table 1 .
Experimental results of different methods for Fashion Dataset.

Table 5 .
Results of three different descriptors using Power Kernel followed by Linear Kernel of SVM on Fashion Dataset.

Table 6 .
Result of three different descriptors using only Linear Kernel on Fashion Dataset.

Table 7 .
Experimental results using a clothing attribute dataset.