# Garment Categorization Using Data Mining Techniques

## Abstract

## 1. Introduction

## 2. Research Background

## 3. Machine Learning Algorithms for Garment Classification

#### 3.1. Naïve Bayes (NB) Classification

^{th}feature. Based on the features, the object can be classified into a class ${c}_{i}$ in $C=\left({c}_{1},{c}_{2},\dots ,{c}_{w}\right)$. Therefore, according to Bayes theorem [51],

#### 3.2. Decision Trees (DT)

#### 3.3. Random Forest (RF)

#### 3.4. Bayesian Forest (BF)

## 4. Research Methodology

#### 4.1. Tools and Dataset

- List of garment sub-categories tagged in the images along with the corresponding garment categories.
- List of 289,222 image names with the corresponding garment category (upper, lower, whole).
- List of garment attributes containing the attribute name (A-line, long-sleeve, zipper, etc.) and the corresponding attribute type.
- List of 289,222 image names with 1000 columns for each garment attributes providing the presence or absence of the attribute in that image by (−1, 0, 1).

#### 4.2. Data Preprocessing

#### 4.2.1. Data Extraction, Cleaning, and Integration

#### 4.2.2. Feature Selection and Data Reduction

- $p\left(t\right)$ is the proportion $\raisebox{1ex}{${N}_{t}$}\!\left/ \!\raisebox{-1ex}{$N$}\right.$ of samples reaching $t$.
- $\u25b3g$ is the impurity function, i.e., Gini importance or mean decrease Gini.
- $v\left({s}_{t}\right)$ is the feature used in the split ${s}_{t}$.

#### 4.3. Model Building

#### 4.3.1. Development of Subsystems

#### Model Development

- The dataset was randomly split into k (=10) equal size partitions.
- From the k partitions, one was reserved as the test dataset for the final evaluation of the model, while the other k-1 partitions were used to model training.
- The process was repeated for each model and machine learning technique k times with each of the k-partitions used exactly once as the test data.
- The k results acquired from each of the test partitions were combined by averaging them, to produce a single estimation.

#### Evaluation

#### 4.3.2. Integration of Subsystems

#### Model Development

#### Evaluation

## 5. Experimentation and Results

#### 5.1. Analysis of Subsystems

#### 5.2. Analysis of the Integrated System

## 6. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

Dataset | Initial | Final | ||
---|---|---|---|---|

No. of Data Points | No. of Attributes | No. of Data Points | No. of Attributes | |

A | 289222 | 1000 | 276253 | 1000 |

U | 137770 | 1000 | 131620 | 430 |

L | 56037 | 1000 | 55915 | 467 |

W | 82446 | 1000 | 82202 | 453 |

S. No. | Data Mining Algorithm | Model Parameters |
---|---|---|

1 | Naïve Bayes | The prior probability distribution is represented by Bernoulli’s Naïve Bayes. |

2 | Decision Trees | Minimum number of samples required to be at a leaf node = 3, Seed value = 1000. |

3 | Random Forest | Minimum number of samples required to be at a leaf node = 3, Number of trees in the forest = 200, Seed value = 1000. |

4 | Bayesian Forest | Minimum number of samples required to be at a leaf node = 3, Number of trees in the forest = 200, Bootstrap = True, Seed value = 1000. |

Accuracy | Precision | Recall | F-Score | |
---|---|---|---|---|

Naïve Bayes | ||||

Dataset A | 0.7513 | 0.7530 | 0.7513 | 0.7502 |

Dataset U | 0.5539 | 0.5444 | 0.5539 | 0.5417 |

Dataset L | 0.6734 | 0.6684 | 0.6734 | 0.6682 |

Dataset W | 0.8242 | 0.7888 | 0.8242 | 0.7975 |

Decision Trees | ||||

Dataset A | 0.7957 | 0.7947 | 0.7957 | 0.7940 |

Dataset U | 0.6130 | 0.6085 | 0.6130 | 0.6064 |

Dataset L | 0.7429 | 0.7384 | 0.7429 | 0.7341 |

Dataset W | 0.8577 | 0.8389 | 0.8577 | 0.8388 |

Random Forest | ||||

Dataset A | 0.8658 | 0.8656 | 0.8658 | 0.8652 |

Dataset U | 0.7331 | 0.7323 | 0.7331 | 0.7305 |

Dataset L | 0.8232 | 0.8223 | 0.8232 | 0.8206 |

Dataset W | 0.9024 | 0.8966 | 0.9024 | 0.8975 |

Bayesian Forest | ||||

Dataset A | 0.7946 | 0.7947 | 0.7946 | 0.7920 |

Dataset U | 0.6113 | 0.6090 | 0.6113 | 0.5963 |

Dataset L | 0.7386 | 0.7396 | 0.7386 | 0.7173 |

Dataset W | 0.8488 | 0.8395 | 0.8488 | 0.8089 |

