Multi-Task Learning and Gender-Aware Fashion Recommendation System Using Deep Learning

Naham, Al-Zuhairi; Wang, Jiayang; Raeed, Al-Sabri

doi:10.3390/electronics12163396

Open AccessArticle

Multi-Task Learning and Gender-Aware Fashion Recommendation System Using Deep Learning

by

Al-Zuhairi Naham

^*

,

Jiayang Wang

and

Al-Sabri Raeed

School of Computer Science and Engineering, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(16), 3396; https://doi.org/10.3390/electronics12163396

Submission received: 20 May 2023 / Revised: 25 July 2023 / Accepted: 25 July 2023 / Published: 10 August 2023

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Many people wonder, when they look at fashion models on social media or on television, whether they could look like them by wearing similar products. Furthermore, many people suffer when they sometimes find fashion models in e-commerce, and they want to obtain similar products, but after clicking on the fashion model, they receive unwanted products or products for the opposite gender. To address these issues, in our work, we built a multi-task learning and gender-aware fashion recommendation system (MLGFRS). The proposed MLGFRS can increase the revenue of the e-commerce fashion market. Moreover, we realized that people are accustomed to clicking on that part of the fashion model, which includes the product they want to obtain. Therefore, we classified the query image into many cropped products to detect the user’s click. What makes this paper novel is that we contributed to improving the efficiency performance by detecting the gender from the query image to reduce the retrieving time. Second, we effectively improved the quality of results by retrieving similarities for each object in the query image to recommend the most relevant products. The MLGFRS consists of four components: gender detection, object detection, similarity generation, and recommendation results. The MLGFRS achieves better performance compared to the state-of-the-art baselines.

Keywords:

fashion recommendation system; gender detection; object detection; EfficientNetB7; YOLOV8; similarity learning; content-based recommendation system

1. Introduction

Recommender systems are one of the great improvements in Internet technology and e-commerce, and the origins of modern recommender systems date back to the early 1990s when they were mainly applied experimentally to personal email and information filtering [1]. Later, recommender systems went through numerous improvements to facilitate users’ navigation through fashion, videos, books, papers, and especially e-commerce. Recommender systems have grown to be an essential part of all large Internet retailers, driving up to 35% of Amazon sales [2] and over 80% of the content watched on Netflix [3]. E-commerce retailers started implementing fashion recommendation systems in the early 2000s [4]. Recently, e-commerce has been growing, especially from 2013 until today, as the recommendation systems facilitate the navigation for users to find fashion products relevant to their interests, and the recommendation systems have grown to be a hot topic and essential tool for the huge number of Internet retailers. In 2013, the total turnover for e-commerce in Europe expanded by 17% in comparison to the 12 months before, and huge organizations could have hundreds of products, or even more, from which users can select on their websites [5]. Customers and the business enterprise want the client to discover relevant fashion items when they are searching, and this is where recommender systems come into the picture [6]. In 2024, the size of the domestic fashion market will have grown by 8.8% compared to its size in 2020 [7,8,9]. Being one of the hottest study topics currently in progress at both the national and international levels, recommender systems are the ideal solution for suitable fashion discovery in e-commerce, and the recommender systems possess several strong points; nevertheless, recommender systems also possess some weaknesses in the accuracy of their results. However, content-based information retrieval (CBIR) has become one of the hotspot topics; it retrieves items based entirely on the image content, which decreases some of the weaknesses, especially in fashion, by computing similarities among items and users through a set of features associated with them [10]. For a fashion item, the proposed features can be a group such as short-sleeve tops, long-sleeve tops, short-sleeve outerwear, long-sleeve outerwear, vests, slings, shorts, trousers, skirts, short-sleeve dresses, long-sleeve dresses, vest dresses, and sling dresses, considering gender. Many of the current recommendation systems have difficulties with full body images, and many customers cannot retrieve what they want because the images showing the full body may contain more than one item, such as a t-shirt with pants, a blouse with shorts, and a shirt with a skirt. Additionally, there are many clients who try to inquire about similar items, but they sometimes retrieve similar results for the opposite gender.

For these reasons, we need to solve different challenges, such as object detection, gender detection, and semantic similarity embeddings, which will help in identifying similar items from the dataset. Based on previous studies, the various research algorithms, and unique experiments, multi-task recommender systems are still the ideal solution to solve future net issues, and multi-task recommender systems are considered an important object of study at all levels, both local and international. For more detail, many of the recommendation systems are coming up with a new model or recommendation algorithm. There are several existing types of research on recommendation systems, and they have given us some inspiration for how recommendation systems work; however, there are some drawbacks, which encouraged us to contribute to the improvement of multi-task recommendation systems. Therefore, we aimed to design a new gender-aware multi-task fashion recommendation method to enhance personal choice by uploading a query image and retrieving the best recommendations based on the provided content. We also aimed to contribute to the improvement of recommendation systems by detecting the gender and the objects from the query image to compare and retrieve the top similarities. Our contributions are as follows:

We propose a multi-task learning and gender-aware framework to improve the performance of the fashion recommendations systems to recommend the most relevant products and to reduce the time of retrieving relevant products.
We propose a new similarity method that detects all the available objects in the query image, and based on gender-aware information, we retrieve the most relevant products for each detected object from the right database.
Compared to the existing recommender systems, the MLGFRS enhances the robustness and the performance of the recommendations, which may suit clients with a more pretentious style.

Section 2 looks at related work from the specialized literature. Section 3 analyses the architecture of the proposed system. Section 4 presents and evaluates the experimental results, and Section 5 offers the conclusion.

2. Related Works

In this section, we need to discuss and refer to some of the related works, in brief, to figure out how they support our approach in the following sections: (i) Fashion Recommender Systems, (ii) Gender-aware detection, (iii) object detection, and (iv) Similarities Learning.

2.1. Fashion Recommender Systems

P. Bellini worked on fashion retail based on a multi-clustering approach of items and users’ profiles in online and physical stores, which relies on mining techniques, allowing one to predict the purchase behavior of newly acquired customers [10]. S. Chakraborty reviewed the state-of-the-art fashion recommendation systems and the corresponding filtering techniques and explored various potential models that could be implemented to develop fashion recommendations [4]. T.H. Nobile focused on the categories of Design and Production and Culture and Society, which collectively gathered indicatively 48% of the selected literature [11]. M. Khalid designed and implemented a two-stage deep learning-based model that recommends a clothing fashion style to extract various attributes from images with clothes to learn the user’s clothing style and preferences, and he used a convolutional neural network (CNN) as a visual extractor of image objects [5]. L. Liao developed an EI tree, which organizes the fashion concepts into multiple semantic levels and augments the tree structure with exclusive as well as independent constraints [12]. X. Yang proposed a new solution named the Attribute-based Interpretable Compatibility (AIC) method, which injected interpretability into the compatibility modeling of items. Specifically, given a corpus of matched pairs of items. They also learn the interpretable patterns that lead to good matches, e.g., a white T-shirt matches with black trousers. In future, we can add this method as a new feature to MLGFRS to suggest a new result for users [13]. Z. Cheng proposed a general feature-level attention method for ICF models. The key to the method is to distinguish the importance of different factors when computing the item similarity for a prediction. They designed a light attention neural network to integrate both item-level and feature-level attention for neural ICF models. However, they focus on collaborative filtering to recommend items based on the users’ preferences, and we aim to develop a framework for content based on the query image [14]. F. Liu proposed a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted items. They designed an unsupervised subgraph generation module, which can effectively identify users with common interests by exploiting both user features and graph structures. MLGFRS focuses on the content of the query image. MLGFRS recommends similar products for each object in the query image [15]. Taobao and Shein are examples of e-commerce that allow users to browse and purchase a wide range of products, from clothing to accessories, making online shopping convenient and accessible for customers worldwide. In 2006, Taobao sellers tended to maintain stabilized transaction volumes while the market growth slowed. This reflects the effect of the competition in China’s online C2C market on Taobao’s performance [16]. Shein, a Chinese online fast-fashion retailer, is known for its affordably priced apparel [17]. However, Taobao and Shein detect only a single object from the query image, and in many instances, they detect unwanted objects. S. Sharma presented an introduction to geographical location-based recommender systems—a detailed review of various recommender systems based on geographical regions. This can identify the parameters to categorize the users and a proposed model for location-based recommender systems. They focus on geodata to recommend products. In MLGFRS, we aim to make it work globally to focus on the image content without knowing the geographical location [18].

2.2. Gender-Aware Detection

It is simple for humanity to recognize female fashion compared to male fashion. Many works have been conducted to recognize gender automatically, and machine learning (ML) facilitated this idea. K. Khan [19] proposed a framework that first segments a face image into face parts and then performs automatic gender classification, and he segmented a face image into six different classes—mouth, hair, eyes, nose, skin, and back. Boosted by face detection, in the past few decades, the work has witnessed the thriving of gender detection based on face specification [20,21]. Segmenting the face into parts facilitated the work of gender recognition, but it is weak for images with full bodies wearing different clothes. H. Chen [22] attempted to build a list of image attributes to describe clothes in images taken in an unconstrained environment, including gender as an attribute, and he only sampled patches from the top of the body. However, we figured out that clothing of the lower body provides distinctive cues for gender recognition as well. S. Cai [23] attempted to recognize gender via clothing info in the case of insufficient face specification. J. Rasheed [24] attempted to use different techniques to detect gender from images of human beings wearing masks, where he found that EfficientNetB performed well by securing a 97.27% accuracy rate. From previous studies, we figured out that EfficientNet is the most used technique for gender detection. For our work, we implemented the most recent version of EfficientNetB7, and we obtained better results by securing a 98.5% accuracy rate.

2.3. Object Detection

Detecting objects is one of the most important aspects of computer vision, and detecting objects in query images is important for generating embeddings and recommending similar products. There are various methods for object detection like Darknet architectures, which are based on single detectors like YOLO [25], which is one of most recently used and most accurate tools in real-time object detection, while our need it to use it in query images it still counted as the ideal solution. Another example is Mask RCNN [26], which requires a luxury to run the method, and while we do not have the luxury to run this in real-time based on the web application, we will go with YOLO.

2.4. Similarity Learning

Similarity learning is a hot topic in deep learning, which helps us in grouping items to find the embedding vectors that are closest to obtaining and recommending similar items for detected objects from query images. F. Schroff presented a system called FaceNet, which directly learns a mapping from face images in a compact Euclidean space where distances directly correspond to a measure of face similarity [27]; for MLGFRS, we will use the embedding learning to group the products based on the similarity to recommend the most relevant products.

3. Proposed System

Our purpose is to develop a multi-task learning and gender-aware fashion recommendation system that involves various modules which carry specific tasks to help in achieving an overall goal. We aim to build a system to recommend proper fashion products for users based on their gender. First, we will detect the gender of the query image to determine which database from which to obtain similar products to facilitate the work and improve the performance. Second, we will detect the object by detecting all the fashion products from the query image, and for this purpose, we focus on these thirteen categories: short sleeve top, long sleeve top, short sleeve outwears, long sleeve outwears, vest, sling, shorts, trousers, skirt, short sleeve dress, long sleeve dress, vest dress, and sling dress. During object detection, we localize the objects with bounding boxes. When we have detected the gender and the objects are obtained, we can crop out each detected object and identify similar products from the right dataset depending on the gender of the query image and the semantic embeddings. This fashion recommender system helps users to efficiently retrieve new fashion products similar to those in the query image. Figure 1 shows the pipeline and framework of MLGFRS, and Algorithm 1 shows the proposed system algorithm.

Algorithm 1. Proposed System Algorithm

Step 1: Qimg ← Pick Query image
Step 2: Objects ← Detect Objects (Qimg)
Step 3: Gender ← Detect Gender (Qimg)
Step 4: IF Gender is male THEN
DataSet ← MenDS
ELSE:
Dataset ← WomenDS
Step 5: FOR each object in Objects DO
FOR each item in object class DO
Dissimilarity ← Euclidean Distance (object, item)
Extract top five items with low dissimilarity

3.1. Database and Classification

In our work, we used the DeepFashion2 dataset [28]. Y. Ge built a rich dataset for non-commercial purposes, which is a versatile benchmark of four tasks including clothes detection, pose estimation, segmentation, and retrieval. The dataset contains 801 K clothing items with rich annotations with a bounding box (x1,y1,x2,y2) for each object in each image, and the objects were categorized into thirteen categories (short sleeve top, long sleeve top, short sleeve outwear, long sleeve outwear, vest, sling, shorts, trousers, skirt, short sleeve dress, long sleeve dress, vest dress, sling dress) [28]. For the gender detection, we used the man/woman classification dataset [29] from Kaggle, which is a well-collected and cleaned dataset containing 3354 full-body pictures (jpg) of men (1414 files) and women (1940 files). Figure 2 shows the value of the images in all the categories in the DeepFashion2 dataset [28].

3.2. Gender Detection

To improve the performance of MLGFRS and to recommend the most relevant items, we need first to detect the gender of the query image to determine which certain dataset we shall use to obtain similar embeddings. Furthermore, we trained a model of gender detection making use of the Man/Woman classification dataset from Kaggle, which was mentioned in the related works section. This dataset contains full-body cleaned images of men and women. For the classification, there are many models, such as ResNET50, which is initialized with ‘Imagenet’ weights with two classes [30]. Another example is the EfficientNet [31] which achieves top-five accuracy on ImageNet and is counted as the best existing ConvNet. Figure 3 shows the data flow of gender detection.

3.3. Object Detection

Object detection is a must in MLGFRS to generate embedding similarities to detect all the items in the query image. To recommend similarities from the dataset, we need to detect and crop out all the products from the query image. Previously, we needed to understand the meaning of mean average precision (maP) and to check whether the detection is correct or not based on the IoU (intersection over union) while the actual value of the bounding box is defined by the IoU. If the IoU is greater than the threshold, the detection is accurate, and vice versa. Table 1 shows the formulas and the functions of the object detection model.

As there are so many techniques for object detection, and while we need MLGFRS to be as robust as possible, we compared the available techniques, and we found that YOLOv7 methods achieve state-of-the-art performance [32]. Recently, Jocher developed a new version of the model called YOLOv8, which is a cutting-edge, state-of-the-art (SOTA) model [33]. Algorithm 2 shows the algorithm that was used to implement the YOLOv8 in the DeepFashion2 dataset.

Algorithm 2. Object Detection Training Pseudocode

Step 1: Dataset ← Load_Dataset()
Step 2: FOR each item in Dataset DO
Darknet [] ← Create_coco_file()
Step 3: model ← YOLO()
Step 4: FOR five epochs DO
Best_model ← model.train()
Step 5: evaluate_model(Best_model)
Step 6: test_model(Best_model)

3.4. Similarities Learning

As long as we detect the gender from the query image, as well as detecting the objects from the query image in the previous steps, we can move to the next most important step to find similarities in the objects of the query image to recommend the most relevant products. We need to compare each detected object with all the products in the same class. To calculate similarities between fashion images, we need to convert images into n-dimension vectors. There are several ways to obtain similarities for the query image. We compared the most used models to find similarities, such as the Siamese Neural Network (SNN), which was invented in the early 1990s by Bromely and LeCun to solve the problem of signature verification [34,35], contrastive learning, metric learning, SimCLR, and EfficientNet. Siamese networks have recently gained significant attention due to their effectiveness in similarity measurement tasks. Wu et al. (2018) utilized a Siamese architecture with an angular loss to improve the discriminative power of learned embeddings [36]. Sun et al. (2018) employed Siamese networks with refined part pooling for person retrieval, which enhanced the representation of the body parts and improved performance [37]. Zhou et al. (2019) proposed omni-scale feature learning in Siamese networks for person reidentification, achieving superior performance via the capturing of features at multiple scales [38]. Chen et al. (2020) focused on face verification and introduced the additive margin SoftMax loss within Siamese networks, which enhanced the discriminative power of the learned embeddings [39]. In MLGFRS, we need to find the similarities between the query image and relevant images in the dataset; therefore, the Siamese Neural Network was the best choice as the SNN takes three inputs, anchor, positive, and negative, where the anchor and the positive should be similar, and the negative should be dissimilar. After that, the model displays the dissimilarity between the images. The lower the dissimilarity, the better the recommendation results, and we can calculate the dissimilarity using the contrastive loss function as the following.

C o n t r a s t i v e L o s s = (1 - Y) \frac{1}{2} (D w)^2 + (Y) \frac{1}{2} {\max (0, m - D w)}^2

Figure 4 shows some samples of SNN comparison, and when the images are similar, the result is 0, and when they are not similar, the result is 1. Algorithm 3 shows the pseudocode of the training Siamese Network.

Algorithm 3. Train Siamese Network Pseudocode

Step 1: Dataset ← Load Dataset
Step 2: Resize images build Transform Tensor
Step 3: Create Siamese Network
Setting Up CNN Layers
Setting Fully Connected Layers
Determine Similarities
Obtaining Vectors
Step 4: Train Siamese Network for 2000 epochs.
Step 5: evaluate Siamese Network
Step 6: test Siamese Network

4. Experiments Results and Evaluation

4.1. Dataset Classification

In our work, we used two datasets of DeepFashion2 to train the models of object detection and similarity learnings to obtain image similarities and retrieve the most relevant products to the query image. For gender detection, we used the Women/Men classification dataset, which contains precleaned images with full bodies for both genders. We cleaned and preprocessed the data to make it ideal for MLGFRS. We first restructured the image annotations to facilitate the loading of data. Second, we added the image size for each item in the DeepFashion2 dataset. Finally, we created a darknet dataset by creating a coco file for each product in the DeepFashion2 dataset. Additionally, for each model, we prepared the required data step by step.

4.2. Gender Detection

To enhance the performance of the recommender systems, we detected the gender from the query image to retrieve the most relevant products from the right dataset. In MLGFRS, we used the EfficientNetB7 model to achieve the best performance. We used Google Colab to train the model and the TensorFlow library. We first resized the images to 380 to fit the EffiecintNetB7 model. Second, we set the number of samples in both classes of women and men to 1400 in order to balance and improve the performance. Third, we split the data into train length 2520, validate length 140, and test length 140. Fourth, we created the model of EfficientNetB7 with the ImageNet weight. Fifth, we instantiated the custom callback and trained the model for 25 epochs to obtain good results. Additionally, we implemented an adaptive method to monitor and track the results across the epochs. In the first epoch, we received results of 0.269 loss, 99.12 accuracy, 0.757 validate loss, and 79.286 validate accuracy. The training was maintained and reached epoch number 12, and then it stopped; as a result, it was adjusted for three epochs. The results of the last epoch were 0.253 loss, 99.443 accuracy, 0.7469 validate loss, and 80.714 validate accuracy. Figure 5 and Figure 6 show the confusion matrix and the loss and accuracy of the gender detection model.

4.3. Object Detection

To implement the object detection methods based on YOLOv8, we need to create a darknet dataset. Therefore, we contributed to creating a coco file for each image in the DeepFashion2 dataset to create the darknet dataset. Each file contains a line for each object in the image with five columns. “The first column determines the class that object belongs to, and the rest four columns determines the four corners of the bounding box of the object in the image”. We also contributed to reordering the structure of the DeepFashoin2 dataset annotations, and we added the image size for all the images to make it easier for future use. As the dataset is very large, we trained the model for five epochs and evaluated the model with 32,153 images. After that, we tested the model with different images which were not included either in the training or in the evaluation with 62,629 images. Below, we can summarize the confusion matrix, the results with samples of the training, and the testing results. The following Figure 7, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12, show the results, confusion matrix, Precision recall curve, and confidence curves of the object detection.

Figure 13 shows samples of testing the model with new data, which was not included in the training data or the evaluation images. Figure 14 shows an example of testing object detection.

4.4. Similarities Learning

To find the images most similar to the query image, we conducted three experiments on training the SNN network with different numbers of epochs to obtain the best results and accuracy. In the following, we can see the differences between the accuracy according to the number of epochs and the different ways we trained the model on the same dataset to optimize the model. We first trained the model on the dataset with full body images for 30 epochs, and the results were not good, as it is shown in Figure 15. After that, we classified the images for categories and retrained the model for 500 epochs, and the results were better than the previous experiment, and Figure 16 shows the result. Finally, we combined the two models, object detection and SNN. We cropped out all the objects from all the images in the dataset and classified them into their right categories. After that, we trained the model for 2000 epochs, and the results of the experiment were better than the previous experiments. In Figure 17, we can find similar products clustered together.

Figure 18, Figure 19, Figure 20 and Figure 21 show the output of Siamese Network in measuring the dissimilarity between images.

5. Results

Eventually, we can combine all the work of all the previous models to recommend the most relevant products for the query image. The output is to crop out all the objects of the query image and to make a row of the most relevant products for each detected object to display the recommendations for the query image. The following proves the optimization of the performance of MLGFRS by detecting the gender of the query image and then comparing all the dataset items to find similarities. Without splitting the database into two categories (men and women), the big O is N as we have to go through all the items to compare them with the objects in the query image. However, after splitting the database, the performance will be N/2. Algorithm 4 shows the time complexity of the algorithm of MLGFRS with gender detections.

Algorithm 4. Proof of the algorithm time complexity

Input: Qimg // Query Image
Result ← Qimg × DS_items → N
DS ← Men + Women.
Gender ←½ DS.
// while we have the gender detected. The result will be as following
Result ← Qimg × DS/2
Result ← N/2.

Table 2 shows three examples of the recommendations for three different queries. Comparing the results of the gender detection with EfficientNetB7 to the gender detection with the ResNet50 model, we can find that the model we trained shows better accuracy and scored 99.443%, better than 95% with ResNet50. Table 3 displays the gender detection accuracy comparison results.

MLGFRS is compared to previous state-of-the-art models like AlexNet [40] and VGG16 [41], and the system was designed by M. Khaled [5]. MLGFRS showed much better results and reduced the retrieval time of the most relevant products via N/2. Without splitting the data, we retrieved the results in 0.826 s. However, by splitting the data, we retrieved the results in 0.456 s. Table 4 shows the comparison of MLGFRS to other systems.

6. Conclusions

Our paper presents a development for a multi-task and gender-aware fashion recommendation system (MLGFRS). We accomplish this by implementing different models (gender detection, object detection, similarity learning) and then combining all the models to build MLGFRS, which receives a query image and returns a result of rows of all the detected objects with the recommended products. We used the CNN model to combine all the models and transfer learning to accomplish the task with different robust libraries. We extensively cleaned the data, cropped all the objects, clustered them according to their categories, created the darknet coco files for each object, and preprocessed the data. We contributed to optimizing the DeepFashion2 dataset for the future learning of researchers. In addition, we added the gender detection model to our work to improve the performance of the recommendations by reducing the retrieval time of the relevant products. Moreover, we enhanced the accuracy of the recommendations by detecting all the available objects in the query image and retrieving relevant products for each detected object from the right database based on the detected gender. We also detected each object and its class to retrieve the most relevant recommendations from the same class to save time. MLGFRS shows better results than the existing studies in accuracy and performance.

For the future, we aim to keep the system continuously training from the query images to enhance the accuracy and to keep the system up to date. We also aim to train the model of detection to detect more objects by adding sporting clothes, glasses, shoes, accessories, and other classes. Eventually, we recommend adding a real-time click-detecting model to detect the click place of the user as a web system and to detect the exactly wanted object and retrieve the most relevant results.

Author Contributions

Conceptualization, A.-Z.N. and J.W.; methodology, A.-Z.N. and J.W.; software, A.-Z.N.; validation, A.-Z.N., J.W. and A.-S.R.; formal analysis, A.-Z.N. and J.W.; investigation, A.-Z.N.; resources, A.-Z.N. and J.W.; data curation, A.-Z.N. and J.W.; writing—original draft preparation, A.-Z.N.; writing—review and editing, A.-Z.N., J.W. and A.-S.R.; visualization, A.-Z.N.; supervision, J.W.; project administration, A.-Z.N. and J.W.; funding acquisition, A.-Z.N. The authors approved the manuscript and agreed with the submission to the electronics journal. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Our data can be browsed in Google Drive; the following link is to the repository of the DeepFashion2 coco files that we have shared and the new labels https://drive.google.com/drive/folders/1_oWvJHqWZFwe3QUKca3bj26NQK5oSpMY (accessed on 16 September 2022), while the images can be downloaded from Google Drive and the description can be found in GitHub https://github.com/switchablenorms/DeepFashion2 (accessed on 8 September 2022). For any further questions, correspondence is available through the author’s email.

Acknowledgments

I am grateful to my supervisor for the great support and his valuable feedback and for being inspiring and motivating throughout the project. I am also grateful to CSC and CSU for the scholarship that they granted me to let me pursue my Master’s degree. I am also grateful to my wife and all my family for being patient during my study and for supporting me. I cannot forget the great support of Khaled AlHussaini and Soaad Othman.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jannach, D.; Pu, P.; Ricci, F.; Zanker, M. Recommender systems: Past, present, future. AI Mag. 2021, 42, 3–6. [Google Scholar]
MacKenzie, I.; Meyer, C.; Noble, S. How Retailers Can Keep up with Consumers; McKinsey & Company: Brussels, Belgium, 2013; p. 18. [Google Scholar]
Chhabra, S. Netflix Says 80 Percent of Watched Content Is Based on Algorithmic Recommendations; Mobile Syrup: Toronto, ON, USA, 2017. [Google Scholar]
Chakraborty, S.; Hoque, M.S.; Rahman Jeem, N.; Biswas, M.C.; Bardhan, D.; Lobaton, E. Fashion recommendation systems, models and methods: A review. Informatics 2021, 8, 49. [Google Scholar] [CrossRef]
Khalid, M.; Keming, M.; Hussain, T. Design and implementation of clothing fashion style recommendation system using deep learning. Rom. J. Inform. Technol. Autom. Control 2021, 31, 123–136. [Google Scholar] [CrossRef]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016, arXiv:arXiv:160304467. [Google Scholar]
Korea Federation of Textile Industries. Korea Fashion Market Trend 2019 Report; Korea Federation of Textile Industries: Seoul, Republic of Korea, 2019. [Google Scholar]
Korea Fashion Association. Global Fashion Industry Survey; Korea Fashion Association: Seoul, Republic of Korea, 2019. [Google Scholar]
Jo, J.; Lee, S.; Lee, C.; Lee, D.; Lim, H. Development of fashion product retrieval and recommendations model based on deep learning. Electronics 2020, 9, 508. [Google Scholar] [CrossRef] [Green Version]
Bellini, P.; Palesi, L.A.I.; Nesi, P.; Pantaleo, G. Multi clustering recommendation system for fashion retail. Multimed. Tools Appl. 2023, 82, 9989–10016. [Google Scholar] [CrossRef] [PubMed]
Nobile, T.H.; Noris, A.; Kalbaska, N.; Cantoni, L. A review of digital fashion research: Before and beyond communication and marketing. Int. J. Fash. Des. Technol. Educ. 2021, 14, 293–301. [Google Scholar] [CrossRef]
Liao, L.; He, X.; Zhao, B.; Ngo, C.W.; Chua, T.S. Interpretable multimodal retrieval for fashion products. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 1571–1579. [Google Scholar]
Yang, X.; He, X.; Wang, X.; Ma, Y.; Feng, F.; Wang, M.; Chua, T.S. Interpretable fashion matching with rich attributes. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 775–784. [Google Scholar]
Cheng, Z.; Liu, F.; Mei, S.; Guo, Y.; Zhu, L.; Nie, L. Feature-level attentive ICF for recommendation. ACM Trans. Inf. Syst. (TOIS) 2022, 40, 1–24. [Google Scholar] [CrossRef]
Liu, F.; Cheng, Z.; Zhu, L.; Gao, Z.; Nie, L. Interest-aware message-passing gcn for recommendation. In Proceedings of the Web Conference, Ljubljana, Slovenia, 19–23 April 2021; pp. 1296–1305. [Google Scholar]
Li, D.; Li, J.; Lin, Z. Online consumer-to-consumer market in China—A comparative study of Taobao and eBay. Electron. Commer. Res. Appl. 2008, 7, 55–67. [Google Scholar] [CrossRef]
Shen, J. Analyzing on the Going Global Marketing Strategy–Taking Shein as an Example. In Proceedings of the 2022 International Conference on Creative Industry and Knowledge Economy (CIKE 2022), Hangzhou, China, 31 March–2 April 2023; Atlantis Press: Paris, France, 2023; pp. 225–229. [Google Scholar]
Sharma, S.; Shakya, H.K.; Marriboyina, V. A location based novel recommender framework of user interest through data categorization. Mater. Today Proc. 2021, 47, 7155–7161. [Google Scholar] [CrossRef]
Khan, K.; Attique, M.; Syed, I.; Gul, A. Automatic gender classification through face segmentation. Symmetry 2019, 11, 770. [Google Scholar] [CrossRef] [Green Version]
Makinen, E.; Raisamo, R. Evaluation of gender classification methods with automatically detected and aligned faces. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 541–547. [Google Scholar] [CrossRef] [PubMed]
Face++ Research Toolkit. Available online: http://www.faceplusplus.com (accessed on 1 February 2023).
Chen, H.; Gallagher, A.; Girod, B. Describing clothing by semantic attributes. In Proceedings of the Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Part III 12. Springer: Berlin/Heidelberg, Germany, 2012; pp. 609–623. [Google Scholar]
Cai, S.; Wang, J.; Quan, L. How fashion talks: Clothing-region-based gender recognition. In Proceedings of the Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 19th Iberoamerican Congress, CIARP 2014, Puerto Vallarta, Mexico, 2–5 November 2014; Proceedings 19. Springer: Berlin/Heidelberg, Germany, 2014; pp. 515–523. [Google Scholar]
Rasheed, J.; Waziry, S.; Alsubai, S.; Abu-Mahfouz, A.M. An Intelligent Gender Classification System in the Era of Pandemic Chaos with Veiled Faces. Processes 2022, 10, 1427. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:arXiv:180402767. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 815–823. [Google Scholar]
Ge, Y.; Zhang, R.; Wang, X.; Tang, X.; Luo, P. Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5337–5345. [Google Scholar]
Navneet Sajwan Stpete_Ishii, Yuleqing. Men/Women Classification Dataset. 2018. Available online: https://www.kaggle.com/datasets/playlist/men-women-classification (accessed on 5 March 2022).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. Int. Conf. Mach. Learn. 2019, PMLR, 6105–6114. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Jocher, G., CA, & QJ. YOLO by Ultralytics (Version 8.0.0). 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 August 2022).
Bromley, J.; Guyon, I.; LeCun, Y.; Säckinger, E.; Shah, R. Signature verification using a “siamese” time delay neural network. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 30 November–3 December 1993; Volume 6. [Google Scholar]
Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 539–546. [Google Scholar]
Wang, J.; Zhou, F.; Wen, S.; Liu, X.; Lin, Y. Deep metric learning with angular loss. In Proceedings of the IEEE international Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2593–2601. [Google Scholar]
Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 480–496. [Google Scholar]
Zhou, K.; Yang, Y.; Cavallaro, A.; Xiang, T. Omni-scale feature learning for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3702–3712. [Google Scholar]
Wang, F.; Cheng, J.; Liu, W.; Liu, H. Additive margin softmax for face verification. IEEE Signal Process. Lett. 2018, 25, 926–930. [Google Scholar] [CrossRef] [Green Version]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:arXiv:14091556. [Google Scholar]

Figure 1. Pipeline framework model. Where [S1, S2, S3, S5,…, Sn] indicates the top items of the similarities to the detected object.

Figure 2. DeepFashion2 category statistics.

Figure 3. Gender detection model.

Figure 4. Samples of comparison of similarities. Where 0 indicates that the samples in the column are similar, while 1 indicates that the two samples in the column are dissimilar.

Figure 5. Gender confusion matrix.

Figure 6. Training and validation loss and accuracy.

Figure 7. Precision confidence curve.

Figure 8. Results of object detection.

Figure 9. F1 confidence curve.

Figure 10. Confusion matrix of object detection.

Figure 11. Precision recall curve.

Figure 12. Recall confidence curve.

Figure 13. Examples of training and evaluating the detection model.

Figure 14. Examples of testing the detection model.

Figure 15. Shows the contrastive loss of training Siamese Network in the image with full body for 30 epochs.

Figure 16. Shows the contrastive loss of training Siamese Network for 500 epochs after classifying the images for their categories.

Figure 17. Shows the contrastive loss of training Siamese Network for 2000 epochs after cropping out all the objects from each image and categorized them into their right categories.

Figure 18. Dissimilarity result example.

Figure 19. Dissimilarity result example.

Figure 20. Dissimilarity result example.

Figure 21. Dissimilarity result example.

Table 1. Object detection formulas.

Fun	Formula
mAP	$m A P = 1 / N \sum_{i = 1}^{n} A P i$
Precision	$P r e c i s i o n = \frac{T P}{T P + F P}$
Recall	$R e c a l l = \frac{T P}{T P + F N}$
IoU	$I o U = \frac{A r e a o f I n t e r s e c t i o n}{A r e a o f U n i o n}$

Table 2. Examples of three query images and the top recommendations.

Query Image	CLASS	S1	S2	S3	S4	S5
	vest dress
	Short Sleeve Top
	Shorts
	Short Sleeve
	Trousers

Table 3. Gender detection accuracy comparison.

Model	Accuracy
ResNet50	95%
EfficientNetB7	99.443%

Table 4. Comparison of MLGFRS to others.

Model	Accuracy	Loss
AlexNet	79%	51%
VGG16	83	48%
Fashion Rec. System [5]	86%	51%
MLGFRS	95.81%	40%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Naham, A.-Z.; Wang, J.; Raeed, A.-S. Multi-Task Learning and Gender-Aware Fashion Recommendation System Using Deep Learning. Electronics 2023, 12, 3396. https://doi.org/10.3390/electronics12163396

AMA Style

Naham A-Z, Wang J, Raeed A-S. Multi-Task Learning and Gender-Aware Fashion Recommendation System Using Deep Learning. Electronics. 2023; 12(16):3396. https://doi.org/10.3390/electronics12163396

Chicago/Turabian Style

Naham, Al-Zuhairi, Jiayang Wang, and Al-Sabri Raeed. 2023. "Multi-Task Learning and Gender-Aware Fashion Recommendation System Using Deep Learning" Electronics 12, no. 16: 3396. https://doi.org/10.3390/electronics12163396

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Task Learning and Gender-Aware Fashion Recommendation System Using Deep Learning

Abstract

1. Introduction

2. Related Works

2.1. Fashion Recommender Systems

2.2. Gender-Aware Detection

2.3. Object Detection

2.4. Similarity Learning

3. Proposed System

3.1. Database and Classification

3.2. Gender Detection

3.3. Object Detection

3.4. Similarities Learning

4. Experiments Results and Evaluation

4.1. Dataset Classification

4.2. Gender Detection

4.3. Object Detection

4.4. Similarities Learning

5. Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI