StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization

Chang, I-Cheng; Jonathan, Elvio; Johan, Marcel; Lee, Shao Qi; Pilota, Phoebe

doi:10.3390/engproc2025120015

Open AccessProceeding Paper

StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization^†

by

I-Cheng Chang

^*

,

Elvio Jonathan

,

Marcel Johan

,

Shao Qi Lee

and

Phoebe Pilota

Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien 974301, Taiwan

^*

Author to whom correspondence should be addressed.

^†

Presented at 8th International Conference on Knowledge Innovation and Invention 2025 (ICKII 2025), Fukuoka, Japan, 22–24 August 2025.

Eng. Proc. 2025, 120(1), 15; https://doi.org/10.3390/engproc2025120015

Published: 2 February 2026

(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)

Download

Browse Figures

Versions Notes

Abstract

Effective personal wardrobe management remains a significant challenge, often leading to underutilized garments and increased fashion waste. Existing digital wardrobe solutions commonly lack intelligent capabilities such as automated organization, personalized styling support, and immersive visualization. We developed StyleVision, an AI-integrated wardrobe stylist application that addresses these limitations. StyleVision incorporates a comprehensive AI pipeline, beginning with standardized garment image preprocessing. The system comprises deep learning-based hierarchical garment classification, a dedicated model for outfit esthetic evaluation utilizing synthesized 2D images, and advanced 2D and 3D visualization modules that facilitate outfit exploration and spatial assessment through static 3D garment representations. To improve prediction reliability, confidence thresholding mechanisms are applied across all predictive components. An experimental evaluation on a custom dataset demonstrated robust performance, achieving high accuracy in garment classification and yielding solid results in style classification. The 3D visualization module was functionally validated, producing realistic and distinguishable visual outputs. By offering intelligent styling and interactive visualization, StyleVision enhances wardrobe utilization and encourages more sustainable fashion consumption practices.

Keywords:

digital wardrobe; garment classification; style classification; outfit visualization; deep learning

1. Introduction

The fashion industry’s rapid cycles contribute significantly to clothing underutilization and environmental waste [1]. Many consumers face challenges in effectively managing their wardrobes, leading to a cycle of purchasing new items while existing ones remain unworn [2]. This behavior results in unnecessary spending but also exacerbates the environmental impact of fashion production. To address this issue, various digital wardrobe applications exist [3,4,5] to assist users in managing their closets. However, these tools often rely on manual categorization and lack the intelligent features needed for automated organization, insightful style recommendations, and compelling outfit visualization. This gap limits their effectiveness in encouraging users to rediscover and creatively reuse their existing garments, thus failing to fully address the core issue of unsustainable consumption.

To bridge this gap between manual inventorying and intelligent, AI-driven personal styling, we introduce StyleVision. It is an integrated application that automates the entire wardrobe management process, from garment analysis to outfit generation and visualization. The system leverages a cohesive AI pipeline to provide a personalized and interactive user experience. In this study, we designed a hierarchical classification model that accurately categorizes diverse user-uploaded garments into main and subcategories and implemented a machine learning framework to evaluate the esthetic style of automatically generated outfit combinations. The developed 2D and 3D visualization pipelines enable users to explore outfit arrangements, enhancing style discovery and spatial awareness. By integrating these AI-based components into a unified platform, StyleVision empowers users to rediscover and creatively utilize their existing wardrobe, thereby encouraging more sustainable fashion practices.

2. Proposed System

The StyleVision system is built upon a cohesive AI pipeline that processes user-provided garment images to deliver classification, styling, and visualization services. The methodology for each core component is detailed below.

2.1. System Architecture and Workflow

StyleVision’s architecture integrates classification, generation, and database, as shown in Figure 1. The Classification models handle garment and style identification. The Generation models create 2D outfit compositions and 3D garment representations. A PostgreSQL database manages all user and garment data, ensuring persistence and enabling organized retrieval.

The system’s operational workflow, depicted in Figure 2, is defined by two primary user-driven processes: adding a new garment to the wardrobe and browsing the generated outfits.

Adding a garment: This is the foundational process for populating the user’s digital wardrobe and pre-computing potential outfits.
Input and classification: The user uploads front and rear-view images of a clothing item (upload garment images). The images are preprocessed, and the frontal view is passed to the Garment Classification Model to determine its hierarchical category (classify into categories).
Concurrent generation: Following classification, the system initiates two concurrent data-generation pathways:
○
Three-dimensional garment modeling: The 3D Visualization Model processes both front and rear images to generate an interactive 3D model of the garment with a static geometry (Generate 3D visualization of the garment). this 3d model, which the user can interactively view, is stored along with its classification labels in the categorized garment database.
○
Two-dimensional outfit synthesis and styling: The system uses the newly added item to synthesize potential 2D outfit combinations with other compatible garments from the users wardrobe (generate 2D outfit). Each resulting 2D composite is immediately evaluated by the style classification model (classified into styles), and the successfully styled outfits are stored in the styled outfits database.
Browsing and visualizing an outfit: This workflow allows the user to explore the pre-computed outfit combinations from their wardrobe.
Style-based retrieval: The user selects a desired style category, such as casual (select an outfit style). The system queries the styled outfits database and retrieves all combinations matching that style (retrieve matching outfit).
Selection and 3D assembly: The matching outfits are displayed to the user (display matching outfits). After the user chooses a specific outfit (choose an outfit), the system retrieves the individual 3D models for each garment in that outfit from the categorized garments database (retrieve 3D object). These models are then assembled and rendered to present a complete 3D outfit visualization to the user (display 3D outfit visualization).

This workflow ensures that computationally intensive tasks are performed during the item addition phase, allowing for a responsive and seamless browsing experience.

2.2. Garment Classification Model

The garment classification model automates wardrobe organization using a hierarchical approach (Figure 3) that classifies garments into one of three main categories—Top, Bottom, or Dress—and then classifies them into one of nine specific subcategories, such as Shirt, t-shirt, pants, or skirt.

2.2.1. Architecture and Training

The model utilizes a convolutional neural network (CNN) based on the LeNet-5 architecture [6], implemented in PyTorch version 2.4.0 with CUDA 12.4 (PyTorch 2.4.0+cu124) [7] and depicted in Figure 4. Separate CNNS were trained for the main category and each subcategory branch (tops and bottoms) to specialize in the feature extraction. Training utilized a custom-aggregated dataset of 32,912 images sourced from public repositories, including Kaggle [8], Roboflow [9], Dress Code [10], and Pinterest [11]. All images were manually reviewed and labeled to ensure data quality. For robust evaluation, the dataset was partitioned into approximately 70% for training, 15% for validation, and 15% for testing. Models were trained using a Cross-Entropy Loss function and the Adam optimizer until convergence was observed on the validation set.

2.2.2. Output Processing and Confidence Thresholding

To enhance the reliability of automated classifications, a confidence thresholding protocol was implemented. The thresholds were empirically set at the 5th percentile of confidence scores from the test set for each model in the hierarchy (0.90 for the main category, 0.53 for tops, 0.59 for bottoms). If a prediction falls below its respective threshold, it is handled gracefully; for instance, a garment might be correctly labeled as “Top” but assigned an “Others” subcategory if the sub-classification is uncertain. This approach maintains organizational integrity while filtering out predictions of low confidence.

2.3. Two-Dimensional Visualization Model

The developed model serves a dual purpose within the StyleVision pipeline: generating standardized images for algorithmic analysis and creating personalized visualizations for the user. The methodology is split into the following stages:

Stage 1: composite generation for style analysis
To create a consistent input for the downstream style classifier, the model generates 2D composites of every valid outfit combination. It does so by programmatically warping and placing the constituent garment images–a process that draws conceptual inspiration from the virtual try-on models such as Improving diffusion models for authentic virtual try-on [12]–onto a set of five distinct, static person templates sourced from the Dress Code dataset [10]. This multi-template approach offers a comprehensive visual representation of the outfit, facilitating a more robust style evaluation.
Stage 2: user-centric visualization
Following style classification, the model leverages the same garment warping and placement techniques to render the outfit onto a user’s uploaded full-body photograph. This provides a personalized “virtual try-on” feature, allowing users to see how a generated outfit might look on their own figure.

2.4. Style Classification Model

To assess the esthetic quality of an outfit, the system assigns it one of six contemporary style labels: casual, formal, semi-formal, sporty, bohemian, or streetwear. The model’s input is the set of five 2D composite images generated by Stage 1 of the 2D Visualization Model. It processes all five views and assigns the most frequently predicted style as the final label for the outfit combination.

2.4.1. Architecture and Preprocessing

The architecture depicted in Figure 5 is an adaptation of the ConvNeXt-T model [13], augmented with a dual attention mechanism (criss-cross and spatial attention) to capture both global and local style-relevant visual cues. Before classification, each composite image undergoes a crucial preprocessing step where a DeepLabV3 model [14], specifically one that utilizes a ResNet-101 backbone pre-trained on the COCO dataset [15]. This ensures that the classifier learns features exclusively from the clothing items themselves.

2.4.2. Custom Dataset and Training

Due to the limitations of existing public datasets representing contemporary styles, a custom dataset of 9518 images was curated from the Dress Code dataset [10] and Pinterest [11]. This dataset was meticulously labeled according to the six target styles. The model was trained using the PyTorch version 2.4.0 with CUDA 12.4 (PyTorch 2.4.0+cu124) framework [7], an The AdamW optimizer provided by PyTorch (version 2.4.0), and a OneCycleLR scheduler implemented in PyTorch (version 2.4.0) to facilitate efficient convergence. A confidence threshold of 0.41 (corresponding to the 5th percentile on the test set) was established to filter out predictions for stylistically ambiguous outfits, ensuring that only reliably classified suggestions are presented to the user.

2.5. Three-Dimensional Outfit Visualization Model

The 3D outfit visualization model provides an immersive view by generating 3D models of clothing combinations. Adapted from the initial phase of the Cloth2Tex pipeline [16], a choice was made to produce recognizable 3D models with coarse textures within a manageable generation time, prioritizing shape and spatial assessment over photorealistic simulation. The generation pipeline, illustrated in Figure 6, uses both front and rear garment views and relies on a library of predefined, parametric 3D template meshes. It involves two main stages:

Shape matching: This stage deforms a template mesh to match the input garment’s geometry. Landmark detection is performed using an HRNet-based architecture [17] (pre-trained on DeepFashion2 [18]), which was adapted to a curated subset of keypoints (6–11 per view) relevant to StyleVisions garment types. Precise garment silhouettes are extracted using a model adapted from “cloths-virtual-try-on” [19]. The shape fitting is then performed via deformation graph optimization, which aligns the templates’ 3D keypoints with the detected 2D landmarks and matches the rendered silhouette to the extracted mask. To ensure plausible geometry and prevent unrealistic shapes, As-Rigid-As-Possible (ARAP) regularization [20] is incorporated into the optimization process.
Coarse texture generation: Following shape matching, an image-based optimization generates the garment’s texture. This process utilizes a differentiable renderer (e.g., SoftRas [21]) to iteratively update the deformed mesh’s UV texture map. The optimization minimizes the photometric difference between the rendered model and the original input images. To encourage plausible color filling in occluded regions and reduce visual artifacts, a total variation loss [22] is also included to penalize high-frequency noise in the final texture.

2.6. Computational Environment

The system was developed and trained on a machine equipped with an Intel Core i7-12700K CPU, 32 gigabytes of random access memory (RAM), and an NVIDIA GeForce RTX 4090 GPU (24 gigabytes Video RAM). Key software libraries include PyTorch 2.4.0, PyTorch3D, Transformers, and Flask for the backend.

3. Results

The performance of StyleVision’s AI components was rigorously evaluated on their respective dedicated test sets. The quantitative and qualitative outcomes for each model are presented below.

3.1. Garment Classification

The hierarchical garment classification model achieved 89.81% overall accuracy, which increased to 92.20% after applying confidence thresholds. This filtering protocol proved effective in enhancing the reliability of predictions. Table 1 presents the accuracy breakdown for each stage of the hierarchy.

The post-threshold confusion matrix, shown in Figure 7, illustrates the model’s strong performance across all nine subcategories. As expected, minor confusion occurred primarily between visually similar items; for instance, “PoloShirt” was sometimes confused with “Shirt” in the Top subcategory, and minor confusion was observed between “Pants”/“Skirt” in the Bottom subcategory.

3.2. Two-Dimensional Outfit Visualization Model Performance

The 2D outfit visualization model was evaluated for its functional correctness across its two stages. The model’s performance was validated by confirming it successfully fulfilled its design objectives.

Composite generation for analysis: The model was confirmed to correctly generate the standardized 2D composite images for every valid outfit combination. These images, depicting outfits on multiple static templates, served as the necessary visual input for the subsequent style analysis pipeline.
User-centric visualization: The model’s capability to render outfits onto a user’s uploaded full-body photograph was also functionally validated. This confirmed the successful implementation of the personalized virtual try-on feature.

The components were assessed qualitatively by their ability to produce the required outputs for the other system components, a task that it performed reliably.

3.3. Style Classification Performance

On a custom-curated test set, the style classification model achieved an accuracy of 74.86% before thresholding. After applying the 0.41 confidence threshold, the accuracy on the remaining, confidently classified subset rose to 76.76%. This result represents a significant improvement over baseline tests on publicly available but stylistically outdated datasets (e.g., 53.93% on FashionStyle14), validating our custom data strategy. The confusion matrix (Figure 8) reveals that while distinct styles, such as Formal and Sporty, were well-distinguished, there was some predictable overlap between more ambiguous categories, like Casual, Semi-Formal, and Streetwear, reflecting the inherent subjectivity of fashion esthetics.

3.4. Three-Dimensional Visualization Model Performance

The developed 3D visualization model was evaluated for its functional output and computational performance.

Functional validation: Operational testing confirmed the model successfully processed front and rear images to generate interactive, rotatable 3D meshes for all supported garment types. Qualitatively, the generated models accurately captured the overall shape, color, and prominent patterns from the 2D input images, producing recognizable representations suitable for outfit exploration.
Generation time: On the specified hardware environment, the average time required to generate a single textured 3D garment model was approximately 4 min. This latency is acceptable for the system’s intended workflow, where 3D model generation is a one-time, offline process performed when a user first adds an item to their wardrobe.

4. Conclusions

We developed StyleVision, a functional prototype that integrates a cohesive AI pipeline for automated garment classification, style analysis, and 3D visualization. The system presented high accuracy in garment categorization and yields promising results in style classification on custom datasets. The system’s confidence thresholding enhances reliability and the practical application of 3D reconstruction for static outfit visualization. StyleVision offers an intelligent platform that promotes more effective wardrobe utilization and more sustainable fashion consumption. To improve classification for visually similar items, 3D visualization fidelity and interactivity need to be enhanced by expanding garment support to include new categories like footwear and developing a recommendation engine for personalized outfit suggestions.

Author Contributions

Conceptualization, I.-C.C., E.J., M.J., S.Q.L. and P.P.; Methodology, I.-C.C., E.J. and M.J.; Training & Testing, E.J.; Inferencing, M.J.; Formal Analysis, I.-C.C., S.Q.L. and P.P.; Research, E.J., M.J., S.Q.L. and P.P.; Resources, S.Q.L. and P.P.; Data Curation, S.Q.L. and P.P.; Writing—Original Draft Preparation, S.Q.L.; Writing—Review & Editing, I.-C.C., E.J., P.P., and M.J.; UI/UX Design, P.P. and M.J.; Backend Development, E.J. and S.Q.L.; Frontend Development, M.J. and P.P.; Project Management, I.-C.C. and S.Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Council (NSTC), Taiwan, grant numbers NSTC 113-2221-E-259-012 and NSTC 114-2221-E-259-005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kwon, T.A.; Choo, H.J.; Kim, Y. Why Do We Feel Bored with Our Clothing and Where Does It End Up? Int. J. Consum. Stud. 2020, 44, 1–13. [Google Scholar] [CrossRef]
Shaw, D.; Duffy, K. Save Your Wardrobe: Digitalising Sustainable Clothing Consumption; University of Glasgow: Glasgow, UK, 2019. [Google Scholar] [CrossRef]
Lookastic: Your Personal AI Stylist. Available online: https://lookastic.com/ (accessed on 15 May 2025).
Stylebook Closet App: A Closet and Wardrobe Fashion App for the iPhone and iPad. Available online: https://www.stylebookapp.com/ (accessed on 15 May 2025).
Whering | The Social Wardrobe & Styling App. Available online: https://whering.co.uk/ (accessed on 15 May 2025).
Kayed, M.; Anter, A.; Mohamed, H. Classification of Garments from Fashion MNIST Dataset Using CNN LeNet-5 Architecture. In Proceedings of the 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), Aswan, Egypt, 12–14 February 2020; pp. 238–243. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019. [Google Scholar] [CrossRef]
Kaggle: Your Machine Learning and Data Science Community. Available online: https://www.kaggle.com/ (accessed on 22 February 2025).
Roboflow Universe: Computer Vision Datasets. Available online: https://universe.roboflow.com/ (accessed on 5 March 2025).
Morelli, D.; Fincato, M.; Cornia, M.; Landi, F.; Cesari, F.; Cucchiara, R. Dress Code: High-Resolution Multi-category Virtual Try-On. arXiv 2022. [Google Scholar] [CrossRef]
Pinterest. Available online: https://www.pinterest.com/ (accessed on 3 April 2025).
Choi, Y.; Kwak, S.; Lee, K.; Choi, H.; Shin, J. Improving Diffusion Models for Authentic Virtual Try-on in the Wild. arXiv 2024. [Google Scholar] [CrossRef]
Chen, X.; Deng, Y.; Di, C.; Li, H.; Tang, G.; Cai, H. High-Accuracy Clothing and Style Classification via Multi-Feature Fusion. Appl. Sci. 2022, 12, 10062. [Google Scholar] [CrossRef]
The PyTorch Maintainers and Contributors. TorchVision: PyTorchs Computer Vision Library. Available online: https://github.com/pytorch/vision (accessed on 13 March 2025).
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Computer Vision–ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar] [CrossRef]
Gao, D.; Zhu, Y.; Zhou, Y.; Wu, J.; Li, Y.; Liu, S. Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On. arXiv 2023. [Google Scholar] [CrossRef]
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5686–5696. [Google Scholar] [CrossRef]
Ge, Y.; Zhang, R.; Wang, X.; Tang, X.; Luo, P. DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5332–5340. [Google Scholar] [CrossRef]
Swayam. SwayamInSync/Clothes-Virtual-Try-On. Available online: https://github.com/SwayamInSync/clothes-virtual-try-on (accessed on 20 April 2025).
Sorkine, O.; Alexa, M. As-Rigid-As-Possible Surface Modeling. In Proceedings of the Fifth Eurographics Symposium on Geometry Processing, Barcelona, Spain, 4–6 July 2007. [Google Scholar]
Liu, S.; Chen, W.; Li, T.; Li, H. Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7707–7716. [Google Scholar] [CrossRef]
Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear Total Variation Based Noise Removal Algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]

Figure 1. StyleVision system architecture.

Figure 2. System workflow.

Figure 3. Garment classification.

Figure 4. Adapted garment classification workflow.

Figure 5. Adapted style classification workflow.

Figure 6. Architecture of 3D visualization model.

Figure 7. Confusion matrix of overall garment classification (post-threshold).

Figure 8. Confusion matrix of style classification (post-threshold).

Table 1. Garment classification accuracy (pre- vs. post-thresholding).

Accuracy After Confidence Thresholding	Confidence Threshold	Accuracy Before Confidence Thresholding	Class
99.34%	0.9	97.97%	Main category
91.54%	0.53	88.99%	Top subcategory
94.28%	0.59	92.34%	Bottom subcategory
92.20%	N/A	89.81%	Overall system classification

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chang, I.-C.; Jonathan, E.; Johan, M.; Lee, S.Q.; Pilota, P. StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization. Eng. Proc. 2025, 120, 15. https://doi.org/10.3390/engproc2025120015

AMA Style

Chang I-C, Jonathan E, Johan M, Lee SQ, Pilota P. StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization. Engineering Proceedings. 2025; 120(1):15. https://doi.org/10.3390/engproc2025120015

Chicago/Turabian Style

Chang, I-Cheng, Elvio Jonathan, Marcel Johan, Shao Qi Lee, and Phoebe Pilota. 2025. "StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization" Engineering Proceedings 120, no. 1: 15. https://doi.org/10.3390/engproc2025120015

APA Style

Chang, I.-C., Jonathan, E., Johan, M., Lee, S. Q., & Pilota, P. (2025). StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization. Engineering Proceedings, 120(1), 15. https://doi.org/10.3390/engproc2025120015

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

StyleVision: AI-Integrated Stylist System with Intelligent Wardrobe Management and Outfit Visualization^†

Abstract

1. Introduction