Using Deep Learning in Real-Time for Clothing Classiﬁcation with Connected Thermostats

: Thermal comfort is associated with clothing insulation, conveying a level of satisfaction with the thermal surroundings. Besides, clothing insulation is commonly associated with indoor thermal comfort. However, clothing classiﬁcation in smart homes might save energy when the end-user wears appropriate clothes to save energy and obtain thermal comfort. Furthermore, object detection and classiﬁcation through Convolutional Neural Networks has increased over the last decade. There are real-time clothing garment classiﬁers, but these are oriented towards single garment recognition for texture, fabric, shape, or style. Consequently, this paper proposes a CNN model classiﬁcation for the implementation of these classiﬁers on cameras. First, the Fashion MNIST was analyzed and compared with the VGG16, Inceptionvv4, TinyYOLOv3, and ResNet18 classiﬁcation algorithms to determine the best clo classiﬁer. Then, for real-time analysis, a new dataset with 12,000 images was created and analyzed with the YOLOv3 and TinyYOLO. Finally, an Azure Kinect DT was employed to analyze the clo value in real-time. Moreover, real-time analysis can be employed with any other webcam. The model recognizes at least three garments of a clothing ensemble, proving that it identiﬁes more than a single clothing garment. Besides, the model has at least 90% accuracy in the test dataset, ensuring that it can be generalized and is not overﬁtting.


Introduction
Clothing insulation is commonly associated with indoor thermal comfort. ASHRAE defines clothing insulation as the resistance to sensible heat transfer provided by a clothing ensemble, expressed in units of clo [1]. There are predictive models of clothing insulation that consider outdoor temperature, season, climate, indoor air temperature, indoor operative temperature, relative humidity [2][3][4]. Rupp et al. [5] evaluated the clothing insulation collected in the ASHRAE database II [6] to predict garment insulation from the indoor air temperature, the season, and building ventilation type. Moreover, Wang et al. [7] proposed a predictive model of clothing insulation for naturally ventilated buildings using the same ASHRAE database II. Gao et al. [8] considered wind direction, posture, and the reduction of clothing insulation due to airspeed to predict thermal comfort.
Alternatively, object detection and classification have been in rapid development for the last 10 years since the famous AlexNet [9] algorithm won the 2012 ImageNet Large Scale Visual Recognition Challenge and started a Convolutional Neural Networks revolution. Hence, Liu et al. [10] used a Convolutional Neural Network (CNN) to recognize an individual's clothes and activity type by capturing thermal videos as inputs. Kalantidis et al. [11] implemented clothing ensemble recognition from a photograph; however, that proposal was not suitable for a real-time solution due to a slow segmentation classification.
There are real-time clothing garments or clothing characteristics classifiers such as the one proposed by Yang and Yu [12]. They used edge detection to obtain information separate from the background and then perform a technique similar to the model proposed by Chao et al. [13], which uses the Histogram of Oriented Objects (HOG) and Support Vector Machines (SVM) to obtain classifications. Yamaguchi et al. [14] focused their research on subjects with single garments instead of a complete ensemble. Furthermore, some CNN approaches used a modified version of the VGG16. Furthermore, some CNN approaches used a modified version of the VGG16 [15] to orient the garment recognition towards texture, fabric, shape, or style [16][17][18][19]. Nevertheless, those approaches did not produce a complete clothing ensemble classification; hence, they only obtained a single clothing garment classification per image.
Due to the increase in dynamic models, adaptive methods that predict clothing properties must understand how an individual adapts to indoor environments. Matsumoto, Iwai, and Ishiguro [20] used a computer vision system and a combination of HOG and SVM to recognize clothing garments. Bouskill and Havenith [21] used a thermal manikin to determine the relationship between clothing insulation and clothing ventilation with different activities known as metabolic rates. They concluded that clothing insulation has less of an effect than the design and fabric of the clothing garment; thus, they recommended analyzing the clothing garments worn in specific places during specific activities to determine the best outfit that avoids colder or warmer thermal sensations.
Moreover, in [20], the authors used an early piece of computer vision hardware from Omron called OKAO Vision to classify objects by proposing a limit that separated two classes, and depending on where the features of new predictions lay, the SVM classified them. However, the SVM was a binary classifier, which meant it only chose between two classes, making it impossible to use this approach as a real-time clothing insulation calculation method. Additionally, the SVM calculated the gradient of each pixel together for every video frame with each computational cost's class. A real-time implementation of clothing recognition is useful for this field to obtain a real-time clo value.
The idea of using computer vision to detect clothing seems expensive when thinking about the implementation of the camera system and the computer needed to process the information and run the solution. However, as cameras are being spread across different uses such as telecare [22][23][24] or combined with personal assistants such as Alexa [25,26], the concept of cameras being part of the smart home infrastructure needs to be considered. Thus, there would be no need to invest in a camera system and only think about the processing part of the problem.
In [25], the authors proposed using Alexa and a camera to track seniors' moods and emotions to prevent social isolation and depression. In [26], the authors considered Alexa for depression pre-diagnosis and suggested using cameras to track householders. Figure 1 displays the smart home structure. Hence, cameras can track garments. For example, through a smart TV, if possible, camera detection can monitor householder reactions or postures and profile end-users' garments. Thus, this picture shows the integration of household appliances that can help to track householders' daily activities and moods. Moreover, in [26], the authors established for the first time the concept of a gamified smart home to help end-users to save energy without feeling compromised. Besides, previous research had been focused on reducing energy consumption through gamified elements [26][27][28][29][30][31]. A smart home uses socially connected products [32][33][34][35][36] to profile end-users based on their personality traits, types of gamified user, and energy users to propose tailored interfaces that help them to understand the benefits of saving energy. Moreover, during this research, the authors suggested considering thermal comfort for energy reductions [26,37].
Therefore, a computer vision system integrated into camera recognition is needed to implement a real-time clothing insulation recognition system to obtain real-time feedback on thermal comfort. Integrating this clothing classifier within the thermostat interface may allow real-time feedback and monitoring to help the end-user to understand how their Energies 2022, 15, 1811 3 of 28 clothes affect thermal comfort. Besides, increasing the setpoint by 1 • C could save electricity consumption by 6% [38]. Therefore, a computer vision system integrated into camera recognition is needed to implement a real-time clothing insulation recognition system to obtain real-time feedback on thermal comfort. Integrating this clothing classifier within the thermostat interface may allow real-time feedback and monitoring to help the end-user to understand how their clothes affect thermal comfort. Besides, increasing the setpoint by 1 °C could save electricity consumption by 6% [38].
Thus, dynamic interfaces could use gamified elements to engage the householder in enjoyable activities while saving energy. There are intrinsic and extrinsic game elements for energy applications provided in the interfaces to help reduce energy [32,37,39]:

Object Classification Algorithms
A CNN handles multiple dimensions due to the convolutional layers [40]. Hence, there are two types of approaches [41]:


One-stage: The object detectors produce bounding boxes that contain the detected objects without a region proposal;  Two-stage: The object detectors carefully review the entire image, leading to a slower process than the one-stage approach but with better accuracy.

CNN Characteristics Author
AlexNet [9] This CNN has eight layers: five convolutional layers connected by max-pooling layers, followed by three fully connected layers. Then, the CNN is divided into two stages, with the feature extraction part done by the convolutional layers and the classification part performed by the fully connected layers. This became the basis for image classifiers. Thus, dynamic interfaces could use gamified elements to engage the householder in enjoyable activities while saving energy. There are intrinsic and extrinsic game elements for energy applications provided in the interfaces to help reduce energy [32,37,39]:

Object Classification Algorithms
A CNN handles multiple dimensions due to the convolutional layers [40]. Hence, there are two types of approaches [41]: • One-stage: The object detectors produce bounding boxes that contain the detected objects without a region proposal; • Two-stage: The object detectors carefully review the entire image, leading to a slower process than the one-stage approach but with better accuracy.

CNN Characteristics Author
AlexNet [9] This CNN has eight layers: five convolutional layers connected by max-pooling layers, followed by three fully connected layers. Then, the CNN is divided into two stages, with the feature extraction part done by the convolutional layers and the classification part performed by the fully connected layers. This became the basis for image classifiers.

Alex Krizhevsky
VGG16 [15] VGG16 consists of convolutional layers stacked on each other. This architecture does not change the size of the kernels in the convolutional layers and keeps it constant in a 3 × 3 value.
Researchers from the Oxford University Table 1. Cont.

CNN Characteristics Author
GoogleLeNet [42] or Inception The designers proposed a Convolutional Network with a kernel size of 1 × 1 to reduce the image. Therefore, the CNN significantly reduced the number of parameters needed for the training. This architecture produced better results than the existing algorithms at that moment.

Google
ResNet [43] This algorithm introduced the residual blocks, which are layers connected in which some weights skipped those convolutional layers.
Therefore, deeper networks are implemented to get rid of the degradation problem.
Windows YOLO [44,45] YOLO stands for You Only Look Once and is a one-stage algorithm proposed in 2016. This algorithm eliminated the region proposals method of two-stage detector algorithms and instead produced bounding boxes. Thus, the probabilities of the object inside that bounding box belonged to that class. Although this algorithm presents lower accuracy than two-stage object detectors, it can be considered an accurate model.

Joseph Redmon
Tini YOLO [46] Tiny YOLO is a modified version of YOLOv3 that keeps the algorithm's speed while making it computationally less expensive. Thus, the embedded systems can have the trained model to produce predictions without expensive GPUs.

Joseph Redmon
TensorFlow [47] is an end-to-end open-source platform written in Python and C++ that provides tools and libraries to allow easy implementation of a machine learning application since it provides a tool for the necessary creation, training, deployment, and performance analysis [48]. In addition, it provides Application Programming Interfaces (APIs) which help to create a model with few lines of code. Therefore, the user spends more time focused on the model implementation and its parameters and less time on the coding part of the implementation.
TensorFlow uses data in the form of tensors or arrays of multiple dimensions, also called matrices, and all the operations inside Tensorflow work with these tensors.
Another plus of the Tensorflow package is that it handles data more efficiently and tries to avoid the Graphics Processing Unit (GPU) or Tensor Processing Unit (TPU) waiting for the Central Processing Unit (CPU) to deal with the input data by using its API, called tf.data, to achieve a more efficient importation of the dataset and all the treatment needed so that the GPU/TPU does not suffer from data starvation.
One of the most important APIs contained within the Tensorflow package is Keras [49]. Keras is an open-source deep learning library that was designed to quickly build and train neural network models. It can build these models using the sequential method, which consists of adding layers in turn with the indicated activation function and filter size [48].
Even though there are some object classification models directed towards clothing recognition, most of the proposed algorithms are for fashion industry problems or produce single clothing garment classifications and fail to generalize to other solutions and fail to be able to be implemented in activity recognition or other areas where a real-time clothing ensemble classification may be useful. Hence, this paper proposes a CNN model classification for implementation on real-time devices, such as cameras: the clothing ensemble classifier.
The concrete contributions of this paper are as follows: • The model recognizes at least three garments of the clothing ensemble, proving that it recognizes more than a single clothing garment; • The model had at least 90% accuracy in the test dataset, ensuring that it can generalize and it is not overfitting.
Furthermore, the VGG16, Inception, TinyYOLOv3, and ResNet classification algorithms were selected in this study because they are the most basic architectures for image classification. Besides, the previous approaches found in the state of the art of clothing recognition models took as a base architecture the VGG16 architecture [14,18]. Therefore, Energies 2022, 15, 1811 5 of 28 the aim of this study was to compare the basic architectures to identify which was the best real-time clothing classifier. Furthermore, as the classifier will be used at home, TinyY-OLOv3 has a small architecture that can be implemented on embedded systems such as the Raspberry, FPGA, or NVIDIA Jetson Nano. Figure 2 displays the methodology used during this research. First, Fashion MNIST was analyzed and compared with the VGG16, Inception, TinyYOLO, and ResNet classification algorithms to determine the object classifier that best suited the clo classification. Then, for the real-time analysis, a new dataset with 12,000 images was created and analyzed with YOLOv3 and TinyYOLO. Since most real-time solutions used the YOLO algorithm, a YOLO model was trained to obtain a real-time clothing garment classifier. Besides, a Tiny YOLO model was tested for the intimacy of the users. Research suggested that Tiny YOLO can be implemented for real-time image detections in constricted environments and implemented into an embedded system. Furthermore, the Tiny YOLO was trained with the recommended weights from another large-scale object detection, segmentation, and captioning dataset known as COCO [50]. Finally, an Azure Kinect DT was employed to analyze the clo values in real-time. Moreover, real-time analysis can be employed with any other webcam. Furthermore, the VGG16, Inception, TinyYOLOv3, and ResNet classification algorithms were selected in this study because they are the most basic architectures for image classification. Besides, the previous approaches found in the state of the art of clothing recognition models took as a base architecture the VGG16 architecture [14,18]. Therefore, the aim of this study was to compare the basic architectures to identify which was the best real-time clothing classifier. Furthermore, as the classifier will be used at home, TinyY-OLOv3 has a small architecture that can be implemented on embedded systems such as the Raspberry, FPGA, or NVIDIA Jetson Nano. Figure 2 displays the methodology used during this research. First, Fashion MNIST was analyzed and compared with the VGG16, Inception, TinyYOLO, and ResNet classification algorithms to determine the object classifier that best suited the clo classification. Then, for the real-time analysis, a new dataset with 12,000 images was created and analyzed with YOLOv3 and TinyYOLO. Since most real-time solutions used the YOLO algorithm, a YOLO model was trained to obtain a real-time clothing garment classifier. Besides, a Tiny YOLO model was tested for the intimacy of the users. Research suggested that Tiny YOLO can be implemented for real-time image detections in constricted environments and implemented into an embedded system. Furthermore, the Tiny YOLO was trained with the recommended weights from another large-scale object detection, segmentation, and captioning dataset known as COCO [50]. Finally, an Azure Kinect DT was employed to analyze the clo values in real-time. Moreover, real-time analysis can be employed with any other webcam.  All the tests were performed with a GeForce TX 2080 Ti GPU and an AMD Ryzen 3950 12 core 3.5 GHz processor to avoid any bias during the time measurements. In addition, a Huawei P30 Lite cellphone's camera was used for static images and real-time videos. The recorded images show an individual in a living room walking off camera, changing a garment, walking, and sitting down. The video lasted 24 s. Furthermore, the current setting did not have more individuals to analyze at the same time; hence, TV series scenes were used to compensate for that lack of individuals and visualize the changes that the model had. Figure 3 depicts the flowchart used during this research for the entire process for training a neural network.

Materials and Methods
All the tests were performed with a GeForce TX 2080 Ti GPU and an AMD Ryzen 3950 12 core 3.5 GHz processor to avoid any bias during the time measurements. In addition, a Huawei P30 Lite cellphone's camera was used for static images and real-time videos. The recorded images show an individual in a living room walking off camera, changing a garment, walking, and sitting down. The video lasted 24 s. Furthermore, the current setting did not have more individuals to analyze at the same time; hence, TV series scenes were used to compensate for that lack of individuals and visualize the changes that the model had. Figure 3 depicts the flowchart used during this research for the entire process for training a neural network.

Datasets
Two datasets were analyzed before training the CNN models. The Deep Fashion dataset provided different labeled images grouped into category, texture, fabric, shape, part, and style [16]. The Fashion MNIST dataset [51] provided 70,000 images of clothing garments divided into 60,000 images for training and 10,000 images for testing. The Fashion MNIST dataset was divided into 10 classes: T-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot.
A new dataset consisting of 12,000 images was proposed because of the dataset analysis. Therefore, 2000 images were data augmented to obtain 10,000 additional images, resulting in a total of 12,000 images. These images were divided into sets of 10,800 for training and 1200 for testing. These images were randomly selected from the internet with different backgrounds and different clothing garments worn. The classes were decided based on the premise of keeping the training time at a minimum but having eight different classes to be recognized. Besides, due to hardware and time constraints, only eight labels were selected. Hence, Table 2 presents the eight different classes that were considered. Furthermore, dresses were labeled as skirts due to the similarity of the bottom part of the clothing garment. During the study, there was no access to a computer with a Linux operating system, so Google Colab was used instead to train both networks (YOLO and TinyYOLOv3) with a custom dataset. However, Google Colab limited the GPU access time, and the 60+ hours needed to train the network translated into several weeks.

Label
Description 0 Highly insulating jacket

Datasets
Two datasets were analyzed before training the CNN models. The Deep Fashion dataset provided different labeled images grouped into category, texture, fabric, shape, part, and style [16]. The Fashion MNIST dataset [51] provided 70,000 images of clothing garments divided into 60,000 images for training and 10,000 images for testing. The Fashion MNIST dataset was divided into 10 classes: T-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot.
A new dataset consisting of 12,000 images was proposed because of the dataset analysis. Therefore, 2000 images were data augmented to obtain 10,000 additional images, resulting in a total of 12,000 images. These images were divided into sets of 10,800 for training and 1200 for testing. These images were randomly selected from the internet with different backgrounds and different clothing garments worn. The classes were decided based on the premise of keeping the training time at a minimum but having eight different classes to be recognized. Besides, due to hardware and time constraints, only eight labels were selected. Hence, Table 2 presents the eight different classes that were considered. Furthermore, dresses were labeled as skirts due to the similarity of the bottom part of the clothing garment. During the study, there was no access to a computer with a Linux operating system, so Google Colab was used instead to train both networks (YOLO and TinyYOLOv3) with a custom dataset. However, Google Colab limited the GPU access time, and the 60+ hours needed to train the network translated into several weeks. The labelImg library was used to label the images because this library allows images and texts to be handled to classify them through bounding boxes. Thus, this classification is compatible with the YOLO format. The advantage of YOLO is that it can detect and classify various objects inside an image, which is perfect for a clothing ensemble classification problem; therefore, this was ideal for the research scope. After labeling the 2000 images, the data augmentation was performed using the clodsa library, as the train and test files were created in the YOLO format. This library was used to perform image transformations on the labeled dataset by keeping the bounding boxes in the correct place. Thus, the augmented dataset increased to 12,000 images. Figure 4 depicts the four CNN algorithms used during this research. The VGG16 and Inceptionv4 considered preloaded weights as the Keras application class [15,48]. The ResNet18 [43,48] was built from scratch, and the TinyYOLOv3 [46] had no preloaded weights and was built from scratch. Thus, the ResNet18 allowed us to make a comparison with the Tiny YOLO model. Moreover, the Tiny YOLO model considered preloaded weights. It was trained with Linux commands to perform a comparison with the independent images.

Training the Models
The training was conducted using 100 epochs, a batch size of 128, the Adam optimizer, and a constant learning rate of 0.01. A change to a test of 0.05 was tried with no significant change. Besides, due to time and hardware constraints, changing the parameters using Google Colab would take several weeks. Thus, no further changes were performed for this hyperparameter. However, these parameters could be tested in future work.
The compile method configured the network, whereas the fit method was used for the model training. Therefore, the training dataset was the input parameter. Besides, the number of batches, epochs, and callbacks were chosen. The batches were divisions of the Furthermore, the validation and training accuracy and the confusion matrices were plotted to compare each CNN. Besides, five images were tested to obtain the clo value in real-time. The model detected the eight different classes and displayed the total clo value and the probability percentage of belonging to the class they examined.
The Tiny YOLO considered pre-trained weights because the other CNN models used these feature extractors as their method to obtain weights. Thus, the Tiny YOLO performed the same tests on the same images and compared them with the other CNN models. Furthermore, the comparison made it possible to visualize any difference with the model created from scratch. Finally, the last model was trained in Google Colab and operated with the new custom dataset of 12,000 images. These 12,000 images came from the original image dataset of 2000 images gathered from the internet with no particular size. These images were labeled with the YOLO format; then, data augmentation techniques were used to add rotation, hue changes, contrast changes, horizontal flip, Gaussian noise, and gamma color correction to enlarge the dataset and cover more areas where the camera and lighting settings may affect the effectiveness of the model.
The threshold refers to the models' confidence percentage that a detected object belongs to a class. For instance, a threshold value of 0.5 generated bounding boxes around objects that have over 50% probability of belonging to the predicted classes. Thus, this project used an initial threshold of 0.4 and lowered it to 0.2 for the video test. For the images, the threshold was set to 0.1 to study all the classifications the model makes.

Training the Models
The training was conducted using 100 epochs, a batch size of 128, the Adam optimizer, and a constant learning rate of 0.01. A change to a test of 0.05 was tried with no significant change. Besides, due to time and hardware constraints, changing the parameters using Google Colab would take several weeks. Thus, no further changes were performed for this hyperparameter. However, these parameters could be tested in future work.
The compile method configured the network, whereas the fit method was used for the model training. Therefore, the training dataset was the input parameter. Besides, the number of batches, epochs, and callbacks were chosen. The batches were divisions of the dataset used to train on a random portion instead of the whole dataset to avoid the failure of the computer due to not having enough Random Access Memory (RAM) to produce the result.
Google Colab was used for the YOLO training because a Linux command was needed to create the required environment and train the model. However, the main drawback was the time limit. Google Colab allows access to its GPU for 4 h; then, it is necessary to wait for 18 h to gain access again for another 4 h. Hence, the Tiny YOLO model training took 90 h, of which 60 h corresponded to the training time.
Currently, there has been no comparison between Tiny YOLOS's feature extraction architecture with the other common architectures. Consequently, during this research, the comparison was performed against the other image classifiers. Furthermore, neither YOLO nor Tiny YOLO was previously implemented on Keras. Hence, the sequential method was built to compare the different CNN models with Tiny YOLO. Moreover, this model was built from scratch using the same activation functions and the number of filters, sizes, and strides to keep the model as close to the original as possible.
As for the VGG16, Inceptionv4, and ResNet18 models, Keras included a method to call on these models and used them as feature extractors to add a few dense layers and differentiate the classifications. Thus, this method was used to compare the models.

Comparing the Models
The accuracy results were obtained from the training models and were plotted against the epochs to see how the models changed their performance along with the training and if there was any overfitting present, as well as to recognize if early stopping should have been used to avoid overfitting. This accuracy corresponds to the best accuracy of the validation data and the corresponding best accuracy on the training data for that same epoch.
Additionally, confusion matrices were built using the network's predictions compared to the actual values with the test dataset to analyze if the accuracy was true. A model that classified everything as a negative except a few positive classifications correctly can have an accuracy value of over 90%; nonetheless, this model would be useless. Furthermore, Energies 2022, 15, 1811 9 of 28 the metrics for model evaluation used during this study were the precision, recall, and F1 score.

Study Case: Clothing Insulation Real-Time Analysis Applied on Thermostats
Once the CNNs were trained, a real-time clothing recognition approach that can be implemented at home was proposed. This real-time recognition was oriented to infer clothing insulation based on the clo values presented in Table 3. Figure 5 depicts the flow chart considered during the proposed solution to obtain a real-time implementation of clothing recognition. model was built from scratch using the same activation functions and the number of filters, sizes, and strides to keep the model as close to the original as possible.
As for the VGG16, Inceptionv4, and ResNet18 models, Keras included a method to call on these models and used them as feature extractors to add a few dense layers and differentiate the classifications. Thus, this method was used to compare the models.

Comparing the Models
The accuracy results were obtained from the training models and were plotted against the epochs to see how the models changed their performance along with the training and if there was any overfitting present, as well as to recognize if early stopping should have been used to avoid overfitting. This accuracy corresponds to the best accuracy of the validation data and the corresponding best accuracy on the training data for that same epoch.
Additionally, confusion matrices were built using the network's predictions compared to the actual values with the test dataset to analyze if the accuracy was true. A model that classified everything as a negative except a few positive classifications correctly can have an accuracy value of over 90%; nonetheless, this model would be useless. Furthermore, the metrics for model evaluation used during this study were the precision, recall, and F1 score.

Study Case: Clothing Insulation Real-Time Analysis Applied on Thermostats
Once the CNNs were trained, a real-time clothing recognition approach that can be implemented at home was proposed. This real-time recognition was oriented to infer clothing insulation based on the clo values presented in Table 3. Figure 5 depicts the flow chart considered during the proposed solution to obtain a real-time implementation of clothing recognition. Thus, the process used in this study was to select the dataset to be implemented in the model training, label the images to add more precision, and provide information about the people wearing the clothes and background. Clothing affects factors that involve the heat transfer between the human body and the ambient environment; besides, clothing insulation affects thermal comfort because the difference in the value of this factor can change the perception of the ambient environment's temperature. Thus, the process used in this study was to select the dataset to be implemented in the model training, label the images to add more precision, and provide information about the people wearing the clothes and background. Clothing affects factors that involve the heat transfer between the human body and the ambient environment; besides, clothing insulation affects thermal comfort because the difference in the value of this factor can change the perception of the ambient environment's temperature.
For this reason, both human-centered and building-centered thermal comfort calculations consider clothing insulation as a factor for the overall thermal comfort range. However, a thermal calculation method has been overlooked due to the difficulty of detecting and classifying every clothing garment that a user is wearing.
Then, a home located at Concord, California, was energy simulated to measure the impact of increasing or decreasing the temperature by 1 • C at the HVAC setpoint. A change of 1 • C can save 6% of electricity [38]. Other elements fed into the energy model were a weather file from Concordia, the construction materials, the home schedule, and loads. The energy model simulation used LadybugTools v1.4.0 from Rhinoceros + Grasshopper. The living room zone was analyzed to obtain the HVAC consumption and calculate the PMV/PPD to determine if the householder was comfortable. The parameters considered were a metabolic rate of 1.0 and a dynamic clo value based on Table 3.
Then, a dynamic interface was proposed based on the energy model results. This interface was built in MATLAB/Simulink V.R2021a. This interface determined, in an interactive and ludic manner, how to save energy by modifying the setpoint and suggesting appropriate types of clothes. Figure 6 displays the input values inside a green box (interface, the month, day, and hour); the output values were the hourly consumption in Watts, the outdoor and indoor temperature, the relative humidity, the setpoint, and the expected savings. In the "Did you know?" section, a message was displayed, and based on the possible energy savings and thermal comfort, three actions were displayed: 1.
Wear the same clothes; 3.
Wear warmer clothes.

Results
Although the proposal was to use both datasets to compare results, only Fashion MNIST was considered because the Deep Fashion dataset was encrypted and required a password to decompress the dataset files. Therefore, an e-mail was sent to the authors, but we never received a response or the required password. Thus, a new dataset was created. The "Take a look" button showed the householder the potential savings achieved by performing those actions. The "Reward available" and "Community news" elements belong to a gamification structure. These buttons are displayed this way because gamifica-tion theory suggests that promoting intrinsic motivations and extrinsic motivations in real activities can achieve specific goals, such as energy reduction [28,32,37,39].

Results
Although the proposal was to use both datasets to compare results, only Fashion MNIST was considered because the Deep Fashion dataset was encrypted and required a password to decompress the dataset files. Therefore, an e-mail was sent to the authors, but we never received a response or the required password. Thus, a new dataset was created. Figure 7 depicts the dataset observation and the divisions considered for training validation and testing stages with the Fashion MNIST dataset. The image shows that the dataset had 10 different classes with the grayscale format and was printed into the NHWC format. Figure 8 shows that the distribution showed no significant difference for the training data and validation data. That means that the model was not biased.  In addition, the data augmentation process used the clodsa package. Figure 9 shows an example of these transformations. Therefore, the results of each transformation allowed the labeled bounding boxes to keep their place without affecting the training. Figure 9a shows the flipping transformations. They cover different postures of the people; the vertical flip considers some individuals that prefer to lay down with their feet up, for instance, to alleviate feet pain. Figure 9b displays the hue and the contrast transformations to cover different lightning environments and possible impediments for a camera. Figure 9c represents the blurring and histogram transformation. They were selected to cover the difference between the image resolution taken from cameras with fewer megapixels or lower resolution than the ones used for training.

Datasets Treatments
There were some differences in the number of examples containing people wearing jackets, shirts, and trousers because it was relevant to discern between a highly insulating jacket and a regular jacket. Any shoe that covered the ankle was labeled as a highly insulating shoe. Furthermore, no differentiation for sandals was made. Every dress was labeled as a skirt due to its similarity with the skirt's shape. Besides, the objective was to have fewer classes to have less training time, and thus we prioritized recognizing several parts of a clothing ensemble such as shirts, jackets, shoes, and skirts that are more common  In addition, the data augmentation process used the clodsa package. Figure 9 shows an example of these transformations. Therefore, the results of each transformation allowed the labeled bounding boxes to keep their place without affecting the training. Figure 9a shows the flipping transformations. They cover different postures of the people; the vertical flip considers some individuals that prefer to lay down with their feet up, for instance, to alleviate feet pain. Figure 9b displays the hue and the contrast transformations to cover different lightning environments and possible impediments for a camera. Figure 9c represents the blurring and histogram transformation. They were selected to cover the difference between the image resolution taken from cameras with fewer megapixels or lower resolution than the ones used for training.
There were some differences in the number of examples containing people wearing jackets, shirts, and trousers because it was relevant to discern between a highly insulating jacket and a regular jacket. Any shoe that covered the ankle was labeled as a highly insulating shoe. Furthermore, no differentiation for sandals was made. Every dress was labeled as a skirt due to its similarity with the skirt's shape. Besides, the objective was to have fewer classes to have less training time, and thus we prioritized recognizing several parts of a clothing ensemble such as shirts, jackets, shoes, and skirts that are more common In addition, the data augmentation process used the clodsa package. Figure 9 shows an example of these transformations. Therefore, the results of each transformation allowed the labeled bounding boxes to keep their place without affecting the training. Figure 9a shows the flipping transformations. They cover different postures of the people; the vertical flip considers some individuals that prefer to lay down with their feet up, for instance, to alleviate feet pain. Figure 9b displays the hue and the contrast transformations to cover different lightning environments and possible impediments for a camera. Figure 9c represents the blurring and histogram transformation. They were selected to cover the difference between the image resolution taken from cameras with fewer megapixels or lower resolution than the ones used for training.

Object Classifiers Comparison
The validation and training accuracy graphs for all models are shown in the following images, where Figure 10a is from the VGG16 network, Figure 10b is from the inception model, Figure 10c is from the ResNet34 network, and Figure 10d is from the Tiny YOLO network. Figure 10a shows that the reached accuracy was below 0.8. Although this model used preloaded weights, it was not as accurate as the other models. This architecture was the basis for some of the proposed solutions for clothing recognition found in the literature reviews. Therefore, it was relevant to inspect the performance of this algorithm. Even though there was no difference between the accuracy from training and the validation, it had a low score in the accuracy metric compared with the other models.
The accuracy graph of the Inception model (Figure 10b) shows that the Inception model fares better in the accuracy metrics when compared with the VGG16 model but failed to reach a stable point within the 100 epochs. Therefore, this model required more epochs. Although, there was little difference between the validation and training accuracy, the ResNet18 and Tiny YOLOv3 models had better scores in both datasets. There were some differences in the number of examples containing people wearing jackets, shirts, and trousers because it was relevant to discern between a highly insulating jacket and a regular jacket. Any shoe that covered the ankle was labeled as a highly insulating shoe. Furthermore, no differentiation for sandals was made. Every dress was labeled as a skirt due to its similarity with the skirt's shape. Besides, the objective was to have fewer classes to have less training time, and thus we prioritized recognizing several parts of a clothing ensemble such as shirts, jackets, shoes, and skirts that are more common and have different clothing insulation values.

Object Classifiers Comparison
The validation and training accuracy graphs for all models are shown in the following images, where Figure 10a is from the VGG16 network, Figure 10b is from the inception model, Figure 10c is from the ResNet34 network, and Figure 10d is from the Tiny YOLO network. Figure 10c reveals that the ResNet algorithm reached a perfect training accuracy, but the validation accuracy was barely above 90%. Hence, overfitting needs to be considered, and implementations of dropout can improve the model. Moreover, early stopping can be considered because the best accuracy for the validation was in the first epochs. Figure 10d shows that the Tiny YOLO made from scratch had a good accuracy but that there was overfitting since the top accuracy score was reached in the first epochs and the difference between training and validation accuracy was greater than 5%. Therefore, a dropout layer with a 50% drop rate was implemented at the middle of the hidden layers.  Figure 11 presents the result of the dropout layer implementation, showing that there was no discerning difference between the results obtained with and without dropout. Therefore, for this implementation, the difference in accuracy scores was not enough to consider dropout, and the test dataset results needed to be analyzed. Besides, this may also indicate the need for a bigger dataset.
As the accuracy may be a misleading metric, a confusion matrix was employed to make a more complete comparison.  Figure 10a shows that the reached accuracy was below 0.8. Although this model used preloaded weights, it was not as accurate as the other models. This architecture was the basis for some of the proposed solutions for clothing recognition found in the literature reviews. Therefore, it was relevant to inspect the performance of this algorithm. Even though there was no difference between the accuracy from training and the validation, it had a low score in the accuracy metric compared with the other models.
The accuracy graph of the Inception model (Figure 10b) shows that the Inception model fares better in the accuracy metrics when compared with the VGG16 model but failed to reach a stable point within the 100 epochs. Therefore, this model required more epochs. Although, there was little difference between the validation and training accuracy, the ResNet18 and Tiny YOLOv3 models had better scores in both datasets. Figure 10c reveals that the ResNet algorithm reached a perfect training accuracy, but the validation accuracy was barely above 90%. Hence, overfitting needs to be considered, and implementations of dropout can improve the model. Moreover, early stopping can be considered because the best accuracy for the validation was in the first epochs. Figure 10d shows that the Tiny YOLO made from scratch had a good accuracy but that there was overfitting since the top accuracy score was reached in the first epochs and the difference between training and validation accuracy was greater than 5%. Therefore, a dropout layer with a 50% drop rate was implemented at the middle of the hidden layers. Figure 11 presents the result of the dropout layer implementation, showing that there was no discerning difference between the results obtained with and without dropout. Therefore, for this implementation, the difference in accuracy scores was not enough to consider dropout, and the test dataset results needed to be analyzed. Besides, this may also indicate the need for a bigger dataset.
Energies 2022, 15, x FOR PEER REVIEW 15 of Figure 11. Tiny YOLO with 50% dropout. Figure 12 depicts the confusion matrices. The model seems to have problems detec ing the T-shirt/top, pullover, coat, and shirt classes. The shirt class had more errors be cause the model misclassified clothing items as a shirt. Hence, this model was sensitiv towards the shirt class. Moreover, all the models presented this problem, because it wa difficult to separate the shirt class from the T-shirt/top class and some of the coat clas examples.
The confusion matrices show that even though ResNet and Tiny YOLO seemed t have better results for the accuracy metric than the Inception model, Inception seemed t perform better in the test dataset. So, to finish this comparison, we consider the numer values side by side to be able to have a better look at the differences between the model (a) (b) Figure 11. Tiny YOLO with 50% dropout.
As the accuracy may be a misleading metric, a confusion matrix was employed to make a more complete comparison. Figure 12 depicts the confusion matrices. The model seems to have problems detecting the T-shirt/top, pullover, coat, and shirt classes. The shirt class had more errors because the model misclassified clothing items as a shirt. Hence, this model was sensitive towards the shirt class. Moreover, all the models presented this problem, because it was difficult to separate the shirt class from the T-shirt/top class and some of the coat class examples.
The confusion matrices show that even though ResNet and Tiny YOLO seemed to have better results for the accuracy metric than the Inception model, Inception seemed to perform better in the test dataset. So, to finish this comparison, we consider the numeric values side by side to be able to have a better look at the differences between the models. Table 4 shows that Tiny YOLO and ResNet18 performed better in the training and validation stages than the other models. The testing accuracy was below that of the Inception model, but this model extracted better features, as confirmed by the recall value. Therefore, the Inception model was the best model in terms of recognizing clothing garments, but it used preloaded weights. Hence, for real-world implementation, a trained Tiny YOLO was created using Linux commands and used preloaded weights to make a fair comparison. Nevertheless, these commands did not offer a way to see the accuracy in the different datasets used for training, validation, and accuracy determination. Therefore, the testing images were used to compare the models. These images were considered due to the complexity of the postures or objects in front of the individuals.
A test on five different images that were not part of the datasets was used to test the real implementation of the CNN models since the objective was to train a CNN model with ing the T-shirt/top, pullover, coat, and shirt classes. The shirt class had more errors because the model misclassified clothing items as a shirt. Hence, this model was sensitive towards the shirt class. Moreover, all the models presented this problem, because it was difficult to separate the shirt class from the T-shirt/top class and some of the coat class examples.
The confusion matrices show that even though ResNet and Tiny YOLO seemed to have better results for the accuracy metric than the Inception model, Inception seemed to perform better in the test dataset. So, to finish this comparison, we consider the numeric values side by side to be able to have a better look at the differences between the models.   Figure 13a shows an individual with a seated posture and lighter garments, Figure 13b shows an individual with reclined posture with a jacket, Figure 13c shows a reading posture with a highly insulated jacket. Figure 13d shows a model in a standing posture with sandals and lighter garments. Figure 13e shows an individual in a writing posture with lighter garments. Table 5 shows the predicted classes, separating between the top choice for the model and the other possible classes, according to how close the probabilities for the top class were, considering a threshold of 10%. The final column has the time in milliseconds it took for the model to produce the classification. Tiny YOLO from scratch (Tiny YOLOs), Tiny YOLO from Linux (Tiny YOLOl), and the Inception model produced more than one classification. Nonetheless, they had problems differentiating between the T-shirt/top, shirt, coat, and pullover classes.  Table 5 shows the predicted classes, separating between the top choice for the model and the other possible classes, according to how close the probabilities for the top class were, considering a threshold of 10%. The final column has the time in milliseconds it took for the model to produce the classification. Tiny YOLO from scratch (Tiny YOLOs), Tiny YOLO from Linux (Tiny YOLOl), and the Inception model produced more than one classification. Nonetheless, they had problems differentiating between the T-shirt/top, shirt, coat, and pullover classes.   Moreover, these three models managed with these images to produce multiple classes of classification in most images. Unfortunately, none of them were consistent, possibly due to the lack of additional information from the Fashion MNIST dataset. Besides, these models can be used for clothing ensemble recognition. Nevertheless, they had problems making the correct classifications; thus, bounding boxes were required.
Furthermore, real-time detection algorithms considered the YOLO or Tiny YOLO architecture. The Inception algorithm could present problems for real-time implementation due to the average recognition time. Therefore, the Tiny YOLO model was considered since it tested the possibility of obtain a garment ensemble classifier by using a dataset with more information. Figure 14 depicts the labeled dataset with bounding boxes for the YOLO and Tiny YOLO models using a threshold of 0.1; then, the threshold was increased for the YOLO model to 0.4 and 0.5. Figure 14a, shows that the sofa was misclassified as a skirt with a 51% probability. Figure 14b shows that the model only classified the shirt and jacket. As these figures show, we found that a threshold of 0.1 was labeling the sofa and floor, and thus the threshold was increased to 0.4. Hence, Figure 14c shows that the model recognized the jacket, trousers, and shoes but misclassified the coach. Moreover, Figure 14d shows that the model recognized three garments; thus, another test was made by increasing the threshold up to 0.5. Therefore, as shown in Figure 14e,f, the model recognized two clothing garments. This threshold was needed because this algorithm proposes the classification and the bounding boxes for recognized objects. Furthermore, this threshold value was considered to avoid detected objects that were not contained in any of the classes, such as the sofa or the floor.  Figure 15 depicts the best and worst video results for the YOLO and Tiny YOLO models with 0.4 thresholds. Screenshots were taken to produce the results and show them in this paper. Figure 15a,b show that the model misclassified the shirt as a skirt because the model understood that this type of shirt seemed more like a skirt. However, Figure  15a shows that the model reflected the best classification for the YOLO model. In Figure  15c, the Tiny YOLO model classified correctly the highly insulated jacket. Nevertheless, Figure 15d shows that the model misclassified the sofa as trousers. Therefore, the threshold was decreased up to 0.2 to review if there were more classifications that the model obtained for multiple garment detections. Since the threshold was lower, there were more The Tiny YOLO model classified the garments with a threshold of 0.1. Figure 14g shows that the model properly classified all the clothes. Figure 14h shows that the model misclassified the trousers. Figure 14i shows that the model failed in classifying the garments except for the trousers. Figure 14j shows that the model misclassified the sofa as a skirt and trousers. Figure 14k shows that the model wrongly classified the laptop as a shirt. Figure 15 depicts the best and worst video results for the YOLO and Tiny YOLO models with 0.4 thresholds. Screenshots were taken to produce the results and show them in this paper. Figure 15a,b show that the model misclassified the shirt as a skirt because the Energies 2022, 15, 1811 20 of 28 model understood that this type of shirt seemed more like a skirt. However, Figure 15a shows that the model reflected the best classification for the YOLO model. In Figure 15c, the Tiny YOLO model classified correctly the highly insulated jacket. Nevertheless, Figure 15d shows that the model misclassified the sofa as trousers. Therefore, the threshold was decreased up to 0.2 to review if there were more classifications that the model obtained for multiple garment detections. Since the threshold was lower, there were more resulting images. Figure 15e- Figure 15e shows that the model misclassified the sofa as trousers. Figure 15f shows that the model misclassified half of the scene as a highly insulated jacket, some books as shoes, and the shirt as a skirt. Figure 15g shows that the model did not recognize the individual's garments and misclassified the sofa as trousers and highly insulated shoes. Figure 15h shows that the model misclassified the sofa as a shirt, the floor, the shirt, and the shoes as trousers.

Study Case: Clothing Insulation Real-Time Analysis Applied on Thermostats
The clothing insulation values are shown in Table 3. Moreover, since the previous video testing did not produce proper classifications, multiple users, garments, and posture were tested on a TV show. Nevertheless, to avoid any copyright problems, these images are not displayed here. Hence, the results were as follows: The YOLO model had problems detecting the garments with darker objects, but with clearer objects, it produced a full clothing classification; o The Tiny YOLO model did not detect multiple clothing garments and incorrectly classified hair as a hat, and it did not detect darker objects.  Figure 15e shows that the model misclassified the sofa as trousers. Figure 15f shows that the model misclassified half of the scene as a highly insulated jacket, some books as shoes, and the shirt as a skirt. Figure 15g shows that the model did not recognize the individual's garments and misclassified the sofa as trousers and highly insulated shoes. Figure 15h shows that the model misclassified the sofa as a shirt, the floor, the shirt, and the shoes as trousers. Figure 15i shows that the model classified the garment as a highly insulated jacket and, due to the shape, also as a skirt. Figure 15j shows that the model correctly classified the shows and the shirt, and even suggested that it could be a skirt; nevertheless, it misclassified the trousers as highly insulated shoes or a skirt. Figure 15k shows that the model correctly classified the clothing as a highly insulated jacket but misclassified it as a trouser. Figure 15l shows that the model classified the shirt and trousers; however, the model considered the trousers to include the sofa.

Study Case: Clothing Insulation Real-Time Analysis Applied on Thermostats
The clothing insulation values are shown in Table 3. Moreover, since the previous video testing did not produce proper classifications, multiple users, garments, and posture were tested on a TV show. Nevertheless, to avoid any copyright problems, these images are not displayed here. Hence, the results were as follows: • 0.4 threshold: The YOLO model had problems detecting the garments with darker objects, but with clearer objects, it produced a full clothing classification; The Tiny YOLO model did not detect multiple clothing garments and incorrectly classified hair as a hat, and it did not detect darker objects.
• 0.2 threshold: The YOLO model showed incorrect classifications or multiple classifications for a single object. However, the YOLO model classified multiple clothing garments and produce more correct classifications than the Tiny YOLO model; The Tiny YOLO made multiple clothing garment classifications, but it misclassified darker objects.
Hence, an Azure Kinect DT was employed to test the clo value in real-time. This test was oriented toward clothing insulation classification. Thus, the bounding boxes had color values depending on the clo with this assumption: • Warmer clothing garments were closer to the red color of the bounding box; colder clothing garments were closer to the blue color.
However, the Tiny YOLO model did not provide noteworthy results for multiple clothing garments recognition. Consequently, Figure 16 depicts the YOLO model results. Figure 16a shows that the model correctly classified the garments, giving a clo value of 0.32; nevertheless, it did not recognize the highly insulated jacket. Figure 16b shows that the model considered the highly insulating jacket, giving a clo value of 0.72. Figure 16c shows that the model accurately classified the highly insulated jacket and the trousers, but it did not classify the shirt. Figure 16d shows that the model correctly classified all the garments.
The total HVAC consumption for the living room zone was 3952 kWh. The cooling setpoint was 24.4 • C, and the heating setpoint was 21.7 • C. After increasing by 1 • C the cooling setpoint and decreasing by 1 • C the heating setpoint, the HVAC consumption was 2923.7 kWh. Figure 17 depicts the monthly chart of HVAC kWh consumption before and after increasing or decreasing the setpoint. There were monthly reductions that went from 18% to 47%. Nevertheless, strategies in the thermostat interface need to engage the householder to reduce energy consumption without losing thermal comfort.
Thus, Figure 18 displays the interface on three different dates and the required actions to reduce energy consumption:  The total HVAC consumption for the living room zone was 3952 kWh. The cooling setpoint was 24.4 °C, and the heating setpoint was 21.7 °C. After increasing by 1 °C the cooling setpoint and decreasing by 1 °C the heating setpoint, the HVAC consumption was 2923.7 kWh. Figure 17 depicts the monthly chart of HVAC kWh consumption before and after increasing or decreasing the setpoint. There were monthly reductions that went from 18% to 47%. Nevertheless, strategies in the thermostat interface need to engage the householder to reduce energy consumption without losing thermal comfort.  The total HVAC consumption for the living room zone was 3952 kWh. The cooling setpoint was 24.4 °C, and the heating setpoint was 21.7 °C. After increasing by 1 °C the cooling setpoint and decreasing by 1 °C the heating setpoint, the HVAC consumption was 2923.7 kWh. Figure 17 depicts the monthly chart of HVAC kWh consumption before and after increasing or decreasing the setpoint. There were monthly reductions that went from 18% to 47%. Nevertheless, strategies in the thermostat interface need to engage the householder to reduce energy consumption without losing thermal comfort. 1. 10 July at 4:00 p.m. (Figure 18a): increase the setpoint by 1 °C and wear lightweight clothes to reduce the HVAC consumption; 2. 8 December at 9:00 p.m. (Figure 18b): decrease the setpoint by 1 °C and wear the same clothes; 3. 8 February at 8:00 p.m. (Figure 18c): decrease the setpoint by 1 °C and wear warmer clothes.

Discussion
The Fashion MNIST dataset helped as a guideline for the new dataset images. Therefore, the new dataset fitted the models' input parameters. Figure 5 shows that the printed images were correctly labeled and there was no clear bias towards a certain class after the dataset division. Accordingly, the datasets were ready to train all the CNN models.
In terms of the overall behavior, the models presented problems with the shirt, Tshirt/top, and coat classes due to the dataset containing dresses labeled as skirts. This labeling was performed to have the minimum number of classes to make the training process as efficient as possible because the increment of one class relied on 2000 more iterations for the training. Consequently, more examples are needed to avoid this confusion and improve the classifier.
The real-time implementations were successful. The real-time test for the YOLO model successfully recognized the clo values for each item of clothing and even managed to produce results in a close-up. However, at certain times, it had difficulties differentiating overlapping garments. Thus, more examples with these considerations are required. The Tiny YOLO model misclassified some garments; hence, more training images are required to make this model more robust.
Another factor to consider is that the Tiny YOLO model seemed to have no problem with computational power, but the YOLO model slowed down the real-time feed of the video. Hence, the model requires certain hardware characteristics to be successfully implemented in real-time.
Besides, real-time feedback, monitoring, and the interaction between the interface, the thermostat, and the householder allow actions to promote energy reductions without losing thermal comfort. Thus, householders can receive suggestions to increase comfort and save energy. Furthermore, to deeply understand thermal comfort and how it affects the environment and householder preferences, it is relevant to understand the type of user that is behind the interface, their preferences, and their location because their behavior will depend on other factors such as gender, age, country, culture, and fashion style, among others.

Conclusions
The results from the model comparison showed that the feature extraction architecture of the Tiny YOLO algorithm was on par with other image classifiers' architectures and can be used as a clothing ensemble classifier since it produced multiple clothing classifications, and it produced accuracy percentages over 90% in all three datasets, which was the objective for this project.
However, it failed to obtain better results in the independent images because the Fashion MNIST dataset had insufficient information to differentiate between the shirt class, the coat class, and the T-shirt/top class. The Tiny YOLO model only achieved 73% of correct classifications on that class, and of the remaining 27%, only 11% was misclassified as the T-shirt/top class. Thus, more information on these classes is needed since this was observed for all models, not only Tiny YOLO.
Hence, the Fashion MNIST dataset was not good enough for use as a clothing ensemble classifier since the models trained with it failed to produce more than two correct classifications in a single testing image or even obtain 90% accuracy in the independent image tests.
The YOLO model classified at least three clothing garments in real-time, but the Tiny YOLO model only produced one clothing classification 5% of the time. Hence, the YOLO model improves upon the state of the art because it outperformed the other models, giving up to four different clothing garment classifications, and consequently resulting in entire clothing ensemble recognition.
For use as a real-time clothing classifier, the YOLO algorithm is ideal as it produced results over 95% of the time in the real-time test with a threshold value of 0.5. This value was the highest obtained in the literature review for real-time implementations. Nevertheless, the Tiny YOLO model required more training examples and a greater variety of images to achieve similar results to the YOLO algorithm.
The results revealed that the new dataset proved that the model was more effective and accurate than the one trained with an existing dataset. In this new dataset, the images contained different postures to try to cover all possibilities since it hinders the accuracy of the model when the person is not in a standing posture. Furthermore, darker environment pictures need to be considered to avoid incorrect detection and classification.
Alternatively, the transfer learning method exposed that it is not ideal if the weights come from a model that is trained with a very specific dataset. Finally, the image classifier was implemented for the clothing insulation classifier during the thermal comfort calculations. Currently, the clothing garments range from 0.04 clo to 0.74 (Table 3); however, these values can be increased. Moreover, an initial assumption for underwear should be made.
The clothing insulation values are provided in the entire video; therefore, any possible changes that occur in front of the camera can be captured and considered for thermal comfort calculation, but this still leaves a gap since the system is not able to recognize the underwear the user is wearing along with any other clothing garment that the camera cannot see, making these readings inaccurate but still better than a constant value.
Hence, the clo value can be calculated in real-time, and these moving values of clothing insulation can be used in a human-machine interface, where changes in clothing garments are proposed to keep a clothing insulation value constant and allow the user to stay inside the thermal comfort ranges, but these can be equally distributed along the entire body to avoid the user feeling warm or cold due to unbalanced clothing insulation distribution.
The batch normalization eliminated the need for the dropout technique since the Tiny YOLO architecture with dropout implementation was the same as the one without it and there was no change in the accuracy scores on the training and validation datasets.