Accuracy Analysis of DNN-Based Pose-Categorization Model and Activity-Decision Algorithm

: The objective of this study is to develop (1) a pose-categorization model that classiﬁes the poses of an occupant based on their image in an indoor space and (2) an activity-decision algorithm that identiﬁes the activity being performed by the occupant. For developing an automated intelligent model, a deep neural network is adopted. The model considers the coordinates of the joints of the occupant in the image as input data and returns the pose of the occupant. Datasets composed of indoor images of home and o ﬃ ce environments are used for training and testing the model. The training and testing accuracies of the optimized model were 100% for both the home and o ﬃ ce environments. A representative activity of an occupant for a certain period has to be decided to control an indoor environment for comfort. The activity-decision algorithm employs a frequency-based method to determine the representative activity type for real-time occupant poses using the pose-categorization model. This study highlights the potential of the developed model and algorithm to determine the activity of occupants to provide an optimal thermal environment corresponding to the individual’s metabolic rate.


Introduction
The quality of indoor environments has significant effects on the health and productivity of occupants [1,2], and the desire of occupants for a pleasant indoor environment is continuously increasing.Thus, improvement of indoor environment quality through management is gaining importance.The indoor environment quality is determined by factors such as thermal comfort, indoor air quality, acoustic quality, and light quality.Thermal comfort felt by the occupants considers factors such as the temperature, relative humidity, air velocity, and mean radiant temperature.
The predicted mean vote (PMV) proposed by Fanger [3] represents the human sensations of thermal comfort in an integrated way including both environmental and personal factors; it is one of the most widely known indices.The PMV sets the thermal neutrality to zero and presents the thermal comfort of occupants quantitatively on a seven-stage numerical scale ranging from −3 (cold) to 3 (hot).The satisfaction range of thermal comfort based on the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) standard 55 [4] is −0.5 < PMV < 0.5.The PMV considers six main factors that include both environmental and personal factors.The environmental factors are the air temperature, relative humidity, air velocity, and mean radiant temperature, and the personal factors are the metabolic rate and clothing insulation.Here, 1 met = 58 W/m 2 and 1 clo = 0.155 m 2 × K/W.
The metabolic rate is the rate of heat generation from the body surface of an individual due to activities maintaining heat balance.Figure 1 presents the effect of metabolic rate on the PMV calculation by comparing the difference of PMV according to activity.The environmental factors are assumed in the general range to satisfy thermal comfort as an air temperature of 25 • C, a relative humidity of 30%, and mean radiant temperature of 25.In ISO 7730 of the International Organization for Standardization and a number of studies [5][6][7], the relative air velocity was taken into account for accurate PMV calculations.Therefore, the value of air velocity which is assumed as 0.1 m/s is adjusted to relative air velocity by referring to the equation of d'Ambrosio Alfano [6].
insulations are 0.5 clo and 1.0 clo, respectively, which assumes two different conditions of the occupant clothing situations.
In accordance with the ASHRAE 55 standard, four types of general activities were selected: "Sleeping" (0.7 met), "Seated.quiet"(1.0 met), "Standing.relaxed"(1.2 met), and "Walking about" (1.7 met).As the metabolic rate changed from 0.7 to 1.7 in 0.5 clo and 1.0 clo, the maximum PMV difference was as large as 3.64 and 2.82, respectively.In addition, although the indoor environment factors such as temperature and humidity satisfied the thermal comfort conditions, the PMV may have been outside the comfort range (−0.5 < PMV < 0.5) according to the metabolic rate.This indicates that the metabolic rate has a significant influence on the thermal comfort in an actual indoor environment and that an accurate method for measuring the metabolic rate is necessary.Metabolic rate is measured based on various methods such as activity classification, heart rate, acceleration, and so forth.A number of studies have been conducted to measure human metabolic rates using an accelerometer [10][11][12].Particularly, Kozey et al. [11] measured physical activities of 277 participants by an accelerometer according to various physical and gender conditions.However, metabolic rate calculation using an accelerometer is only possible when the occupant is moving; there are limitations in distinguishing sedentary activities such as writing and typing.
International standard ISO 8996 [13] classifies the measurement methods for the metabolic rate into four levels: screening (level 1), observation (level 2), analysis (level 3), and expertise (level 4).The accuracy of each level was presented at at 20% for level 2, at 10% for level 3, and at 20% for level 4. Higher levels such as analysis (level 3) and expertise (level 4) correspond to more accurate methods for measuring the metabolic rate.They use information directly measured from the body such as heart rate and oxygen consumption.
Studies have been conducted to develop heart-rate-based methods (level 3) for calculating the metabolic rate [14][15][16].Hasan et al. [14] measured heart rate by using a wearable device and According to ISO 9920 [8] and Havenith et al. [9], clothing insulation is affected by the relative air velocity such as an occupant's movement or wind speed.However, the purpose of Figure 1 is only intended to identify the impact of the metabolic rate on the PMV calculation.To clarify the effect of metabolic rate on PMV calculations, fixed values of clothing insulation were applied.The clothing insulations are 0.5 clo and 1.0 clo, respectively, which assumes two different conditions of the occupant clothing situations.
In accordance with the ASHRAE 55 standard, four types of general activities were selected: "Sleeping" (0.7 met), "Seated.quiet"(1.0 met), "Standing.relaxed"(1.2 met), and "Walking about" (1.7 met).As the metabolic rate changed from 0.7 to 1.7 in 0.5 clo and 1.0 clo, the maximum PMV difference was as large as 3.64 and 2.82, respectively.In addition, although the indoor environment factors such as temperature and humidity satisfied the thermal comfort conditions, the PMV may have been outside the comfort range (−0.5 < PMV < 0.5) according to the metabolic rate.This indicates that the metabolic rate has a significant influence on the thermal comfort in an actual indoor environment and that an accurate method for measuring the metabolic rate is necessary.
Metabolic rate is measured based on various methods such as activity classification, heart rate, acceleration, and so forth.A number of studies have been conducted to measure human metabolic rates using an accelerometer [10][11][12].Particularly, Kozey et al. [11] measured physical activities of 277 participants by an accelerometer according to various physical and gender conditions.However, metabolic rate calculation using an accelerometer is only possible when the occupant is moving; there are limitations in distinguishing sedentary activities such as writing and typing.
International standard ISO 8996 [13] classifies the measurement methods for the metabolic rate into four levels: screening (level 1), observation (level 2), analysis (level 3), and expertise (level 4).The accuracy of each level was presented at at 20% for level 2, at 10% for level 3, and at 20% for level 4. Higher levels such as analysis (level 3) and expertise (level 4) correspond to more accurate methods for measuring the metabolic rate.They use information directly measured from the body such as heart rate and oxygen consumption.
Studies have been conducted to develop heart-rate-based methods (level 3) for calculating the metabolic rate [14][15][16].Hasan et al. [14] measured heart rate by using a wearable device and compared the measurements with the constant metabolic rate value.In the best case, 80% accuracy was achieved.
In a similar study, Lee et al. [15] compared the calculated metabolic rate based on the heart rate from a sensor that was obtained via a location-based method.Calvaresi et al. [10] calculated the metabolic rate by wearing a chest strap with a multi-parametric device that measures heart rate, breathing rate, vector magnitude units, and acceleration.In spite of the accuracy of the method using the heart rate, individual variations occurred as the intensity of the activity increased [16] and the accuracy of the calculated metabolic rate depended on emotional conditions and stress levels [17].
For more precise metabolic rate calculation, studies on the expertise (level 4) method were conducted using oxygen consumption and doubly labeled water.Usually, the occupant respiration was measured using a wearable device such as a face mask covering the mouth and nose to collect exhaled gas [18].Ji et al. [19] measured the metabolic rate according to the CO 2 concentration changes of the subject in an airtight chamber and compared the accuracy with that of the heart-rate-based method (level 3).It was confirmed that the CO 2 production method provides easier and more accurate measurements of the metabolic rate than those provided by the heart-rate-based method.
To acquire occupant information for levels 3 and 4, it is essential to attach additional instruments to the human body that can be sometimes inconvenient and cumbersome.For example, devices such as a smart wrist ring (level 3) and a mask (level 4) must always be attached to the body for collecting data.Data for calculating the metabolic rate can be missing owing to improper device usage such as desorption.In particular, level 4, which is the most accurate method, is more applicable to experimental settings than to actual building environments.
For applicability to actual buildings, there is a need for the estimation of the metabolic rate of occupants without complex systems or devices.Observation (level 2) is a simpler method that employs the occupant activity for determining the metabolic rate [20].Typically, a specific activity type in a space (e.g., "seated with typing" in the office) is determined by observation; then, the tabulated value for the activity is determined as the metabolic rate of the occupant.Despite its simplicity, the observation method has not been successfully applied for metabolic rate.Because the observation is judged by experts, it is difficult to be accomplished automatically without experts.Therefore, the activity in a space is assumed as a specific behavior and the metabolic rate is employed as a fixed value corresponding to the assumed activity.However, if there is an accurate and simple device that can recognize the actual activity of the occupants, the possibility of simply applying the observation method can be secured.
Recently, intelligent systems recognizing human faces or gestures by training images has been developing [21], and the performance of classifiers adopting machine learning techniques such as deep learning has been improved [22].With an intelligent system that can automatically classify the actual occupant activity in real time, the observation method can be a practical and convenient solution.The intelligent system would provide the actual occupant activity, and then, the actual activity would be compared with the tabulated values for determining the metabolic rate value.
The objective of the present study is to develop a method that can automatically estimate the actual activity of the occupant for application to the calculation of metabolic rate.For this, a pose-categorization model and an activity-decision algorithm are developed.The pose-categorization model employs a deep neural network (DNN) for classifying the occupant poses using indoor images, and an activity-decision algorithm is designed to determine the representative activity based on the categorized real-time poses.This intelligent and automated method for estimating occupant activity provides a simple and practical solution that can be a basis for determining the metabolic rate of the occupant.

Pose-Categorization Model
The pose-categorization process consists of two steps: 1.
A model developed in a previous study [23] is used to produce the coordinates of the 14 major human joints from the occupant image, and 2.
the pose-categorization model developed in the present study classifies the poses using the coordinates from the preceding model.The preceding model for extracting 14 human joints from images is presented in Section 2.1.1.

Preceding Model: Articulated Pose Estimation
A preceding model for estimating an articulated pose of a human was developed by Han [18] wherein the coordinates of the major body joints were determined.The model was trained with various human-activity images, and it output 14 major joints that included the forehead, neck, shoulders, elbows, wrists, hips, knees, and ankles.
The structure of the preceding model includes a residual block with a DNN structure for accuracy improvement.The process of estimating the articulated pose is shown in Figure 2. The input images pass through the image feature extraction stage.According to the extracted image features, the 14 pairs of absolute coordinate values for each joint are output through the 14 joint-estimating blocks.The output of the preceding model is used as input data for the pose-categorization model.
Energies 2020, 13, 839 4 of 14 2. the pose-categorization model developed in the present study classifies the poses using the coordinates from the preceding model.The preceding model for extracting 14 human joints from images is presented in Section 2.1.1.

Preceding Model: Articulated Pose Estimation
A preceding model for estimating an articulated pose of a human was developed by Han [18] wherein the coordinates of the major body joints were determined.The model was trained with various human-activity images, and it output 14 major joints that included the forehead, neck, shoulders, elbows, wrists, hips, knees, and ankles.
The structure of the preceding model includes a residual block with a DNN structure for accuracy improvement.The process of estimating the articulated pose is shown in Figure 2. The input images pass through the image feature extraction stage.According to the extracted image features, the 14 pairs of absolute coordinate values for each joint are output through the 14 joint-estimating blocks.The output of the preceding model is used as input data for the pose-categorization model.

Development of Pose-Categorization Model
The pose-categorization model developed in this study classifies poses based on the coordinates of joints of the occupant in the image.For an automated and intelligent system, the DNN, which is widely used in various building fields, is applied [24][25][26].The types of poses are selected based on indoor activities that mainly occur at home and in the office according to the ASHRAE 55 standard Table 5.2.1.2Metabolic Rates for Typical Tasks [4].Four home activities are selected: "Sleeping", "Reclining", "Seated.quiet",and "Standing.relaxed".Because of the sedentary environment, the office activities include two sitting activities: "Seated.quiet"and "Typing"and one standing activity: "Standing.relaxed".Images corresponding to home and office activities were collected through direct shooting or from the internet.There are 30 collected images for each activity about various subjects.The number of images is increased by a factor of 112 by normalizing the image size to 128 × 128 pixels and by performing augmentation by reversing and rotating the images.Thus, the total number of images is 16,800 (5 activities × 30 × 112).Figure 3 shows sample images for each activity from the collected dataset.

Development of Pose-Categorization Model
The pose-categorization model developed in this study classifies poses based on the coordinates of joints of the occupant in the image.For an automated and intelligent system, the DNN, which is widely used in various building fields, is applied [24][25][26].The types of poses are selected based on indoor activities that mainly occur at home and in the office according to the ASHRAE 55 standard Table 5.2.1.2Metabolic Rates for Typical Tasks [4].Four home activities are selected: "Sleeping", "Reclining", "Seated.quiet",and "Standing.relaxed".Because of the sedentary environment, the office activities include two sitting activities: "Seated.quiet"and "Typing"and one standing activity: "Standing.relaxed".Images corresponding to home and office activities were collected through direct shooting or from the internet.There are 30 collected images for each activity about various subjects.The number of images is increased by a factor of 112 by normalizing the image size to 128 × 128 pixels and by performing augmentation by reversing and rotating the images.Thus, the total number of images is 16,800 (5 activities × 30 × 112).Figure 3 shows sample images for each activity from the collected dataset.
The structure of the pose-categorization model and the applied parameters are as follows.The model adopts an optimized DNN structure that exhibits an optimal performance for classifying 10 classes [27].As shown in Figure 4, the structure comprises one input layer, four hidden layers, and one output layer.Because the 28 values from the 14 pairs of joint coordinates (x, y) are used as input data, the number of input neurons is 28 and the number of hidden neurons is 140-112-84-56.For the training, the cross entropy and Adam optimizer are used as error and optimization functions, respectively.Model parameters such as the rectified linear unit (ReLU), batch normalization, and drop-out method are employed.The structure of the pose-categorization model and the applied parameters are as follows.The model adopts an optimized DNN structure that exhibits an optimal performance for classifying 10 classes [27].As shown in Figure 4, the structure comprises one input layer, four hidden layers, and one output layer.Because the 28 values from the 14 pairs of joint coordinates (x, y) are used as input data, the number of input neurons is 28 and the number of hidden neurons is 140-112-84-56.For the training, the cross entropy and Adam optimizer are used as error and optimization functions, respectively.Model parameters such as the rectified linear unit (ReLU), batch normalization, and drop-out method are employed.The real-time operation of the pose-categorization model in the room is as follows.A single occupant image of the room is collected using a camera sensor.The camera sensor is combined with a small hardware instrument (Raspberry Pi), and the images of the occupant are only used as an input data of the model in a closed loop to prevent information leakage and to avoid privacy issues.The joint coordinates of the occupant in the collected images are extracted using the preceding model.Then, the pose-categorization model uses the joint coordinates as input data to classify the poses in real time.
The pose-categorization model is trained and tested for home and office activities.The training dataset, which is the collected images, is separated as a valid dataset for verification.The valid dataset constitutes 20% of the training dataset; thus, the number of images in the training and valid datasets for each activity are 2688 and 672, respectively.The valid dataset preventing data-sampling bias is   The structure of the pose-categorization model and the applied parameters are as follows.The model adopts an optimized DNN structure that exhibits an optimal performance for classifying 10 classes [27].As shown in Figure 4, the structure comprises one input layer, four hidden layers, and one output layer.Because the 28 values from the 14 pairs of joint coordinates (x, y) are used as input data, the number of input neurons is 28 and the number of hidden neurons is 140-112-84-56.For the training, the cross entropy and Adam optimizer are used as error and optimization functions, respectively.Model parameters such as the rectified linear unit (ReLU), batch normalization, and drop-out method are employed.The real-time operation of the pose-categorization model in the room is as follows.A single occupant image of the room is collected using a camera sensor.The camera sensor is combined with a small hardware instrument (Raspberry Pi), and the images of the occupant are only used as an input data of the model in a closed loop to prevent information leakage and to avoid privacy issues.The joint coordinates of the occupant in the collected images are extracted using the preceding model.Then, the pose-categorization model uses the joint coordinates as input data to classify the poses in real time.
The pose-categorization model is trained and tested for home and office activities.The training dataset, which is the collected images, is separated as a valid dataset for verification.The valid dataset constitutes 20% of the training dataset; thus, the number of images in the training and valid datasets for each activity are 2688 and 672, respectively.The valid dataset preventing data-sampling bias is The real-time operation of the pose-categorization model in the room is as follows.A single occupant image of the room is collected using a camera sensor.The camera sensor is combined with a small hardware instrument (Raspberry Pi), and the images of the occupant are only used as an input data of the model in a closed loop to prevent information leakage and to avoid privacy issues.The joint coordinates of the occupant in the collected images are extracted using the preceding model.Then, the pose-categorization model uses the joint coordinates as input data to classify the poses in real time.
The pose-categorization model is trained and tested for home and office activities.The training dataset, which is the collected images, is separated as a valid dataset for verification.The valid dataset constitutes 20% of the training dataset; thus, the number of images in the training and valid datasets for each activity are 2688 and 672, respectively.The valid dataset preventing data-sampling bias is divided into five different datasets for each training, as shown in Figure 5.If the five training results are similar, the data are not biased, thereby ensuring the reliability of the training.

Activity-Decision Algorithm
The purpose of this study includes providing a solution for applying the determined activity to the metabolic rate.Because the pose-categorization model classifies an activity of a specific moment from the image, the categorized pose is not apt to represent the actual activity.Thus, an activitydecision algorithm that can determine the actual activity from the accumulated pose results for a certain period is developed.
The activity-decision algorithm is designed to determine the representative activity from the data accumulated for a certain period of time by the pose-categorization model.Two methods are used to identify the representative activity: 1) the frequency method and 2) the average method.For the real-time dataset, images are collected once every 5 s for 1 h.This dataset is used for evaluating the algorithm in the cases of the two aforementioned methods.As shown in Figure 6, the representative value of the accumulated pose for 1 min is calculated using average and frequency values, and the results are compared to determine the method that is more appropriate to be applied to the algorithm.For the comparison, the activities derived from the pose-categorization model are replaced with the corresponding metabolic-rate values of each activity from Table 5.2.1.2provided by ASHRAE 55 (2017).The algorithm that more closely matches the metabolic rate of the actual activity is selected.The accuracies for the training and valid datasets are analyzed in Section 3.1.Additionally, for the real-time test dataset, which is not used in the model training, up to 720 images of each activity are collected after every 5 s for 1 h.The analysis method for the real-time test dataset involves the confusion matrix, F1 score, and receiver operating characteristic (ROC) curve.

Valid
The subjects in the images collected for the dataset consist of both males and females with an age range of 22-26 years (Table 1).The images of each subject are collected for 10 min for each activity.

Activity-Decision Algorithm
The purpose of this study includes providing a solution for applying the determined activity to the metabolic rate.Because the pose-categorization model classifies an activity of a specific moment from the image, the categorized pose is not apt to represent the actual activity.Thus, an activity-decision algorithm that can determine the actual activity from the accumulated pose results for a certain period is developed.
The activity-decision algorithm is designed to determine the representative activity from the data accumulated for a certain period of time by the pose-categorization model.Two methods are used to identify the representative activity: (1) the frequency method and (2) the average method.For the real-time dataset, images are collected once every 5 s for 1 h.This dataset is used for evaluating the algorithm in the cases of the two aforementioned methods.As shown in Figure 6, the representative value of the accumulated pose for 1 min is calculated using average and frequency values, and the results are compared to determine the method that is more appropriate to be applied to the algorithm.For the comparison, the activities derived from the pose-categorization model are replaced with the corresponding metabolic-rate values of each activity from Table 5.2.1.2provided by ASHRAE 55 (2017).The algorithm that more closely matches the metabolic rate of the actual activity is selected.

Accuracy of Pose-Categorization Model
The performance of the pose-categorization model was analyzed according to the results for the training dataset and the real-time test dataset.The cost, F1 score, and ROC curve were used for the analysis.
Figure 7 shows the training-cost results of the model with five different valid datasets.Figure 7a and b corresponds to the models trained for home and office activities, respectively.There was a difference in the speed at which the training converged, but after approximately 2000 steps, all five training cases indicated a cost close to zero.This suggests that all the models were trained well, without overfitting.On the basis of the training results, it was necessary to verify the performance of the model using the new real-time images.To evaluate the performance of the model for the real-time dataset, the F1 score and ROC curve that are commonly used to evaluate classification models were employed.
Tables 2 and 3 present the confusion matrices and the F1 scores of the results for the home-and office-activity datasets, respectively.As shown in the confusion matrices of Tables 2a and 3a, the accuracies of the model for home and office activities were 98.9% and 88.2%, respectively.In home activities, although errors occurred for "Standing.relaxed",other activities tended to be classified correctly.However, there were classification errors for the "Standing.relaxed"and "Typing" office activities; in particular, "Typing" was misclassified as "Seated.quiet",as both featured the sitting posture.
The F1 score is used for the comprehensive evaluation of precision and recall.A higher F1 score indicates that the precision and recall are similar and high.The average F1 scores for the home and office activities were 0.99 and 0.89 (close to 1.0), respectively; this is shown in Tables 2b and 3b,

Accuracy of Pose-Categorization Model
The performance of the pose-categorization model was analyzed according to the results for the training dataset and the real-time test dataset.The cost, F1 score, and ROC curve were used for the analysis.
Figure 7 shows the training-cost results of the model with five different valid datasets.Figure 7a,b corresponds to the models trained for home and office activities, respectively.There was a difference in the speed at which the training converged, but after approximately 2000 steps, all five training cases indicated a cost close to zero.This suggests that all the models were trained well, without overfitting.

Accuracy of Pose-Categorization Model
The performance of the pose-categorization model was analyzed according to the results for the training dataset and the real-time test dataset.The cost, F1 score, and ROC curve were used for the analysis.
Figure 7 shows the training-cost results of the model with five different valid datasets.On the basis of the training results, it was necessary to verify the performance of the model using the new real-time images.To evaluate the performance of the model for the real-time dataset, the F1 score and ROC curve that are commonly used to evaluate classification models were employed.
Tables 2 and 3 present the confusion matrices and the F1 scores of the results for the home-and office-activity datasets, respectively.As shown in the confusion matrices of Tables 2a and 3a, the accuracies of the model for home and office activities were 98.9% and 88.2%, respectively.In home activities, although errors occurred for "Standing.relaxed",other activities tended to be classified correctly.However, there were classification errors for the "Standing.relaxed"and "Typing" office activities; in particular, "Typing" was misclassified as "Seated.quiet",as both featured the sitting posture.
The F1 score is used for the comprehensive evaluation of precision and recall.A higher F1 score indicates that the precision and recall are similar and high.The average F1 scores for the home and office activities were 0.99 and 0.89 (close to 1.0), respectively; this is shown in Tables 2b and 3b, On the basis of the training results, it was necessary to verify the performance of the model using the new real-time images.To evaluate the performance of the model for the real-time dataset, the F1 score and ROC curve that are commonly used to evaluate classification models were employed.
Tables 2 and 3 present the confusion matrices and the F1 scores of the results for the homeand office-activity datasets, respectively.As shown in the confusion matrices of Tables 2a and  3a, the accuracies of the model for home and office activities were 98.9% and 88.2%, respectively.In home activities, although errors occurred for "Standing.relaxed",other activities tended to be classified correctly.However, there were classification errors for the "Standing.relaxed"and "Typing" office activities; in particular, "Typing" was misclassified as "Seated.quiet",as both featured the sitting posture.The F1 score is used for the comprehensive evaluation of precision and recall.A higher F1 score indicates that the precision and recall are similar and high.The average F1 scores for the home and office activities were 0.99 and 0.89 (close to 1.0), respectively; this is shown in Tables 2b and 3b, respectively.Both the accuracy and F1 score were higher for the home activities than for the office activities.For the office activities, the average F1 score was 0.89 and the lowest F1 score was 0.83 (for "Typing").The low F1 score for "Typing" was a result of the relatively low recall value indicating the correct classification of the actual "Typing" value.This is because there were many errors causing "Typing" to be classified as "Seated.quiet"."Typing" also involves a seated pose, and errors occurred because of the inability to recognize objects such as computers.Therefore, if objects can be distinguished, the classification accuracy of the model can be improved.
The ROC curve depicts the true-positive rate (TPR) with respect to the false-positive rate (FPR), which are known as the recall and 1-specificity, respectively.A random classifier illustrates a graph of y = x as an ROC curve, and better classification performance corresponds to the graph tending toward the top left.The performance is indicated by the area under the curve (AUC).An AUC close to 1.0 indicates high performance.Figure 8 shows the ROC curves for each home and office activity.The performance of binary classification for each activity was verified.As shown in Figure 8a, the AUC for each of the four home activities was close to 1.00, which is congruent with high accuracy and F1 score.For office activities (Figure 8b), the smallest AUC was observed for "Typing" (0.86) and the AUCs for the other two activities were greater than 0.9.
The F1 score and ROC curves indicated similar performance results for all the activities.Accordingly, the activity-decision algorithm using the real-time data was further developed and evaluated.

Performance of Activity-Decision Algorithm
The activity-decision algorithm determines a representative activity for a certain period of time.The period was set as 1 min in this study.The representative activity is determined by applying the average method or frequency method, whichever is better.For comparing the two methods, the activities classified by the model were replaced with the corresponding metabolic rate of the ASHRAE standard.
To identify the method that is better for determining the representative activity, the metabolic rates obtained every minute using the two methods-frequency and average-were compared for 1 h. Figure 9 and Figure 10 show the errors of the two methods and the pose-categorization model for home and office activities, respectively.Here, the values of the pose-categorization model represent the results of the model without manipulation such as value-averaging or representative-value selection.Table 4 compares the accuracies of the average and frequency methods.
As shown in Figure 9, for the home activities "Sleeping" and "Seated.quiet," the average and frequency methods both had errors of zero because the pose-categorization model estimated the metabolic rate perfectly, as shown in Table 2a.In the case of "Reclining" and "Standing.relaxed,"where a metabolic-rate error occurred in the pose-categorization model, the accuracy of the frequency method was higher than that of the average method.The average accuracy of the estimated metabolic rate for home activities (Table 4a) for the frequency method was 99.58%, which was higher than that for the average method.
The results for office activities are presented in Figure 10 and Table 4b.Because the accuracy of the pose-categorization model for "Seated.quiet"activity was 100% (Table 2b), there were no errors for either method.The errors caused by the pose-categorization model for "Standing.relaxed"and "Typing" (Figure 10) were reduced by the frequency method.The average accuracy of the frequency method for all the office activities was 88.89%, which is 7% higher than that of the average method.
Based on these results, the frequency method was selected for the activity-decision algorithm to determine the representative activity from the real-time data during the period.The errors eventually depend on the performance of the pose-categorization model.Therefore, if an error occurs in the model, the probability of an error occurring in the activity-decision algorithm increases.The performance of the pose-categorization model must be enhanced to improve the accuracy of the activity-decision algorithm.

Performance of Activity-Decision Algorithm
The activity-decision algorithm determines a representative activity for a certain period of time.The period was set as 1 min in this study.The representative activity is determined by applying the average method or frequency method, whichever is better.For comparing the two methods, the activities classified by the model were replaced with the corresponding metabolic rate of the ASHRAE standard.
To identify the method that is better for determining the representative activity, the metabolic rates obtained every minute using the two methods-frequency and average-were compared for 1 h.Figures 9 and 10 show the errors of the two methods and the pose-categorization model for home and office activities, respectively.Here, the values of the pose-categorization model represent the results of the model without manipulation such as value-averaging or representative-value selection.Table 4 compares the accuracies of the average and frequency methods.
As shown in Figure 9, for the home activities "Sleeping" and "Seated.quiet," the average and frequency methods both had errors of zero because the pose-categorization model estimated the metabolic rate perfectly, as shown in Table 2a.In the case of "Reclining" and "Standing.relaxed,"where a metabolic-rate error occurred in the pose-categorization model, the accuracy of the frequency method was higher than that of the average method.The average accuracy of the estimated metabolic rate for home activities (Table 4a) for the frequency method was 99.58%, which was higher than that for the average method.
The results for office activities are presented in Figure 10 and Table 4b.Because the accuracy of the pose-categorization model for "Seated.quiet"activity was 100% (Table 2b), there were no errors for either method.The errors caused by the pose-categorization model for "Standing.relaxed"and "Typing" (Figure 10) were reduced by the frequency method.The average accuracy of the frequency method for all the office activities was 88.89%, which is 7% higher than that of the average method.
Based on these results, the frequency method was selected for the activity-decision algorithm to determine the representative activity from the real-time data during the period.The errors eventually depend on the performance of the pose-categorization model.Therefore, if an error occurs in the model, the probability of an error occurring in the activity-decision algorithm increases.The performance of the pose-categorization model must be enhanced to improve the accuracy of the activity-decision algorithm.

Conclusions
An intelligent system that can automatically classify the activity of an occupant in real time was developed.The system includes a pose-categorization model for classifying occupant poses and an activity-decision algorithm for determining the representative activity based on the categorized real-time poses.In contrast to conventionally applied methods with complicated systems or devices, this is a simple and convenient system that can be applied to an actual building.The conclusions of this study are as follows: 1.
The pose-categorization model was trained with indoor images of home and office environments.
The optimized structure of the DNN comprised one input layer, four hidden layers, and one output layer.The trained pose-categorization model had 100% accuracy for the training and valid datasets.

2.
A real-time dataset consisting of 720 images for each activity was used for testing the pose-categorization model.For home and office activities, respectively, the pose-categorization model exhibited classification accuracies that were 98.9% and 88.2% and average F1 scores that were 0.99 and 0.89.The average AUC of the ROC curve was close to 1 for both environments.

3.
The activity-decision algorithm is designed to determine the representative activity for 1 min.An accuracy of representative activity decision was compared using frequency and average methods based on the real-time poses output from the pose-categorization model.As a result, the frequency method decided the representative activity more accurately than the average method by 4.58% for home and 7.22% for office, determined to be applied to the activity-decision algorithm.
Thus, the developed model and algorithm confirmed that it is possible to identify the activities of occupants using indoor images.In addition, the development of the activity-decision algorithm is expected to decide more accurate metabolic rates considering the actual activity of indoor occupant's compared with a constant metabolic rate by assuming a specific activity.
The developed intelligent model does not require the direct intervention of the occupants, and the experiments confirmed that real-time metabolic rate can be measured automatically when the model was applied to the actual building.In the actual environment, it is possible to develop an adaptive model that is customized to the occupant by continuously training the data of occupants.In addition, if a model for measuring clothing insulation is also developed in a further study, it can be applied to calculate an accurate PMV with the model and algorithm developed in this study.
To increase the accuracy of the activity estimation, the accuracy of the pose-categorization model must be enhanced.Although the pose-categorization model and the activity-decision algorithm cover some of the major indoor activities that occur in home and office environments, it is necessary to expand the scope to various activities and multiple occupants.Additionally, there were errors due to the absence of object detection in the images (e.g., "Typing" was classified as "Seated.quiet").Therefore, to expand the range of activities and to classify various indoor poses, an indoor-object-detection model must be developed.In the future, to determine individual metabolic rates according to the activity, models for estimating other personal factors such as gender, age, sex, and body mass index must be developed.Furthermore, studies must be conducted to compare with the performances of the heart-rate-based and oxygen-consumption-based methods.

Figure 1 .
Figure 1.Effects of the metabolic rate on the thermal comfort.

Figure 1 .
Figure 1.Effects of the metabolic rate on the thermal comfort.

Figure 2 .
Figure 2. Structure of the preceding model.

Figure 2 .
Figure 2. Structure of the preceding model.

Figure 3 .
Figure 3. Image samples from the collected dataset.

Figure 4 .
Figure 4. Structure of the pose-categorization model.

Figure 3 .
Figure 3. Image samples from the collected dataset.

Figure 3 .
Figure 3. Image samples from the collected dataset.

Figure 4 .
Figure 4. Structure of the pose-categorization model.

Figure 4 .
Figure 4. Structure of the pose-categorization model.

Figure 5 .
Figure 5. Five sets of valid data in the training dataset.

Figure 5 .
Figure 5. Five sets of valid data in the training dataset.

Figure 6 .
Figure 6.Methods for the development of the activity-decision algorithm.

Figure 7 .
Figure 7. Training results of valid dataset.

Figure 6 .
Figure 6.Methods for the development of the activity-decision algorithm.

Figure 7 .
Figure 7. Training results of valid dataset.

Figure 7 .
Figure 7. Training results of valid dataset.

Figure 9 .Figure 9 .
Figure 9. Errors of the two methods and the model in the case of the home activities

Figure 10 .
Figure 10.Errors of the two methods and the model in the case of office activities

Table 1 .
Details regarding the subjects.

Table 2 .
Results for the real-time home-activity dataset.(a) Confusion matrix.

Table 3 .
Results for the real-time office-activity dataset.

Table 4 .
Accuracy of the representative metabolic rate (a).Errors of the two methods and the model in the case of office activities.

Table 4 .
Accuracy of the representative metabolic rate. (a).