Next Article in Journal
A Review on Durability of Foam Concrete
Next Article in Special Issue
Study on the Winter Thermal Environment and Thermal Satisfaction of the Post-Disaster Prototype and Vernacular Houses in Nepal
Previous Article in Journal
NLFEA of Reinforced Concrete Corbels: Proposed Framework, Sensibility Study, and Precision Level
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Investigating the Effects of Parameter Tuning on Machine Learning for Occupant Behavior Analysis in Japanese Residential Buildings

1
Yamashita Sekkei, Inc., Tokyo 103-8542, Japan
2
Faculty of Engineering, Department of Architecture, Shinshu University, Nagano 390-8621, Japan
*
Author to whom correspondence should be addressed.
Buildings 2023, 13(7), 1879; https://doi.org/10.3390/buildings13071879
Submission received: 16 May 2023 / Revised: 8 July 2023 / Accepted: 21 July 2023 / Published: 24 July 2023
(This article belongs to the Special Issue Synergy between Mitigation and Adaptation in Buildings)

Abstract

:
Global warming is currently progressing worldwide, and it is important to control greenhouse gas emissions from the perspective of adaptation and mitigation. Occupant behavior is highly individualized and must be analyzed to accurately determine a building’s energy consumption. However, most of the resident behavior models in existing studies are based on statistical methods, and their accuracy in parameter tuning has not been examined. The accuracy of heating behavior prediction has been studied using three different methods: logistic regression, support vector machine (SVM), and deep neural network (DNN). The generalization ability of the support vector machine and the deep neural network was improved by parameter tuning. The parameter tuning of the SVM showed that the values of C and gamma affected the prediction accuracy. The prediction accuracy improved by approximately 11.9%, confirming the effectiveness of parameter tuning on the SVM. The parameter tuning of the DNN showed that the values of the layer and neuron affected prediction accuracy. Although parameter tuning also improved the prediction accuracy of the DNN, the rate of increase was lower than that of the SVM.

1. Introduction

The building sector is currently responsible for more than 40% of global energy consumption and more than 30% of greenhouse gas (GHG) emissions [1]. The Intergovernmental Panel on Climate Change (IPCC)’s Sixth Assessment Report [2] states that anthropogenic GHG emissions increased approximately 1.6 times between 1990 and 2019, and the global average temperature increased by approximately 1.1 °C between 1850 and 2020. Achieving carbon neutrality is necessary to achieve a sustainable society, and it is imperative to address mitigation measures to reduce GHG emissions and adaptation measures to limit the damage that cannot be avoided by implementing mitigation measures alone [3].
Mitigation measures in the building sector include the use of more efficient buildings and renewable energy. To implement mitigation measures, understanding the indoor thermal environment and energy use by combining empirical measurements and computational simulations holds significant importance. However, the predictions obtained from current simulations differ from actual measurements [4,5]. One reason for the difference between the predicted and measured values is that the simulations do not reflect occupants’ behaviors in adapting to the indoor environment. Occupant behaviors include opening and closing windows, using heating and cooling equipment, and adjusting clothing. It is known that occupants’ behaviors change with thermal stress and can be considered in the simulations as a schedule model. For this purpose, it is necessary to develop a predictive model for occupant behavior based on observable variables. The analysis of occupant behavior is an adaptation measure to climate change.
In recent years, advances in hardware and information technology (IT) have brought machine learning to the forefront of data prediction. Machine learning is a technique for developing algorithms that make predictions by having the machine read data and iteratively learn to find hidden patterns [6]. Machine learning has been applied in various fields, including medicine, finance, agriculture, and commerce. Machine learning has also been used in the building sector, mainly to predict the energy consumption of buildings. However, there have been no studies in which machine learning has been used to predict occupant behavior. Machine learning techniques are also expected to improve prediction accuracy in the area of occupant behavior.
The effect of occupant behavior on energy consumption has been studied in a variety of buildings. Nicol and Humphreys [7] studied occupant behavior across Europe, including Pakistan and the United Kingdom, and presented a probabilistic approach to thermal comfort. Clevenger et al. [8] showed that occupant behavior affects annual energy consumption in residential and commercial buildings. Ioannou and Itard [9] showed that occupant behavior factors have a greater impact on heating energy consumption than building factors. Sun and Hong [10] demonstrated that a set of five indicators capturing occupant behavior, encompassing lighting, plug load, comfort, HVAC control, and window control, can reduce energy consumption by up to 22.9% alone and 41.0% in combination, respectively. Zhuang et al. [11] presented a data-driven predictive control method with time-series forecasting (TSF) and reinforcement learning (RL) to examine various sensor metadata for HVAC system optimization. The optimal TSF models were integrated with a Soft Actor-Critic RL agent to analyze sensor metadata and optimize HVAC operations, achieving 17.4% energy savings and 16.9% thermal comfort improvement in the surrogate environment. Zou et al. [12] presented Win Light, a novel occupancy-driven lighting control system that aims to reduce energy consumption while simultaneously preserving the lighting comfort of occupants. The experimental results demonstrated that Win Light achieved 93.09% and 80.27% energy savings compared to static-scheduling lighting control schemes and PIR sensor-based lighting control schemes while guaranteeing the personalized lighting comfort of each occupant. Duygu Tekler et al. [13] proposed Plug-Mate, a novel IoT-based occupancy-driven plug-load management system that reduces plug-load energy consumption and user burden through intelligent plug-load automation. Duygu Tekler et al. [14] proposed a hybrid active learning framework to reduce data collection costs for developing data-efficient and robust personal comfort models that can predict users’ thermal comfort and air-movement preferences. Andre et al. [15] reviewed recent publications on PCS to understand what has not yet been discussed on this topic. Wang and Greenberg [16] studied the effects of opening and closing windows on occupant behavior and showed that in summer months in different climates, mixed-mode ventilation can reduce heating and cooling energy consumption by 17–47%. Thus, there is a relationship between occupant behavior and energy consumption, and improving the accuracy of occupant-behavior prediction models is important for controlling building energy consumption.
In recent years, machine learning has been used to predict the energy consumption of buildings. Wei et al. [17] used a variety of machine learning techniques, such as artificial neural networks (ANNs) and support vector machines (SVMs), to predict energy consumption.
Robinson et al. [18] used machine learning models, such as SVM and random forest (RF), as well as statistical methods such as linear regression, to estimate energy consumption in commercial buildings. Machine learning models are expected to be highly accurate in predicting occupant behavior and energy consumption.
Predictive models for occupant behavior have been studied using statistical methods in a variety of buildings, including residential buildings, office buildings, and schools. Rijal et al. [19] conducted a study on window opening and closing in Japanese houses and condominiums and presented a probability model for window opening and closing using logistic regression. Shi et al. [20,21] determined the probabilities of opening windows in apartments in Beijing and Nanjing using multivariate logistic regression and showed that air pollution is a factor that influences the window-opening and -closing behavior of residents. Jeong et al. [22] analyzed the relationship between occupant behavior and window control in an apartment complex and found clear differences in window control behavior compared to office building occupants. Jones et al. [23] conducted a study in a United Kingdom residence and used multivariate logistic regression to build a probability model for window opening and closing in a master bedroom. Fabi et al. [24] used a Bernoulli distribution to study the predictive accuracy of window-opening and -closing models in Japanese, Swiss, and Danish homes. Lai et al. [25] measured the thermal environment in 58 apartments in five different climates in China and built a prediction model for occupants’ opening and closing of windows. Zhang et al. [26] studied the window-opening and -closing behavior of occupants in a United Kingdom office building and presented a probability model for window opening and closing using regression and probit analysis. Rijal et al. [27] developed a probabilistic model of the window-opening and -closing behavior of occupants in a United Kingdom office building and applied it to a building simulation plan model. Herkel et al. [28] conducted a field study of manual window controls in 21 individual offices located in a German institute and presented a schedule model for window controls. Yun et al. [29] used a probabilistic model to show differences in occupant behavior in private offices with and without night ventilation during the summer months. Haldi et al. [30] performed a logistic regression analysis in a Swiss office building during the summer to predict the probability of behavioral adaptations to both personal characteristics, such as clothing insulation and metabolic rate, and environmental characteristics such as windows, doors, and fans. Belafi et al. [31] studied the window-opening and -closing behavior in a Hungarian school and used regression analysis to build a model for the window-opening and -closing behavior in classrooms. Research on predictive models for occupant behavior has been conducted worldwide. Predictive models in previous studies have been based on statistical methods, and there have been limited studies using machine learning. The authors of [32] conducted a study using machine learning to analyze the natural ventilation behavior of occupants in summer housing. Ten machine learning models were compared, and the most suitable model for predicting occupant behavior was analyzed. They also examined the features that influence natural ventilation behavior in a multifaceted process. The analysis found that logistic regression, support vector machine, and deep neural network were the three most suitable algorithms for predicting occupant behavior. However, this study was not parameter-tuned, and further improvements in prediction accuracy are expected.
The purpose of this study is to improve the prediction accuracy of occupant behavior in a residential building in Gifu City. In this study, machine learning is used to analyze the accuracy of resident behavior predictions. The following are the objectives of this study:
(1)
Analyzing the factors affecting heating use behavior using the features obtained from the thermal environment survey.
(2)
Performing parameter tuning in the machine learning model to study the accuracy.
This study will address the aforementioned issues through the following research flow. In Section 1, we present the background of this study, existing research and issues, and the purpose of this study. The effectiveness of machine learning compared to statistical methods as a method for improving the prediction accuracy of resident behavior is described. Section 2 describes the basic theory of machine learning. Section 3 describes the methodology of this study. Section 4 describes the results and discussion of the analysis, where Section 4.1 describes the basic tabulation of the survey results, Section 4.2 analyzes machine learning in the initial conditions, Section 4.3 analyzes machine learning in feature selection, Section 4.4 analyzes parameter tuning in SVM, Section 4.5 analyzes parameter tuning in DNN, and Section 4.6 analyzes changes in resident behavior over time. Section 5 presents the conclusions.

2. Theory

2.1. Overview of Machine Learning

Machine learning is a method of analyzing data in which a machine automatically learns from the data to discover the rules and patterns behind it [6]. The difference between machine learning and statistics is that machine learning is about automatic machine learning based on data, whereas statistics is about the probabilistic determination of rules and patterns in data. However, it has become difficult to draw a clear dividing line today, as the use of computers has become commonplace, including in the world of statistics. Statistics and machine learning are the same in that both are about finding rules and patterns in data and building models; however, the difference is not in the method of data analysis but in the purpose. In the case of statistics, the focus is on whether the rules behind the data can be better explained. In the case of machine learning, the focus is on whether the data can be better predicted. In statistics, most models consist of explanatory variables that can be intuitively understood to some degree. However, in machine learning, explanatory variables that cannot be intuitively understood are also taken into account, so higher accuracy can be expected.

2.2. Overview of Cross-Validation

Cross-validation is a method used in machine learning to prevent overfitting and improve generalization ability [33]. Overfitting occurs when the learning process adapts to the training data to an excessive degree, resulting in poor estimation performance for unknown data. Generalization is based on the estimation performance. Cross-validation is performed to prevent models from being built with high predictive accuracy for learned data, but low predictive accuracy for unknown data.
Cross-validation is usually performed using K-fold cross-validation, in which the sample group is divided into K pieces. K-fold cross-validation is performed using machine learning with one of the K-folds as test data and the remaining K − 1 folds as training data to determine the prediction accuracy. Machine learning is performed K times so that each of the similarly folded sample groups serves as the test data once, and the average of the obtained prediction accuracy is used as the prediction value. An example with K = 5 is shown in Figure 1.

2.3. Feature Selection Overview

In machine learning, it is important to have a large number of features. However, this interaction may lead to poor prediction accuracy. Therefore, feature selection is important to improve prediction accuracy. Feature selection shortens the training time, simplifies model interpretation, improves model accuracy, and reduces overfitting [34].
Feature selection methods can be broadly classified into three categories: the Filter Method, Wrapper Method, and Embedded Method. The Filter Method does not use a machine learning model but is complete with only one dataset. This has the advantage of lower computational cost because it depends on the performance of the data. However, the disadvantage is that only one feature is considered at a time so the interaction of multiple features is not taken into account. The Wrapper Method uses machine learning models to evaluate feature combinations. It is possible to find relationships between features that are not found using the Filter Method and find the optimal combination of features for each model. The Embedded Method simultaneously performs feature selection in a machine learning model. In particular, there are lasso and ridge regressions. The Wrapper Method is widely used in machine learning for feature selection. The two primary Wrapper Methods are Forward Selection and Backward Elimination. Forward Selection is a method in which all features are first removed from the training data, and then the features are added one at a time. Features are added starting with the one that gives the greatest improvement in accuracy, and the process is iteratively repeated until there is no change in accuracy. Backward Elimination is a method that starts with all the features included in the training data and then reduces the number of features one by one. It starts with the features that yield the greatest improvement in accuracy, and the process is iteratively repeated until the accuracy no longer changes.

2.4. Evaluation Index

To determine the effectiveness of a machine learning model, an evaluation index must be established. The evaluation indexes described in this section are used to evaluate the predictions made by machine learning models when solving classification problems with machine learning. The prediction of classification problems cannot be evaluated using a single criterion; different evaluation indexes must be used for different purposes.

2.4.1. Confusion Matrix

A confusion matrix is a matrix formulation of the actual and predicted classifications in a binary classification problem. Table 1 lists the confusion matrices.
The machine learning results are classified into the four categories of the confusion matrix. The four categories are true positive (TP), false negative (FN), false positive (FP), and true negative (TN). TP is when the measured value is positive and also predicted to be positive by machine learning; FN is when the measured value is positive and predicted to be negative by machine learning; FP is when the measured value is negative and predicted to be positive by machine learning; and TN is when the measured value is negative and predicted to be negative by machine learning. In other words, TP and TN are classifications whose predictions are correct, whereas FN and FP are classifications whose predictions are incorrect.

2.4.2. Theory of Evaluation Index

There are many evaluation indices in machine learning based on the confusion matrix described in the previous section. Payet et al. [35] employed the metrics accuracy, precision, recall, and F1-score to assess the performance of various classification models on imbalanced datasets. These indicators are most commonly used in the field of machine learning, and accuracy, precision, recall, and F-measure indicators are also used for the analysis in this study. The four evaluation indices are described as follows:
Accuracy: Accuracy, also referred to as the percentage of correct responses, is a measure of how well the overall predictions match the actual measurements. Equation (1) shows the formula for accuracy.
Accuracy = (TP + TN)/(TP + FN + FP + TN)
Precision: Precision, also called the goodness-of-fit ratio, is an indicator of the percentage of data that are actually positive compared to those predicted to be positive. On the other hand, precision is an indicator that ignores data that are incorrectly predicted to be negative and therefore is not very effective when FN is a problem. Equation (2) shows the formula for precision.
Precision = TP/(TP + FP)
Recall: Recall, also known as the recall ratio, is an index of the percentage of data that can be correctly predicted as positive compared to the data that are actually positive. This is useful when a positive reading should not be incorrectly predicted as negative. Alternatively, recall is an indicator that ignores data that are falsely predicted to be positive and is, therefore, not very effective when FP is a problem. Equation (3) shows the formula for recall.
Recall = TP/(TP + FN)
F-measure: The F-measure is the harmonic mean of precision and recall, which have contrasting characteristics. This is an effective evaluation index for unbalanced datasets. Equation (4) shows the formula for the F-measure.
F-measure = (2 × Precision × Recall)/(Precision + Recall)

2.5. Parameter Tuning

The parameters in machine learning mainly refer to the weights that the model optimizes during the learning process. The parameters are usually tuned automatically by the machine learning model. However, there are also parameters that need to be set manually before machine learning can be performed. Parameter tuning is necessary to control the behavior of each machine learning algorithm. Parameter tuning in machine learning is about balancing the nonlinearity and generalization capability of the model. The purpose of parameter tuning is to improve the prediction (generalization ability) of unknown data.
The parameter tuning method used is a grid search, which is commonly used in parameter tuning and is highly interpretable. A grid search is a machine learning method that uses brute force to find the optimal combination of prespecified parameter combinations on a grid. A grid search has the advantage of being highly interpretable. However, the disadvantage is that the number of combinations increases exponentially with the number of parameters, making the computational cost very high. In this study, parameter tuning is performed on two machine learning models: SVM and deep neural network (DNN).

2.5.1. Overview of Parameter Tuning in SVM

The binary classification problem in machine learning is solved by drawing a line or plane that serves as a boundary between two classes. The line or plane that serves as the boundary is called the decision boundary. SVM aims to find the decision boundary with the largest margin between the two classes. In addition, two hyperparameters of SVM, C and gamma, are known to have a significant impact on the decision boundaries. The basic theories of SVM, C, and gamma are described below [36].
The margin is the distance between the decision boundary and the data closest to the decision boundary. The data closest to the decision boundary are called support vectors. In this case, the margin of the support vector is the same for each class. A conceptual diagram of the margin and support vectors is shown in Figure 2.
To achieve high identification performance, margin maximization should be considered. Although there are myriad possible boundaries for classifying binary values, setting a boundary at the extreme edge of one or both class values increases the likelihood of misclassification for slightly misaligned data. Therefore, setting a boundary with high generalization capability is the reason for maximizing the margin. The decision boundary of SVM is a straight line when the features are two-dimensional. The concept of SVM in two dimensions is shown in Equations (5) and (6).
The equation of the two-dimensional line is as follows:
a x + b y + c = 0
Based on the distance equation between a point (xi, yi) and a straight line, finding the combination of a and b that maximizes the margin for all training data (i = 1, 2,…, n) is the learning of SVM in two dimensions.
m a r g i n = a x i + b y i + c a 2 + b 2
In general, features in machine learning are rarely two-dimensional but often three-dimensional or more. Generalizing beyond three dimensions in an SVM would mean that the decision boundary is on a hyperplane rather than a straight line. The concept of SVM in three or more dimensions is shown in Equations (7)–(16).
The equations for an n-dimensional hyperplane are as follows:
w 1 x 1 + w 2 x 2 + w 3 x 3 + w n x n + w 0 = 0
Convert to vector notation.
W T X i + w 0 = 0
Let K1 be positive data and K2 be negative data in the binary classification. Thus, the following equation is satisfied:
W T X i + w 0 > 0     ( X i     K 1   )
W T X i + w 0 < 0     ( X i     K 2   )
Using the variable t, with ti = 1 if the i-th data xi belongs to K1 and ti = −1 if they belong to K2, the conditional expression can be expressed as follows:
t i W T X i + w 0 > 0     ( i = 1 ,   2 ,   3 ,   N )
The distance between this hyperplane and point Xi is also shown below.
d = w 1 x 1 + w 2 x 2 + w 3 x 3 + w n x n + w 0 w 1 2 + w 2 2 + w 3 2 + + w n 2 = W T X i + w 0   W
From Equations (11) and (12), the condition for maximizing the margin M can be expressed by the following equation:
m a x w , w 0 M , t i W T X i + w 0 W M   ( i = 1 ,   2 ,   3 ,   N )
Dividing both sides by M and fitting W′ and w0′ so that W M   W = W   w 0 M   w 0 = w 0 , the conditional expression becomes
t i W T X i + w 0 1   ( i = 1 ,   2 ,   3 ,   N )
M′, which is a simplified version of the margin M, can be expressed by the following equation:
M = t i W T X i + w 0 W = 1 W   ( i = 1 ,   2 ,   3 ,   N )
In other words, maximizing the margin M = 1 W is the goal of SVM.
To simplify the calculation, transforming the equation into a normalized space yields the following equation:
m i n w , w 0   f W = 1 2   W 2 ,   t i W T X i + w 0 1   ( i = 1 ,   2 ,   3 ,   N )
Therefore, when   t i W T X i + w 0 1 , it is possible to maximize the margin by minimizing 1 2   W 2 .
C: The above theory is called the hard margin, which assumes that the two classes can be completely classified. However, hard margins have the disadvantage that they make complete separation impossible for data that cannot be separated by a straight line, which reduces the generalization ability. Therefore, the theory of soft margins is used, which allows some degree of misclassification.
t i W T X i + w 0 1 ξ i     ( i = 1 ,   2 ,   3 ,   N )  
In this case, ξ i is called a slack variable and the margin constraint can be relaxed by introducing a slack variable. The larger the value of ξ i , the greater the degree of misclassification. Therefore, this is a variable that should be small from the perspective of avoiding misclassification. Therefore, by defining a function for Equation (16) that adds the sum of the slack variable multiplied by the coefficient C, a balance between margin maximization and misclassification tolerance can be achieved in learning.
  f W = 1 2   W 2 + C   i = 1 n ξ i
C is a parameter that indicates the degree to which misclassification is tolerated. If C is small, the value is not large, even if the number of misclassifications is large, so some misclassification is allowed. In contrast, if C is large, the larger the number of misclassifications, the larger the value, which does not allow many misclassifications. Figure 3 shows a conceptual diagram of the decision boundary for a change in C.
Gamma: A possible countermeasure to the hard margin is the boundary definition of nonlinearity using kernel tricks, as well as the relaxation of conditions through soft margins. The kernel trick is equivalent to adding a z-axis when linear separation is difficult in the xy coordinate system, allowing linear separation, and drawing a nonlinear decision boundary when this separation plane is transformed back to the base x-coordinate system. Thus, by transforming φ to a higher-dimensional coordinate system that combines the original features, classification with nonlinear decision boundaries becomes possible even when linear separation is not possible. This transformation is usually performed by defining a kernel function, and the transformation method that uses kernel functions is called a kernel trick. The kernel functions are as follows:
K   X i ,   X j = ϕ X i T ϕ ( X j )
The most commonly used kernel function is the radial basis function (RBF) kernel, where γ (gamma) in the equation is a hyperparameter. The RBF kernel is given in Equation (20).
K   X i ,   X j = e x p ( X i X j 2 2 σ 2 ) = e x p ( γ X i X j 2 )
Gamma is a parameter that represents the distance at which one point in the training data affects the decision boundary. The larger the gamma, the smaller the area of influence of a single point of the training data, so the curvature is a large decision boundary. The smaller the gamma, the larger the influence area of a single point of the training data, resulting in a decision boundary with a small curvature. Figure 4 shows a conceptual diagram of the decision boundary for a change in gamma.

2.5.2. Overview of Parameter Tuning in a DNN

A DNN is a mathematical model that simulates a collection of neural networks in the brain. The number of neurons and layers in the hidden layer are parameters that affect the prediction accuracy and computational cost [37]. An overview of the DNN is shown in Figure 5.
Neuron: A neuron is a function that takes multiple inputs, performs some kind of calculation on them, and produces the result as a single output. Some of these calculations are called activation functions, which are used to convert unlimited inputs into predictable ranges. The number of neurons can be adjusted as desired, depending on the complexity of the problem that needs to be solved. However, the larger the increase, the more time is required for computation, resulting in higher processing costs. Moreover, because the internals of neurons are black-boxed, it is impossible to express them in a mathematical equation.
Layer: Layer refers to the number of layers in the hidden layer. As with neurons, the same is true here: the larger the number of layers, the more complex the problem that can be represented and the higher the computational cost. Increasing the number of layers differs from increasing the number of neurons in that the results processed by the first neuron are continued in the second and third layers, allowing for more complex expressions.

3. Methods

3.1. Survey Description

In this study, the indoor thermal environment was measured, and a subjective survey was conducted on the thermal comfort of the occupants of a detached wooden house in Gifu, Japan. The annual mean outdoor temperature in Gifu City is 16.2 °C, and the annual mean precipitation is 1860.7 mm. Therefore, the city falls under the warm and humid climate (Cfa) in the Köppen climate classification. The house studied was a detached wooden house with one or two floors.
The participants were provided prior information regarding the survey’s content, and their consent was obtained prior to the commencement of the survey. Votes were obtained by asking the residents to turn in their records four times a day during a specified period. Younger and older respondents confirmed that they accurately understood the content of the survey. Participants with medical conditions and young children who had difficulty understanding the survey were excluded. Any requests or offers to temporarily discontinue the survey during the specified time period were promptly accommodated. To protect residents’ privacy, no data on square footage, specifications, or photographs of individual homes were collected due to lack of consent.
The survey was conducted from 1 December 2010 to 28 February 2011. During this period, 3821 votes were collected without missing all the features. A total of 65 participants took part in the survey, with 30 units being included in the study. Among the participants, there were 32 males and 33 females. The age of the participants ranged from 7 to 79 years, with an average age of 41.1 years. Table 2 shows the age and gender distribution of the survey participants in the winter analysis.
During the actual measurement period of the indoor thermal environment, questionnaires were administered regarding the participants’ occupant behavior four times a day. Each of the four surveys necessitated a report for distinct time periods: from waking until 12:00, 12:00 to 16:00, 16:00 to 20:00, and 20:00 until bedtime. However, if it was difficult to respond within the allotted time, respondents were allowed to respond at any interval of at least one hour. The survey was administered to individuals who agreed to participate after being informed of the survey’s content in advance. Anonymity was maintained during the study to avoid any personal exposure of participants. The questionnaire used in this study was in Japanese because the participants were Japanese.

3.2. Thermal Environmental Data Collection

The air temperature, relative humidity, and globe temperature were measured in the thermal environment of a room. The survey asked about opening/closing windows, heating, opening/closing interior doors, opening/closing curtains, and kotatsu use. To capture the indoor environment, measurements of the indoor air temperature, relative humidity, and globe temperature were taken at a height of 600 mm, as the living room was designated as the seating area at floor level.
To measure the temperature difference between the height of the floor seat and the feet, the foot temperature was measured at a height of 100 mm, which is the height of the ankles. The measuring instruments were installed in a location that was not affected by solar radiation or heat generation and did not interfere with daily life. In the case of indoor wind speed, measurements were conducted for a duration of 5 min following the initiation of the survey, and the resulting average value was utilized as the representative measure. No anemometers were installed for continuous indoor measurement because the number of anemometers was limited. Due to equipment limitations, the subjects were divided into three groups, with each group being measured approximately 10 days per month. Photographs of the measurement equipment used to measure the indoor air temperature, globe temperature, and indoor relative humidity are shown in Figure 6, and an overview of the measurement equipment is presented in Table 3.
Two variables associated with the human body were assessed: clothing insulation and the rate of metabolism. The measurement of anthropometric factors was completed by the participants themselves when they answered the subjective report. Metabolic rates were estimated based on work intensity prior to voting. Clothing insulation was estimated using the Hanada weight method [38] by asking respondents to enter the total weight of the clothing they were wearing.
Publicly available data from the Japan Meteorological Agency [39] were used to determine the outdoor thermal environment. The outdoor temperature, outdoor relative humidity, outdoor wind speed, barometric pressure, cloud cover, and precipitation were tabulated. The observation point was Gifu City, Gifu Prefecture, which is located in the center of the studied residence.

3.3. Thermal Indices

A thermal index was used as a characteristic in this study. The thermal indices employed encompassed the operative temperature (Top), average radiant air temperature (MRT), dew point temperature (Td), modified effective temperature (ET*), standard modified effective temperature (SET*), wet-bulb globe temperature (WBGT), neutral temperature (Tn), and the disparity between the action temperature and neutral temperature (Tdiff). The difference between the operative temperature and 18 °C (Top-18) was used as a characteristic to analyze the limits of adaptation to cold environments.
The operative temperature is an evaluation index of the thermal environment of the human body. As this study was conducted indoors under calm airflow conditions, the temperature was calculated using the average room air temperature and mean radiant air temperature according to ASHRAE Standard 55 [40].
T o p = A t a + 1 A t r ¯
  • Top = operative temperature (°C)
  • ta = average air temperature (°C)
  • tr = mean radiant air temperature (°C)
  • A = constant as a function of air velocity (0.5)
The mean radiant air temperature (MRT) was calculated based on Benton’s formula [41], and the formula for calculating the MRT is given below:
T r = 6.32 * D 0.4 * v 0.5 σ * ε * T g T a + T g 4 0.25
  • D = diameter of globe thermometer (m)
  • ε = emissivity of globe thermometer (0.95)
  • σ = Stefan–Boltzmann constant (5.67 * 10−8 [W/m2K4])
  • Ta = air temperature (K)
  • Tg = globe temperature (K)
  • Tr = MRT (K)
  • v = air velocity (m/s)
The globe thermometer used in this study was a Vernon type (0.15 m diameter).
The dew point temperature was calculated using Tetens’ formula [42]. The dew point temperature was calculated using the following formula:
T d = 237.3   log 10 ( 6.1078 e ) log 10 e 6.1078 7.5
  • Td = dew point temperature (°C)
  • e = water vapor pressure in the air (hPa)
The new effective temperature is an evaluation index based on a thermal equilibrium equation. It can comprehensively evaluate the air temperature, humidity, airflow, radiation, clothing insulation, and metabolic rate, and takes into account the thermoregulatory function of the body through sweating using a two-node model. The standard new effective temperature was specified as a standard environment with an air velocity of 0.135 (m/s), metabolic rate M (met), and standard clothing insulation to allow the comparison of thermal environments under different conditions. The new effective temperatures were obtained from the ASHRAE thermal comfort tool, and the standard new effective temperatures were obtained from ASHRAE Standard 55 [40]. Atmospheric pressure data for Gifu City, Gifu Prefecture, Japan, were obtained from the Japan Meteorological Agency [39]. Body weight data were obtained from the Ministry of Health, Labor, and Welfare [43], and body surface area was calculated using the Kurazumi formula [44]. The Kurazumi formula is as follows:
S = 72.18   *   W 0.425   *   H 0.725
  • S = body surface area (cm2)
  • W = weight (kg)
  • H = height (cm)
The WBGT was proposed in the United States in 1954 to prevent heat stroke in U.S. military personnel. The WBGT is an index that focuses on the heat exchange between the human body and the outside air and takes into account humidity and solar radiation, which have a great influence on the heat balance of the human body. The calculation formula is as follows:
W B G T = 0.7 T w + 0.3 T g
  • Tw = wet-bulb temperature (°C)
  • Tg = globe Temperature (°C)
The neutral temperature is the operative indoor temperature at which occupants are comfortable, as dictated by temperature/cooling sensation. There are two methods for calculating neutral temperature: linear regression and the Griffith method [45]. The Griffith method is generally used because linear regression methods are susceptible to highly biased data. The formula for calculating the neutral temperature using the Griffith method is as follows:
T n = T i + 0 T S V a
  • Tn = neutral temperature (°C)
  • Ti = room air temperature (°C)
  • TSV = thermal sensation vote (-)
  • A = sensitivity constant (0.5)

4. Results and Discussion

4.1. Basic Aggregation

The results of the survey in this study were tabulated. A total of 3821 votes were collected. Figure 7 shows the distribution of the gender and age of residents in the received votes. The survey involved the participation of approximately an equal number of males (1833) and females (1988), resulting in a near 1:1 ratio. The age distribution of the survey participants was primarily characterized by individuals in their 50s, followed by those in their 20s.
Figure 8 shows the outdoor air temperature trends in Gifu City during the study period. Table 4 also provides a statistical summary of the indoor and outdoor thermal environments and thermal comfort index. The minimum and maximum indoor air temperatures were −0.5 °C and 28.5 °C, respectively, with an average value of 15.7 °C. The minimum and maximum indoor relative humidity values were 19% and 89%, respectively, with an average value of 53.5%. The indoor thermal environment was similar to that of a typical Japanese house during winter. The minimum outdoor air temperature was −3.1 °C, the maximum was 18.5 °C, and the mean was 4.7 °C. The minimum and maximum outdoor relative humidity values were 15% and 91%, respectively, with an average value of 66.0%. The outdoor thermal conditions mirrored the overall winter climate of Gifu City.
Table 5 lists the indoor environmental conditions indicated by the votes during this period. Approximately 64% of the respondents were using heating at the time they cast votes during this period. In addition, the results were low for open windows and interior doors in the living room during the winter months. Approximately 33% of the curtains were open at the time of voting. The kotatsu was in use approximately 26.5% of the time at the time of voting. A higher percentage of openings were closed during the winter months to improve the air-tightness of the rooms. The percentage of respondents who used a heater was higher than those who used a kotatsu, indicating that the use of a heater is common in homes.
Table 6 provides a statistical summary of the subjective votes. In tabulating the data, a reclassification was made in terms of thermal sensation and affective assessment. For thermal sensation, “very cold”, “cold”, and “cold” were classified as cold, with scale values from −4 to −2; “slightly cold”, “neither hot nor cold”, and “slightly warm” were classified as neutral, with scale values from −1 to +1; and “warm”, “hot”, and “very hot” were defined as hot, with scale values from +2 to +4. When rating the affective assessment, “very uncomfortable”, “extremely uncomfortable”, and “unpleasant” were classified as unpleasant, with scale values from +1 to +3, and “somewhat uncomfortable” and “comfortable” were considered comfortable, with scale values from +4 to +5. In the winter indoor environment, there were only votes for “cold” or “neutral” and no votes for “hot”.

4.2. Analysis According to Initial Conditions

Machine learning has been shown to be effective for many predictions due to its high accuracy and ease of use in analyzing training data [46]. For this reason, a number of studies have been conducted using machine learning techniques to predict energy consumption and analyze the impact of energy conservation measures such as renewable energy technologies [47,48,49]. Although machine learning has been used extensively in predicting building energy consumption, few analyses have been conducted on predicting occupant behavior using machine learning. Machine learning varies in its ease of use, ability to build predictive models with interpretable structures, and computational cost, depending on the method [50]. Therefore, it is imperative to explore the utilization of machine learning models in order to ascertain the optimal approach for predicting the behavior of residents when it comes to opening and closing windows.
First, the analysis was performed under the initial conditions using three machine learning models: logistic regression (LR), SVM, and DNN. The initial conditions were analyzed using all features and no parameter tuning. A total of 37 features were used in the analysis. The features used in the analysis are listed in Table 7.
The default values were used for the parameters of the initial conditions, which are in machine learning. For the LR, no parameter tuning was performed because no parameters could be set. For SVM, gamma and C are tunable parameters. The values used for the initial conditions were gamma = 0 and C = 0. A DNN has parameters that can be tuned by the layers and neurons. The values used for the initial conditions were layer = 2 and neuron = (50, 50). Table 8 shows a comparison of the accuracies of the machine learning models as a function of the initial conditions.
Under the initial conditions, the accuracies were 0.783, 0.770, and 0.827 for LR, SVM, and DNN, respectively. When comparing the three machine learning models, DNN showed the best accuracy, but no significant differences were found when compared to LR or SVM. In addition, the fact that extremely low values for precision, recall, and F-measure were not obtained indicates that the machine learning models’ predictions were not biased toward either “heating on” or “heating off”. Thus, the validity of the machine learning models was confirmed in this study. It needs to be investigated to what extent the accuracy can be improved through feature selection and parameter tuning.

4.3. Analysis by Feature Selection

4.3.1. Machine Learning Features

Feature selection was performed to analyze the features that affect the use of winter heating. Feature selection was performed in two ways, forward selection (FS) and backward elimination (BE), which are widely used in machine learning. FS is a method that starts with no features and increases the number of features individually to find the most accurate combination. BE is a method that starts with all the features included and decreases the number of features one by one to find the most accurate combination. The objective was to investigate the difference in accuracy between the two methods and the combination of common features. Table 9 shows a comparison of the accuracy by feature selection.
There was no difference in the prediction accuracy between FS and BE in the LR, SVM, and DNN models. Compared to the initial accuracy, LR’s accuracy increased by approximately 1.5%, SVM’s increased by approximately 2.9% to 4.2%, and DNN’s increased by approximately 1.9% to 2.4%. Although feature selection improved the accuracy, it did not produce the expected results in predicting resident behavior.
Table 10 lists the selected features in FS and BE. With the exception of relative humidity, no indoor thermal environment features were selected for the SVM. In the DNN, however, features were selected for the indoor thermal environment, except for the FS globe temperature. Thermal indicators were selected infrequently for the SVM, but irregularly for LR and the DNN. Occupant behavior was selected for all but the LR curtain. The DNN selected a relatively large number of features for both FS and BE, whereas the SVM tended to select fewer features compared to the DNN. There was no regularity in the features selected for the LR, SVM, or DNN models. There was also no difference in the selected features between FS and BE.

4.3.2. Examination of Features through Linear Regression

In the previous section, machine learning was used to select features. However, no regularity was found in the selected features. In this section, linear regression is performed on the features used in machine learning to analyze the relationships between the features and the features that affect heating usage. Table 11 shows the values obtained using linear regression.
The coefficient of determination, standard error, t-value, and p-value were used as the indices for linear regression. The coefficient of determination indicates the degree of fit of the estimated regression equation; the closer it is to 1, the stronger the explanatory power for the target variable. The standard error is the standard deviation of the estimator and represents the variability of the estimator obtained from the sample. The t-value and p-value are indicators of the statistical significance or dominance of the coefficient of determination for a feature. For a feature to reach the 5% significance level, the absolute value of the t-value must be greater than 2 or the p-value must be less than 0.05.
No linear regression index could be determined for the characteristic room air velocity because only representative values from 5 min of measurements were used due to the availability of equipment. For acceptance and preference, precipitation, comfort, barometric pressure, cloud cover, indoor awareness, and outdoor relative humidity, the absolute t-values were greater than 2 and the p-values were greater than 0.05. Therefore, these features did not appear to have a statistically significant effect on heating use.
The highest coefficient of determination was 0.362 for indoor air temperature. In addition, features such as thermal tolerance, curtains, and globe temperature were observed to have a positive influence on heating use. Clothing insulation and interior doors also had negative coefficients of determination, which may have resulted in a negative influence on heating consumption. The highest absolute value of the coefficient of determination was obtained for indoor air temperature, suggesting that indoor air temperature is the feature that has the greatest influence on heating use among the features used in this study.
Linear regression showed the features that influenced heating use. In addition, feature selection using machine learning revealed the feature combinations that yielded the highest accuracy. While the summer analysis showed that the trade-off features had a significant influence on the objective variable, the winter analysis did not identify any features with a particularly large effect.

4.4. Analysis by Parameter Tuning of SVM

Because no significant improvement in prediction accuracy was observed with feature selection, parameter tuning was performed. The purpose of parameter tuning is to maximize the estimation performance of unknown data by balancing the nonlinearity and generalization ability of the machine learning model with the parameters. There are two parameters in SVM: C and gamma. Because the range of parameters can be set infinitely, it is difficult to find the point where the prediction accuracy is maximized. For this analysis, the parameters were set on a logarithmic scale, and a grid search was performed to find the maximum prediction accuracy over a wide range. Eleven Cs were set (10N, N = −1 to 9), with a global minimum of 10−1 and a global maximum of 109, and 11 gamma rays were set (10N, N = −6 to 4), with a global minimum of 10−6 and a global maximum of 104. In the analysis of the initial conditions, the precision, recall, and F-measure values did not show any problems with data imbalance. Therefore, in the parameter tuning analysis, accuracy was used to study the forecast accuracy. Figure 9 shows the variation in accuracy in the difference between C and gamma, and the variation in accuracy represented by the response surface. The relationship between C and accuracy at a fixed gamma is shown in Figure 10, and the relationship between gamma and accuracy at a fixed C is shown in Figure 11.
The parameter tuning results showed that accuracy was highest at C = 101 and gamma = 10−2. The accuracy at gamma = 10−2 was above 0.82, except at C = 10−1, suggesting that larger values of C do not significantly affect the prediction accuracy. However, the accuracy of gamma varied compared to C, which may have significantly affected the prediction accuracy.
The response surfaces were created and optimized based on the accuracy obtained by tuning the SVM parameters. The response surface is a model that approximates the relationship between the predictor variables and the predicted response. The computational optimization time can be significantly reduced by using the response surface method. Evolutionary design, an approximation method, was used for the response surface technique. Evolutionary design is a method that uses genetic algorithms to search for optimal combinations of elementary functions. The predictor variables were C and gamma, which are SVM parameters, and accuracy was used for the response. Ten values of C were set on a linear scale, with a global minimum of 1 and a global maximum of 100. Three hundred gamma values were set on a linear scale, with a global minimum of 0.001 and a global maximum of 0.3. The optimality obtained for the response surface ranged from gamma = 0.0289 to 0.0346, with an accuracy of 0.8486. The value of C did not affect the accuracy.
When tuning the parameters of the SVM, the value of gamma was found to have a greater influence on the prediction accuracy compared to the value of C. Therefore, we set the value of C to 10, where the best value was obtained, and tuned the gamma value by decreasing the range. Because the local maximum of the gamma was found near 10−2, 300 gamma values were set on a linear scale, with a global minimum of 0.001 and a global maximum of 0.30. Figure 12 shows the relationship between gamma and accuracy at C = 10.
The range of gamma where the accuracy was highest was between gamma = 0.01 and gamma = 0.05, the range indicated by the blue line in the figure. The point with the highest accuracy was in the range from gamma = 0.01 to gamma = 0.05, specifically gamma = 0.028, with an accuracy of 0.862. All points within this range had accuracies greater than 0.84, indicating high prediction accuracy. In the gamma range above 0.05, the accuracy tended to decrease with increasing gamma. A local maximum in the accuracy occurred around gamma = 0.2, but it was not a global maximum.
The prediction accuracy obtained through SVM parameter tuning had a global maximum value of 0.862. The prediction accuracy of the SVM initial conditions was 0.770, which means that parameter tuning improved the prediction accuracy by approximately 11.9%. Parameter tuning may be effective in improving the fit to unknown data in occupant heating behavior.

4.5. Analysis through Parameter Tuning of the DNN

Similar to the parameter tuning for the SVM, parameter tuning was also performed for the DNN to investigate the prediction accuracy. There are two DNN parameters: the layer and the neuron. The larger the values of the layers and neurons, the more complex the interior of the hidden layer becomes. Therefore, it is necessary to consider the values of the layers and neurons that yield the highest prediction accuracy. In addition, because the DNN is a black-box model, the computational process of the hidden layer is not revealed. Instead, it is expected to have higher prediction accuracy compared to the white-box models. Figure 13 shows the change in accuracy for different numbers of layers and neurons. Figure 14 shows the relationship between the number of layers and accuracy when the number of neurons is fixed, and Figure 15 shows the relationship between the number of neurons and accuracy when the number of layers is fixed.
When the number of neurons = 1, the accuracy decreased as the number of layers increased. When the number of neurons = 100 to 600, there was no significant change in accuracy as the number of layers changed. When the number of neurons = 700 to 1000, the accuracy was stable until the number of layers was about six but became unstable as the number of layers increased beyond seven.
The DNN showed excellent prediction accuracy when the number of layers = 2–6 and the number of neurons = 200–500, indicating that too small or too large values for the layer and neuron parameters can have a negative effect on the prediction accuracy. The highest accuracy achieved by parameter tuning was 0.847, and the accuracy under the initial conditions of the DNN, i.e., with the number of layers = 5 and the number of neurons = 200, was 0.827, which improved the prediction accuracy by approximately 2.4%. Because the accuracy under the initial conditions of the DNN was higher than that of the other machine learning models, the expected prediction accuracy was not achieved in parameter tuning.
Compared to SVM parameter tuning, DNN parameter tuning resulted in a lower rate of increase in accuracy. DNNs provide relatively high prediction accuracy without parameter tuning, which should allow for easy verification of accuracy in future analyses. In contrast, the SVM outperformed the DNN in terms of accuracy after parameter tuning, confirming the importance of parameter tuning in SVM. In future studies of prediction accuracy in resident behavior, the tuning of SVM parameters may help improve prediction accuracy.
The accuracy obtained by tuning the DNN parameters was used to create and optimize the response surface. Evolutionary design, an approximation method similar to SVM, was used for the response surface method. The DNN parameters (neuron and layer) were used as the predictor variables, and accuracy was used as the response. Ten neuron values were set on a linear scale, with a global minimum of 1 and a global maximum of 1000. Ten layer values were set on a linear scale, with a global minimum of 1 and a global maximum of 10.
The response surface of the DNN, obtained using the evolutionary design, is shown in Figure 13. The DNN showed more stable values for prediction accuracy over a wider range compared to the SVM. The neurons achieved high prediction accuracy mainly in the range of 200 to 600, whereas the layers achieved high prediction accuracy in layers 5 and 6. For both the neurons and layers, the response surface showed that the prediction accuracy decreased above a certain value.

4.6. Time-Series Changes in Forecast Accuracy Due to Parameter Tuning

To investigate the relationship between the time series and forecast accuracy during the study period, the daily and weekly forecast accuracies were determined. The machine learning model used was SVM, and the parameters were C = 10 and gamma = 0.028, which showed the best accuracy. Figure 16 shows the change in the forecast accuracy over time on a daily basis, and Figure 17 shows the amount of data on a daily basis. Figure 18 shows the change in the forecast accuracy over time from week to week, and Figure 19 shows the amount of data from week to week. Although there were daily variations in the accuracy values, there was no significant variation in the forecast accuracy of the time series, with values averaging close to 0.8. The first week of December and the first and second weeks of January also showed high forecast accuracy, with values above 0.90. While the accuracy of the forecast for the first week of January may have been affected by the small amount of data, the accuracy for the first week of December was high, despite the large amount of data. This suggests that seasonal changes in early December may improve the accuracy of the heating consumption forecasts.

5. Conclusions

This study investigated the prediction accuracy of occupant behavior using machine learning with thermal environment training data measured in a house in Gifu City.
The heating behavior of the occupants in the winter months was predicted. We analyzed the factors affecting heating behavior and performed parameter tuning in machine learning models to examine their accuracy. For feature selection, FS and BE were performed using machine learning. Compared to the baseline, the prediction accuracy improved, but only by 1.5% to 4.2% for LR, SVM, and DNN. Linear regression analysis was also performed to analyze the effects of the features on heating use. Indoor air temperature was the feature that most strongly influenced heating use, with a coefficient of determination of 0.362, but no features were found to have a particularly large effect on heating use.
Parameter tuning of the SVM showed that the values of C and gamma affected the prediction accuracy. The value of gamma was found to have a greater influence on the features than the value of C. Accuracy was highest at 0.862 when C = 10 and gamma = 0.028. Compared to the baseline condition, the prediction accuracy improved by approximately 11.9%, confirming the effectiveness of using parameter tuning in SVM.
Parameter tuning of the DNN showed that the values of the layers and neurons affected the prediction accuracy. Excellent prediction accuracy was observed for layers 2–6 and neurons 200–500. The highest accuracy value was 0.847 when the number of layers = 5 and the number of neurons = 200. Although parameter tuning also improved the prediction accuracy of the DNN, the rate of increase was lower than that of the SVM.
The time-series change in the forecast accuracy after parameter tuning showed high accuracy in the first week of December, the first week of January, and the second week of January. In early January, the small amount of data could have affected the accuracy of the forecast, whereas in early December, the forecast accuracy was high despite the relatively large amount of data. This is expected to improve the accuracy of predicting occupant heating consumption in early December as the season changes.
Future issues that need to be addressed include the following.
Improved accuracy of forecasting models:
Machine learning models that have been widely used in previous studies were used, but there are other models besides those used in this study. In addition, parameter tuning was performed on only two models: SVM and DNN. Therefore, considering the machine learning model and parameter tuning methods used may contribute to further improvements in forecast accuracy. Feature selection also affects prediction accuracy. It is possible that the features not measured in this study have a significant impact on occupant behavior. Although a large amount of data should be collected through new surveys to improve forecasting accuracy, it is also necessary to study the features before the survey.
Automatic schedule generation based on lifestyle considering time history:
Because this study was based on point data at the time of the poll, time history was not taken into account, and we could not get to the point where the daily schedule could be clarified. Accurate surveys of the living environment and analyses using line data from continuous measurements are needed.

Author Contributions

Conceptualization, K.F. and T.N.; methodology, K.F. and T.N.; software, K.F.; validation, K.F. and T.N.; formal analysis, K.F.; investigation, T.N.; resources, T.N.; data curation, K.F.; writing—original draft preparation, K.F.; writing—review and editing, K.F.; visualization, K.F.; supervision, T.N.; project administration, K.F. and T.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was performed with support from the Environment Research and Technology Development Fund JPMEERF20222M01 of the Environmental Restoration and Conservation Agency of Japan.

Data Availability Statement

Not applicable.

Acknowledgments

The authors appreciate the help and cooperation of all the participants in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Huovila, P.; Ala-Juusela, M.; Melchert, L.; Pouffary, S.; Cheng, C.C.; Ürge-Vorsatz, D.; Koeppel, S.; Svenningsen, N.; Graham, P. Buildings and Climate Change: Summary for Decision Makers; Sustainable United Nations, United Nations Environment Programme: Nairobi, Kenya, 2009. [Google Scholar]
  2. IPCC. Summary for Policymakers: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S.L., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M.I., et al., Eds.; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
  3. Congedo, P.M.; Baglivo, C.; Seyhan, A.K.; Marchetti, R. Worldwide dynamic predictive analysis of building performance under long-term climate change conditions. J. Build. Eng. 2021, 42, 103057. [Google Scholar] [CrossRef]
  4. Carlucci, S.; Causone, F.; Biandrate, S.; Ferrando, M.; Moazami, A.; Erba, S. On the impact of stochastic modeling of occupant behavior on the energy use of office buildings. Energy Build. 2021, 246, 111049. [Google Scholar] [CrossRef]
  5. Du, J.; Pan, W. Diverse occupant behaviors and energy conservation opportunities for university student residences in Hong Kong. Build. Environ. 2021, 195, 107730. [Google Scholar] [CrossRef]
  6. Fathi, S.; Srinivasan, R.; Fenner, A.; Fathi, S. Machine learning applications in urban building energy performance forecasting: A systematic review. Renew. Sustain. Energy Rev. 2020, 133, 110287. [Google Scholar] [CrossRef]
  7. Nicol, J.F.; Humphreys, M.A. A Stochastic Approach to Thermal Comfort—Occupant Behavior and Energy Use in Buildings. ASHRAE Trans. 2004, 110, 554–568. [Google Scholar]
  8. Clevenger, C.; Haymaker, J.; Jalili, M. Demonstrating the impact of the occupant on building performance. J. Comput. Civ. Eng. 2014, 28, 99–102. [Google Scholar] [CrossRef]
  9. Ioannou, A.; Itard, L.C.M. Energy performance and comfort in residential buildings: Sensitivity for building parameters and occupancy. Energy Build. 2015, 92, 216–233. [Google Scholar] [CrossRef]
  10. Sun, K.; Hong, T. A simulation approach to estimate energy savings potential of occupant behavior measures. Energy Build. 2017, 136, 43–62. [Google Scholar] [CrossRef] [Green Version]
  11. Zhuang, D.; Gan, J.L.V.; Duygu Tekler, Z.; Chong, A.; Tian, S.; Shi, X. Data-driven predictive control for smart HVAC system in IoT-integrated buildings with time-series forecasting and reinforcement learning. Appl. Energy 2023, 338, 120936. [Google Scholar] [CrossRef]
  12. Zou, H.; Zhou, Y.; Jiang, H.; Chien, S.; Xie, L.; Spanos, C. WinLight: A WiFi-based occupancy-driven lighting control system for smart building. Energy Build. 2018, 158, 924–938. [Google Scholar] [CrossRef]
  13. Duygu Tekler, Z.; Low, R.; Yuen, C.; Blessing, L. Plug-Mate: An IoT-based occupancy-driven plug load management system in smart buildings. Build. Environ. 2022, 223, 109472. [Google Scholar] [CrossRef]
  14. Duygu Tekler, Z.; Lei, Y.; Peng, Y.; Miller, C.; Chong, A. A hybrid active learning framework for personal thermal comfort models. Build. Environ. 2023, 234, 110148. [Google Scholar] [CrossRef]
  15. Andre, M.; Vecchi, R.; Lamberts, R. User-centered environmental control: A review of current findings on personal conditioning systems and personal comfort models. Energy Build. 2020, 222, 110011. [Google Scholar] [CrossRef]
  16. Wang, L.; Greenberg, S. Window operation and impacts on building energy consumption. Energy Build. 2015, 92, 313–321. [Google Scholar] [CrossRef]
  17. Wei, Y.; Zhang, X.; Shi, Y.; Xia, L.; Pan, S.; Wu, J.; Han, M.; Zhao, X. A review of data-driven approaches for prediction and classification of building energy consumption. Renew. Sustain. Energy Rev. 2018, 82, 1027–1047. [Google Scholar] [CrossRef]
  18. Robinson, C.; Dilkina, B.; Hubbs, J.; Zhang, W.; Guhathakurta, S.; Brown, M.A.; Pendyala, R.M. Machine learning approaches for estimating commercial building energy consumption. Appl. Energy 2017, 208, 889–904. [Google Scholar] [CrossRef]
  19. Rijal, H.B.; Humphreys, M.A.; Nicol, J.F. Development of a window opening algorithm based on adaptive thermal comfort to predict occupant behavior in Japanese dwellings. Jpn. Archit. Rev. 2018, 1, 310–321. [Google Scholar] [CrossRef] [Green Version]
  20. Shi, S.; Zhao, B. Occupants’ interactions with windows in 8 residential apartments in Beijing and Nanjing, China. Build. Simul. 2016, 9, 221–231. [Google Scholar] [CrossRef]
  21. Shi, S.; Li, H.; Ding, X.; Gao, X. Effects of household features on residential window opening behaviors: A multilevel logistic regression study. Build. Environ. 2020, 170, 106610. [Google Scholar] [CrossRef]
  22. Jeong, B.; Jeong, J.-W.; Park, J.S. Occupant behavior regarding the manual control of windows in residential buildings. Energy Build. 2016, 127, 206–216. [Google Scholar] [CrossRef]
  23. Jones, R.V.; Fuertes, A.; Gregori, E.; Giretti, A. Stochastic behavioural models of occupants’ main bedroom window operation for UK residential buildings. Build. Environ. 2017, 118, 144–158. [Google Scholar] [CrossRef]
  24. Fabi, V.; Andersen, R.K.; Corgnati, S. Verification of stochastic behavioural models of occupants’ interactions with windows in residential buildings. Build. Environ. 2015, 94, 371–383. [Google Scholar] [CrossRef]
  25. Lai, D.; Jia, S.; Qi, Y.; Liu, J. Window-opening behavior in Chinese residential buildings across different climate zones. Build. Environ. 2018, 142, 234–243. [Google Scholar] [CrossRef]
  26. Zhang, Y.; Barrett, P. Factors influencing the occupants’ window opening behaviour in a naturally ventilated office building. Build. Environ. 2012, 50, 125–134. [Google Scholar] [CrossRef]
  27. Rijal, H.B.; Tuohy, P.; Humphreys, M.A.; Nicol, J.F.; Samuel, A.; Clarke, J. Using results from field surveys to predict the effect of open windows on thermal comfort and energy use in buildings. Energy Build. 2007, 39, 823–836. [Google Scholar] [CrossRef] [Green Version]
  28. Herkel, S.; Knapp, U.; Pfafferott, J. Towards a model of user behaviour regarding the manual control of windows in office buildings. Build. Environ. 2008, 43, 588–600. [Google Scholar] [CrossRef]
  29. Yun, G.Y.; Steemers, K. Time-dependent occupant behaviour models of window control in summer. Build. Environ. 2008, 43, 1471–1482. [Google Scholar] [CrossRef]
  30. Haldi, F.; Robinson, D. On the behaviour and adaptation of office occupants. Build. Environ. 2008, 43, 2163–2177. [Google Scholar] [CrossRef]
  31. Deme Belafi, Z.; Naspi, F.; Arnesano, M.; Reith, A.; Revel, G.M. Investigation on window opening and closing behavior in schools through measurements and surveys: A case study in Budapest. Build. Environ. 2018, 143, 523–531. [Google Scholar] [CrossRef]
  32. Kaito, F.; Takashi, N.; Yoshihiro, M. Prediction of Occupant Behavior toward Natural Ventilation in Japanese Dwellings: Machine Learning Models and Feature Selection. Energies 2022, 15, 5993. [Google Scholar] [CrossRef]
  33. Cheung, F.K.T.; Skitmore, M. Application of cross validation techniques for modelling construction costs during the very early design stage. Build. Environ. 2006, 41, 1973–1990. [Google Scholar] [CrossRef] [Green Version]
  34. Abdou, N.; Mghouchi, Y.; Jraida, K.; Hamdaoui, S.; Hajou, A.; Mouqallid, M. Prediction and optimization of heating and cooling loads for low energy buildings in Morocco: An application of hybrid machine learning methods. J. Build. Eng. 2022, 61, 105332. [Google Scholar] [CrossRef]
  35. Payet, M.; David, M.; Lauret, P.; Amayri, M.; Ploix, S.; Garde, F. Modelling of occupant behaviour in non-residential mixed-mode buildings: The distinctive features of tropical climates. Energy Build. 2022, 259, 111895. [Google Scholar] [CrossRef]
  36. Vapnik, V. The Nature of Statistical Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar] [CrossRef]
  37. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  38. Kayoko, H. Thermal insulation of clothing. J. Text. Mach. Soc. Jpn. 1982, 35, P358–P364. [Google Scholar] [CrossRef]
  39. Japan Meteorological Agency. Available online: https://www.data.jma.go.jp/obd/stats/etrn/ (accessed on 17 January 2023).
  40. American National Standards Institute; American Society of Heating, Refrigerating and Air-Conditioning Engineers. ANSI/ASHRAE Standard 55-2020: Thermal Environmental Conditions for Human Occupancy; American Society of Heating, Refrigerating and Air Conditioning Engineers, Inc.: Atlanta, GA, USA, 2020. [Google Scholar]
  41. Benton, C.; Bauman, F.; Fountain, M. A field measurement system for the study of thermal comfort. ASHRAE Trans. 1990, 96 Pt 1, 623–633. [Google Scholar]
  42. Tetens, O. Uber einige meteorologische begriffe. Z. Geophys. 1930, 6, 297–309. [Google Scholar]
  43. Ministry of Health. Labour and Welfare Part 2 Health and Hygiene Chapter 1 Health. Available online: https://www.mhlw.go.jp/toukei/youran/indexyk_2_1.html (accessed on 17 January 2023).
  44. Yoshihito, K.; Tetsumi, H.; Tadahiro, T.; Naoki, M. Research on Body Surface Area of the Japanese. Jpn. Soc. Biometeorol. 1994, 31, p5–p29. [Google Scholar]
  45. Griffiths, I.; Thermal Comfort Studies in Buildings with Passive Solar Features, Field Studies. Report to the Commission of the European Community, ENS35 090 UK. 1990. Available online: http://ci.nii.ac.jp/naid/10010793725/en/ (accessed on 15 May 2023).
  46. Wang, Z.; Srinivasan, R. A review of artificial intelligence-based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models. Renew. Sustain. Energy Rev. 2016, 75, 796–808. [Google Scholar] [CrossRef]
  47. Tardioli, G.; Kerrigan, R.; Oates, M.; O’Donnell, J.; Finn, D. Data driven approaches for prediction of building energy consumption at urban level. Energy Procedia 2015, 78, 3378–3383. [Google Scholar] [CrossRef] [Green Version]
  48. Pham, A.-D.; Ngo, N.-T.; Ha Truong, T.T.; Huynh, N.-T.; Truong, N.-S. Predicting energy consumption in multiple buildings using machine learning for improving energy efficiency and sustainability. J. Clean. Prod. 2020, 260, 121082. [Google Scholar] [CrossRef]
  49. Li, K.; Xie, X.; Xue, W.; Dai, X.; Chen, X.; Yang, X. A hybrid teaching-learning artificial neural network for building electrical energy consumption prediction. Energy Build. 2018, 174, 323–334. [Google Scholar] [CrossRef]
  50. Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J. Build. Eng. 2022, 45, 103406. [Google Scholar] [CrossRef]
Figure 1. Summary of cross-validation with K = 5.
Figure 1. Summary of cross-validation with K = 5.
Buildings 13 01879 g001
Figure 2. Conceptual diagram of margins and support vectors.
Figure 2. Conceptual diagram of margins and support vectors.
Buildings 13 01879 g002
Figure 3. Conceptual diagram of the decision boundary for a change in C.
Figure 3. Conceptual diagram of the decision boundary for a change in C.
Buildings 13 01879 g003
Figure 4. Conceptual diagram of the decision boundary for a change in gamma.
Figure 4. Conceptual diagram of the decision boundary for a change in gamma.
Buildings 13 01879 g004
Figure 5. Overview of the DNN.
Figure 5. Overview of the DNN.
Buildings 13 01879 g005
Figure 6. Photographs of the measuring equipment.
Figure 6. Photographs of the measuring equipment.
Buildings 13 01879 g006
Figure 7. Distribution of occupants by gender and age in the winter poll.
Figure 7. Distribution of occupants by gender and age in the winter poll.
Buildings 13 01879 g007
Figure 8. Outdoor air temperature trends in Gifu City from 1 December 2010 to 28 February 2011.
Figure 8. Outdoor air temperature trends in Gifu City from 1 December 2010 to 28 February 2011.
Buildings 13 01879 g008
Figure 9. Variation in accuracy in the difference between C and gamma, and variation in accuracy represented by the response surface.
Figure 9. Variation in accuracy in the difference between C and gamma, and variation in accuracy represented by the response surface.
Buildings 13 01879 g009
Figure 10. Relationship between C and accuracy at fixed gamma.
Figure 10. Relationship between C and accuracy at fixed gamma.
Buildings 13 01879 g010
Figure 11. Relationship between gamma and accuracy at fixed C.
Figure 11. Relationship between gamma and accuracy at fixed C.
Buildings 13 01879 g011
Figure 12. Relationship between gamma and accuracy at C = 10.
Figure 12. Relationship between gamma and accuracy at C = 10.
Buildings 13 01879 g012
Figure 13. Variation in accuracy in the difference between the number of layers and neurons, and variation in accuracy represented by the response surface.
Figure 13. Variation in accuracy in the difference between the number of layers and neurons, and variation in accuracy represented by the response surface.
Buildings 13 01879 g013
Figure 14. Relationship between the number of layers and accuracy when the number of neurons is fixed.
Figure 14. Relationship between the number of layers and accuracy when the number of neurons is fixed.
Buildings 13 01879 g014
Figure 15. Relationship between the number of neurons and accuracy when the number of layers is fixed.
Figure 15. Relationship between the number of neurons and accuracy when the number of layers is fixed.
Buildings 13 01879 g015
Figure 16. Time-series variation in forecast accuracy from day to day.
Figure 16. Time-series variation in forecast accuracy from day to day.
Buildings 13 01879 g016
Figure 17. Number of data obtained per day.
Figure 17. Number of data obtained per day.
Buildings 13 01879 g017
Figure 18. Time-series changes in forecast accuracy from week to week.
Figure 18. Time-series changes in forecast accuracy from week to week.
Buildings 13 01879 g018
Figure 19. Number of data obtained per week.
Figure 19. Number of data obtained per week.
Buildings 13 01879 g019
Table 1. Confusion matrix.
Table 1. Confusion matrix.
Predicted Class
PositiveNegative
Actual ClassPositiveTrue Positive (TP)False Negative (FN)
NegativeFalse Positive (FP)True Negative (TN)
Table 2. Gender and age distribution of survey participants.
Table 2. Gender and age distribution of survey participants.
GenderAge Group
MaleFemale0–910–1920–2930–3940–4950–5960–6970–7980+
3233231312823220
Table 3. Summary of the measurement equipment.
Table 3. Summary of the measurement equipment.
ParameterInstrumentResolutionAccuracyManufacturer
Air temperatureThermo Recorder TR-710.1 °C±0.3 °CT&D Corporation
Air temperatureThermo Recorder TR-720.1 °C±0.3 °C
Relative humidity1%±5%
Globe temperatureGlobe Thermometer 150 mm φSIBATA
Table 4. Statistical summary of thermal environment data and thermal comfort indices.
Table 4. Statistical summary of thermal environment data and thermal comfort indices.
FeatureMeanMax.Min.MedianS.D.
IndoorAir temperature (°C)15.728.5−0.516.54.65
Relative humidity (%)53.589195310.56
Air velocity (m/s)0.10.10.10.10.00
Globe temperature (°C)15.132.1−1.615.84.55
Wet-bulb temperature (°C)10.619.9−1.611.13.75
Foot temperature (°C)13.323.2−0.513.83.84
OutdoorAir temperature (°C)4.718.5−3.144.10
Relative humidity (%)66.091156815.64
Air velocity (m/s)2.410.901.91.61
Atmospheric pressure (hPa)1014.11025.2997.91015.46.05
Cloud cover (-)6.910093.79
Precipitation (mm)0.111.5000.63
Thermal indexOperative temperature (°C)15.429.1−0.616.24.58
MRT (°C)15.132.1−1.615.84.55
Dew point temperature (°C)6.018.1−6.76.24.39
WBGT (°C)11.921.5−1.312.63.93
ET* (°C)16.132.8−0.516.65.00
SET* (°C)17.836.9−3.917.96.69
Neutral temperature (°C)17.729.11.418.44.37
Tdiff (°C)−2.32−8−21.94
Top-18 (°C)−2.611.1−18.6−1.84.59
Human factorMetabolic rate (met)1.320.81.20.39
Clothing insulation (clo)0.82.70.30.70.32
MRT, mean radiant air temperature; WBGT, wet-bulb globe temperature; ET*, new effective temperature; SET*, standard new effective temperature; Tdiff, difference between operative temperature and neutral temperature; Top-18, difference between 18 °C and current Top.
Table 5. Indoor environmental conditions in the poll during the period.
Table 5. Indoor environmental conditions in the poll during the period.
Occupant BehaviorNumberOn (Open)Off (Close)
Heating382124421379
Window38211123709
Door38216113210
Curtain382112632558
Kotatsu382110192802
Table 6. Statistical summary of subjective votes in the winter analysis.
Table 6. Statistical summary of subjective votes in the winter analysis.
Subjective VoteProportion (%)
Thermal sensationCoolNeutralWarm
32.068.00.0
Thermal consciousUnconscious Conscious
37.6 62.4
Thermal acceptabilityUnacceptable Acceptable
12.3 87.7
Thermal toleranceIntolerable Tolerable
9.7 90.3
Affective assessmentUncomfortable Comfortable
18.9 81.1
Thermal preferenceCoolerNo changeWarmer
0.234.365.5
Table 7. Features used in the winter analysis.
Table 7. Features used in the winter analysis.
Features
Thermal
environmental data
Indoor Indoor air temperature, indoor relative humidity,
indoor air velocity, globe temperature,
wet-bulb temperature, foot temperature
OutdoorOutdoor air temperature, outdoor relative humidity,
outdoor air velocity, atmospheric pressure, cloud cover, precipitation
Thermal comfort
indices
Operative temperature, MRT, dew point temperature,
WBGT, ET*, SET*, neutral temperature, Tdiff, Top-18
Subjective vote Thermal sensation, thermal conscious,
thermal acceptability, thermal tolerance,
affective assessment, thermal preference
Human factor Gender, age, metabolic rate, clothing insulation, posture
Occupant behavior Window, door, curtain, kotatsu
Other Date/time
Table 8. Comparison of accuracies of machine learning models as a function of initial conditions.
Table 8. Comparison of accuracies of machine learning models as a function of initial conditions.
AccuracyPrecisionRecallF-Measure
LR0.7830.8620.7870.823
SVM0.7700.8290.8070.818
DNN0.8270.8820.8430.862
Table 9. Comparison of the evaluation indices by feature selection.
Table 9. Comparison of the evaluation indices by feature selection.
ModelsFeature SelectionAccuracyPrecisionRecallF-Measure
LRFS0.7950.8720.7960.832
BE0.7950.8710.7980.833
SVMFS0.7930.8100.8840.845
BE0.8020.8130.8980.853
DNNFS0.8430.8910.8600.875
BE0.8470.8960.8610.878
Table 10. Features selected for forward selection and backward elimination.
Table 10. Features selected for forward selection and backward elimination.
LRSVMDNN
FSBEFSBEFSBE
Thermal
environmental data
IndoorAir temperature000011
Relative humidity101111
Air velocity110011
Globe temperature010001
Wet-bulb temperature100011
Foot temperature110011
OutdoorAir temperature111111
Relative humidity101000
Air velocity000010
Atmospheric pressure101011
Cloud cover010000
Precipitation000011
Thermal comfort indicesOperative temperature100010
MRT100011
Dew point temperature010101
WBGT100011
ET*110001
SET*010001
Neutral temperature100010
Tdiff010001
Top-18001011
Subjective voteThermal sensation111111
Thermal conscious011011
Thermal acceptability111011
Thermal tolerance101111
Affective assessment001111
Thermal preference101011
Human factorGender001011
Age111111
Metabolic rate001011
Clothing insulation000001
Posture111111
Occupant behaviorWindow111111
Door 111111
Curtain001111
Kotatsu111111
OtherDate/Time111111
A 1 indicates that the feature was selected in FS and BE. A 0 indicates that the feature was not selected in FS and BE.
Table 11. Comparison of indicators by linear regression.
Table 11. Comparison of indicators by linear regression.
CoefficientStd. Errort-Statp-Value
Thermal
environmental data
IndoorAir temperature0.3620.002235.10.000
Relative humidity0.0120.00120.70.000
Air velocity−0.001NaNNaNNaN
Globe temperature0.0830.00254.80.000
Wet-bulb temperature−0.0930.002−48.60.000
Foot temperature−0.0430.002−25.70.000
OutdoorAir temperature−0.0360.002−23.80.000
Relative humidity−0.0010.000−1.80.072
Air velocity0.0080.0042.10.038
Atmospheric pressure0.0000.0010.00.996
Cloud cover−0.0010.002−0.40.703
Precipitation0.0130.0101.40.175
Thermal comfort indicesOperative temperature−0.1430.002−92.70.000
MRT0.0830.00254.40.000
Dew point temperature0.0300.00218.80.000
WBGT−0.0360.002−19.80.000
ET*−0.0130.001−9.10.000
SET*0.0270.00126.10.000
Neutral temperature−0.1010.002−67.20.000
Tdiff−0.0440.003−12.70.000
Top-18−0.1490.002−96.70.000
Subjective voteThermal sensation−0.0220.007−3.20.001
Thermal conscious−0.0070.013−0.60.560
Thermal acceptability0.0330.0191.70.087
Thermal tolerance0.1180.0225.40.000
Affective assessment0.0080.0081.00.305
Thermal preference0.0220.0131.60.100
Human factorGender0.0240.0122.00.049
Age−0.0030.000−6.40.000
Metabolic rate−0.1150.016−7.20.000
Clothing insulation−0.2550.019−13.20.000
Posture−0.0140.002−6.40.000
Occupant behaviorWindow−0.2690.037−7.30.000
Door −0.2040.017−11.90.000
Curtain0.0980.0137.50.000
Kotatsu0.0710.0145.10.000
Intercept−2.892Infinity0.01.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Furuhashi, K.; Nakaya, T. Investigating the Effects of Parameter Tuning on Machine Learning for Occupant Behavior Analysis in Japanese Residential Buildings. Buildings 2023, 13, 1879. https://doi.org/10.3390/buildings13071879

AMA Style

Furuhashi K, Nakaya T. Investigating the Effects of Parameter Tuning on Machine Learning for Occupant Behavior Analysis in Japanese Residential Buildings. Buildings. 2023; 13(7):1879. https://doi.org/10.3390/buildings13071879

Chicago/Turabian Style

Furuhashi, Kaito, and Takashi Nakaya. 2023. "Investigating the Effects of Parameter Tuning on Machine Learning for Occupant Behavior Analysis in Japanese Residential Buildings" Buildings 13, no. 7: 1879. https://doi.org/10.3390/buildings13071879

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop