1. Introduction
The problem of energy and thermal comfort has become a global issue. Many studies have tried to solve energy problems by predicting thermal comfort with variations in the type and shape of the building. Thermal comfort predictions can reduce energy wastage in buildings [
1]. Thermal comfort cannot be separated from the scale of thermal sensation, model, climate, and personal variables. Several studies have used a variety of thermal sensation scales. The model is the result of an analysis of the thermal comfort variable, which is used to predict human thermal comfort [
2]. Climatic variables associated with the thermal sensation of residents in naturally ventilated nursing homes can influence the mathematical model. Regions with different climates will also produce different responses, so research areas must pay attention to hot or cold areas [
3]. A user’s thermal perception is one of the main variables in thermal comfort research. Field tests can be used in thermal comfort research Questionnaire based on the thermal perception standard from ASHRAE 55. The analysis uses regression analysis, producing a mathematical model or equation of thermal comfort [
4]. Airflow is one of the factors of thermal comfort. Human perception of airflow is also essential to know in achieving thermal comfort. Using the laboratory as an experimental room in research can obtain results that can be accounted for. The use of regression analysis can be applied to airflow research using an experimental room [
5].
The thermal comfort research model aims to predict building occupants’ thermal comfort. The prediction model generated from research is used as a standard for making building designs. The current prediction model uses an adaptive thermal comfort approach. Research verifying the adaptive thermal comfort model has been conducted in four Brazilian cities. The study used the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) method and found that the variation in the verified adaptive thermal comfort model was more than 90% for the four cities in Brazil [
6]. Predictively modeling personal thermal comfort has become a trending topic in the improvement of human comfort in rooms. Thermal comfort is closely related to the design and performance of building systems, especially in sustainable and intelligent buildings [
7]. Thermal comfort modeling is manifold. A basis for modeling that is currently developing is the use of stochastic algorithms and variables [
8]
The model’s accuracy in predicting the thermal comfort of occupants is an important aspect that must be continuously developed. Accurate models that predict the right results are convincing [
9]. Linear regression is also used in outdoor thermal comfort studies. Thermal perception and other variables such as air temperature, wind, and sun exposure have been analyzed using linear regression. The results showed a predictive model of thermal comfort. In addition, equations about air temperature have also been found based on wind and sun exposure [
10]. In outdoor thermal comfort research, sun exposure is a factor that is more considered than in indoor thermal comfort research. The temperature of solar radiation in the room is not too significant compared with the temperature of solar radiation outside. The mathematical model of thermal comfort in the interior still includes the average solar radiation temperature according to the thermal comfort factor that has been formulated [
11].
Methods in thermal comfort research include simulation with software, modeling in the lab, and field testing. Field tests are the most widely used in thermal comfort research. The measurement of the thermal comfort variable also sometimes coincides with the measurement of the acoustic and visual comfort variables. A study’s results will show the relationship between the variables of thermal, acoustic, and visual comfort [
12]. The experimental method is also one of the methods used in thermal comfort research. Some studies make the experimental space a tool to test a model. The developing technology makes the experimental space more varied [
13]. Simulation methods are often used to validate a model found in research. Simulation using software is widely used in model validation. One of the programs used for simulation is ENVI-NET.
Simulations are often combined with field measurements to obtain more valid results, with both methods carried out in a study [
14]. Measurements in the field coupled with simulations can use building information modeling (BIM). Computational fluid dynamic (CFD) analysis is often one of the analytical methods in BIM [
15]. Several studies have combined the two methods, but research comparing the two methods has not been widely carried out.
The development of the database field has led to the emergence of increasingly sophisticated data analysis tools. Machine learning algorithms are a new approach to analyzing thermal or visual comfort data. Machine learning algorithms as analytical tools are widely used for research on human comfort in buildings [
16]. Machine learning algorithm data analysis methods continue to be developed and used in various types of buildings. Numerical computing is one of the strengths of machine learning algorithms [
17]. The algorithm can also be used in urban heat island research. Research can produce effective strategies to reduce the urban heat island effect by avoiding overcrowding in infrastructure development; increasing plantations, waterbodies, and roof gardens; and using a white roof color in construction. Research findings will enable urban planners, policymakers, and local governments to achieve environmentally friendly outcomes [
18]. Machine learning (ML)-based building models have gained popularity in building predictive control (MPC) models for building energy management applications. However, ML-based building models are usually nonlinear in capturing building dynamics, which causes a high computational load for MPC models, prohibiting their application in real-time building control [
19]. Analysis of the hot water usage model for the thermal comfort of occupants can also be performed with machine learning. The created model provides control performance for occupant adaptation [
20].
Developing convenience models using machine learning is inevitable with database development and automated calculations. Determining the analytical method is essential to finding an accurate thermal comfort model. The purpose of this study was to compare the method of multiple linear regression analysis and naïve Bayes in making an accurate thermal comfort model.
Measuring thermal comfort can be performed with objective and subjective measurements. Objective measurements use a thermal measuring instrument that measures air temperature, average solar radiation temperature, wind speed, and tire humidity. Subjective measurements include filling out a thermal sensation questionnaire from ASHRAE by having the respondent sit for 15 min within the research object [
21]. Gender is an important factor in thermal comfort. The selection of respondents needs to consider gender. Different sexes will produce different thermal sensations [
22]. Thermal comfort measurements have PMV (predicted mean vote) and PPD (predicted percent dissatisfied) indicators. These two aspects are the keywords in the analysis of thermal comfort [
23].
Thermal comfort prediction with machine learning is still being developed. Prediction can be used to develop the science of building design. Not many studies predict thermal comfort using machine learning [
24]. Machine learning methods have various forms, often using mathematical calculations and computational fluid dynamics (CFD). The use of various machine learning methods means that research results vary [
17].
Thermal comfort prediction using a neural network has also been carried out. The thermoelectric airduct (TE-AD) cooling system is used to predict air temperature, PMV, and PPD. The prediction model is accurate in predicting thermal comfort [
25]. The development of artificial neural networks for machine learning is still being carried out to find the right predictions [
26]. Machine learning in predicting indoor air quality is also still being developed [
27]. Thermal comfort prediction can also combine methods with thermal comfort variables [
28].
Regression analysis is widely used in thermal comfort research to find a predictive model of thermal comfort. Currently, there are many models of thermal comfort that use regression analysis. Many comparisons of machine learning methods to find predictive models of thermal comfort have been carried out. One of the machine learning methods is naïve Bayes. A comparison between naïve Bayes and other methods has also been carried out [
29]. Naïve Bayes can be an alternative to machine learning that does not require complicated calculations. Research that compares the regression method with naïve Bayes to form a predictive model of thermal comfort needs to be performed so that it is known that the method is not complicated and that it produces a better predictive model.
Clothing is one of the variables that affects thermal comfort. The majority of students in Wonosobo, Indonesia, wear closed clothes in carrying out learning activities at school. Currently, there are few predictive models with respondents who wear closed clothes and have a religious culture. Thus, this study has the novelty of finding a predictive thermal comfort model using closed clothing variables in cold areas.
This research can contribute to computer science to find predictive models that are simple and accurate. Contributions to architecture can be used as a basis for architectural design by predicting thermal comfort in naturally ventilated buildings.
Some of the related works are shown in
Table 1, below.
2. Materials and Methods
This research compares two data analysis methods, regression analysis and naïve Bayes. The data used results from the measurement of thermal comfort in the field. Respondents were students at two private high schools in Kejajar District, Wonosobo Regency. The variables used are gender, age, height, weight, temperature, globe temperature, humidity, velocity, and thermal sensation vote (TSV). The survey was conducted on two measurement days in the morning, afternoon, and evening. Temperature and humidity were measured using a measuring tool with the Extect brand. We measured the globe temperature using a black copper ball with a diameter of 15 cm.
Respondents were asked to wait for 15 min when the initial measurements were taken. Respondents were high school students, so respondents had gone through adaptation to the room. The research subjects involved were male and female respondents. In thermal comfort research, it is possible that there are differences in the results of thermal sensation between men and women, although there are studies that say there are not many differences between men and women [
32]. Research subjects were not selected using sampling. Responses were taken from all students who became the object of the research. The number of high school students was 252, and all of them were used as research subjects.
Data analysis used regression analysis with SPSS Statistics 25 and weka 3.8.6. The results of data analysis were compared for accuracy so that more accurate analytical tools could be found. Regression analysis was performed by making a mathematical equation model as follows:
where X
1: gender, X
2: age, X
3: height, X
4: weight, X
5: temperature, X
6: globe_temperature, X
7: relative_humidity, and X
8: velocity.
Data analysis using naïve Bayes requires a reasonably long process starting with determining the training data that will be the test data. The calculation is performed by calculating the class probability P(Y), the probability of each P(X) criterion, and the final probability. Naïve Bayes analysis will produce an output from Y or TSV predictions. The resulting output can predict the TSV generated by building occupants with the value of the independent variable set (
Figure 1).
3. Results
The data obtained amounted to 252 datasets. Respondents were 252 high school students. Female respondents wear hijab school uniforms, while men do not wear head coverings. Female students wear long and long-sleeved skirts, and male students wear shirts and trousers. All students, both girls and boys, wear shoes and socks. The activities they perform are sitting writing and sitting listening for 7–16 h. The total amount of data from eight independent variables and one dependent variable is 2268. Respondents consisted of 44% men and 56% women. Respondent ages ranged from 14 to 19 years, with an average of 16.3 years. The respondents’ heights were between 140 and 177 cm, with an average of 156 cm. The respondents’ body weights were between 30 and 82 kg, averaging 47.9 kg. The temperature in the class was between 22 and 25.5 °C, with an average of 23.75 °C. Globe temperature in the class was between 23 and 26.5 °C, with an average of 24.6 °C. Humidity was between 60 and 80%, with an average of 68.63%. Velocity did not involve too much movement, so it shows more zeros. The most significant velocity was 1 m/s. The thermal sensation votes obtained ranged from −3 (very cold) to +3 (very hot), with an average of −0.89 (near cool). Data measurement was performed by bringing the measuring instrument closer to the respondent by placing the measuring instrument on the classroom table (
Figure 2).
Data analysis using multiple linear regression has several data test requirements, namely, validity and reliability. In addition, the classical assumption test also needs to be carried out to obtain data that can be used for multiple linear regression analysis. Analysis using SPSS software resulted in a large amount of valid test data. The normality test was part of the regression analysis and obtained a model that meets the assumption of normality (
Figure 3).
Multiple linear regression data analysis using SPSS produced a value of unstandardized coefficients that can be used as the coefficient of the prediction model. Some of the resulting values looked insignificant. This value indicates that the influence of the independent variable on the dependent is less potent (
Table 2). This value can still be used in predicting thermal comfort because several models from other studies also obtained the same results.
Based on the value of the constants and regression coefficients obtained from the regression analysis, it is known that the multiple linear regression equation based equation (Equation (1)) is as follows:
Several variables show a high significance value. This indicates that several independent variables did not strongly influence the dependent variable. This is possible because of the type of clothing worn by the respondents.
The comfortable air temperature was calculated when Y = 0 (comfortable thermal sensation condition), and a comfortable air temperature of 33.19 °C was obtained. The comfortable air temperature produced was higher than the average comfortable air temperature in the tropics, which is 27 °C. This is possible because the closed clothes worn by respondents are hijab for women and trousers for men.
Based on Equation (1) that was generated, a prediction of thermal comfort can be made for the sample data based on
Table 3 as follows:
Naïve Bayes analysis begins with determining the training data in as many as 252 datasets according to the data from the measurement results. The variables used in predicting thermal comfort (TSV) are the same as the regression analysis, namely, gender, age, height, weight, temperature, globe temperature, relative humidity, and velocity. The training data is used as test data, which is the basis of our calculations.
Based on the training data used, there are 7 TSV classes, namely, class −3 with 3 data, class −2 with 87 data, class 1 with 81 data, class 0 with 50 data, class 1 with 24 data, class 2 with 6 data, and class 3 with as much as 1 data. TSV class classification data in the Weka software is shown in
Figure 4.
The probability value for each criterion is shown in
Appendix A.
In this test data, a thermal comfort (TSV) prediction can be made if the data used are as follows: gender: 1, age: 18, height: 160, weight: 46, temperature: 22, globe temperature: 24, relative humidity: 62, and velocity: 0.
Based on the test data in
Table 3, a prediction calculation can be made for the data above.
Based on the naïve Bayes algorithm calculations based on
Table 4, the highest value is 0.000026 for class −2. Thus, the prediction of the test data is class −2.
The accuracy of the prediction results can be calculated based on the confusion matrix. From a total of 252 available data, 0 data in class A were correctly predicted as class A (−3), and 3 data were correctly predicted as class one. In total, 72 data in class B were correctly predicted as class B (−2), and 15 data were correctly predicted as class B. In total, 49 data in class C were correctly predicted as class C (−1), and 32 data were predicted incorrectly as class C. In total, 29 data in class D were correctly predicted as class D (0), and 21 data were predicted incorrectly as class D. In total, 16 data in class E were correctly predicted as class E (1), and 8 data were predicted incorrectly as class E. In total, three data in class F were correctly predicted as class F (2), and three data were incorrectly predicted as class F. Zero data in class G were correctly predicted as class G (3), and one datum was predicted incorrectly as class G.
The results show that the number of correctly predicted data (correctly classified instances) was 169 data, or 67.06%, while the incorrectly predicted results (incorrectly classified instances) amounted to 83 data, or 32.94%.
The comparison between regression analysis and naïve Bayes seen from the number of TSVs shows that the regression analysis found the highest TSV predictions in the cool range (−1), with as many as 147 data. In the naïve Bayes prediction, the highest TSV was found in the cold range (−2), as much as 104. The highest TSV difference showed a difference in results between the regression analysis and naïve Bayes (
Figure 5).
Higher data variation was found in the results of naïve Bayes analysis, which could predict all TSV categories, while the data generated using linear regression look clustered at a value of 0 to −2. If the prediction results using linear regression and the naïve Bayes method are compared with the actual data in the field, then the naïve Bayes method has a better level of accuracy because the results obtained from naïve Bayes can approach the actual data in the field.
The value generated from naïve Bayes looks close to the actual data generated in field testing. The value using regression analysis shows the greater the value, the higher the value (
Figure 6).
By comparing the prediction results with the initial data, the level of accuracy of both methods can be found. The linear regression analysis has data accuracy in as many as 84 of 252 datasets, and naïve Bayes analysis makes correct predictions in 181 out of 252 datasets. The accuracy of linear regression analysis is 33%, while the naïve Bayes analysis is 67%. Prediction results show that the accuracy of naïve Bayes is higher than the multiple linear regression analysis (
Table 5).
4. Discussion
Thermal comfort data obtained based on age are still relevant as a basis for formulating a thermal comfort model. Age is essential in thermal comfort, and other studies have analyzed elderly respondents. Thermal comfort models built with different age data will produce different findings. Research in Tibet in winter and summer with elderly respondents found differences in the research results regarding the acceptance of thermal comfort [
33]. Individual thermal comfort response is inseparable from the microclimate of each region. Solar radiation influences the individual’s thermal comfort response with an influence on the average solar radiation temperature. The air content also influences the thermal comfort response of each individual in an area [
34]. Modeling with thermal comfort data more often uses regression analysis. The use of machine learning has now grown so that the methods applied are more varied [
35].
The use of regression analysis is still carried out in modeling thermal comfort in outdoor spaces. Research with regression analysis is still accurate in evaluating outdoor thermal comfort by including physiological parameters. The model found that it can provide a design basis for creating thermally comfortable open spaces in urban parks [
36]. A comparison of methods in modeling thermal comfort using thermal sensation vote (TSV) was carried out, but it still needs to be used to find the most accurate method. TSV is one of the appropriate variables in accurately predicting thermal comfort at a rate of 95.8%. The Bayesian optimization technique is considered an accurate method for making prediction models. Algorithms in Bayesian optimization techniques can predict individual thermal comfort [
31]. The results of other studies show that linear discriminant analysis (LDA) is better than linear regression (LA). Several algorithms show different results in different cases. These findings can contribute to studying subjective and objective feelings of indoor thermal comfort in public buildings, thereby guiding architectural design, the intelligent control of ventilation systems, and realizing human–building interaction interfaces [
37].
Naïve Bayes is better than regression analysis. The results showed that naïve Bayes has a calculation accuracy of 67%. Another study compared naïve Bayes with artificial neural network (ANN), fuzzy logic (FL), and PMV-based algorithms. Other results show that the naïve Bayes calculation provides a prediction accuracy of 73% [
30]. The difference compared with the research conducted in other studies is 1%. Another study comparing several machine learning methods in finding predictions of city thermal comfort found that naïve Bayes resulted in a data accuracy of 40.43% [
38]. The results of other studies are quite different from the research that has been undertaken. Thermal comfort data in urban areas may differ from indoor data. Research on energy consumption savings that compares naïve Bayes and regression has also found results that are not different from the research on thermal comfort that has been carried out. The results of the study of energy consumption savings with regression resulted in a data accuracy of 41.43% and a naïve Bayes accuracy of 73% [
39]. The results of other research regressions compared with the research that has been performed have a difference of 41.43 minus 33%, which is 8.43%. The difference compared with naïve Bayes accuracy is 1%.
The prediction results obtained from linear regression and naïve Bayes are not precise but instead are based on the closest value [
40]. In linear regression, the results are obtained by rounding the final grade to the nearest side of the class, while the results from naïve Bayes are obtained from the class that has the largest final score.
PMV (predicted mean vote) and PPD (predicted percentage of dissatisfied) values were obtained using the CBE Thermal Comfort Tool software from
https://comfort.cbe.berkeley.edu/ (accessed on 3 October 2022). Thermal variable data in the form of air temperature, average solar radiation temperature (globe temperature), wind speed, humidity, metabolism, and respondent activity were entered into the software, and PMV and PPD values were obtained. A total of 252 respondents calculated their PMV and PPD. The distribution of the PMV values was mostly in the range of −0.5 to −1, a value that indicates that a respondent is almost cold (score: −1). In another value, respondents seem to obtain a PMV value of 0.5, which indicates that some respondents feel close to warm (score: 1). The overall PMV results show on
Figure 7 that the respondents are still not too cold or too hot.
The highest PPD value produced by the respondents reached 19%. The minimum value is 5%, and the average PPD produced is 9%. Not many respondents reached the 19%value. The PPD value generated using the software from
https://comfort.cbe.berkeley.edu/software (accessed on 3 October 2022) shows that respondents are predicted to still be able to accept the existing thermal conditions. The PPD value is still below 25%, which means that respondents are still relatively comfortable with the existing thermal conditions (
Figure 8).