Personalized Human Thermal Sensation Prediction Based on Bayesian-Optimized Random Forest

Yang, Hao; Ran, Maoyu

doi:10.3390/buildings15142539

Open AccessArticle

Personalized Human Thermal Sensation Prediction Based on Bayesian-Optimized Random Forest

by

Hao Yang

^1,2,3

and

Maoyu Ran

^1,2,*

¹

School of Architecture, Huaqiao University, Xiamen 361021, China

²

Xiamen Key Laboratory of Ecological Building Construction, Xiamen 361021, China

³

Faculty of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(14), 2539; https://doi.org/10.3390/buildings15142539

Submission received: 5 June 2025 / Revised: 29 June 2025 / Accepted: 17 July 2025 / Published: 19 July 2025

(This article belongs to the Special Issue Building Energy Performance and Thermal Comfort: Synergies and Challenges)

Download

Browse Figures

Versions Notes

Abstract

Establishing a predictive model for human thermal sensation serves as the fundamental theoretical basis for intelligent control of building HVAC systems based on thermal comfort. The traditional Predicted Mean Vote (PMV) model exhibits low accuracy in predicting human thermal sensation and is not well suited for practical applications. In this study, real thermal sensation survey data were collected and used to first analyze the discrepancy between PMV model predictions and actual human thermal sensation. Subsequently, a simple thermal sensation prediction model was developed using multiple linear regression. More accurate personalized thermal sensation prediction models were then constructed using various machine learning algorithms, followed by a comparative analysis of their performance. Finally, the best-performing model was further optimized using Bayesian methods to enhance hyperparameter tuning efficiency and improve the accuracy of personalized human thermal sensation prediction.

Keywords:

Bayesian optimization; Random Forest (RF); human thermal comfort; machine learning; PMV

1. Introduction

Indoor environments in buildings are the primary living spaces for humans. Studies have shown that people spend approximately 70% to 90% of their lives indoors [1]. The thermal conditions of indoor environments are not only closely related to human comfort and physiological health, but also have significant implications for energy consumption and environmental protection. Therefore, accurately assessing occupants’ thermal comfort is essential for maintaining a comfortable thermal environment while promoting energy efficiency. For example, in building air-conditioning system control, using occupants’ thermal sensation as feedback can improve both indoor thermal comfort and energy utilization efficiency.

Developing predictive models of human thermal sensation forms the fundamental theoretical basis for regulating indoor thermal environments based on human thermal comfort. The Predicted Mean Vote (PMV) model, proposed by Fanger in the 1970s [2], is one of the most widely used models for predicting thermal comfort. PMV is a thermal sensation prediction model based on mathematical and statistical analysis [3]. Over the years, it has been extensively applied by researchers worldwide to assess and predict indoor thermal comfort and human thermal perception [4,5,6], as well as in the control and operation of building air-conditioning systems [7,8,9,10]. However, traditional thermal comfort models such as PMV are based on steady-state environmental assumptions and therefore struggle to accurately reflect human thermal sensation in dynamic conditions. Numerous empirical studies have shown significant discrepancies between PMV predictions and actual measured thermal sensations, revealing the limitations of the model in real-world applications [11,12]. Firstly, the PMV model is designed to predict the average thermal sensation of a large population [13]. However, due to individual differences, people’s perceptions of thermal environments can vary significantly [14], which are influenced by factors such as gender [15,16,17] and body mass index (BMI) [18]. As a result, PMV predictions may not accurately represent the thermal sensation of smaller or specific subgroups [19,20]. Secondly, the PMV model requires six specific input variables—air temperature, relative humidity, mean radiant temperature, clothing insulation, air velocity, and metabolic rate. Among these, variables such as clothing insulation and metabolic rate are often difficult to measure accurately in practical settings. Therefore, they are frequently assumed or simplified [21], which can lead to significant deviations between the model’s predictions and actual human thermal sensation. Thirdly, in addition to the six aforementioned parameters, human thermal sensation is also influenced by complex and nonlinear relationships with other environmental factors, such as outdoor humidity, wind speed, and radiant heat [22]. These additional factors are equally important in shaping thermal perception [23,24]. Consequently, the traditional PMV model often lacks sufficient predictive accuracy for real-world applications [25].

To enhance the accuracy of the Predicted Mean Vote (PMV) in assessing real thermal comfort, Toftum [26] developed the extended PMV (ePMV) model by introducing an adjustment factor “e” to account for expectancy effects. Later, Yao et al. [27] improved the PMV model by integrating an adaptive coefficient “λ,” leading to the adaptive PMV (aPMV) model for better thermal sensation prediction. While both ePMV and aPMV outperform the traditional PMV model, their application has limitations. The expectancy factor “e” varies depending on local climate and air-conditioning usage, while the adaptive coefficient “λ” necessitates extensive thermal comfort data from diverse climatic zones [28].

In recent years, with the rapid advancement of artificial intelligence technologies, machine learning methods have opened new possibilities for predicting human thermal sensation. Machine learning-based approaches are capable of capturing the complex and nonlinear relationships between thermal sensation and its influencing factors [29,30], making them an emerging focus in the field of thermal comfort research [31,32]. Compared with traditional thermal comfort models, machine learning models have demonstrated significantly higher prediction accuracy [33]. Previous studies have employed a variety of machine learning algorithms to develop thermal sensation prediction models, including Support Vector Regression (SVR) [34], Artificial Neural Networks (ANNs) [35], Random Forest (RF) [36,37], and K-Nearest Neighbors (KNN) [38], to improve model performance. The input features used across different studies vary, though most include parameters consistent with the PMV model, such as indoor air temperature, relative humidity, air velocity, mean radiant temperature, metabolic rate, and clothing insulation. Some studies have also considered physiological attributes of the subjects, such as gender [36], age [29], or body mass index (BMI) [30]. Recent studies in machine learning-based thermal comfort prediction are summarized in Table 1.

In the application of machine learning for predicting human thermal sensation, the choice of algorithm is crucial [36]. Different algorithms are based on varying principles and computational logic, which can lead to divergent prediction results when modeling the relationship between various parameters and thermal sensation [39]. In addition to algorithm selection, hyperparameters also play a significant role in influencing model performance [49]. Hyperparameter optimization methods are generally categorized into manual and automated search strategies. Manual search relies on the researcher’s experience and intuition to adjust hyperparameter combinations. However, this approach is often inefficient and may fail to achieve optimal results, especially for complex problems.

To address the limitations of manual tuning, automated optimization methods such as grid search [50] and random search [51] have been proposed. Grid search uses an exhaustive strategy to evaluate all possible hyperparameter combinations [52]. Although it is straightforward, the method becomes computationally expensive and inefficient in high-dimensional parameter spaces. In contrast, random search improves optimization efficiency by sampling hyperparameter combinations randomly, often identifying near-optimal solutions in a shorter time [53].

Nevertheless, both grid and random search are limited to local optimization. To overcome this, Bayesian optimization has been introduced as a more effective approach [54]. Extensive experimental evidence suggests that Bayesian optimization outperforms traditional methods in both global search efficiency and prediction accuracy, particularly when handling complex nonlinear problems [55]. However, its application in the context of human thermal sensation prediction remains relatively rare.

In summary, the traditional PMV model for predicting thermal sensation is limited by substantial computational errors, making it difficult to apply effectively in real-world engineering contexts. In contrast, machine learning algorithms offer a more efficient and flexible approach to predicting human thermal comfort. However, the differences in prediction accuracy among various algorithms require comprehensive comparative analysis. Furthermore, the performance of these models can be significantly enhanced through hyperparameter optimization, for which efficient methods such as Bayesian optimization are particularly well suited.

Therefore, based on collected real-world thermal sensation questionnaire data, this study first analyzes the discrepancy between the PMV model’s predicted values and actual human thermal sensation. Secondly, a simple predictive model is developed using multiple linear regression. Subsequently, several machine learning algorithms are employed to construct more accurate personalized thermal sensation prediction models, and a comparative analysis of their performance is conducted. Finally, the best-performing model is further optimized using Bayesian optimization to enhance hyperparameter tuning efficiency, thereby improving the predictive accuracy of personalized thermal sensation assessment.

2. Methodology

2.1. Data Collection

2.1.1. Measurement Equipment

The data collection site for this paper was located in a public office building. Indoor and outdoor environmental parameters were measured using various sensors. The types of parameters and corresponding sensor specifications are summarized in Table 2. The measured variables included indoor air temperature, indoor relative humidity, outdoor air temperature, outdoor relative humidity, outdoor wind speed, and outdoor solar radiation intensity. All sensors were manufactured by Campbell Scientific, Inc. (Logan, UT, USA).

To facilitate the fitting of the prediction model and to expand the variability of indoor thermal environment parameters as well as the range of occupants’ thermal sensations, the indoor air-conditioning system was turned off during questionnaire surveys conducted on 10 June, 8 July, 8 September, 25 November, and 2 December 2024.

It should be noted that although outdoor parameters (such as wind speed and solar radiation) do not directly determine thermal sensation, they can indirectly influence the indoor thermal environment through building envelope performance and HVAC dynamics (e.g., solar radiation increases indoor temperature; outdoor wind affects building heat loss). By incorporating these features, our data-driven model can empirically learn these underlying relationships, which is also a common practice in machine learning-based thermal comfort prediction studies.

2.1.2. Thermal Sensation Data Collection

This study involved a total of 7 participants. Before data collection, all participants were informed about the experimental content and procedures, and their informed consent was obtained. Although there are existing computer vision-based techniques for automatically extracting occupants’ clothing insulation, posture, and facial cues [56,57], considering the high cost and complexity of deploying these models in practical engineering applications, participants were instructed to adopt adaptive behaviors to adjust their clothing and activities during the experiment.

The thermal sensation questionnaires were collected on tablet computers using randomly generated participant IDs. No personal information such as names, contact details, device identifiers, or IP addresses was collected. All surveys were conducted in naturally formed indoor public environments without any artificial interventions (e.g., extreme high or low temperatures). No physiological or psychological risks were introduced.

Only subjective thermal sensation votes (TSV) were collected, following the ASHRAE standard 7-point scale (−3 to +3): very cold (−3), cold (−2), slightly cool (−1), neutral (0), slightly warm (+1), warm (+2), and very hot (+3). From June to December 2024, thermal sensation questionnaire data were continuously collected, resulting in a total of 2710 thermal sensation vote records.

2.2. Data Preprocessing

2.2.1. Data Standardization

Data standardization is a crucial step in data analysis that transforms the distribution of input variables to a normal distribution with a mean of 0 and a standard deviation of 1. This process eliminates the potential bias caused by differences in data scales and dimensions, which could otherwise weaken the analysis.

Many machine learning algorithms are sensitive to the numerical range of features. If certain features have significantly larger value ranges, the model may disproportionately prioritize them, leading to degraded performance. For instance, temperature values may fluctuate between 0 and 40 °C, while humidity varies from 0–100%. Therefore, standardizing such continuous numerical variables removes the influence of their varying ranges and units.

In this study, the Z-score standardization method [58] is employed, calculated as follows:

z = \frac{x - μ}{σ}

(1)

where x = original feature value, μ = mean of the feature, σ = standard deviation of the feature, and z = the standardized values.

2.2.2. Data Imbalance Handling

In this study, the target variable of thermal sensation employs a 7-point scale (−3 to +3) to characterize human subjective perception of thermal environments. Statistical analysis of the raw data reveals a significant imbalance in sample distribution, with the majority of ratings concentrated in the neutral range (−2 to +2) while extreme ranges (±3) are underrepresented. Directly training regression models with such imbalanced data may lead to overfitting for intermediate thermal sensation values and poor predictive performance for extreme states (e.g., “very hot” or “very cold”), ultimately compromising the model’s applicability and robustness in real-world thermal environment control.

To address this issue, this study employs the SMOGN algorithm (Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise) [59] for training data augmentation. SMOGN is an extension of the traditional SMOTE algorithm for regression problems, which can identify “rare-value intervals” in the target variable and generate new synthetic samples within these intervals based on interpolation relationships between minority samples and their neighbors, while applying Gaussian noise to the target values, thereby preserving the original data distribution characteristics while expanding the data volume in sparse regions.

For a given rare sample

x_{i}

and its neighbor

x_{j}

in feature space, the synthetic sample is generated as:

\tilde{x} = x_{i} + λ (x_{j} - x_{i}), λ ~ μ (0, 1)

(2)

where

\tilde{x}

denotes the newly generated feature vector, λ is a random number uniformly distributed in [0, 1], and

x_{i}

,

x_{j}

represent the minority sample and its neighbor, respectively.

The corresponding target variable (thermal sensation value) is interpolated with added Gaussian noise:

\tilde{y} = y_{i} + λ (y_{j} - y_{i}) + ∊, ∊ ~ N (0, σ^{2})

(3)

where

\tilde{y}

denotes the newly generated target value;

y_{i}

,

y_{j}

are the thermal sensation values of the original sample and its neighbor; and ϵ is a Gaussian noise term following a normal distribution with mean 0 and variance

σ^{2}

.

In this study, thermal sensation values are treated as continuous target variables, with a relevance threshold set to identify extreme-value intervals (±3 and adjacent ranges), and 5 Nearest Neighbors (k = 5) are used by default for sample synthesis. Through this method, the sample size for extreme thermal sensation states is significantly increased, enhancing the model’s learning capability for thermal discomfort states, which lays a data foundation for improving the accuracy of thermal environment prediction and optimizing subsequent energy-saving control strategies.

2.3. Data Analysis

2.3.1. Multiple Linear Regression

As a core statistical method, multiple linear regression (MLR) quantifies the combined effect of several explanatory variables on a response variable by fitting a linear equation to observed data [60]. In this study, indoor and outdoor environmental parameters were first treated as independent variables, while thermal sensation was considered the dependent variable. MLR was employed to establish a mathematical–statistical relationship between them. The mathematical formulation of the MLR model can be expressed as:

Y = {β_{0} + β}_{1} X_{1} + β_{2} X_{2} \dots + β_{k} X_{k}

(4)

where X₁, X₂, …, X_k represent the independent variables, Y denotes the dependent variable, β₁, β₂, …, β_k are the regression coefficients, and β₀ is the intercept (constant term). The regression coefficients were estimated using the least squares method.

2.3.2. Machine Learning Algorithms

(1): The Backpropagation Neural Network

The Backpropagation Neural Network (BPNN) is an Artificial Neural Network algorithm that employs the backpropagation algorithm to adjust synaptic weights and thresholds, enabling the network to fit training data with minimal error. A typical BPNN architecture consists of an input layer, one or more hidden layers, and an output layer, and is widely applied to both classification and regression tasks.

During training, the BPNN iteratively computes prediction errors via backpropagation and updates weights and thresholds to optimize the model’s fit to the training data. The algorithm operates in two distinct phases: (1) Forward propagation: Input signals are processed layer-by-layer to generate outputs. (2) Backward propagation: Errors are evaluated from output to input layers, while weight and threshold adjustments are propagated in the reverse direction.

(2): Random Forest

Random Forest (RF) is an ensemble learning method. Its core idea is to randomly select samples from the training data with replacement and choose a subset of features randomly to construct each decision tree. Since each decision tree is independent of others, Random Forest supports parallel computing, thereby significantly improving training efficiency. The randomness of Random Forest is reflected in two aspects: the randomness of data selection and the randomness of feature selection. This not only enhances the model’s generalization ability but also effectively prevents overfitting.

In Random Forest models, feature importance can be quantified using the Gini Importance method (also known as Mean Decrease in Impurity). This approach evaluates a feature’s contribution to model predictions based on the reduction in Gini impurity during node splitting in decision trees.

For the class distribution of samples at node t, the Gini impurity G(t) is defined as:

G (t) = 1 - \frac{1}{T} \sum_{i = 1}^{C} {(p_{i})}^{2}

(5)

where C is the output dimension,

p_{i}

is the proportion of samples belonging to class i at node t, and T is the total number of samples at node t.

In Random Forests, the importance I(f) of feature f is calculated by summing the Gini impurity reductions across all trees where the feature was used for splitting:

I (f) = \frac{1}{N_{T}} \sum_{T} \sum_{t ∊ T, s p l i t (t) = f} ∆ G (t)

(6)

where

N_{T}

is the total number of decision trees, ΔG(t) represents the decrease in Gini impurity after node splitting.

The final importance scores are normalized by the sum across all features and expressed as percentages.

(3): Support Vector Regression

Support Vector Regression (SVR) is suitable for nonlinear, high-dimensional and small-sample regression modeling. The core concept involves: (1) utilizing kernel functions (Kernel Trick) to map input space to high-dimensional feature space; (2) constructing an optimal hyperplane in this space where predicted values deviate from actual values by no more than a preset tolerance; (3) maximizing model generalization capability. The method maintains SVM’s advantages while adapting to regression tasks through these specialized mechanisms.

(4): K-Nearest Neighbors

K-Nearest Neighbors Regression (KNN) is an instance-based nonparametric regression method suitable for predicting continuous target variables, which means it makes no assumptions about the data and learns the model directly from the data. The KNN algorithm defines a neighborhood range by setting the threshold K. Its core principle is: given a test sample, it calculates the K-Nearest Neighbors in the feature space and uses the weighted average of these neighbors’ target values to predict the sample’s output. The predictive performance of KNN regression highly depends on the distance metric and the selection of K value. Data normalization is typically required to improve the rationality of distance calculations.

2.4. The Bayesian Optimization Method

Bayesian Optimization (BO) [61] is a sequential optimization approach based on probabilistic models, particularly suitable for black-box function optimization (where the objective function is expensive to evaluate, lacks an analytical expression, or contains noise). Its core concept involves modeling the distribution of the objective function using Gaussian Processes (GPs) and balancing exploration-exploitation through an acquisition function to progressively approach the global optimum. The key steps of the Bayesian optimization algorithm are as follows:

(1): Initialization: Randomly select a small number of initial points (x₁, x₂,…, x_n) and compute their objective function values y = f(x).
(2): Gaussian Process Modeling: Assume f(x) follows a Gaussian Process: f(x)∼GP(μ(x), k(x, x′)), where μ(x) is the mean function and k(x, x’) is the covariance kernel function (e.g., RBF kernel). Update the posterior distribution based on observed data to predict the mean and variance at new points x_new.
(3): Acquisition Function Optimization: Design an acquisition function α(x) (e.g., Expected Improvement EI, Upper Confidence Bound UCB) based on the posterior distribution to select the next evaluation point:

$x_{n e x t} = \underset{x}{argmax} α (x)$

(7)
(4): Iterative Update: Compute f(x_next), add the new data to the observation set, update the Gaussian Process model, and repeat steps 2–3 until convergence.

2.5. Model Performance Evaluation Metrics

This study employs the coefficient of determination (R²) and root mean square error (RMSE) as two common metrics for evaluating the goodness of fit of regression models. These indicators have distinct emphases and applications. Both metrics are used in this research to assess the accuracy of machine learning-based thermal sensation prediction models. The calculation formulas are as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(8)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(9)

where y_i = actual value, ŷ_i = predicted value, ȳ = mean of actual values, and n = sample size.

R² measures the model’s ability to explain data variability, indicating the goodness of fit between predicted values and actual data. Its value ranges between 0 and 1, with values closer to 1 indicating better model performance.

As a metric of prediction error, RMSE is derived by first squaring the residuals, calculating their mean, and finally extracting the square root, thereby providing an interpretable scale of model–performance discrepancy. Smaller RMSE values indicate more accurate predictions. A smaller RMSE value indicates less model error and more accurate predictions. When RMSE approaches zero, it suggests excellent model fitting. Notably, RMSE shares the same unit as the predicted variable, making it highly interpretable

3. Results and Analysis

3.1. Human Thermal Sensation Prediction Based on the PMV Model

By matching the timestamps of the recorded data, the correspondence between thermal environment parameters and thermal sensation questionnaire votes was established. Figure 1 and Figure 2 show the relationship between different thermal sensation votes and indoor thermal environment parameters.

From the figures, it can be seen that as the thermal sensation votes change from very cold (−3) to very hot (3), the indoor air temperature gradually increases, which is consistent with the predicted expectations. However, for relative humidity, the distribution across different thermal sensation votes is relatively scattered, showing no significant increasing or decreasing trend.

The indoor thermal environment parameters collected by indoor sensors were used to predict human thermal sensation using the conventional PMV model. In the calculation process, according to the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) Standard [62], the default values were adopted as follows: indoor air velocity of 0.1 m/s, metabolic rate of 1.0 met (sedentary office work), clothing insulation of 0.6 clo in summer and 1.0 clo in winter. The radiant temperature was set equal to the indoor air temperature.

Figure 3 shows the comparison between actual human thermal sensation and predicted values from the conventional PMV model. It can be observed that there is a significant discrepancy between PMV predicted values and actual human thermal sensation, with the linear regression R² value being only 0.020 (red line in the figure), indicating very low accuracy.

Since PMV is an indicator calculated based on standardized formulas, it cannot directly reflect the actual thermal sensations of the subjects in this study. Additionally, thermal sensations vary among different subjects due to factors such as gender and body weight.

3.2. Human Thermal Sensation Prediction Model Based on Multiple Linear Regression

This section establishes a multiple linear regression prediction model for human thermal sensation based on parameters measured by indoor and outdoor sensors. The multivariate linear expression of the thermal sensation prediction model is shown in the Equation (5):

T S = {β_{0} + β}_{1} T_{i n} + β_{2} {R H}_{i n} + β_{3} T_{o u t} + β_{4} {R H}_{o u t} + β_{5} {T r}_{o u t} + β_{6} V_{o u t} + β_{7} {R a}_{o u t}

(10)

where TS = Actual human thermal sensation, T_in is indoor air temperature (°C), RH_in is indoor relative humidity (%), T_out is outdoor air temperature (°C), RH_out is outdoor relative humidity (%), Tr_out is outdoor radiant temperature (°C), V_out is outdoor wind speed (m/s), Ra_out is outdoor solar radiation, and β₀ is intercept, β₁, β₂,…, β₇ are regression coefficients. The regression analysis results are presented in Table 3. In the table, B represents the regression coefficient (β value) for each variable.

Figure 4 shows the comparison between predicted values (calculated using the overall thermal sensation linear model without considering individual differences among subjects) and actual measured values, yielding an R² of 0.423. This relatively low prediction accuracy primarily results from significant individualized variations in thermal sensation among different subjects. Consequently, this population-level prediction model proves unsuitable for predicting individual thermal sensations, necessitating the development of personalized thermal sensation prediction models for individual subjects.

Figure 5 presents a comparison between predicted results and actual observed values for personalized thermal sensation models of four representative subjects, developed using the multiple linear regression method. For each subject, the aforementioned multiple regression modeling procedure was independently repeated to establish their corresponding personalized thermal sensation prediction model. Compared with the overall thermal sensation linear model in Figure 4, the predicted values of personalized models are closer to the actual values, with significantly improved model fitting.

The average adjusted R² value of all subjects’ personalized thermal sensation prediction models was 0.823, which is 40.01% higher than the overall prediction model that did not consider personalized thermal sensation. This indicates that when predicting human thermal sensation, considering individual differences in thermal sensation leads to higher prediction accuracy compared to the original overall prediction model that did not account for individual differences, and can better reflect subjects’ personalized thermal sensations.

Since human thermal sensation is a fuzzy concept, prediction results can generally be considered accurate when the deviation between predicted and actual values is less than 0.5. However, the average RMSE value of the personalized thermal sensation model based on multiple linear regression is 1.27, indicating relatively large errors. Therefore, more complex and precise models are required for predicting and analyzing human thermal sensation.

3.3. Personalized Human Thermal Sensation Prediction Models Based on Machine Learning

3.3.1. Inputs and Outputs of the Model

This section employs machine learning algorithms that are more complex and accurate than linear regression methods to predict human thermal sensation. In addition to the environmental parameters obtained from indoor and outdoor sensors, subject information is also included as input to the prediction model. Therefore, the input and output variables of the machine learning-based human thermal sensation prediction model are summarized in Table 4.

To account for individual differences among users, this study incorporates user ID as one of the input features. Since user ID represents a categorical variable, we employ One-Hot Encoding to transform it into a seven-dimensional dummy variable vector, thereby avoiding the risk of introducing artificial ordinal relationships that may arise from categorical variables. This approach effectively preserves inter-individual variation information and facilitates the model’s learning of user-specific thermal sensation response patterns.

3.3.2. Hyperparameter Optimization for Machine Learning Models

This section employs four machine learning algorithms: Random Forest, Support Vector Regression, the Backpropagation Neural Network, and K-Nearest Neighbors. Hyperparameter optimization was performed using grid search combined with 5-fold cross-validation. The search ranges for each parameter and the resulting optimal parameters are summarized in Table 5.

The visualization of grid search results for each machine learning model is presented in Figure 6. It should be noted that although the optimization process involves multiple parameters, the three-dimensional visualization can only demonstrate the variation process of two parameters.

3.3.3. Prediction Results and Comparison of Machine Learning Models

Using the optimal hyperparameters obtained through the grid search method described in the previous section, we established thermal sensation prediction models based on various machine learning algorithms. The prediction accuracy is presented in Table 6.

The Random Forest algorithm demonstrated the highest accuracy and outperformed other algorithms in thermal sensation prediction, achieving an RMSE of 0.595 and R² of 0.916, 3.51% higher than the least accurate Support Vector Regression (SVR). Figure 7 shows the comparison between predicted and actual values of personalized thermal sensation using the Random Forest algorithm.

Although the Random Forest-based thermal sensation prediction model with grid search optimization has shown relatively high accuracy, its RMSE of 0.595 exceeds 0.5. In thermal sensation assessments, where each scale level is spaced by 1, a prediction error greater than 0.5 can lead to a significant misjudgment of human thermal sensation. Therefore, further optimization of the Random Forest algorithm is necessary to improve its prediction accuracy.

3.4. Personalized Human Thermal Sensation Prediction Model Based on Bayesian-Optimized Random Forest

In the previous section, manual grid search optimization was employed to tune the hyperparameters of the Random Forest algorithm, examining the influence of two parameters—the number of trees and maximum depth—on model accuracy, thereby initially identifying locally optimal solutions. However, the grid search method relies on an exhaustive search strategy, making it difficult to comprehensively and efficiently identify the optimal parameter combinations, thus presenting certain limitations.

To address these limitations, this section adopts Bayesian optimization to tune the Random Forest hyperparameters, aiming to enhance both optimization efficiency and prediction accuracy.

The Random Forest model’s hyperparameters include not only the aforementioned number of trees and maximum depth, but also minimum samples required to split an internal node, minimum samples required at a leaf node, maximum number of features considered for splitting at each node and criterion for measuring split quality. These six parameters correspond, respectively, to the following parameters in Python3.8’s scikit-learn library (1.0.2): n_estimators max_depth, min_samples_split, min_samples_leaf, max_features, and criterion.

This study employs 5-fold cross-validation for model evaluation to mitigate uncertainty caused by random data splitting and enhance the robustness of hyperparameter optimization results. During the Bayesian optimization process, each candidate hyperparameter combination undergoes 5-fold cross-validation on the training data, with the average R² serving as the optimization objective.

The search space encompasses five critical hyperparameters of the Random Forest model. The optimization utilizes Gaussian Process Regression as the surrogate function and adopts GP-UCB (Gaussian Process Upper Confidence Bound) as the acquisition function, with the maximum iteration number set to 50. The β parameter is set to 2.576, a commonly used value corresponding to a 99% confidence interval.

Table 7 presents both the optimization ranges and final optimal values for the hyperparameters of the Random Forest thermal sensation prediction model.

Figure 8 illustrates the visualization process of the optimization. Although the model explored six parameters in total, constrained by the expressive capability of three-dimensional coordinate plots, only the variation processes of two parameters are displayed: the number of trees and the maximum depth.

Due to the discontinuous nature of parameter sampling in Bayesian optimization (as opposed to the regular interval selection in grid search methods), the visualization cannot generate a three-dimensional surface. Instead, scatter points are used to represent the optimization exploration process, where each point corresponds to a specific hyperparameter combination.

After completing Bayesian optimization, this study applied the optimized hyperparameter combination to the Random Forest model and evaluated its performance using 5-fold cross-validation on the full dataset. In each fold, both the pre-optimization and post-optimization models were trained, and their R² and RMSE values on the test subsets were recorded.

To ensure fair comparison, the pre-optimization model also used the optimal hyperparameter combination determined by grid search, and was evaluated under the same data splits using 5-fold cross-validation. Each fold maintained identical training-test splits, with both models’ R² and RMSE values recorded on each test fold, as shown in Table 8.

The Bayesian-optimized model achieved an average R² value of 0.945, representing a 2.89% improvement over the pre-optimization model, as shown in Figure 9. This demonstrates that the Bayesian optimization algorithm can further enhance model prediction accuracy compared to grid search methods. The optimized thermal sensation prediction model yielded an average RMSE of 0.393, which is below 0.5, indicating its strong capability to reflect actual human thermal sensation.

To evaluate the statistical significance of the optimization strategy’s impact on model performance, this study employed paired difference tests to compare the R² and RMSE values between pre- and post-optimization models across the 5-fold cross-validation. Given the small sample size (n = 5), the Wilcoxon signed-rank test was adopted to ensure robust analysis. Both R² and RMSE comparisons yielded statistically significant results (p < 0.05), confirming that the Bayesian optimization strategy significantly improved model performance, with the optimized model demonstrating consistently superior predictive accuracy across all folds.

3.5. Feature Importance Analysis

To further investigate the contribution of input variables to thermal sensation prediction, this study calculated the feature importance of each environmental parameter based on the trained Random Forest model (see Figure 10). The results indicate that indoor air temperature is the most influential factor affecting human thermal sensation, followed by outdoor solar radiation and outdoor air temperature. This suggests that external environmental conditions (e.g., solar load and temperature) indirectly influence indoor thermal environments through building envelopes and heat exchange processes, thereby affecting human thermal perception.

Additionally, indoor relative humidity exhibits a moderate impact on thermal sensation, whereas outdoor wind speed and humidity show relatively lower importance. This aligns with real-world building physics: although outdoor parameters such as wind speed and humidity do not directly affect human occupants, they influence building thermal balance and HVAC system loads, which in turn affect thermal comfort to some extent.

Notably, while this study did not incorporate physiological variables (e.g., metabolic rate and clothing insulation from PMV models), the inclusion of multidimensional environmental parameters—particularly outdoor solar radiation—enables the model to effectively capture the combined effects of indoor and outdoor thermal conditions, thereby indirectly achieving high-accuracy thermal sensation modeling. This further demonstrates the practical significance of incorporating external environmental data in thermal sensation prediction, especially in building control scenarios where physiological parameters are difficult to obtain, highlighting its strong engineering applicability and potential for widespread adoption.

4. Discussion

This study aims to enhance the prediction accuracy of human thermal sensation in built environments by systematically comparing the performance of traditional models and machine learning methods, followed by optimization of the optimal model. The discussion proceeds from the aspects of model validity, input feature selection, practical application value, and limitations.

(1): Experimental results indicate that the traditional PMV model struggles to accurately reflect individuals’ actual thermal sensations in most scenarios, particularly under conditions of significant individual variability or intense environmental fluctuations, where its predictive deviation is substantial. This aligns with existing research conclusions that the PMV model is effective for standardized populations but lacks adaptability to individuals.
(2): In contrast, while the multiple linear regression model improves fitting performance to some extent, it still exhibits certain errors in scenarios with complex nonlinear relationships or prominent feature interactions. Nevertheless, it is certain that personalized thermal sensation regression prediction achieves higher accuracy than aggregated regression prediction.
(3): The introduction of machine learning models significantly improved thermal sensation prediction accuracy, demonstrating the advantages of data-driven approaches for this task. Machine learning models can capture complex nonlinear feature mapping relationships and exhibit superior generalization performance. This study aims to explore a low-complexity, engineering-feasible personalized thermal sensation prediction model for practical deployment in smart building control systems. The four machine learning models employed (e.g., Random Forest, Support Vector Machine, and K-Nearest Neighbors) cover both linear and nonlinear methods, as well as ensemble and distance-based learning approaches, with clear performance differences. Overall prediction accuracy meets practical application requirements. Given the low-dimensional input features (only six indoor and outdoor environmental variables) and relatively limited sample size, data-intensive deep learning architectures such as LSTM or Bi-LSTM were not adopted. Moreover, compared to black-box complex neural networks, this study prioritizes model interpretability and engineering feasibility for real-world deployment.
(4): Incorporating Bayesian optimization into the hyperparameter tuning process further enhanced the predictive performance of the Random Forest model. Compared to traditional grid search, Bayesian optimization identified a superior hyperparameter combination with fewer evaluations, ultimately improving model accuracy. Wilcoxon signed-rank test results confirmed statistically significant optimization effects (p < 0.05), indicating the strategy’s effectiveness and efficiency in model fine-tuning.
(5): This study utilized real-world building environment data, selecting six environmental parameters—indoor/outdoor temperature, humidity, air velocity, and solar radiation—as input variables. Feature importance analysis via the Random Forest model revealed that indoor air temperature remains the primary factor influencing thermal sensation, while outdoor solar radiation and temperature also play non-negligible roles. Although these external parameters do not directly affect the human body, they indirectly contribute to thermal perception by influencing the indoor thermal environment.
(6): However, this study has certain limitations: First, the limited sample size necessitates further validation of model generalizability across larger-scale buildings in different climatic zones. Second, while real-world survey data were used, individual characteristics (e.g., age, gender, and physique) were not incorporated into the model. Future research could build upon this foundation to develop a more comprehensive personalized modeling framework.

5. Conclusions

This study has utilized actual human thermal sensation survey data to establish both a human thermal sensation linear regression prediction model and a machine learning prediction model. By comparing the predictive accuracy of various algorithm models, the Random Forest algorithm, which demonstrated the highest precision, was selected as the personalized human thermal sensation prediction model. Subsequently, Bayesian optimization was employed to further optimize the hyperparameters of the Random Forest algorithm, resulting in a Bayesian-optimized Random Forest model for personalized human thermal sensation prediction. The conclusions drawn from this chapter’s research are as follows:

(1): The predicted values of the traditional PMV thermal sensation model exhibit significant discrepancies with the actual human thermal sensation values, with a linear fitting accuracy R² value of only 2%, indicating that the traditional PMV model inadequately represents actual human thermal sensation.
(2): The personalized thermal sensation linear prediction model for participants achieved an average R² value of 0.823, representing a 40.01% improvement compared to the generalized prediction model that does not account for personalized thermal sensation.
(3): The predictive accuracy of machine learning models surpasses that of simple linear regression models. Upon determining the hyperparameters of various machine learning models using grid search, the predictive accuracy of the four algorithms, in descending order, are: Random Forest, the BPNN neural network, K-Nearest Neighbors, and Support Vector Regression. In terms of prediction accuracy, the Random Forest algorithm performed the best in the task of human thermal sensation prediction, achieving an accuracy rate of 0.916.
(4): By applying Bayesian optimization to optimize the hyperparameters of the Random Forest personalized human thermal sensation prediction model, it was observed that the prediction accuracy R² value of the optimized Random Forest model increased to 0.945, representing a 2.89% improvement over the original model, with its RMSE value reaching 0.393. The Wilcoxon signed-rank test confirmed that the Bayesian optimization strategy is statistically effective.
(5): Feature importance analysis revealed that indoor air temperature is the primary factor influencing thermal sensation, followed by outdoor solar radiation, outdoor air temperature, indoor relative humidity, outdoor wind speed, and outdoor relative humidity. Although outdoor environmental parameters do not directly affect the human body, they indirectly contribute to thermal perception by influencing the indoor thermal environment.

Author Contributions

Conceptualization, H.Y.; formal analysis, H.Y.; investigation, H.Y.; methodology, H.Y.; resources, M.R.; software, H.Y.; supervision, M.R.; validation, M.R.; visualization, H.Y.; writing—original draft, H.Y.; writing—review and editing, M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (51678254), the 14th Five-Year Plan National Key Research and Development Program (2024YFC3809304) and the Natural Science Foundation of Xiamen City (3502Z202473052).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy reasons.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, M.; Ooka, R.; Choi, W.; Ikeda, S. Experimental and numerical investigation of energy saving potential of centralized and decentralized pumping systems. Appl. Energy 2019, 251, 113359. [Google Scholar] [CrossRef]
Fanger, P. Thermal Comfort: Analysis and Applications in Environmental Engineering; Danish Technical Press: København, Denmark, 1970. [Google Scholar]
ASHRAE Standard 55; Thermal Environmental Conditions for Human Occupancy. ASHRAE: Atlanta, GA, USA, 2013.
Yan, H.; Mao, Y.; Yang, L. Thermal adaptive models in the residential buildings in different climate zones of Eastern China. Energy Build. 2017, 141, 28–38. [Google Scholar] [CrossRef]
Bermejo, P.; Redondo, L.; de la Ossa, L.; Rodríguez, D.; Flores, J.; Urea, C.; Gámez, J.A.; Puerta, J.M. Design and simulation of a thermal comfort adaptive system based on fuzzy logic and on-line learning. Energy Build. 2012, 49, 367–379. [Google Scholar] [CrossRef]
Hussain, S.; Gabbar, H.A.; Bondarenko, D.; Musharavati, F.; Pokharel, S. Comfort-based fuzzy control optimization for energy conservation in HVAC systems. Control. Eng. Pract. 2014, 32, 172–182. [Google Scholar] [CrossRef]
Ku, K.; Liaw, J.-S.; Tsai, M.; Liu, T. Automatic control system for thermal comfort based on predicted mean vote and energy saving. IEEE Trans. Autom. Sci. Eng. 2014, 12, 378–383. [Google Scholar] [CrossRef]
Klaučo, M.; Kvasnica, M. Explicit MPC approach to PMV-based thermal comfort control. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; IEEE: Piscataway, NJ, USA, 2015; pp. 4856–4861. [Google Scholar]
Ascione, F.; Bianco, N.; De Stasio, C.; Mauro, G.M.; Vanoli, G.P. Simulation-based model predictive control by the multi-objective optimization of building energy performance and thermal comfort. Energy Build. 2016, 111, 131–144. [Google Scholar] [CrossRef]
Nowak, M.; Urbaniak, A. Application of predictive control algorithms for thermal comfort and energy saving in the classroom. In Proceedings of the 2016 17th International Carpathian Control Conference (ICCC), High Tatras, Slovakia, 29 May–1 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 527–532. [Google Scholar]
Du, H.; Lian, Z.; Lai, D.; Duanmu, L.; Zhai, Y.; Cao, B.; Zhang, Y.; Zhou, X.; Wang, Z.; Zhang, X. Evaluation of the accuracy of PMV and its several revised models using the Chinese thermal comfort Database. Energy Build. 2022, 271, 112334. [Google Scholar] [CrossRef]
Lan, H.; Hou, H.C.; Gou, Z. A machine learning led investigation to understand individual difference and the human-environment interactive effect on classroom thermal comfort. Build. Environ. 2023, 236, 110259. [Google Scholar] [CrossRef]
Gao, S.; Ooka, R.; Oh, W. Experimental investigation of the effect of clothing insulation on thermal comfort indices. Build. Environ. 2021, 187, 107393. [Google Scholar] [CrossRef]
Liu, G.; Luo, X.; Yu, J.; Sun, Y.; Zhang, B. Prediction model for personalized thermal comfort of indoor office workers based on non-skin contact wearable device. Build. Environ. 2025, 272, 112686. [Google Scholar] [CrossRef]
Karjalainen, S. Thermal comfort and gender: A literature review. Indoor Air 2012, 22, 96–109. [Google Scholar] [CrossRef] [PubMed]
Maykot, J.K.; Rupp, R.F.; Ghisi, E. A field study about gender and thermal comfort temperatures in office buildings. Energy Build. 2018, 178, 254–264. [Google Scholar] [CrossRef]
Haselsteiner, E. Gender matters! Thermal comfort and individual perception of indoor environmental quality: A literature review. In Rethinking Sustainability Towards a Regenerative Economy; Springer: Cham, Switzerland, 2021; pp. 169–200. [Google Scholar]
Thapa, S. Insights into the thermal comfort of different naturally ventilated buildings of Darjeeling, India–Effect of gender, age and BMI. Energy Build. 2019, 193, 267–288. [Google Scholar] [CrossRef]
Luo, M.; Wang, Z.; Ke, K.; Cao, B.; Zhai, Y.; Zhou, X. Human metabolic rate and thermal comfort in buildings: The problem and challenge. Build. Environ. 2018, 131, 44–52. [Google Scholar] [CrossRef]
Ji, W.; Luo, M.; Cao, B.; Zhu, Y.; Geng, Y.; Lin, B. A new method to study human metabolic rate changes and thermal comfort in physical exercise by CO₂ measurement in an airtight chamber. Energy Build. 2018, 177, 402–412. [Google Scholar] [CrossRef]
Chaudhuri, T.; Soh, Y.C.; Bose, S.; Xie, L.; Li, H. On assuming Mean Radiant Temperature equal to air temperature during PMV-based thermal comfort study in air-conditioned buildings. In Proceedings of the IECON 2016—42nd Annual Conference of the IEEE Industrial Electronics Society, Florence, Italy, 23–26 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 7065–7070. [Google Scholar]
Li, J.; Liu, N. The perception, optimization strategies and prospects of outdoor thermal comfort in China: A review. Build. Environ. 2020, 170, 106614. [Google Scholar] [CrossRef]
Yin, J.; Zheng, Y.; Wu, R.; Tan, J.; Ye, D.; Wang, W. An analysis of influential factors on outdoor thermal comfort in summer. Int. J. Biometeorol. 2012, 56, 941–948. [Google Scholar] [CrossRef] [PubMed]
Sangkertadi, S.; Syafriny, R. Pair influence of wind speed and mean radiant temperature on outdoor thermal comfort of humid tropical environment. J. Urban Environ. Eng. 2016, 10, 177–185. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, X.; Zheng, Z.; Oladokun, M.O.; Fang, Z. Experimental investigation into the effects of different metabolic rates of body movement on thermal comfort. Build. Environ. 2020, 168, 106489. [Google Scholar] [CrossRef]
Fanger, P.O.; Toftum, J. Extension of the PMV model to non-air-conditioned buildings in warm climates. Energy Build. 2002, 34, 533–536. [Google Scholar] [CrossRef]
Yao, R.; Li, B.; Liu, J. A theoretical adaptive model of thermal comfort—Adaptive Predicted Mean Vote (aPMV). Build. Environ. 2009, 44, 2089–2096. [Google Scholar] [CrossRef]
Li, N.; Yu, W.; Li, B. Assessing adaptive thermal comfort using artificial neural networks in naturally-ventilated buildings. Int. J. Vent. 2012, 11, 205–218. [Google Scholar] [CrossRef]
Gao, N.; Shao, W.; Rahaman, M.S.; Zhai, J.; David, K.; Salim, F.D. Transfer learning for thermal comfort prediction in multiple cities. Build. Environ. 2021, 195, 107725. [Google Scholar] [CrossRef]
Wang, Z.; Yu, H.; Luo, M.; Wang, Z.; Zhang, H.; Jiao, Y. Predicting older people’s thermal sensation in building environment through a machine learning approach: Modelling, interpretation, and application. Build. Environ. 2019, 161, 106231. [Google Scholar] [CrossRef]
Zhang, W.; Wen, Y.; Tseng, K.J.; Jin, G. Demystifying thermal comfort in smart buildings: An interpretable machine learning approach. IEEE Internet Things J. 2020, 8, 8021–8031. [Google Scholar] [CrossRef]
Gao, G.; Li, J.; Wen, Y. DeepComfort: Energy-efficient thermal comfort control in buildings via reinforcement learning. IEEE Internet Things J. 2020, 7, 8472–8484. [Google Scholar] [CrossRef]
Megri, A.C.; El Naqa, I. Prediction of the thermal comfort indices using improved support vector machine classifiers and nonlinear kernel functions. Indoor Built Environ. 2016, 25, 6–16. [Google Scholar] [CrossRef]
Jiang, L.; Yao, R.J.B. Modelling personal thermal sensations using C-Support Vector Classification (C-SVC) algorithm. Build. Environ. 2016, 99, 98–106. [Google Scholar] [CrossRef]
von Grabe, J. Potential of artificial neural networks to predict thermal sensation votes. Appl. Energy 2016, 161, 412–424. [Google Scholar] [CrossRef]
Han, X.; Hu, Z.; Li, C.; Wu, J.; Li, C.; Sun, B. Prediction of human thermal comfort preference based on supervised learning. J. Therm. Biol. 2023, 112, 103484. [Google Scholar] [CrossRef] [PubMed]
Huang, K.; Lu, S.; Li, X.; Chen, W. Using random forests to predict passengers’ thermal comfort in underground train carriages. Indoor Built Environ. 2023, 32, 343–354. [Google Scholar] [CrossRef]
Shan, X.; Yang, E.-H. Supervised machine learning of thermal comfort under different indoor temperatures using EEG measurements. Energy Build. 2020, 225, 110305. [Google Scholar] [CrossRef]
Luo, M.; Xie, J.; Yan, Y.; Ke, Z.; Yu, P.; Wang, Z.; Zhang, J. Comparing machine learning algorithms in predicting thermal sensation using ASHRAE Comfort Database II. Energy Build. 2020, 210, 109776. [Google Scholar] [CrossRef]
Zhou, X.; Xu, L.; Zhang, J.; Niu, B.; Luo, M.; Zhou, G.; Zhang, X. Data-driven thermal comfort model via support vector machine algorithms: Insights from ASHRAE RP-884 database. Energy Build. 2020, 211, 109795. [Google Scholar] [CrossRef]
Feng, X.; Zainudin, E.B.; Wong, H.W.; Tseng, K.J. A hybrid ensemble learning approach for indoor thermal comfort predictions utilizing the ASHRAE RP-884 database. Energy Build. 2023, 290, 113083. [Google Scholar] [CrossRef]
Xiong, L.; Yao, Y. Study on an adaptive thermal comfort model with K-nearest-neighbors (KNN) algorithm. Build. Environ. 2021, 202, 108026. [Google Scholar] [CrossRef]
Ma, Z.; Wang, J.; Ye, S.; Wang, R.; Dong, F.; Feng, Y. Real-time indoor thermal comfort prediction in campus buildings driven by deep learning algorithms. J. Build. Eng. 2023, 78, 107603. [Google Scholar] [CrossRef]
Kumar, T.S.; Kurian, C.P. Real-time data based thermal comfort prediction leading to temperature setpoint control. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 12049–12060. [Google Scholar] [CrossRef]
Li, X.; Xu, C.; Wang, K.; Yang, X.; Li, Y. Data-driven adaptive GM(1,1) time series prediction model for thermal comfort. Int. J. Biometeorol. 2023, 67, 1335–1344. [Google Scholar] [CrossRef] [PubMed]
Pantavou, K.; Delibasis, K.K.; Nikolopoulos, G.K. Machine learning and features for the prediction of thermal sensation and comfort using data from field surveys in Cyprus. Int. J. Biometeorol. 2022, 66, 1973–1984. [Google Scholar] [CrossRef] [PubMed]
Guo, R.; Yang, B.; Guo, Y.; Li, H.; Li, Z.; Zhou, B.; Hong, B.; Wang, F. Machine learning-based prediction of outdoor thermal comfort: Combining Bayesian optimization and the SHAP model. Build. Environ. 2024, 254, 111301. [Google Scholar] [CrossRef]
Wu, Z.; Li, N.; Peng, J.; Cui, H.; Liu, P.; Li, H.; Li, X. Using an ensemble machine learning methodology-Bagging to predict occupants’ thermal comfort in buildings. Energy Build. 2018, 173, 117–127. [Google Scholar] [CrossRef]
Wu, J.; Chen, X.-Y.; Zhang, H.; Xiong, L.-D.; Lei, H.; Deng, S.-H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. Health Care 2019, 17, 26–40. [Google Scholar]
Sun, Y.; Ding, S.; Zhang, Z.; Jia, W. An improved grid search algorithm to optimize SVR for prediction. Soft Comput. 2021, 25, 5633–5644. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Syarif, I.; Prugel-Bennett, A.; Wills, G. SVM parameter optimization using grid search and genetic algorithm to improve classification performance. TELKOMNIKA 2016, 14, 1502–1509. [Google Scholar] [CrossRef]
Mantovani, R.G.; Rossi, A.L.; Vanschoren, J.; Bischl, B.; De Carvalho, A.C. Effectiveness of random search in SVM hyper-parameter tuning. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–8. [Google Scholar]
Wang, X.; Jin, Y.; Schmitt, S.; Olhofer, M. Recent advances in Bayesian optimization. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
Jones, D.R. A taxonomy of global optimization methods based on response surfaces. J. Glob. Optim. 2001, 21, 345–383. [Google Scholar] [CrossRef]
Song, Z.; Zou, S.; Zhou, W.; Huang, Y.; Shao, L.; Yuan, J.; Gou, X.; Jin, W.; Wang, Z.; Chen, X. Clinically applicable histopathological diagnosis system for gastric cancer detection using deep learning. Nat. Commun. 2020, 11, 4294. [Google Scholar] [CrossRef] [PubMed]
Kabir, H.; Wu, J.; Dahal, S.; Joo, T.; Garg, N. Automated estimation of cementitious sorptivity via computer vision. Nat. Commun. 2024, 15, 9935. [Google Scholar] [CrossRef] [PubMed]
Cheadle, C.; Vawter, M.P.; Freed, W.J.; Becker, K.G. Analysis of microarray data using Z score transformation. J. Mol. Diagn. 2003, 5, 73–81. [Google Scholar] [CrossRef] [PubMed]
Branco, P.; Torgo, L.; Ribeiro, R.P. SMOGN: A pre-processing approach for imbalanced regression. In Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, Skopje, North Macedonia, 22 September 2017; pp. 36–50. [Google Scholar]
Reddy, T.A. Applied Data Analysis and Modeling for Energy Engineers and Scientists; Springer Science & Business Media: New York, NY, USA, 2011. [Google Scholar]
Joy, T.T.; Rana, S.; Gupta, S.; Venkatesh, S. Hyperparameter tuning for big data using Bayesian optimization. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; IEEE: Piscataway, NJ, USA, 2017; pp. 2574–2579. [Google Scholar]
De Dear, R.J.; Brager, G.S. Thermal comfort in naturally ventilated buildings: Revisions to ASHRAE Standard 55. Energy Build. 2002, 34, 549–561. [Google Scholar] [CrossRef]

Figure 1. Relationship between different thermal sensation votes and indoor air temperature.

Figure 2. Relationship between different thermal sensation votes and relative humidity.

Figure 3. Actual human thermal sensation vs. PMV model predicted values.

Figure 4. Predicted values (calculated by the overall thermal sensation linear model) vs. actual values.

Figure 5. Personalized thermal sensation prediction models for four representative subjects.

Figure 6. Visualization of hyperparameter optimization process for machine learning models (The redder the color, the greater the value).

Figure 7. Actual vs. predicted values using the Random Forest model with grid search optimization.

Figure 8. Schematic representation of the Bayesian optimization process.

Figure 9. Actual vs. predicted values of Bayesian-optimized Random Forest (BO-RF).

Figure 10. Importance of environmental parameters.

Table 1. Literature on Machine Learning Models for Thermal Comfort Prediction.

Reference	Model	Input Parameters
[39]	RF, SVR, ANN, GBM, NB, KNN	T_in, Tr_in, RH_in, V_in, M, R_ci
[40]	SVR	T_in, Tr_in, RH_in, V_in, M, R_ci
[41]	ELM + SVR + RF, SCN + SVR +RF	T_in, M, R_ci, V_in, RH_in, T_out
[42]	KNN	T_in, Tr_in, RH_in, V_in, M, R_ci
[43]	BI-LSTM	T_in, Tr_in, RH_in, V_in, M, R_ci
[44]	ANN	T_in, RH_in, R_ci, V_in
[45]	GM	T_in, RH_in, V_in, Tr_in, T_out
[46]	ANN, RF, SVR	T_in, Tr_in, RH_in, V_in, R_ci
[47]	RF, SVR, ANN, GBM, NB, KNN	T_in, Tr_in, RH_in, V_in, M, R_ci
[48]	SVR	T_in, Tr_in, RH_in, V_in, M, R_ci

T_in: indoor air temperature, Tr_in: indoor mean radiant temperature, RH_in: indoor relative humidity, T_out: outdoor air temperature, V_in: indoor wind speed, M: metabolic equivalent, and R_ci: clothing insulation.

Table 2. Sensor Measurement Ranges and Accuracies.

Parameter	Measurement Range	Measurement Accuracy
Indoor Air Temperature	−40~50 °C	±0.3 °C
Indoor Relative Humidity	0~100%	±3%
Outdoor Air Temperature	−50~80 °C	±0.4 °C
Outdoor Relative Humidity	0~100%	±3%
Outdoor Wind Speed	0~70 m/s	±0.3 m/s
Outdoor Solar Radiation Intensity	0~2000 W/m²	<5%
Indoor Air Temperature	−40~50 °C	±0.3 °C
Indoor Relative Humidity	0~100%	±3%
Outdoor Air Temperature	−50~80 °C	±0.4 °C

Table 3. Results of Multiple Linear Regression Analysis.

	B	Std	t	p-Value
$β_{0}$	−3.249	0.388	−8.381	0.000 **
$T_{i n}$	0.354	0.145	2.434	0.015 *
${R H}_{i n}$	−0.001	0.004	−0.189	0.850
$T_{o u t}$	−0.042	0.008	−5.481	0.000 **
${R a}_{o u t}$	−0.148	0.147	−1.008	0.314
${R H}_{o u t}$	−0.008	0.003	−2.791	0.005 **
$V_{o u t}$	−0.243	0.087	−2.793	0.005 **

Note: ** indicates p < 0.01 and * indicates p < 0.05.

Table 4. Input and Output Variables of the Machine Learning-Based Thermal Sensation Prediction Model.

Model Input		Model Output
Indoor Environmental Parameters	Indoor air temperature Indoor relative humidity	Human thermal sensation
Outdoor Environmental Parameters:	Outdoor air temperature Outdoor relative humidity Outdoor wind speed Outdoor solar radiation intensity
Subject ID

Table 5. Parameters Optimization Result and Performance of Different Models.

Model	Parameters	Search Range	Optimal Values
BPNN	Hidden layers	1~10	4
	Hidden nodes	1~100	50
	Activation function	Tanh, Relu, Sigmoid	Tanh
	Learning rate	10⁻⁴~1	10⁻⁴
SVR	C	0.1~100	200
SVR	Gamma	10⁻³~1	10⁻¹
RF	Min samples leaf	1~15	2
	Max depth	1~100	17
	Number of trees	1~100	15
KNN	K	1~5	9
	P	1, 2, 3	1
	Weights	Uniform, Distance	Distance

Table 6. Prediction Accuracy of Machine Learning Models.

	BPNN	SVR	RF	KNN
R²	0.906	0.881	0.916	0.896
RMSE	0.595	0.743	0.526	0.738

Table 7. Bayesian Optimization Results for Random Forest Model Hyperparameters.

Parameters	Search Range	Optimal Values
n_estimators	1~100	87
max_depth	1~100	20
min_samples_split	1~20	2
min_samples_leaf	1~20	2
max_features	auto, sqrt, log2	log2
criterion	Gini, entropy	Gini

Table 8. Comparison of Random Forest Model Performance Optimized by Grid Search (Grid-RF) and Bayesian Optimization (BO-RF).

	Model	1st Fold	2nd Fold	3rd Fold	4th Fold	5th Fold	Mean Value
R²	Grid-RF	0.908	0.915	0.910	0.913	0.915	0.9162
R²	BO-RF	0.940	0.948	0.942	0.946	0.942	0.9451
RMSE	Grid-RF	0.613	0.584	0.601	0.591	0.578	0.595
RMSE	BO-RF	0.392	0.395	0.389	0.394	0.395	0.393

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, H.; Ran, M. Personalized Human Thermal Sensation Prediction Based on Bayesian-Optimized Random Forest. Buildings 2025, 15, 2539. https://doi.org/10.3390/buildings15142539

AMA Style

Yang H, Ran M. Personalized Human Thermal Sensation Prediction Based on Bayesian-Optimized Random Forest. Buildings. 2025; 15(14):2539. https://doi.org/10.3390/buildings15142539

Chicago/Turabian Style

Yang, Hao, and Maoyu Ran. 2025. "Personalized Human Thermal Sensation Prediction Based on Bayesian-Optimized Random Forest" Buildings 15, no. 14: 2539. https://doi.org/10.3390/buildings15142539

APA Style

Yang, H., & Ran, M. (2025). Personalized Human Thermal Sensation Prediction Based on Bayesian-Optimized Random Forest. Buildings, 15(14), 2539. https://doi.org/10.3390/buildings15142539

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Personalized Human Thermal Sensation Prediction Based on Bayesian-Optimized Random Forest

Abstract

1. Introduction

2. Methodology

2.1. Data Collection

2.1.1. Measurement Equipment

2.1.2. Thermal Sensation Data Collection

2.2. Data Preprocessing

2.2.1. Data Standardization

2.2.2. Data Imbalance Handling

2.3. Data Analysis

2.3.1. Multiple Linear Regression

2.3.2. Machine Learning Algorithms

2.4. The Bayesian Optimization Method

2.5. Model Performance Evaluation Metrics

3. Results and Analysis

3.1. Human Thermal Sensation Prediction Based on the PMV Model

3.2. Human Thermal Sensation Prediction Model Based on Multiple Linear Regression

3.3. Personalized Human Thermal Sensation Prediction Models Based on Machine Learning

3.3.1. Inputs and Outputs of the Model

3.3.2. Hyperparameter Optimization for Machine Learning Models

3.3.3. Prediction Results and Comparison of Machine Learning Models

3.4. Personalized Human Thermal Sensation Prediction Model Based on Bayesian-Optimized Random Forest

3.5. Feature Importance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI