An Explainable Evaluation Model for Building Thermal Comfort in China

: The concentration of atmospheric greenhouse gases is being ampliﬁed by human activity. Building energy consumption, particularly for heating and cooling purposes, constitutes a signiﬁcant proportion of overall energy demand. This research aims to establish a smart evaluation model to understand the thermal requirements of building occupants based on an open-access dataset. This model is beneﬁcial for making reasonable adjustments to building thermal management, based on factors such as different regions and building user characteristics. Employing Bayesian-optimized LightGBM and SHAP (SHapley Additive exPlanations) methods, an explainable machine learning model was developed to evaluate the thermal comfort design of buildings in different areas and with different purpose. Our developed LightGBM model exhibited superior evaluation performance on the test set, outperforming other machine learning models, such as XGBoost and SVR (Support Vector Regression). The SHAP method further helps us to understand the interior evaluation mechanism of the model and the interactive effect among input features. An accurate thermal comfort design for buildings based on the evaluation model can beneﬁt the carbon-neutral strategy.


Introduction
The rapid increase in global temperature and its associated detrimental impacts have made climate change one of the most pressing challenges of the 21st century [1].A central aspect of this escalation in global temperatures is the increasing concentration of atmospheric greenhouse gases, notably amplified by human activities [2].As per recent studies, urban regions are major contributors to greenhouse gas emissions, predominantly due to activities such as transportation, industrial operations, energy production and consumption, waste management, and the functioning of residential and commercial buildings [3][4][5][6][7].Consequently, addressing urban carbon emissions has become imperative in the fight against global climate change [8].Among the various factors contributing to urban carbon emissions, building energy consumption, especially for heating and cooling purposes, plays a predominant role [9].It is estimated that buildings account for nearly 40% of global energy consumption [10], with a significant fraction of this energy being expended for maintaining thermal comfort [11,12].Thermal comfort, a state of mind expressing satisfaction with the surrounding thermal environment, is crucial for ensuring the health, productivity, and well-being of building occupants [13][14][15].Thermal comfort is a field of study that has garnered considerable attention, with research standards playing a pivotal role in establishing uniform testing protocols.Pioneering standards, such as ASHRAE Standard 55 [16] and ISO 7730 [17], provide comprehensive methodologies for assessing thermal comfort in various environments.These standards define the thermal environmental conditions for human occupancy and prescribe a range of factors, including temperature, humidity, airflow, and clothing insulation, which contribute to individual thermal satisfaction [18,19].The quantification of comfort parameters has been further refined through the Predicted Mean Vote (PMV) and Predicted Percentage Dissatisfied (PPD) indices, which are now widely accepted benchmarks for evaluating thermal environments in relation to human satisfaction [18].Such standards not only guide experimental design but also facilitate the comparison of findings across different studies, ensuring that assessments of thermal comfort are both reliable and replicable.However, achieving optimal thermal comfort in a manner that is both energy-efficient and aligned with occupants' preferences is a formidable challenge [20,21].
The challenge is compounded by the diversity in regional climates, building designs, and occupant preferences [22].Different regions, influenced by their geographical positioning and topographical attributes, experience different temperature ranges and climatic conditions.Similarly, buildings, based on their design, materials used, and purpose (whether commercial, residential, or industrial), have varying energy needs and thermal characteristics [23].Additionally, the preference for thermal comfort can differ significantly among occupants, influenced by factors such as age, health, clothing, and activities.This diversity necessitates a detailed, data-driven understanding of the thermal requirements of buildings and their occupants.With the advent of the digital age, vast amounts of data are being generated and made available through open-access datasets, providing an unparalleled opportunity to harness this information for understanding and addressing the thermal comfort needs of building occupants.Employing machine learning methodologies, researchers can now model complex relationships between multiple variables, offering insights that were previously elusive [24,25].
Recent advances in interpretability of machine learning models have emphasized the importance of explanation techniques that provide insight into model predictions.Global explanation methods, like permutation feature importance, offer an overall perspective on feature relevance across the entire dataset, but they do not account for the complex interactions between features within individual predictions.In contrast, local explanation techniques, such as LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations), provide granular insights into the contribution of each feature to individual predictions, reflecting the conditional interaction effects within the model [26].SHAP, in particular, employs a game-theoretic approach to attribute the prediction output to its input features, thereby offering a cohesive and theoretically grounded method for local explanations [27].The SHAP technique not only elucidates feature contributions but also enhances transparency and trust in complex models, a crucial aspect in fields like energy management where model decisions have significant impacts [28].By incorporating local explanation methods, researchers can more effectively communicate model behavior, providing stakeholders with understandable and actionable insights into model predictions [29,30].
This paper delves into the above context, with the primary aim of establishing a smart evaluation model that leverages an open-access dataset [31] to understand the thermal requirements of building occupants.The emphasis is on creating a model that not only accurately predicts thermal comfort needs but also offers explanations for its predictions.The latter is particularly important as it offers architects, urban planners, and policy makers actionable insights into the factors influencing thermal comfort, facilitating informed decision-making.Furthermore, with China being the world's most populated country and undergoing rapid urbanization, the focus of this research on China offers timely insights.China's urban areas, characterized by their diverse climates ranging from the cold northeast to the hot and humid southeast, present a unique challenge and opportunity.An effective and efficient approach to ensuring thermal comfort in Chinese buildings can significantly contribute to the country's carbon-neutral strategy, echoing its commitments to global climate change mitigation efforts.
The organization of this paper is as follows.Section 2 delves into the data processing methodologies employed, from filtering the raw data to encoding categorical variables.In Section 3, the LightGBM model's establishment is detailed, along with insights into the hyperparameter optimization and training processes.Section 4 presents the results, critically analyzing the model's performance.Section 5 interprets the evaluation model's predictions mechanism through the SHAP (SHapley Additive exPlanations) method.Finally, Section 6 concludes the research, highlighting its contributions and implications.
In essence, this research sits at the nexus of urban development, thermal comfort, and sustainable energy consumption, providing a roadmap for future urban planning efforts aimed at achieving carbon neutrality while ensuring the well-being of occupants.

Dataset Filtering
The open-access Chinese thermal comfort dataset [31], spearheaded by Xi'an University of Architecture and Technology in collaboration with seven other institutions, encompasses 41,977 data entries gathered from 49 cities spanning five climatic zones in China over the last two decades.Rigorous quality control measures were implemented on the raw data, involving systematic organization to guarantee its dependability.Each data entry encompasses environmental parameters, occupants' subjective feedback, building specifications, and individual details.In the raw dataset, certain non-essential features have a substantial amount of missing data.We first deleted these features, and subsequently removed samples with incomplete data to derive a filtered dataset.The features in the filtered dataset are shown in Table 1.A total of 11,899 samples are retained after dataset filtering.The details of the subjective thermal comfort indicators are delineated below:

•
Clothing Insulation: Respondents were prompted to select the clothing type that matched their attire at the time of taking the survey.In instances where their specific clothing type was not listed, they were guided to choose the closest alternative.The insulation value for individual clothing items was determined based on ASHRAE 55-2020 [32].For outfits composed of multiple garments, the total insulation value was computed by aggregating the insulation values of each individual piece.

•
Metabolic Rate: The dataset features metabolic rate values for the Chinese population across various activity states.These values were ascertained in [33] using indirect calorimetry.The participants' activities at the time of completing the questionnaire were documented and subsequently translated into metabolic rate values.The corresponding values are sitting (0.9 met), sitting while typing (1.0 met), sitting with document filing (1.2 met), standing in an office setting (1.1 met), standing with document filing (1.3 met), and walking at a pace of 2 km/h (2.1 met).

Feature Selection
Feature selection is a critical step in the development of a robust and efficient model.Properly selecting the right features not only enhances the model's performance but also provides insights into the underlying processes governing the system.With the growing dimensions of data, especially in the age of big data, pruning irrelevant or redundant features becomes an imperative to prevent models from becoming overly complex and to reduce the computational overhead associated with training.In this study, we employ a two-pronged criterion for feature selection, aiming to streamline the input dataset while retaining the most informative predictors.

•
Exclusion of Irrelevant Features: The primary objective of any modelling endeavor is to capture the underlying patterns in the data that are pertinent to the prediction or classification task at hand.Hence, the first step in our feature selection process is to remove any feature that does not have a direct or meaningful relationship with the evaluation indicators.Features that do not contribute significant information or might introduce noise into the system are systematically identified and excluded.This ensures that our model remains focused on pertinent information and is not swayed by irrelevant data.

•
Addressing Feature Collinearity: The presence of highly correlated or collinear features can introduce instability in certain models and can also make the model's interpretations more challenging.When two or more features convey similar information, they are, in essence, redundant, and the inclusion of all these features does not necessarily improve the predictive power of the model but certainly increases the computational burden.In our methodology, if a set of features exhibit high collinearity (i.e., they are highly related), we adopt a conservative approach by retaining only a few representative features from that set and discarding the rest.This approach ensures that our model remains efficient without a compromise in its predictive capability.
Upon examining the data, we observe that indoor physical parameters have been gauged at three distinct heights above the floor: 0.1 m, 0.6 m, and 1.1 m. Figure 1 illustrates the significant correlation between these parameters across the three levels, as evidenced by their Pearson correlated coefficients.Guided by the principle of "Addressing Feature Collinearity", it is judicious to select a single set of indoor physical parameters from one specific height, given the strong interrelation between measurements from different heights.We have chosen the parameters measured at 0.6 m above the floor, as this height consistently exhibits the most robust correlation with the other two levels.Subsequently, the Spearman correlated coefficients (SCC) for the remaining features are shown in Figure 2. SCC is a rank correlation coefficient, and its calculation is based on the ranking of sample values of two variables in the data.SCC is agnostic to the numerical type and distribution of variables, thus exhibiting a broad scope of applicability.The formula for SCC is expressed as follows: where x and y are the variables to be studied, R(x i ) is rank of sample x i , R(y i ) is rank of sample y i , and n is the amount of all samples.The value of SCC ranges from −1 to +1, and the greater absolute value indicates stronger correlation between the two studied variables.In the analysis presented within Figure 2, we focus solely on the absolute value of the SCC, emphasizing the strength of correlations between variables.Given the intricate internal dynamics observed in large sample sets, a mere reliance on significance might lead to misconceptions.Thus, the magnitude of the SCC holds primary importance in our approach.For the scope of this study, we designate thermal sensation (TSV), thermal comfort (TCV), and thermal acceptability (TAV) as evaluation outputs.Adhering to the principle of "Exclusion of Irrelevant Features", any feature demonstrating an SCC below 0.1 with these evaluation criteria is excluded from the modelling process.As tree-based learning models inherently yield a single feature output, separate models are necessitated for TSV, TCV, and TAV.Consequently, each model autonomously selects its most pertinent input features.The feature selection results are:  For the TSV evaluation model, the related input features are building type, building function, thermal operation mode, clothing insulation, metabolic rate, and indoor air temperature.


For the TCV evaluation model, the related input features are seasons, city, building type, building function, thermal operation mode, clothing insulation, metabolic rate, indoor air temperature, indoor relative humidity, and indoor air velocity.


For the TAV evaluation model, the related input features are city, climate zoom, weight, clothing insulation, metabolic rate, indoor air temperature, and indoor air velocity.
In pursuit of a broader applicability, this research seeks to formulate a versatile evaluation model tailored for cities not encompassed within the current dataset.As such, the "city" variable is excluded from the previously identified factors.All the selected features are shown in Table 2.In the analysis presented within Figure 2, we focus solely on the absolute value of the SCC, emphasizing the strength of correlations between variables.Given the intricate internal dynamics observed in large sample sets, a mere reliance on significance might lead to misconceptions.Thus, the magnitude of the SCC holds primary importance in our approach.For the scope of this study, we designate thermal sensation (TSV), thermal comfort (TCV), and thermal acceptability (TAV) as evaluation outputs.Adhering to the principle of "Exclusion of Irrelevant Features", any feature demonstrating an SCC below 0.1 with these evaluation criteria is excluded from the modelling process.As tree-based learning models inherently yield a single feature output, separate models are necessitated for TSV, TCV, and TAV.Consequently, each model autonomously selects its most pertinent input features.The feature selection results are:

•
For the TSV evaluation model, the related input features are building type, building function, thermal operation mode, clothing insulation, metabolic rate, and indoor air temperature.

•
For the TCV evaluation model, the related input features are seasons, city, building type, building function, thermal operation mode, clothing insulation, metabolic rate, indoor air temperature, indoor relative humidity, and indoor air velocity.

•
For the TAV evaluation model, the related input features are city, climate zoom, weight, clothing insulation, metabolic rate, indoor air temperature, and indoor air velocity.In conclusion, the feature selection process adopted in this study is rigorous and is designed to produce a streamlined, informative, and non-redundant set of predictors.This not only facilitates efficient model training but also aids in deriving meaningful and interpretable results from the model.After feature selection, the data distribution of each feature (including the evaluation results TSV, TCV, and TAV) is shown in Figure 3.
It should be noted that, in the context of many traditional machine learning algorithms, preprocessing steps, like data normalization for numerical features and one-hot encoding for categorical variables, are essential to ensure optimal model performance.However, when working with the LightGBM model, such transformations are not required.This is due to the inherent design and mechanism of LightGBM, which can naturally handle different scales of numeric data and internally manages categorical variables through its histogram-based algorithm.Specifically, LightGBM applies a binning process to sort numerical values into discrete bins and utilizes a special algorithmic approach for categorical attributes, negating the necessity for manual one-hot encoding.This not only simplifies the preprocessing pipeline but also often results in faster training times and reduced memory usage without compromising model accuracy.However, in this study, In pursuit of a broader applicability, this research seeks to formulate a versatile evaluation model tailored for cities not encompassed within the current dataset.As such, the "city" variable is excluded from the previously identified factors.All the selected features are shown in Table 2.In conclusion, the feature selection process adopted in this study is rigorous and is designed to produce a streamlined, informative, and non-redundant set of predictors.This not only facilitates efficient model training but also aids in deriving meaningful and interpretable results from the model.After feature selection, the data distribution of each feature (including the evaluation results TSV, TCV, and TAV) is shown in Figure 3.
where n is denoted as the total number of the categories of variable X, and ei is the element of the one-hot vector whose value equals to 1 only in the corresponding categorical position that variable X indicates and equals 0 in the rest of the positions.

LightGBM Model
LightGBM serves as an enhancement of the XGBoost and Gradient Boosting Decision Tree (GBDT) models [34].It integrates the Exclusive Feature Bundling (EFB) and Gradient-based One-Side Sampling (GOSS) algorithms, positioning LightGBM as a leading It should be noted that, in the context of many traditional machine learning algorithms, preprocessing steps, like data normalization for numerical features and one-hot encoding for categorical variables, are essential to ensure optimal model performance.However, when working with the LightGBM model, such transformations are not required.This is due to the inherent design and mechanism of LightGBM, which can naturally handle different scales of numeric data and internally manages categorical variables through its histogram-based algorithm.Specifically, LightGBM applies a binning process to sort numerical values into discrete bins and utilizes a special algorithmic approach for categorical attributes, negating the necessity for manual one-hot encoding.This not only simplifies the preprocessing pipeline but also often results in faster training times and reduced memory usage without compromising model accuracy.However, in this study, the categorical variables are unordered, and thus it is preferable to employ one-hot encoding rather than label encoding.The encoding function can be expressed as follows: one-hot (X) = [e 1 , e 2 , . .., e i , . .., e n ] (2) where n is denoted as the total number of the categories of variable X, and e i is the element of the one-hot vector whose value equals to 1 only in the corresponding categorical position that variable X indicates and equals 0 in the rest of the positions.

LightGBM Model
LightGBM serves as an enhancement of the XGBoost and Gradient Boosting Decision Tree (GBDT) models [34].It integrates the Exclusive Feature Bundling (EFB) and Gradient-based One-Side Sampling (GOSS) algorithms, positioning LightGBM as a leading model for tabular data prediction, boasting rapid training speeds and elevated prediction accuracy [35,36].Typically, tabular data, characterized by rows representing samples and columns denoting features, often contain sparse categorical features abundant in zero elements, particularly when subjected to the one-hot encoding method.Such feature sparsity can detrimentally impede the efficacy of machine learning models.Addressing this, LightGBM leverages the EFB algorithm to amalgamate specific sparse features.Given that many sparse features frequently display mutual exclusivity, preventing them from being concurrently non-zero, the EFB algorithm consolidates these features into a singular new feature, thereby curtailing the feature dimension [34].This approach efficiently mitigates training complexities while retaining commendable accuracy.Moreover, as an ensemble model of the Classification and Regression Tree (CART), LightGBM encapsulates the decision manifold inherent in the Decision Tree (DT), ensuring it remains impervious to discrepancies in value-type and distribution.Consequently, LightGBM emerges as an apt choice for evaluating building thermal comfort based on the selected tabular features.

Bayesian-Optimized Hyperparameters
A total of 20% of the entire dataset was randomly allocated as the test set, providing a basis for evaluating the performance of the model.The remaining 80% of the data was designated for hyperparameter tuning and model training processes.Hyperparameter optimization was undertaken using 5-fold cross-validation, i.e., the dataset was divided into five equal parts, with each part used as a validation set while the remaining four parts were combined to form a training set, in a rotational manner to ensure comprehensive evaluation.Subsequently, for model training, the remaining 80% of the data was further partitioned into a training set and a validation set in a 4:1 ratio, facilitating the iterative refinement of the model parameters.It is imperative that the test set remains completely separate from and uninvolved in the model establishment process, encompassing both hyperparameter optimization and model training phases, to preclude any potential for data leakage and ensure the integrity of the model's evaluation.
Bayesian optimization is a probabilistic model-based approach for global optimization of black-box functions that are expensive to evaluate.It operates by constructing a posterior distribution over the objective function and then subsequently selects points to evaluate by balancing exploration and exploitation.The method is particularly well suited for optimization of hyperparameters in machine learning algorithms.In this research, Bayesian optimization is employed to fine-tune hyperparameters of a LightGBM regression model.Here, f (x) represents the cross-validated root mean squared error (RMSE) of the model predictions, with x denoting the vector of hyperparameters: x = [x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 ], where each x i represents a hyperparameter in LightGBM (shown in Table 3).The goal is to find the hyperparameter vector x * that minimizes f (x), which in this scenario translates to the optimal model performance.The Gaussian process is often used to model the distribution over functions p (f |D), where D represents the set of points (x, f (x)) already evaluated.Acquisition functions, such as Expected Improvement (EI) in this research, are then used to select the next query point by maximizing the expected utility.The Gaussian process posterior is updated with the new observations, and this process is repeated for a predefined number of iterations or until convergence criteria are met.This iterative process allows for the adaptive refinement of the search space, leading to more efficient optimization when compared to traditional grid or random search methods.This research provided initiates the optimization with 50 starting points and continues for an additional 500 iterations, progressively refining the model's hyperparameters towards the optimal configuration.The boosting type was set as "GBDT", and all of the other parameters of LightGBM, such as learning rate, were set as default.Since we need to build three different evaluation models for TSV, TCV, and TAV, respectively, the above optimization process will be conducted independently for each model.The Searching space and optimal value of each hyperparameter are shown in Table 3.

Model Training
The training progression of our LightGBM-based evaluation models is captured in Figure 4, which delineates the RMSE as the chosen loss metric over successive iterations for both the training and validation datasets.The loss curves of the Thermal Sensation Vote (TSV), Thermal Comfort Vote (TCV), and Thermal Acceptability Vote (TAV) models, as shown in Figure 4a-c, respectively, demonstrate a sharp decline in training RMSE.This illustrates the models' rapid learning curve and their ability to quickly assimilate the patterns within the training data.Concomitantly, the validation RMSE for each model converges to a low, indicating an effective generalization to the validation data which is pivotal in preventing overfitting-a phenomenon where a model exhibits high accuracy on training data yet fails to predict accurately on unseen data.
iterations, suggesting a swift convergence indicative of the efficiency of the LightGBM algorithm.The ongoing reduction in training RMSE post-convergence points to the potential for additional fine-tuning, should it be necessary.The depicted validation loss curves reinforce the balance attained by the models, which encapsulates sufficient complexity to learn from the training data while maintaining the ability to generalize to new datasets.This balance is vital, affirming the models' robustness and ensuring their applicability to a broader range of data, consistent with the objectives of the validation phase.Remarkably, the models achieve their best validation performance within the first 50 iterations, suggesting a swift convergence indicative of the efficiency of the LightGBM algorithm.The ongoing reduction in training RMSE post-convergence points to the potential for additional fine-tuning, should it be necessary.The depicted validation loss curves reinforce the balance attained by the models, which encapsulates sufficient complexity to learn from the training data while maintaining the ability to generalize to new datasets.This balance is vital, affirming the models' robustness and ensuring their applicability to a broader range of data, consistent with the objectives of the validation phase.

Model Performance
To quantitatively evaluate the accuracy of our developed model for evaluating building thermal comfort, we employed three widely accepted evaluation metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Error.Generally, for these metrics, lower values signify superior model performance.The definitions for these evaluation metrics are presented as follows: where y i and ŷi denote as the true value and the predicted value, and y and ŷ denote as the averages of true value and predicted value.
Figure 5 provides a visual representation of the performance metrics for the evaluation models-TSV, TCV, and TAV-when applied to the testing set.The depicted box charts summarize the error distributions for each model, with the interquartile range (IQR) capturing the middle 50% of the data, delineated by the box's extent from the 25th to the 75th percentile.The central tendency of the models' errors is indicated by the median line and the mean symbol within the boxes, offering a dual perspective on the models' predictive accuracy.Notably, the span of the whiskers, extending to 1.5 times the IQR, illustrates the variability within the majority of the predictions, with the outliers marked as diamonds highlighting instances of significant deviation from the typical error range.Such graphical analysis aids in the comparative evaluation of model robustness and error consistency.The TSV model exhibits a slightly wider interquartile range, suggesting more variability in predictions compared to the TCV and TAV models.The latter models demonstrate a more compressed IQR, indicative of a tighter clustering of errors and, potentially, a more consistent predictive performance.In Figure 7, we present a comparative analysis of the LightGBM-based evaluation models against a suite of established machine learning algorithms, namely KNN (k-Nearest Neighbor), RF (Random Forest), XGBoost, GBDT (Gradient Boosting Decision Tree), and SVR (Support Vector Regression).The KNN, RF, GBDT, and SVR models were constructed using the Scikit-learn library, while XGBoost was implemented via its dedicated library.All models, including LightGBM, were established with default parameter settings to exclude the impact of hyperparameter optimization.The comparative outcomes suggest that RF, XGBoost, and GBDT exhibit comparable levels of accuracy, likely attributable to their shared foundation in tree-based methodologies.Conversely, SVR and KNN appear less adept at managing the tabular dataset's large-scale nonlinearity, as evidenced by their respective error metrics.Although LightGBM demonstrates a marginal superiority in assessing TSV and TAV, it is distinctly more proficient in evaluating TCV.The consistent performance across multiple evaluation metrics shows the robustness of LightGBM, confirming its potential as a reliable tool for thermal comfort evaluation.In the TSV evaluation model, outliers are symmetrically distributed beyond the whiskers, whereas for the TCV model outliers are predominantly found below the lower whisker, and for the TAV model outliers are primarily above the upper whisker.This indicates that the TCV and TAV models tend to produce anomalously low and high results, respectively.It is imperative to consider the range span of TSV, TCV, and TAV, as a broader span implies a more challenging prediction task.Specifically, the spans for TSV, TCV, and TAV are 6, 5, and 2, respectively.In this context, as illustrated in Figure 6, the TCV model outperforms the others in terms of prediction across its range, which is also deemed the most crucial metric for assessing thermal comfort in buildings.In Figure 7, we present a comparative analysis of the LightGBM-based evaluation models against a suite of established machine learning algorithms, namely KNN (k-Nearest Neighbor), RF (Random Forest), XGBoost, GBDT (Gradient Boosting Decision Tree), and SVR (Support Vector Regression).The KNN, RF, GBDT, and SVR models were constructed using the Scikit-learn library, while XGBoost was implemented via its dedicated library.All models, including LightGBM, were established with default parameter settings to exclude the impact of hyperparameter optimization.The comparative outcomes suggest that RF, XGBoost, and GBDT exhibit comparable levels of accuracy, likely attributable to their shared foundation in tree-based methodologies.Conversely, SVR and In Figure 7, we present a comparative analysis of the LightGBM-based evaluation models against a suite of established machine learning algorithms, namely KNN (k-Nearest Neighbor), RF (Random Forest), XGBoost, GBDT (Gradient Boosting Decision Tree), and SVR (Support Vector Regression).The KNN, RF, GBDT, and SVR models were constructed using the Scikit-learn library, while XGBoost was implemented via its dedicated library.All models, including LightGBM, were established with default parameter settings to exclude the impact of hyperparameter optimization.The comparative outcomes suggest that RF, XGBoost, and GBDT exhibit comparable levels of accuracy, likely attributable to their shared foundation in tree-based methodologies.Conversely, SVR and KNN appear less adept at managing the tabular dataset's large-scale nonlinearity, as evidenced by their respective error metrics.Although LightGBM demonstrates a marginal superiority in assessing TSV and TAV, it is distinctly more proficient in evaluating TCV.The consistent performance across multiple evaluation metrics shows the robustness of LightGBM, confirming its potential as a reliable tool for thermal comfort evaluation.Instead of the development of a novel model attuned to extensive thermal datasets, a core objective of this study is to elucidate the relative impact weights, marginal effects, and interplay among all pertinent factors, which we discuss comprehensively in Section 5.

Model Interpretation
The interpretive analysis of the established LightGBM model was conducted using the SHAP (SHapley Additive exPlanations) method, which is grounded in game theory and relies on conditional expectations to elucidate the model's decision-making process [26,[37][38][39][40][41][42].The SHAP approach delineates the marginal contribution of each input feature to the predictive outcomes and helps understand the model's operational tendencies when evaluating the thermal comfort.This interpretative process exclusively employed the test dataset to reveal the model's explanatory insights.Particular attention was devoted to the TCV (Thermal Comfort Vote) evaluation model, attributed to its exceptional predictive accuracy and its acknowledged importance in gauging thermal comfort within building environments.By scrutinizing the TCV model, we discerned the influence weights, marginal effects, and interactive mechanisms of its contributing factors.This detailed examination enables a deeper comprehension of the factors that predominantly affect thermal comfort evaluations, guiding both the design of intelligent thermal regulation systems and the formulation of strategies for enhancing occupants' comfort and well-being.

Model Interpretation
The interpretive analysis of the established LightGBM model was conducted using the SHAP (SHapley Additive exPlanations) method, which is grounded in game theory and relies on conditional expectations to elucidate the model's decision-making process [26,[37][38][39][40][41][42].The SHAP approach delineates the marginal contribution of each input feature to the predictive outcomes and helps understand the model's operational tendencies when evaluating the thermal comfort.This interpretative process exclusively employed the test dataset to reveal the model's explanatory insights.Particular attention was devoted to the TCV (Thermal Comfort Vote) evaluation model, attributed to its exceptional predictive accuracy and its acknowledged importance in gauging thermal comfort within building environments.By scrutinizing the TCV model, we discerned the influence weights, marginal effects, and interactive mechanisms of its contributing factors.This detailed examination enables a deeper comprehension of the factors that predominantly affect thermal comfort evaluations, guiding both the design of intelligent thermal regulation systems and the formulation of strategies for enhancing occupants' comfort and well-being.Additive exPlanations) values provide a profound understanding of feature contributions by assigning each feature an importance value for a particular prediction.A higher mean absolute SHAP value signifies a greater impact of the feature on the model's output.The bar chart reveals that 'Indoor Temperature' possesses the most significant influence on TCV, as evidenced by its highest mean absolute SHAP value.This suggests that variations in indoor temperature are the most substantial predictor of thermal comfort levels perceived by occupants.'Building Type' also demonstrates a notable impact, implying that the structural and architectural characteristics encapsulated by this factor are critical in determining thermal comfort.'Metabolic Rate' and 'Building Function' follow closely, indicating their substantial roles in influencing the thermal comfort outcomes, likely due to their direct relationship with human thermal regulation and the activities conducted within the building space.Conversely, 'Clothing Insulation', 'Indoor Humidity', 'Indoor Velocity', and 'Thermal Mode' display comparatively lower influence weights.Nonetheless, their contributions are non-negligible, suggesting a complex interplay of environmental conditions and personal factors that collectively shape the thermal comfort experience.It is noteworthy that 'Season' is the factor with the lowest mean absolute SHAP value, playing an inconsequential role in the evaluation model.The presence of multiple factors with varied influence weights reinforces the multifaceted nature of thermal comfort, which cannot be attributed to a singular environmental or personal characteristic.Instead, it emerges as an aggregate outcome of multiple interacting variables.The quantification of influence weights via SHAP values facilitates a nuanced understanding of the TCV model, allowing practitioners to prioritize interventions based on the factors most predictive of thermal comfort.Such insights can drive informed decisions in the design and management of building environments, optimizing occupant comfort while potentially enhancing energy efficiency.

Influence Weights
Buildings 2023, 13, x FOR PEER REVIEW 13 of 20 determining thermal comfort.'Metabolic Rate' and 'Building Function' follow closely, indicating their substantial roles in influencing the thermal comfort outcomes, likely due to their direct relationship with human thermal regulation and the activities conducted within the building space.Conversely, 'Clothing Insulation', 'Indoor Humidity', 'Indoor Velocity', and 'Thermal Mode' display comparatively lower influence weights.Nonetheless, their contributions are non-negligible, suggesting a complex interplay of environmental conditions and personal factors that collectively shape the thermal comfort experience.It is noteworthy that 'Season' is the factor with the lowest mean absolute SHAP value, playing an inconsequential role in the evaluation model.The presence of multiple factors with varied influence weights reinforces the multifaceted nature of thermal comfort, which cannot be attributed to a singular environmental or personal characteristic.Instead, it emerges as an aggregate outcome of multiple interacting variables.The quantification of influence weights via SHAP values facilitates a nuanced understanding of the TCV model, allowing practitioners to prioritize interventions based on the factors most predictive of thermal comfort.Such insights can drive informed decisions in the design and management of building environments, optimizing occupant comfort while potentially enhancing energy efficiency.

Marginal Effects
Figure 9 illustrates the marginal impacts of various factors of the TCV model.Each dot within the figure symbolizes an independent data point.The hue of each dot corresponds to the specific factor's value for that data point.The SHAP value associated with each dot quantifies the marginal influence of the data point on the outcome, namely, the Thermal Comfort Voting (TCV) assessment.A positive SHAP value suggests that the respective feature value of the data point contributes to an increase in the output.The TCV scale ranges from 0, denoting 'very comfortable', to 5, indicating 'very uncomfortable'.Consequently, an elevated SHAP value denotes that the feature value of the data point adversely affects thermal comfort.The SHAP scatters depicted in the figure offers a granular view of the feature importance and impact on the predictive model.

Marginal Effects
Figure 9 illustrates the marginal impacts of various factors of the TCV model.Each dot within the figure symbolizes an independent data point.The hue of each dot corresponds to the specific factor's value for that data point.The SHAP value associated with each dot quantifies the marginal influence of the data point on the outcome, namely, the Thermal Comfort Voting (TCV) assessment.A positive SHAP value suggests that the respective feature value of the data point contributes to an increase in the output.The TCV scale ranges bution of SHAP values for 'Indoor Temperature' underlines the critical balance required in maintaining temperatures within a range that maximizes comfort while minimizing energy consumption for cooling systems.Conversely, the 'Metabolic Rate' feature is characterized by a diverse spread of SHAP values, reflecting its complex relationship with thermal comfort.Notably, higher metabolic rates, indicated by red dots, contribute to a higher TCV, which in this context translates to a reduction in comfort levels.This is in line with the understanding that increased activity levels lead to higher internal heat production, which, if not offset by the thermal environment, can cause discomfort.This finding emphasizes the importance of designing building environments that are adaptable to the varying activity levels of occupants, suggesting that spaces should be versatile enough to accommodate different metabolic rates while still ensuring comfort.The insights derived from analyzing 'Indoor Temperature' and 'Metabolic Rate' highlight the interplay between environmental conditions and occupant activities in the context of thermal comfort.Effective thermal comfort design must therefore account for these factors, aiming to create an adaptive environment that can respond to both the dynamic nature of indoor temperatures and the diverse metabolic rates of occupants.This approach not only enhances occupant comfort but also promotes energy efficiency by aligning the building's climate control strategies with the actual needs of its users.The rest of the value-type features, such as clothing insulation and indoor humidity, did not show a clear mode in influencing the output, which might be revealed through interactive influence analysis.
Within the categorical variables assessed, particular attention is given to each category's relative impact on thermal comfort.For the variable 'Building Type', the category 'Residential' exhibits a pronounced detrimental influence on thermal comfort.In contrast, other categories, such as 'Dormitory', 'Office', and 'Educational', appear to exert negligible effects.This observation suggests that occupants may have less stringent thermal comfort expectations within public edifices, or that these structures may inherently possess superior thermal regulation capabilities compared to private dwellings.In addition, this disparity may be attributed to the economic aspects of thermal consumption costs and payment responsibility.Specifically, the cost of thermal energy in public spaces, which is not borne directly by individuals, potentially reduces thermal comfort concerns among users of these buildings.As for 'Building Function', spaces with public utility, including offices and dormitories, demonstrate no significant impact on thermal comfort, whereas private spaces such as 'Bedroom' and 'Living Room' are associated with the poorest thermal comfort levels.The influence patterns for 'Building Function' align with those observed in 'Building Type'.Regarding 'Thermal Mode', 'Radiator Heating' emerges as the most conducive to thermal comfort.Alternative modes, such as 'Natural Ventilation' and 'Convection Cooling', tend to negatively affect comfort margins.Seasonally, individuals report optimal thermal comfort in the winter, with the 'Transition Season' being the least comfortable period.Summer does not display a clear trend in thermal comfort preferences.

Interactive Mechanism
The interactive mechanism is to show the comprehensive effects of two features on building thermal comfort.In this part, we took the most relevant indicator "indoor temperature" as the basic index, and conducted four groups of interactive analysis, as shown in Figure 10.In the SHAP dependence graphs, the scales of color bars do not include the outliers.Each point in these graphs represents how the interaction of the two features at that specific data point influences the TCV score, offering insights into the complex interplay of environmental factors on thermal comfort.To help better understand the SHAP dependence graph, we summarised two essential aspects of it: (a) Feature valueprediction impact relationship: The horizontal axis usually represents the value of a specific feature (i.e., indoor temperature in Figure 10), while the vertical axis shows the SHAP value, indicating the impact of that feature value on the model's prediction (i.e., TCV value); (b) Color of data points: Data points can be colored to represent the values of other features (i.e., clothing insulation, metabolic rate, indoor humidity, and indoor velocity from Figure 10a-d, respectively), revealing the interaction effects between different features.Figure 10a indicates a nonlinear relationship between indoor temperature and the SHAP values for this temperature, with a color gradient representing clothing insulation levels.As indoor temperature increases, SHAP values initially show a decline and then rise, suggesting an inverse U-shaped relationship.Lower SHAP values, indicating higher thermal comfort, are predominant at moderate temperatures, while extreme temperatures, both low and high, correspond to higher SHAP values, reflecting reduced thermal comfort.At lower temperatures, increased clothing insulation (as indicated by a gradient from blue to red) seems to mitigate the discomfort to some extent, as evidenced by the cluster of points with higher insulation levels associated with lower SHAP values.However, as the temperature rises beyond a certain threshold, even higher levels of clothing insulation cannot counteract the discomfort caused by high temperatures.In the midrange of temperatures, there is a spread of SHAP values at varying levels of clothing insulation, implying a more complex interaction, where factors other than clothing and temperature may play a significant role in thermal comfort.This could include individual metabolic rates, the presence of direct sunlight, or other environmental factors not captured in this two-dimensional graph.At higher temperatures, the trend of increasing SHAP values regardless of clothing insulation suggests a limit to the compensatory role of clothing in managing thermal comfort.In these conditions, the physiological limits of heat dissipation might be reached and the discomfort becomes more pronounced, regardless of clothing insulation.Overall, the SHAP dependence graph reveals that, while Figure 10a indicates a nonlinear relationship between indoor temperature and the SHAP values for this temperature, with a color gradient representing clothing insulation levels.As indoor temperature increases, SHAP values initially show a decline and then rise, suggesting an inverse U-shaped relationship.Lower SHAP values, indicating higher thermal comfort, are predominant at moderate temperatures, while extreme temperatures, both low and high, correspond to higher SHAP values, reflecting reduced thermal comfort.At lower temperatures, increased clothing insulation (as indicated by a gradient from blue to red) seems to mitigate the discomfort to some extent, as evidenced by the cluster of points with higher insulation levels associated with lower SHAP values.However, as the temperature rises beyond a certain threshold, even higher levels of clothing insulation cannot counteract the discomfort caused by high temperatures.In the mid-range of temperatures, there is a spread of SHAP values at varying levels of clothing insulation, implying a more complex interaction, where factors other than clothing and temperature may play a significant role in thermal comfort.This could include individual metabolic rates, the presence of direct sunlight, or other environmental factors not captured in this two-dimensional graph.At higher temperatures, the trend of increasing SHAP values regardless of clothing insulation suggests a limit to the compensatory role of clothing in managing thermal comfort.In these conditions, the physiological limits of heat dissipation might be reached and the discomfort becomes more pronounced, regardless of clothing insulation.Overall, the SHAP dependence graph reveals that, while clothing insulation can moderate the impact of indoor temperature on thermal comfort, this effect is bounded by the limits of physiological adaptation to temperature extremes.This suggests the importance of maintaining indoor temperatures within a moderate range to optimize thermal comfort, particularly in environments where the clothing insulation cannot be easily adjusted [43].
Figure 10b suggests that individuals with a higher metabolic rate (represented by red dots) tend to achieve thermal comfort more easily at lower indoor temperatures.This observation implies that the inherent heat generation from a higher metabolic rate may compensate for the lower ambient temperatures, thus aligning with the body's thermoregulatory needs to maintain a sensation of comfort.This phenomenon can be attributed to the body's endogenous thermal regulation system, where metabolic heat production plays a critical role.At lower temperatures, a higher metabolic rate can help maintain core body temperature, reducing the need for external heating sources and potentially leading to a more energy-efficient state of comfort.The concentration of red dots at the lower end of the indoor temperature spectrum on the SHAP graph indicates that, as the ambient temperature decreases, the thermal contribution of metabolic heat becomes increasingly significant.This aligns with thermoregulatory principles, where the human body's metabolic heat generation helps to offset the heat loss to the environment.The SHAP dependence graph indicates that the influence of metabolic rate on thermal comfort is attenuated at temperatures exceeding 27 • C, beyond which thermal comfort significantly declines with further increases in temperature, regardless of the metabolic rate.The implications of individual metabolic differences on thermal comfort are profound.They indicate that personalized comfort models could be beneficial in designing HVAC (Heating Ventilation and Air Conditioning) systems and in developing building energy management strategies that take into account the metabolic diversity of occupants.Adaptive thermal regulation systems that respond to individual metabolic rates can optimize energy consumption by reducing the reliance on artificial heating or cooling when the occupants' metabolic heat production is sufficient to achieve comfort.
Figure 10c delineates the interaction between indoor temperature and humidity, elucidating their collective effects on thermal comfort.At lower indoor temperatures, the contribution of humidity to Thermal Comfort Voting (TCV) appears to be ambiguous; conversely, at elevated indoor temperatures, humidity levels predominantly register as high, hinting at a homogeneity within the filtered dataset.This homogeneity notably underscores the dearth of observations from hot and arid climates [44], thereby limiting the model's capacity to accurately reflect the variations in comfort perceptions associated with such conditions.To foster the creation of comprehensive thermal comfort models, it is imperative to procure a dataset that is both diverse and representative, spanning the full gamut of climatic scenarios.
Figure 10d illustrates the relationship between indoor air velocity and thermal comfort across various temperature ranges.It can be observed that higher air velocities, which are predominantly prevalent during the summer months, correspond to lower TCV values, suggesting an increase in thermal comfort.This phenomenon is likely attributable to the prevalent cooling and ventilation strategies employed during these warmer periods.Conversely, during winter, instances of high indoor air velocity are comparatively scarce, thereby rendering the impact of air movement on thermal comfort less discernible.The lack of significant data points under cold conditions suggests that ventilation strategies may be less aggressive, possibly due to the heating requirements and the desire to minimize energy loss.This analysis underscores the importance of considering the seasonal context when evaluating the influence of air velocity on thermal comfort.Airflow, often a crucial factor in thermal comfort during hot conditions, might play a nuanced role in colder climates.Such insights are vital for the design of HVAC systems that are responsive to the thermal needs of occupants while balancing energy efficiency across seasonal variations.

Conclusions
This research presents an innovative approach for evaluating building thermal comfort in China, utilizing a smart evaluation model underpinned by an open-access dataset.Through the integration of Bayesian-optimized LightGBM and SHAP methodologies, we have developed an explainable machine learning model that accurately predicts thermal comfort requirements across different regions and building types.The following key insights have been distilled from our study: (1) Our model has demonstrated commendable accuracy in evaluating thermal comfort, with SHAP analysis providing granular insights into the model's internal workings.
The ability of the model to generalize across the test set with high precision suggests its potential for widespread application in smart building management systems.
(2) The study underscores the paramount influence of indoor temperature on thermal comfort voting, reiterating the necessity for precise temperature control in the pursuit of occupant comfort.The notable impacts of building type and metabolic rate highlight the significance of architectural design and human physiological activity in thermal comfort perception.(3) The insights gleaned from our analysis have significant policy implications.They can inform the development of energy-efficient thermal comfort standards and regulations that are sensitive to regional climatic diversity and personalized occupant needs.Accurate predictions of thermal comfort can aid substantially in the optimization of energy usage, aligning with the objectives of sustainable development and carbon neutrality.The model's ability to delineate the influence of distinct factors enables the design of energy-efficient and occupant-centric thermal environments.(4) The research paves the way for future studies to incorporate additional variables, such as clothing adaptability, occupant behaviour, and building occupancy patterns.Such expansions could yield a holistic thermal comfort model that is both predictive and prescriptive, aiding stakeholders in creating energy-efficient, comfortable, and health-promoting built environments.
In conclusion, our study contributes a sophisticated, data-driven evaluation model to the field of building thermal comfort.This model not only serves as a tool for optimizing thermal comfort but also acts as a guide for sustainable building design and operation, ultimately supporting the global endeavor to mitigate climate change through improved energy stewardship in the building sector.With its capacity to elucidate complex relationships within large datasets, our research exemplifies the potential of machine learning to revolutionize building science and urban planning.

Figure 1 .
Figure 1.Pearson correlated coefficients between indoor physical parameters with different measured heights.

Figure 1 .
Figure 1.Pearson correlated coefficients between indoor physical parameters with different measured heights.

Figure 2 .
Figure 2. Spearman correlation coefficients for all features.

Figure 2 .
Figure 2. Spearman correlation coefficients for all features.

Figure 3 .
Figure 3.The distribution of each feature after feature selection.

Figure 3 .
Figure 3.The distribution of each feature after feature selection.

Figure 4 .
Figure 4. Loss curves in the training process.Figure 4. Loss curves in the training process.

Figure 4 .
Figure 4. Loss curves in the training process.Figure 4. Loss curves in the training process.

Figure 5 .
Figure 5. Performance of evaluation models on testing set.

Figure 6 .
Figure 6.Performance of evaluation models on testing set considering the indicator ranges.

Figure 5 .
Figure 5. Performance of evaluation models on testing set.

Buildings 2023 , 20 Figure 5 .
Figure 5. Performance of evaluation models on testing set.

Figure 6 .
Figure 6.Performance of evaluation models on testing set considering the indicator ranges.

Figure 6 .
Figure 6.Performance of evaluation models on testing set considering the indicator ranges.

Buildings 2023 ,
13, x FOR PEER REVIEW 12 of 20Instead of the development of a novel model attuned to extensive thermal datasets, a core objective of this study is to elucidate the relative impact weights, marginal effects, and interplay among all pertinent factors, which we discuss comprehensively in Section 5.

Figure 7 .
Figure 7. Overall performance of all machine learning models on testing set.

Figure 8 Figure 7 .
Figure8delineates the influence weights of various factors on the Thermal Comfort Vote (TCV) model through mean absolute SHAP values.It should be noted that category features, including building type, building function, thermal operation mode, and season, have been one-hot encoded, therefore, each category feature's mean absolute SHAP value indicates the sum of its one-hot features' mean absolute SHAP values.SHAP (SHapley Additive exPlanations) values provide a profound understanding of feature contributions by assigning each feature an importance value for a particular prediction.A higher mean absolute SHAP value signifies a greater impact of the feature on the model's output.The

Figure 8
Figure 8 delineates the influence weights of various factors on the Thermal Comfort Vote (TCV) model through mean absolute SHAP values.It should be noted that category features, including building type, building function, thermal operation mode, and season, have been one-hot encoded, therefore, each category feature's mean absolute SHAP value indicates the sum of its one-hot features' mean absolute SHAP values.SHAP (SHapley

Figure 8 .
Figure 8. Mean absolute SHAP values for factors of TCV model.

Figure 8 .
Figure 8. Mean absolute SHAP values for factors of TCV model.

Buildings 2023 ,Figure 10 .
Figure 10.SHAP dependence graph to show the interactive effects of different factors.

Figure 10 .
Figure 10.SHAP dependence graph to show the interactive effects of different factors.

Table 1 .
The features in the filtered dataset for building thermal comfort.

Table 2 .
The selected features in the dataset for the evaluation model of building thermal comfort.

Table 2 .
The selected features in the dataset for the evaluation model of building thermal comfort.