Previous Article in Journal
Collaboration in Constructing Human–Robot Teams: Interpretive Structural Modelling (ISM) Approach to Identifying Barriers and Strategies for Enhancing Implementation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Thermostat Setpoint Prediction Using IoT and Machine Learning in Smart Buildings

by
Fatemeh Mosleh
1,*,
Ali A. Hamidi
2,
Hamidreza Abootalebi Jahromi
1 and
Md Atiqur Rahman Ahad
1,*
1
Department of Engineering & Computing, School of Architecture Computing and Engineering, University of East London, London E16 2RD, UK
2
Hanatech IoT Inc., Halifax, NS B3M 2L4, Canada
*
Authors to whom correspondence should be addressed.
Automation 2026, 7(1), 29; https://doi.org/10.3390/automation7010029
Submission received: 23 December 2025 / Revised: 29 January 2026 / Accepted: 30 January 2026 / Published: 5 February 2026

Abstract

Increased global energy consumption contributes to higher operational costs in the energy sector and results in environmental deterioration. This study evaluates the effectiveness of integrating Internet of Things (IoT) sensors and machine learning techniques to predict adaptive thermostat setpoints to support behavior-aware Heating, Ventilation, and Air Conditioning (HVAC) operation in residential buildings. The dataset was collected over two years from 2080 IoT devices installed in 370 zones in two buildings in Halifax, Canada. Specific categories of real-time information, including indoor and outdoor temperature, humidity, thermostat setpoints, and window/door status, shaped the dataset of the study. Data preprocessing included retrieving data from the MySQL database and converting the data into an analytical format suitable for visualization and processing. In the machine learning phase, deep learning (DL) was employed to predict adaptive threshold settings (“from” and “to”) for the thermostats, and a gradient boosted trees (GBT) approach was used to predict heating and cooling thresholds. Standard metrics (RMSE, MAE, and R2) were used to evaluate effective prediction for adaptive thermostat setpoints. A comparative analysis between GBT ”from” and “to” models and the deep learning (DL) model was performed to assess the accuracy of prediction. Deep learning achieved the highest performance, reducing the MAE value by about 9% in comparison to the strongest GBT model (1.12 vs. 1.23) and reaching an R2 value of up to 0.60, indicating improved predictive accuracy under real-world building conditions. The results indicate that IoT-driven setpoint prediction provides a practical foundation for behavior-aware thermostat modeling and future adaptive HVAC control strategies in smart buildings. This study focuses on setpoint prediction under real operational conditions and does not evaluate automated HVAC control or assess actual energy savings.

1. Introduction

In today’s world, with the acceleration of urbanization, the need for efficient energy management systems is playing a more critical role in buildings [1]. Buildings have substantial potential to advance smart technologies by optimizing energy usage while improving environmental sustainability. Smart buildings offer a flexible infrastructure that enables machine learning technologies using smart sensors. This artificial intelligence architecture supports data-driven HVAC operation while enhancing residents’ quality of life [2]. Finding a viable solution to tackle energy challenges involves multiple technologies working together. Applying machine learning algorithms to IoT sensor data can improve how building systems respond to changing conditions [3]. This digital integration, which is occurring on a large scale, acts as a foundation for future smart cities, bringing resilience, adaptability, and long-term sustainability [4]. Such flexibility in using smart technologies is highly important. Furthermore, previous research [5] has shown that expanding this efficient architecture to surrounding houses can enhance system performance.
Accurate modeling of indoor thermal behavior is essential for understanding how heating and cooling systems respond to environmental parameters such as weather conditions and construction-related factors. Precisely measuring regional building energy consumption remains a major challenge, limiting the availability of insights into residential energy use [6]. To bridge this research gap, researchers have focused on exploring diverse approaches using computational methods to provide critical insights to optimize energy management in buildings. Recent work has increasingly evaluated classifiers such as deep learning and gradient boosted trees, since machine learning simultaneously learns from tenant behaviors and data, which may be helpful in optimizing HVAC systems [7]. This comparative approach allows for continuous enhancement of prediction, moving the field toward more practical, data-driven energy solutions. Despite the significant potential of smart technologies, numerous obstacles persist. Informing energy consumption architecture is the main factor in making sustainable construction decision-making [8]. Major barriers such as costly installation processes and data privacy issues require high skill in complex system management to address them. These challenges stimulate innovation to expand the smart building sector and create opportunities for energy-efficient designs that balance sustainability with the comfort of tenants.
This research aimed to explore how the integration of IoT sensor technology and machine learning models can support behavior-aware thermostat setpoint prediction in residential buildings. To demonstrate this approach in a real-world context, this method was applied to a two-year IoT dataset collected from two multi-unit buildings in Halifax, Canada. By combining IoT sensor data and machine learning prediction, this study offers insight into occupant-driven thermostat behavior and its representation through data-driven models in residential environments. This work evaluates behavior-aware thermostat setpoint prediction from operational IoT data; it does not implement a control loop, simulate HVAC actuation, or quantify energy savings or comfort outcomes.
To provide a clear outline of the study, the remainder of this paper is organized as follows. Section 2 presents the materials and methods, including the dataset description, preprocessing, feature engineering, and model development. Section 3 reports the results of the proxy energy-demand analysis and the predictive performance of the proposed models. Section 4 discusses the findings, compares the deep learning and gradient boosted trees approaches, and highlights practical implications and limitations. Finally, Section 5 concludes the paper and suggests directions for future research.
Many smart building studies have used machine learning to predict energy use or optimize HVAC systems. They usually look at system performance rather than how occupants choose thermostat setpoints in daily life. This study takes a different approach by focusing on occupant-selected thermostat setpoints (“from” and “to”) as the main evidence of user behavior. The following subsections review related work on IoT systems, machine learning-based optimization, and smart thermostats and explain why these methods do not fully capture long-term, zone-level setpoint behaviors in real homes.

1.1. IoT Infrastructure and Energy Efficiency in Buildings

The rapid expansion of connected devices, known as the Internet of Things (IoT), has the potential to redefine both the environment and the economy. While individual sensors have low power requirements during operation, the total power consumption of trillions of such devices is substantial, making energy-efficient IoT infrastructure a priority. The study by Gunalan et al. [9] focused on sustainable IoT development through the use of energy-efficient wireless sensor networks, low-power RFID systems, and hardware components designed for reducing energy consumption. These elements are required for supporting long-term environmental and economic sustainability.
Similarly, Priyadarshi [10] showed that node deployment and network optimization play key roles in improving the lifetime and energy efficiency of WSNs. The study also highlighted the importance of hardware layer optimization in heterogeneous IoT deployments. A 2024 systematic review [11] supported these findings, showing that IoT-based sensors, networked control, and integrated data analysis significantly reduce energy consumption. However, the review also noted persistent challenges, such as limited adoption of high-tech systems, data privacy concerns, and lifecycle management issues after system deployment. However, most IoT studies at the infrastructure level focus on data collection and networking rather than predicting occupant-selected thermostat setpoints.

1.2. Machine Learning and Intelligent Control for Energy Optimization

Machine learning (ML) has emerged as a powerful approach for optimizing building operations, particularly in HVAC control. Eltawil et al. [12] demonstrated that intelligent HVAC systems employing model predictive control (MPC) and ML techniques can achieve 4–32% energy savings depending on the building type and climate. Their model combined random forest, support vector regression, and artificial neural networks to predict thermal comfort while maintaining energy efficiency.
Energy consumption in buildings is also influenced by thermostat control and temperature fluctuations. Ruliyanta et al. [13] showed that dynamic regulation of indoor conditions through IoT-enabled smart thermostats enhances both efficiency and comfort. Chinthala [14] investigated energy optimization in smart homes using IoT and big data analytics, applying linear regression models in R Studio 1.4.1717 to assess the impact of configuration, lighting, and solar integration. Together, these studies underline the value of ML and real-time data for adaptive control, leading to more intelligent and responsive energy management. Unlike setpoint prediction, most ML-based HVAC optimization studies aim to control energy or comfort and assume fixed user preferences. As a result, they provide limited insight into how occupants choose thermostat setpoints in real residential settings.

1.3. Integration of IoT and AI in Smart Building Ecosystems

1.3.1. Smart Thermostat Analytics and Occupancy

Recent developments in artificial intelligence have enabled more effective integration of IoT data within intelligent building analytics frameworks. Chaudhari et al. proposed a hybrid Transformer–Convolutional architecture with adaptive gating for occupancy detection in smart buildings, reporting that deep learning models can effectively capture spatiotemporal occupancy patterns from sensor data [15]. Occupancy-related information has also been identified as a driver for energy modeling and HVAC optimization. A comprehensive survey by Chaudhari et al. reviewed IoT-based occupancy detection methods, including sensing technologies, feature extraction, and machine learning algorithms commonly used in smart building applications [16].
Recent studies have increasingly examined the role of tenant behavior and occupant-centric approaches in residential building performance evaluation. A comprehensive review by Mylonas et al. [17] highlighted the diversity and variability of occupant behavior and its influence on building operation, while Soleimanijavid et al. [18] reviewed occupant-centric control management and discussed challenges associated with its practical implementation in real buildings. These studies emphasize that occupant behavior is a key factor in building operation; however, it is very difficult to model under realistic conditions.
In parallel, recent studies have made use of data collected by smart thermostats and sensors to analyze how occupants interact with residential building systems. Doma et al. [19] demonstrated that smart thermostat measurements can be used to identify residential occupancy schedules, and Li et al. [20] reviewed large-scale smart thermostat datasets, showing their applications and limitations in real deployments. Similarly, Bouyakhsaine et al. [21] applied machine learning techniques to combined sensor and usage data to predict residential occupancy patterns. While these studies show the potential of smart thermostat data for understanding occupant presence and system use, they generally focus on occupancy patterns or overall system behavior rather than on how occupants select thermostat setpoints under long-term residential operation.
Other research has focused on the deployment of IoT-based monitoring systems and data-driven models during day-to-day building operation. Karjou et al. [22] presented a case study on the design and implementation of IoT-based occupancy monitoring systems, representing practical challenges according to sensor reliability and operational variability.

1.3.2. IoT Platforms Used in Real Deployments and Operational Analytics

Recent research has focused on combining IoT infrastructure with artificial intelligence to enable intelligent, adaptive environments. Ntafalias et al. [23] presented the PHOENIX IoT platform, which integrates sensors, ML modules, and adaptive control mechanisms in legacy buildings. Large-scale deployments achieved energy savings of up to 86% in residential and 20% in commercial sites, demonstrating that retrofitting existing infrastructure with intelligent systems can yield significant performance gains.
In addition to platform-based deployments, recent applied research has addressed building energy consumption prediction in real residential environments. Moulla et al. showed that machine learning-based models can capture nonlinear relationships between environmental variables and energy demand using operational residential data [24]. Furthermore, Craciun et al. showed that hybrid machine learning approaches can improve robustness and predictive performance in IoT–smart building environments [25]. These recent studies illustrate a growing shift toward hybrid and deployment-scale AI solutions for smart buildings, while also underscoring the limited availability of long-term, zone-level residential studies that explicitly capture occupant-driven thermostat behavior.
At a larger scale, Ali et al. [26] applied data-driven modeling approaches to predict urban building energy performance using real operational data. These studies demonstrate the importance of evaluating models under long-term, real-world conditions, where behavior varies and systems change over time.
At the same time, a large number of studies focus on energy performance and system efficiency. For example, Muñoz-Rodríguez et al. [27] presented improved performance indicators for monitored photovoltaic systems and Li et al. [28] analyzed thermal materials for multifunctional building applications. Although such studies contribute valuable insights into building energy performance, in most cases, occupant behavior is considered indirectly, while thermostat setpoint selection is not addressed.
Recent research has increasingly focused on data-driven and occupant-centric approaches in building analysis. Despite recent progress, few studies have analyzed how occupants select thermostat setpoints using long-term, zone-level residential data. This study focuses on addressing this gap using multi-year residential IoT data.

1.3.3. Related Targets: Temperature Behavior and Comfort Analysis

Similarly, Li et al. [29] applied deep transfer learning for thermal dynamics modeling using large-scale thermostat data, while Boutahri and Tilioua [30] developed an ML-based predictive model for thermal comfort and energy optimization. These studies collectively demonstrate that AI-based analytics can make practical use of diverse sensor data, thereby improving predictive accuracy and energy responsiveness.
Predicting thermostat setpoint behavior is challenging due to irregular occupant actions, behavioral variability across zones, and changes in preferences over time. Real residential IoT data commonly contain noise, missing values, and changing usage patterns, unlike simulated or carefully prepared datasets. These factors limit achievable predictive accuracy but reflect realistic conditions under which behavior-aware models must operate.

1.4. Research Gaps and Contribution of This Study

Several end-to-end IoT frameworks have already been introduced in the literature, such as the work by Ntafalias et al. [23], and transfer learning has been applied to large-scale thermostat modeling. By contrast, fewer studies focus on long-term residential deployments with zone-level operational telemetry and clearly document preprocessing and evaluation.
While prior studies confirmed the importance of IoT, AI, and ML for improving building energy performance, they typically addressed component-level approaches, such as HVAC systems, sensor networks, or thermal comfort, rather than an integrated data-driven pipeline. Moreover, the existing literature often prioritizes algorithmic accuracy over holistic analysis of system-level performance and occupant comfort.
To address these limitations, the present study makes the following contributions:
  • Reports results from a large, real residential IoT deployment, based on two years of data collected from 2080 devices across 370 zones, extending earlier work that mainly relies on simulations or smaller datasets.
  • Describes the collection and preparation of IoT data for machine learning analysis, including data cleaning, feature engineering, and time-aware splitting in a realistic smart-building context.
  • Evaluates behavior-aware thermostat setpoint prediction using gradient boosted trees and deep learning models, and discusses how these predictions can support future work on automated HVAC control and energy.
This approach addresses a practical gap in deployment-scale evidence by moving from isolated modeling toward data-driven, operational system-level evidence of energy efficiency in smart buildings.

2. Materials and Methods

2.1. Research Design and Framework

This research is structured as a predictive data-driven model for evaluating adaptive thermostat setpoints under real operational conditions in smart residential buildings. The methodology relies on real-world IoT sensor data recorded over an extended period of time and applies machine learning techniques to model thermal behavior under realistic conditions. The described research adopts a systematic process involving data collection, transformation, variable definition, model building, and model assessment.
Two residential buildings, Royal Tower 1 and Darya, located in Halifax, Canada, were selected for the study. An IoT platform with contact sensors and thermostats was installed in these locations. Contact sensors were positioned near windows and doors to continuously report open/close status changes. Thermostats provided real-time information on operating mode (heat, cool, on, and off), humidity, temperature, heat setpoints, and cool setpoints. Analysis of the collected data provides concise insight into the monitoring and dynamic control of building climate conditions. Sensors and thermostats were distributed across zones to capture environmental conditions HVAC behavior. Data visualization techniques were developed to depict how the devices were integrated into the building infrastructure. These visualizations enabled the evaluation of sensor placement according to windows, doors, and HVAC units (zones and floors).
Figure 1 illustrates the primary workflow of the research. The process began with data collection from two buildings over a two-year period, followed by a preprocessing stage to clean and merge the datasets. The subsequent phase involved calculating a proxy energy demand indicator to determine and analyze relative trends. Finally, machine learning algorithms were applied to predict thermostat thresholds by analyzing user behavior, with results compared to evaluate the process.

2.2. Data Collection and Dataset Description

Data were collected from two buildings over two years, encompassing 370 zones and 2080 IoT devices; this process resulted in the accumulation of 1,501,558 telemetry data points. Although the analysis was based on two buildings, the data covered two years of operation and included high-frequency measurements from hundreds of zones, resulting in roughly 1.5 million observations for model training and evaluation. Contact sensors were used to provide the status of windows and doors to show if they were open or closed. Thermostats were used to report heating or cooling mode, with four statuses of heat, cool, on, and off. They also showed current temperature, humidity, and heat/cool setpoints. In the case of any changes or in a continuous way at scheduled intervals, the devices transmitted data. The data were received on an AWS server through an MQTT-based IoT network, where updates from Z-Wave sensors were organized according to each sensor’s topic, zone, and location within the building. The data were then stored in a secured internal database managed by Hanatech Company. As the database is private and not publicly accessible, retrieval and preprocessing were performed internally. The data were then backed up and imported into an analytical framework for feature generation processes. The data were cleaned, the pattern of null values was discovered, irrelevant columns were removed, and new features were created to enhance analytical value. Visualization via charts was performed on the cleaned dataset to provide insight into the results of the study. The database is proprietary (Hanatech) and not publicly accessible due to confidentiality; all analyses were performed on de-identified records.

2.3. Model Development and Data Analysis Tool

For the present study, machine learning was selected for the model development tool because it integrates artificial intelligence and statistical techniques. For statistical techniques, Pearson’s correlation was selected for refining the data and leading to accurate predictions. The role of a robust ML model was to evaluate the status in which comfort was maintained and energy efficiency was improved. The assumption was that it plays a crucial role in managing indoor temperature variations for the building residents. Among various ML algorithms, deep learning was chosen for the task of recording the thresholds of the adaptive thermostat temperature (“from” and “to”) for each zone. Gradient boosting trees (GBT) was selected because it has the ability to handle nonlinear relationships, it has scalability to large datasets, and it can be implemented as a strong benchmark model.
The dataset was split chronologically into training (70%), validation (15%), and testing (15%) subsets using a time-based split to keep temporal order and prevent data leakage. Model performance is reported on the held-out test subset, which represents later operational periods. This approach evaluated model performance under realistic usage changes over time, instead of focusing only on short-term behavior. The model was assessed using both cross-validation and a hold-out building approach (training on one building and testing on the other) to assess generalizability. Because both buildings were in the same climate region, this evaluation was intended to test generalization across different buildings and occupant behaviors under consistent climatic conditions rather than cross-climate transfer. Table 1 summarizes the main features used in model development, together with the corresponding preprocessing steps. It outlines key variable definitions, measurement units, data sources, and applied transformations such as encoding, scaling, and outlier handling. To capture periodic patterns, temporal variables such as hour-of-day and day-of-week were decomposed into sine and cosine components. Categorical variables (for example, HVAC operating mode) were encoded using a one-hot approach, and numerical features were normalized using z-score standardization. While the system did not include direct sensing of tenants, solar radiation, detailed building properties, or appliance loads, their influence was indirectly represented through thermostat setpoints, window and door states, temporal features, and indoor–outdoor temperature relationships.

2.4. Deep Learning (DL)

A feed-forward neural network (FNN) was developed to predict adaptive thermostat temperature thresholds (“from” and “to”) for each building zone. The input layer matched the size of the engineered feature vector, incorporating environmental, temporal, and operational variables. The network architecture consisted of two fully connected hidden layers with 128 and 64 neurons, respectively, and the ReLU activation function was used to capture nonlinear relationships in the data. The output layer consisted of two linear neurons, one for each temperature threshold (“from” and “to”), reflecting the continuous nature of the prediction task. The model was trained using the Adam optimizer with a learning rate of 0.001, with Mean Squared Error (MSE) as the loss function, while Mean Absolute Error (MAE) and the Coefficient of Determination (R2) were used as evaluation metrics. Training was performed for 50 epochs with a batch size of 32, and early stopping was applied at epoch 47 to prevent overfitting. During training, a significant decrease in the validation MSE value was observed within the first ten epochs. By the end of training, the model showed a small generalization gap (0.07 MSE), which suggested that it had learned effectively without overfitting. For evaluation, the model’s performance was tested using the test set, and we calculated the MSE, MAE, RMSE, and R2 metrics to assess its predictive accuracy. Additionally, we visually compared the predicted and actual temperature thresholds and examined the residuals to ensure the model’s performance was consistent with the expected outcomes. The problem was framed as supervised regression based on engineered temporal and contextual features, rather than raw time-series data. A standard deep learning model was used as a reference model to examine behavior-aware thermostat setpoint prediction under real operating conditions.

2.5. Gradient Boosted Trees (GBT)

To establish a benchmark against the deep learning (DL) results, a gradient boosted trees (GBT) regressor was implemented using the identical targets and feature set. GBT is an ensemble method that sequentially constructs decision trees one after another, where each new tree aims to correct the errors made by the previous ones, gradually improving the model’s overall prediction accuracy.

2.6. Implementation and Consistency

The GBT model used the same feature-engineered dataset and data splits as the DL model to ensure direct comparison. While GBT is generally less sensitive to feature scaling, the input features were nonetheless standardized to ensure consistency in methodology across all models. Due to the dual-output nature of the task, separate GBT models were trained independently for predicting the “from” and “to” temperature thresholds.

2.7. Evaluation and Interpretability

The performance of the GBT models was quantified using the same metrics as the DL model: Mean Squared Error (MSE), Mean Absolute Error (MAE), and R2. This enabled a comparison of predictive performance between the two distinct methodologies. The feature importance values, calculated from the trained ensemble, were analyzed to recognize the key input variables, such as indoor temperature, humidity, and the temperature difference between indoor and outdoor environments, that most strongly influenced the predicted temperature thresholds. To provide additional insight, we analyzed the most effective features identified by the best-performing GBT model and discuss their role in setpoint prediction. Temporal factors such as hour-of-day and day-of-week, together with indoor–outdoor temperature conditions, have the strongest influence, which aligns with how tenants typically change thermostat settings in residential buildings. This feature ranking provided a clear and straightforward view of which inputs contributed most to the model’s predictions. Feature importance is reported for the GBT models to highlight key drivers, while a more detailed analysis of the deep learning model (e.g., SHAP or LIME) is not included in this study and is left for future work.

2.8. Energy Consumption Calculation

Energy consumption was estimated conceptually as a function of the temperature gradient (T_out-T_in) over time (Equation (1)), assuming constant thermal transmittance (K) for the building envelope. Although this relationship provides only an indicative measure, it supports an analysis of the connection between temperature dynamics and energy demand patterns. This indicator reflects relative energy demand patterns and should not be interpreted as a measure of metered energy consumption.
E = K . T o u t T i n . Δ t
where E represents the total energy consumption K is a constant that accounts for the building’s thermal properties and efficiency, T o u t is the outdoor temperature T i n is the indoor temperature and Δ t is the time over which the energy is measured. In this study, the energy indicator is normalized and scaled to kwh units for interpretability and visualization purposes only. The scaling constant does not represent calibrated building energy parameters, and the resulting values should be interpreted as relative energy demand indicators rather than absolute metered energy consumption.

2.9. Sensors Architecture

The IoT architecture integrated Z-Way, MQTT, and the ThingsBoard platform to enable seamless data acquisition, communication, and analysis within the smart building environment. Sensor data from thermostats and Z-Wave devices were transmitted via a local gateway to the cloud through the MQTT protocol, where they were securely stored and processed for intelligent control. This architecture supported real-time monitoring, adaptive response, and remote management, improving operational efficiency and enabling behavior-aware HVAC operation. The overall data flow of this integrated system is illustrated in Figure 2, demonstrating its contribution to sustainable and data-driven building management. The architecture is presented to provide context for data acquisition and end-to-end data flow in the deployed system; a detailed evaluation of communication performance metrics such as latency, packet loss, and fault tolerance is outside the scope of this study. The focus of this work was on analyzing operational telemetry for behavior-aware thermostat setpoint prediction rather than benchmarking the underlying communication infrastructure.

2.10. Sensor Placement Rationale and Deployment Policy

Sensor placement in the buildings followed practical installation guidelines commonly used in residential smart building systems rather than a formally optimized sensing layout. Contact sensors were installed on windows and doors to capture occupant-driven opening events that can significantly influence thermal conditions within a zone. Thermostats were installed within each zone to measure indoor temperature and to manage local HVAC operation based on occupant-defined setpoints.
Sensors were installed to capture typical thermal conditions in each zone, with care taken to avoid locations that might affect readings, such as direct sunlight or nearby heat sources. Practical aspects, including deployment practicality, maintenance access, and occupant use, were also considered. The following principles were used consistently across all zones:
  • Each zone includes one thermostat configured to represent the primary occupied space of that zone.
  • Contact sensors are installed on windows and door openings where opening events are expected to affect thermal exchange.
  • Devices are installed in accessible locations to support long-term operation and maintenance.
In this study, each monitored zone provided temperature measurements and window or door state information, which defined sensing coverage at the zone level in this study. Because the work focused on analyzing data from an existing deployment, sensor placement optimization and detailed coverage analysis are outside the scope of this paper.

2.11. Internet of Things and Machine Learning Implementation

In this research, the Internet of Things (IoT) refers to the deployed network of sensing and control devices used to monitor residential building operation at the zone level. Each zone was equipped with a smart thermostat and contact sensors installed on primary windows and doors. The thermostats recorded indoor temperature and occupant-selected setpoints, while contact sensors captured opening and closing events that influenced thermal exchange. Outdoor environmental data were also collected to provide contextual information. Sensor data were transmitted through a Z-Wave gateway, communicated using the MQTT protocol, and stored in a cloud-based platform (ThingsBoard) for processing and analysis.
Machine learning was applied to model and predict occupant-driven thermostat setpoint behavior using historical IoT telemetry. This study evaluated two supervised regression models, gradient boosted trees and deep learning. The inputs included environmental variables, temporal features, and operational indicators derived from sensor data, and the models predicted the lower and upper thermostat setpoints chosen by occupants. Training used a time-aware data split to maintain temporal order, and performance was evaluated using MAE, RMSE, and R2.

3. Results

The aim of this evaluation was to determine whether occupant-selected thermostat setpoints can be predicted from operational IoT data under realistic residential conditions rather than to optimize HVAC control or energy savings.
The proxy energy indicator was analyzed in two complementary ways. First, a detailed analysis was conducted to examine indicator use by time-of-day and indoor–outdoor temperature pairs, revealing fluctuations and identifying periods of higher or lower demand. Second, a daily aggregated analysis summarized the data into daily intervals to highlight broader trends and overall patterns, allowing for easier comparison across days and building zones.
This dual presentation balanced detailed insights with a high-level perspective, aiding in the analysis of relative thermal demand patterns and their relationship with temperature variations.
Figure 3 shows the daily proxy energy indicator from the beginning of April to the end of June 2024. From April to May, a steady reduction in the proxy energy indicator was observed, reflecting changes in relative thermal demand patterns compared to previous months.
Figure 4 shows the hourly proxy energy indicator from April to May. The energy proxy indicator begins at around 4.2 (scaled kWh) at midnight, declines during the early morning hours to a minimum of approximately 3.0 (scaled kWh) at 08:00, and then increases sharply as building activity and cooling/heating demand rise. The pattern highlights clear daily cycles in the proxy energy indicator, with variation consistent with time-of-day effects and outdoor temperature changes.
The comparison between predicted and observed thermostat thresholds is depicted in Figure 5a,b.
Figure 5a,b indicate that there is an alignment between predicted values and actual values for “from” and “to.” The predicted values show tighter clustering along the diagonal line, reflecting a meaningful predictive signal, while “to” predictions display slightly greater variance. In total the model shows more consistent performance for “from” predictions.
In Figure 6a,b, the histograms compare the error distributions across the two categories, “from” and “to.” The histograms show differences in the error distributions between “from” and “to” with errors concentrated around zero in both cases, while the “to” distribution shows fewer large deviations. This indicates that the “to” predictions are less likely to large deviations under the evaluated operating conditions.
In Table 2, model performance is compared with two related studies. The table uses MAE (Mean Absolute Error) where its lower value is favorable and R-squared (R2) where a value closer to 1.0 is indicative of higher performance. The models from refs. [29,30] report lower MAE values (0.259 and 0.083); however, these values are shown for context only, as the studies address different prediction targets and data conditions and are not directly comparable to this setpoint prediction task. It should be noted that the models in refs. [29,30] address different prediction targets and datasets; therefore, the comparison is intended to provide contextual performance insight rather than a direct benchmark. The R2 values obtained (0.55–0.60) reflect the challenges of working with long-term operational data where occupant behavior varies, measurements are noisy, and usage patterns change over time. Higher R2 values reported in earlier work are often based on simulations or carefully controlled datasets, whereas this study focuses on prediction under real residential conditions.
Table 3 summarizes the GBT regressor model in two stages, labelled as “from” and “to” regarding their prediction performance. The table includes error metrics such as RMSE and MSE, in addition to MAE and R-squared, to provide a broader picture of improvement. The “to” threshold produces smaller error values (RMSE = 1.53, MAE = 1.22) than the “from” threshold (RMSE = 2.49 MAE = 1.79) showing better prediction consistency. Although the R2 value remains moderate (0.54), the model consistently captures the structure of the thermostat setpoint signal under operational variability.

4. Discussion

Research on thermostat optimization and HVAC systems has generated a wide range of methods and insights, particularly through simulation-based studies and controlled datasets. When these approaches are applied in real residential settings, missing values, device-level variation, irregular tenant behavior, and changing usage patterns become unavoidable. While these factors make prediction more challenging and can reduce accuracy scores, they also reflect the realities of everyday smart building management. In this context, moderate R2 values reflect the variability inherent in long-term residential operation, including behavioral diversity, missing data, and changing usage patterns, and provide a reference point for future improvements under comparable real-world conditions. In practical terms, this study shows how an end-to-end IoT-to-ML workflow can be applied to long-term operational data from a real residential deployment and shows that behavior-aware thermostat setpoint patterns can be identified across hundreds of zones. This helps inform future research on automated HVAC control and the evaluation of energy or comfort outcomes under real-world conditions. We note that the dataset comes from two buildings in Halifax, so the results should be considered limited within a single climate context, and broader validation across different climates is left for future work. In addition, the absence of direct occupancy sensors, solar radiation data, detailed envelope characteristics, and appliance-level loads limits the explanatory depth of the models. Rather than being measured directly, these factors are represented indirectly through behavioral and environmental indicators. The models learn patterns in occupant setpoint selection and are not intended to optimize thermostat settings or to demonstrate energy or comfort improvements.
Because the study was focused on forecasting, we did not include automated control performance or run HVAC actuation simulations, and we therefore do not report measured energy savings or comfort improvements. The results should be considered evidence that tenant-driven setpoint patterns can be extracted from real operational telemetry and used as a prediction layer. The next step is to integrate this prediction layer into a control or simulation framework to compare energy and comfort outcomes against current operation.
The present study explored the performance of advanced machine learning models, namely gradient boosted trees (GBT) and deep learning (DL), to be able to predict thermostat setpoints in smart buildings and to validate the effectiveness of IoT- and ML-based approaches in this regard. Fifty epochs were given to the DL model as the input. Then, at the checkpoint with the lowest validation MSE value (epoch 47), early stopping was applied. In Table 4, key metrics related to epoch 47 are presented. More advanced temporal models, such as LSTM, GRU, or Transformer-based approaches, could better capture longer-term thermal patterns and are an important direction for future work.
Early stopping at epoch 47 was selected to prevent overfitting and, as shown in Table 4, building thermostat setpoints were strongly predictable using the deep learning model. There was a sharp decrease in the validation MSE value during the first 10 epochs, which was indicative of how fast the model could rapidly capture the patterns in the IoT dataset and that behavioral and environmental dynamics could be derived from the patterns. Table 3 indicates a small generalization gap (0.07 MSE), supporting good model fit and establishing the validity of the method. The validation MSE value decreased by 62.7% from the initial epoch to epoch 47; thus, it was the role of training to offer marginal improvements in training error. Table 3 shows that the DL model was a logical choice, as it indicated robustness and reliability and was demonstrated to be suitable for real-time setpoint prediction and control in smart buildings. The study performed a sampled test set via the GBT models for predicting thermostat thresholds (“from” and “to”):
The gradient boosted trees (GBT) model was selected for predicting the thermostat thresholds of “from” and “to.” As presented in Table 4 in the “from” threshold, the R2 value was 0.498, which indicated moderate accuracy and explained roughly half of the variance. But reviewing the “to” threshold, the accuracy increased to R2 = 0.545, and the error metrics showed lower values, namely RMSE = 1.538 and MAE = 1.227, indicating that precision in forecasting the upper thermostat setpoint was improved. Table 5 implicitly indicates that the model was successful in discovering the patterns of heating and cooling. The difference between “from” and “to” was the result of occupant behavior, for example, opening and closing windows, or sudden changes in the number of people living in the house. Table 4 clearly shows that “to” setpoints followed a more stable pattern resulting from the elevated learning ability of the GBT model.
Table 6 presents a comparison of the gradient boosted trees (GBT) and deep learning (DL) models, with their key measures evaluated with the aim of predicting adaptive setpoints in smart buildings. The GBT “from” model, with an RMSE value of 2.50, an MAE value of 1.79, and an R2 value of 0.50, explained about half of the variation in the smart building dataset. The GBT “to” model showed better performance, with lower errors and a higher R2 value. Reviewing the deep learning model showed even slightly better accuracy than the GBT “from” and GBT “to” models, with improved R2 values and lower error rates. It is mentioned that, in comparison between the other two modes of GBT, the DL model could capture more complex patterns and relationships in the data, especially those related to changes in indoor temperature, humidity, and occupancy behavior.

Results in the Context of Existing Literature

To provide context for these results, we briefly discuss relevant recent work in the field. Several studies have used machine learning to analyze building behavior, but they often focus on different targets and data conditions.
Li et al. [29] reported high predictive accuracy when modeling building thermal behavior using deep learning and smart thermostat data. Boutahri and Tilioua [30] reported low error values when estimating thermal comfort in controlled smart building environments. However, although these studies highlight the potential of advanced learning methods, they address different aspects from the present work. Here, the focus was on occupant-driven thermostat setpoint behavior using long-term IoT data from residential buildings. Under these conditions, the results demonstrated that meaningful setpoint patterns can be learned across a large dataset, with the deep learning model performing consistently better than the gradient boosted trees model baselines (Table 6). This supports the novelty of the work in illustrating deployment-scale, behavior-aware setpoint prediction under real-world conditions rather than in controlled or simulated environments. To make the comparison clearer, Table 7 summarizes differences in prediction focus, dataset properties, and deployment scale across recent studies.
As shown in Table 7, the present study addresses occupant-selected thermostat setpoints using long-term, zone-level residential IoT data, offering a different perspective from much of the existing literature.
The monthly patterns in the consumption of energy in Figure 4 and Figure 5 represent the weather, occupant behaviors, or, in general, changes in smart building systems. A sharp drop in energy usage was observed starting in mid-June. At the same time, smart adaptive thermostat controls were newly installed, or occupancy schedules underwent a change. Recording the sharp fall shows that the system was able to dynamically respond to any real-world changes in temperature or building dynamics. The current study aligns with the observations made in ref. [23] regarding the same decline in energy consumption achieved after implementing the IoT model. Thus, this result confirms that data-driven approaches used in the study are capable of capturing meaningful changes in building thermal dynamics under real operational conditions. The findings of the study truly confirm the role of IoT- and ML-based methods in recording real-life changes in building dynamics data for operational decision-making, which further helps future evaluations of energy savings and higher system efficiency.
From a practical perspective, the results of this study demonstrate that IoT- and machine learning-based techniques achieve competitive performance relative to traditional baseline approaches for analyzing energy consumption and sustainability in smart buildings. The improved performance of the ML models is achieved via capturing complicated and nonlinear relationships within smart energy systems in buildings. The contribution of these technologies suggests their potential for future evaluations of operational cost reduction and environmental impact when combined with validated energy measurements; thus, the global benefit of these technologies is demonstrated.

5. Conclusions

This study developed predictive models for adaptive thermostat setpoints using IoT and machine learning methods by analyzing real-time occupant data collected over two years. The models captured indoor thermal dynamics and occupant-related patterns that are relevant for behavior-aware HVAC operation, while energy trends were assessed using a proxy indicator rather than metered consumption. This study used long-term operational IoT data to show that occupant-driven thermostat setpoint patterns can be learned at the deployment level, supporting both applied research and practical smart building development. It is expected that, via such a model, automated and intelligent control of HVAC systems can be effectively achieved. The results of the GBT models for thermostat thresholds “from” and “to” showed that the “to” state outperformed the “from” state regarding accurate predictions, while the DL model performed even better than the two states of GBT. Implementing such a system in smart buildings is performed with the basic assumption that the comfort zone for occupants is stabilized. From a scalability perspective, the proposed data pipeline and modeling approach are built on standard IoT infrastructure and widely available sensor data, which makes them suitable for use in other residential buildings with similar management.
The present study focused on a small number of buildings within the same climatic region. These buildings used similar construction materials and building systems. Expanding the analysis to a larger scale and including greater variation in climate, materials, and systems would enable more comprehensive, reliable, and globally generalizable results. Since thermal needs and tenant habits vary so much by region, our current results are best understood within their specific climate context. To ensure that the model is truly versatile, the next step will be validating it against data from different environments, particularly in cooling-dominant regions. In terms of validating the approach beyond the abovementioned limitation, for future studies, a larger number of buildings and environmental systems in various climatic zones are suggested. Despite promising results, this study has several limitations. The analysis is limited to two buildings within a single climatic region, and the models were not validated using actual energy consumption data. Future work should integrate smart control experiments or energy simulations to quantify real energy savings and extend the evaluation across multiple climate zones, making it possible to study how behavior-aware setpoint prediction performs across different thermal conditions and occupant behaviors, and evaluate the scalability of the proposed approach across diverse building contexts.

Author Contributions

F.M. was responsible for the conceptualization, methodology design, software development, data curation, formal analysis, and preparation of the original manuscript. F.M. also contributed to model validation and visualization of results. H.A.J. contributed to the scientific review, comparative analysis with related studies, interpretation of results, and critical review and editing of the manuscript. A.A.H. supervised the overall project, provided access to the IoT infrastructure and datasets, contributed to system architecture validation, and supported refinement of the methodology. M.A.R.A. contributed to supervision, project administration, validation of results, and critical review and editing of the manuscript, and serves as the corresponding author. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are not publicly available due to privacy and confidentiality restrictions, as they originate from a proprietary IoT platform. Aggregated and anonymized data may be made available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank Hanatech, Halifax, Canada, for providing access to the IoT data infrastructure used in this study. In this work, artificial intelligence and machine learning techniques were employed for data analysis and predictive modeling of thermostat setpoint thresholds based on IoT sensor data.

Conflicts of Interest

Author Ali A.Hamidi was employed by the company Hanatech IoT Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IoTInternet of Things
HVACHeating, Ventilation, and Air Conditioning
DLDeep Learning
MAEMean Absolute Error
RMSERoot Mean Squared Error
R2Coefficient of Determination
GBTGradient Boosted Trees

References

  1. Wang, W. Building Energy Consumption and Urban Energy Planning. Buildings 2022, 13, 6. [Google Scholar] [CrossRef]
  2. Bijlani, V. Smart Buildings for Sustainable Smart Cities. In Proceedings of the 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC), Jeddah, Saudi Arabia, 23–25 January 2023; pp. 1–6. [Google Scholar]
  3. Jagadeesan, S.; Ravi, C.N.; Sujatha, M.; Southry, S.S.; Sundararajan, J.; Reddy, C.V.K. Machine Learning and IoT based Performance Improvement of Energy Efficiency in Smart Buildings. In Proceedings of the 2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India, 23–25 March 2023; pp. 375–380. [Google Scholar]
  4. Bibri, S.E.; Alexandre, A.; Sharifi, A.; Krogstie, J. Environmentally sustainable smart cities and their converging AI, IoT, and big data technologies and solutions: An integrated approach to an extensive literature review. Energy Inform. 2023, 6, 9. [Google Scholar] [CrossRef]
  5. Zamanidou, A.; Magliozzi, A.; Fokaides, P. From Buildings to Neighborhoods: Upscaling Smartness Assessment for Enhanced Sustainability. In Proceedings of the 2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), Split and Bol, Croatia, 25–28 June 2024; pp. 1–5. [Google Scholar]
  6. Qiuhong, Z.; Zhan, Z.; Jiayi, W. Application of Regional Building Energy Consumption Prediction Model in Building Construction. In Proceedings of the 2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Vientiane, Laos, 11–12 January 2020; pp. 92–94. [Google Scholar]
  7. Mshragi, M.; Petri, I. Fast machine learning for building management systems. Artif. Intell. Rev. 2025, 58, 211. [Google Scholar] [CrossRef]
  8. Ma, Z.; Yan, Z.; He, M.; Zhao, H.; Song, J. A review of the influencing factors of building energy consumption and the prediction and optimization of energy consumption. AIMS Energy 2025, 13, 35–85. [Google Scholar] [CrossRef]
  9. Gunalan, K.; Dakshana, M.; Sangeetha, S.; Anandan, P.; Saveetha, R.; Abirami, S. An In-Depth Survey on Environmental Sustainability: Mitigating Energy Footprints for an Advanced Future Outlook. In Proceedings of the 2023 Third International Conference on Smart Technologies, Communication and Robotics (STCR), Sathyamangalam, India, 9–10 December 2023; Volume 1, pp. 1–6. [Google Scholar]
  10. Priyadarshi, R. Efficient node deployment for enhancing coverage and connectivity in Wireless Sensor Networks. Sci. Rep. 2025, 15, 29052. [Google Scholar] [CrossRef] [PubMed]
  11. Poyyamozhi, M.; Murugesan, B.; Rajamanickam, N.; Shorfuzzaman, M.; Aboelmagd, Y. IoT—A Promising Solution to Energy Management in Smart Buildings: A Systematic Review, Applications, Barriers, and Future Scope. Buildings 2024, 14, 3446. [Google Scholar] [CrossRef]
  12. Eltawil, M.A.; Mohammed, M.; Alqahtani, N.M. Developing Machine Learning-Based Intelligent Control System for Performance Optimization of Solar PV-Powered Refrigerators. Sustainability 2023, 15, 6911. [Google Scholar] [CrossRef]
  13. Ruliyanta, R.; Suwodjo Kusumoputro, R.A.; Nugroho, R.; Nugroho, E.R. A Novel Green Building Energy Consumption Intensity: Study in Inalum Green Building. In Proceedings of the 2022 IEEE Region 10 Symposium (TENSYMP), Mumbai, India, 1–3 July 2022; pp. 1–6. [Google Scholar]
  14. Chinthala, P. Energy Consumption Optimization Using IoT and Big Data and Energy Efficiency in Smart Homes. Int. Res. J. Mod. Eng. Technol. Sci. 2024, 6, 3232–3238. [Google Scholar] [CrossRef]
  15. Chaudhari, P.; Xiao, Y.; Li, T. Translution: A Hybrid Transformer–Convolutional Architecture with Adaptive Gating for Occupancy Detection in Smart Buildings. Electronics 2025, 14, 3323. [Google Scholar] [CrossRef]
  16. Chaudhari, P.; Xiao, Y.; Cheng, M.M.-C.; Li, T. Fundamentals, Algorithms, and Technologies of Occupancy Detection for Smart Buildings Using IoT Sensors. Sensors 2024, 24, 2123. [Google Scholar] [CrossRef]
  17. Mylonas, A.; Tsangrassoulis, A.; Pascual, J. Modelling occupant behaviour in residential buildings: A systematic literature review. Build. Environ. 2024, 265, 111959. [Google Scholar] [CrossRef]
  18. Soleimanijavid, A.; Konstantzos, I.; Liu, X. Challenges and opportunities of occupant-centric building controls in real-world implementation: A critical review. Energy Build. 2024, 308, 113958. [Google Scholar] [CrossRef]
  19. Doma, A.; Prajapati, S.N.; Ouf, M.M. Developing a residential occupancy schedule generator based on smart thermostat data. Build. Environ. 2024, 261, 111713. [Google Scholar] [CrossRef]
  20. Li, H.; O’Brien, W.; Loftness, V.; Cochran Hameen, E.; Hong, T. A critical review of use cases and insights from a large dataset of smart thermostats. Adv. Appl. Energy 2025, 19, 100236. [Google Scholar] [CrossRef]
  21. Bouyakhsaine, K.; Brakez, A.; Draou, M. Prediction of residential building occupancy using Machine learning with integrated sensor and survey Data: Insights from a living lab in Morocco. Energy Build. 2024, 319, 114519. [Google Scholar] [CrossRef]
  22. Karjou, P.F.; Saryazdi, S.K.; Stoffel, P.; Müller, D. Practical design and implementation of IoT-based occupancy monitoring systems for office buildings: A case study. Energy Build. 2024, 323, 114852. [Google Scholar] [CrossRef]
  23. Ntafalias, A.; Papadopoulos, P.; Ramallo-González, A.P.; Skarmeta-Gómez, A.F.; Sánchez-Valverde, J.; Vlachou, M.C.; Marín-Pérez, R.; Quesada-Sánchez, A.; Purcell, F.; Wright, S. Smart buildings with legacy equipment: A case study on energy savings and cost reduction through an IoT platform in Ireland and Greece. Results Eng. 2024, 22, 102095. [Google Scholar] [CrossRef]
  24. Moulla, D.K.; Attipoe, D.; Mnkandla, E.; Abran, A. Predictive Model of Energy Consumption Using Machine Learning: A Case Study of Residential Buildings in South Africa. Sustainability 2024, 16, 4365. [Google Scholar] [CrossRef]
  25. Craciun, R.-A.; Caramihai, S.I.; Mocanu, Ș.; Pietraru, R.N.; Moisescu, M.A. Hybrid Machine Learning for IoT-Enabled Smart Buildings. Informatics 2025, 12, 17. [Google Scholar] [CrossRef]
  26. Ali, U.; Bano, S.; Shamsi, M.H.; Sood, D.; Hoare, C.; Zuo, W.; Hewitt, N.; O’Donnell, J. Urban building energy performance prediction and retrofit analysis using data-driven machine learning approach. Energy Build. 2024, 303, 113768. [Google Scholar]
  27. Muñoz-Rodríguez, F.J.; Snytko, A.; de la Casa Hernández, J.; Rus-Casas, C.; Jiménez-Castillo, G. Rooftop photovoltaic systems. New parameters for the performance analysis from monitored data based on IEC 61724. Energy Build. 2023, 295, 113280. [Google Scholar] [CrossRef]
  28. Li, Y.; Yu, B.; Li, N. The performance analysis of a novel manganese oxide solar low-temperature thermal-catalyst in building multifunctional applications. Energy Build. 2023, 297, 113477. [Google Scholar] [CrossRef]
  29. Li, H.; Pinto, G.; Piscitelli, M.S.; Capozzoli, A.; Hong, T. Building thermal dynamics modeling with deep transfer learning using a large residential smart thermostat dataset. Eng. Appl. Artif. Intell. 2024, 130, 107701. [Google Scholar] [CrossRef]
  30. Boutahri, Y.; Tilioua, A. Machine learning-based predictive model for thermal comfort and energy optimization in smart buildings. Results Eng. 2024, 22, 102148. [Google Scholar] [CrossRef]
Figure 1. Overview of the research workflow: data acquisition, preprocessing, model training, and evaluation.
Figure 1. Overview of the research workflow: data acquisition, preprocessing, model training, and evaluation.
Automation 07 00029 g001
Figure 2. Smart building–IoT architecture integrating Z-Way, MQTT, and ThingsBoard for secure ingestion, monitoring, and control.
Figure 2. Smart building–IoT architecture integrating Z-Way, MQTT, and ThingsBoard for secure ingestion, monitoring, and control.
Automation 07 00029 g002
Figure 3. Daily proxy energy indicator (scaled kWh, relative values) from April to June 2024: temporal trend and variability.
Figure 3. Daily proxy energy indicator (scaled kWh, relative values) from April to June 2024: temporal trend and variability.
Automation 07 00029 g003
Figure 4. Hourly proxy energy indicator (scaled kWh, relative values) from April to May 2024: diurnal pattern aligned with occupancy and weather.
Figure 4. Hourly proxy energy indicator (scaled kWh, relative values) from April to May 2024: diurnal pattern aligned with occupancy and weather.
Automation 07 00029 g004
Figure 5. Predicted vs. actual thermostat thresholds: (a) “from”; (b) “to”. The dashed red line indicates the ideal 1:1 relationship (perfect prediction), and the scatter points represent individual observations.
Figure 5. Predicted vs. actual thermostat thresholds: (a) “from”; (b) “to”. The dashed red line indicates the ideal 1:1 relationship (perfect prediction), and the scatter points represent individual observations.
Automation 07 00029 g005
Figure 6. Distribution of prediction errors for “from” and “to”: (a) “from”; (b) “to”.
Figure 6. Distribution of prediction errors for “from” and “to”: (a) “from”; (b) “to”.
Automation 07 00029 g006
Table 1. Summary of main features and preprocessing procedure.
Table 1. Summary of main features and preprocessing procedure.
Feature TypeFeature NameDescriptionTransformation
NumericalIndoor TemperatureRoom temperature (°C)Standardized
NumericalHumidityIndoor relative humidity (%)Standardized
NumericalOutdoor TemperatureWeather data (°C)Standardized
CategoricalThermostat StateHeat/Cool/OffOne-Hot Encoded
CategoricalOverall StateWindow (Open/Close)One-Hot Encoded
TemporalHour-of-DayHour extracted from timestampSine/Cosine encoding
Table 2. Contextual comparison with two related studies (different datasets and targets; values not directly comparable).
Table 2. Contextual comparison with two related studies (different datasets and targets; values not directly comparable).
PaperTarget PredictedData SettingMAER2
Deep Transfer Learning Model [29]Thermal dynamicsSemi-controlled curated0.2590.83
ML-Based Predictive Model for Thermal Comfort [30]Comfort indexControlled0.0830.801
Gradient Boosted Trees (GBT, this study)Setpoint (“to”)Real-world residential1.230.55
Deep Learning (DL, this study)Setpoint (“from” + “to”)Real-world residential1.120.58
Table 3. Model for predicting key regression metrics.
Table 3. Model for predicting key regression metrics.
MetricGBT Regressor (from)GBT Regressor (to)
RMSE2.491.53
MSE6.212.36
MAE1.791.22
R-squared0.490.54
Table 4. Key performance metrics of the deep learning model at epoch 47.
Table 4. Key performance metrics of the deep learning model at epoch 47.
MetricValue
Validation MSE2.524
Validation MAE1.095
Validation RMSE1.589
Train MSE2.455
Train MAE1.074
Train RMSE1.567
Generalization Gap (Val–Train MSE)0.07
Table 5. GBT model “from” values vs. “to” values.
Table 5. GBT model “from” values vs. “to” values.
ThresholdRMSEMSEMAER2
From2.4976.2351.7950.498
To1.5382.3661.2270.545
Table 6. Comparison of machine learning models GBT and DL.
Table 6. Comparison of machine learning models GBT and DL.
ModelRMSEMAER2
GBT from2.501.790.50
GBT to1.541.230.55
DL1.611.120.55–0.60
Table 7. Overview of prediction focus and data attributes in recent studies.
Table 7. Overview of prediction focus and data attributes in recent studies.
StudyPrediction FocusData SourceDurationScaleBehavior RepresentationMain Contribution
Li et al. (2024) [29]Thermal dynamicsSmart thermostat logsShort–medium termLargeImplicitAccurate thermal response modeling
Boutahri & Tilioua (2024) [30]Thermal comfort indexSmart building sensorsShort termSmallLimitedComfort-oriented ML prediction
Ntafalias et al. (2024) [23]Energy trendsIoT platformMedium termBuilding-levelIndirectEnergy impact of IoT deployment
This studyThermostat setpoints (“from”/“to”)Operational residential IoT2 years370 zonesExplicit (occupant-driven)Deployment-scale behavior-aware setpoint prediction
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mosleh, F.; Hamidi, A.A.; Jahromi, H.A.; Ahad, M.A.R. Adaptive Thermostat Setpoint Prediction Using IoT and Machine Learning in Smart Buildings. Automation 2026, 7, 29. https://doi.org/10.3390/automation7010029

AMA Style

Mosleh F, Hamidi AA, Jahromi HA, Ahad MAR. Adaptive Thermostat Setpoint Prediction Using IoT and Machine Learning in Smart Buildings. Automation. 2026; 7(1):29. https://doi.org/10.3390/automation7010029

Chicago/Turabian Style

Mosleh, Fatemeh, Ali A. Hamidi, Hamidreza Abootalebi Jahromi, and Md Atiqur Rahman Ahad. 2026. "Adaptive Thermostat Setpoint Prediction Using IoT and Machine Learning in Smart Buildings" Automation 7, no. 1: 29. https://doi.org/10.3390/automation7010029

APA Style

Mosleh, F., Hamidi, A. A., Jahromi, H. A., & Ahad, M. A. R. (2026). Adaptive Thermostat Setpoint Prediction Using IoT and Machine Learning in Smart Buildings. Automation, 7(1), 29. https://doi.org/10.3390/automation7010029

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop