A Novel Learning Algorithm Based on Bayesian Statistics: Modelling Thermostat Adjustments for Heating and Cooling in Buildings

: The temperature of indoor spaces is at the core of highly relevant topics such as comfort, productivity and health. In conditioned spaces, this temperature is determined by thermostat preferences, but there is a lack of understanding of this phenomenon as a time-dependent magnitude. In addition to this, there is scientiﬁc evidence that the mental models of how users understand the operation of the billions of air-conditioning machines around the world are incorrect, which causes systems to ‘compensate’ for temperatures outside by adjusting the thermostat, which leads to erratic changes on set-points over the day. This paper presents the ﬁrst model of set-point temperature as a time-dependent variable. Additionally, a new mathematical algorithm was developed to complement these models and make possible their identiﬁcation on the go, called the life Bayesian inference of transition matrices. Data from a total of 75 + 35 real thermostats in two buildings for more than a year were used to validate the model. The method was shown to be highly accurate, fast, and computationally trivial in terms of time and memory, representing a change in the paradigm for smart thermostats.


Introduction
For the population around the globe that occupies air-conditioned spaces, having control over their heating, ventilation and air conditioning (HVAC) systems improves their satisfaction and determines their energy consumption [1,2]. However, the manual control of conditioning systems could lead to negative effects, since the inadequate control of thermostats leads to uncomfortable temperatures and lowers productivity in a range from 3.1% to 16% (the last one reported in call centres) [3,4]. The incorrect use of thermostats can, therefore, represent a substantial cost for companies; however, when used correctly, thermostats can produce savings by cutting energy use up to 20% [4,5].
Despite the importance of understanding set-point temperatures in buildings, there are few studies considering them as a time-dependent, human driven variable. A full understanding of the temperatures that building users are likely to choose is key to understanding energy consumption and comfort, and previous studies have not had full behavioural data as a time series available for study (one of the most complete but still limited sets of data for thermostats can be found in [5]. In this regard, it is worth mentioning the work shown in [6], who developed a framework for multi-occupancy environments to optimise comfort considering the feedback from physiological data; this is automatic and bypasses the behaviour of changing the thermostat. Although this method can be fully automatised, it requires extra equipment for capturing the physiological data. If smart thermostats become the norm, the association of these mechanisms with the modelling of human interactions with thermostats, as we show in this paper, could increase the accuracy of both methods. Another attempt at modelling user requirements is HVACLearn [7]; this work presents an interesting intelligent algorithm that learns from indoor air temperature, occupancy and thermal vote to anticipate occupants' preferences. This work, although rather comprehensive, leaves room for our work, which integrates a specific model for historical changes of the thermostat values. In addition, the new paradigm of the IoT today brings the potential of interconnected thermostats that optimise the conditions in room-level spaces while also allowing for central units' operation to be optimised [8]. Considering how important heating and cooling preferences are, and how much is gained from giving control to users, the full automation of thermostats does not seem to be the ideal solution. Instead, recent intelligent information and communication technologies (ICTs) are able to extract insights from the weather and the thermodynamic responses of buildings and at the same time read users' behaviours and preferences with respect to thermostat adjustments (the so-called smart thermostats). These systems can provide occupants with personalised advice, and with that, users can be made active agents in creating comfortable, productive and healthy spaces and in saving energy (for example [9]). In this new paradigm, there is still one part of the puzzle that is missing, which is being able to model and forecast the thermostat preferences of users.
Both short-and long-term energy consumption prediction are essential for building and grid design and operation. The literature has shown that this forecasting is only accurate when including users' behaviours, but no models have had enough detail to model the thermostat values. A variety of black-box (sometimes called data-driven) models have been developed that represent energy consumption as a whole, without disaggregating the effect that thermostat value selection may have. In the bibliography, one can find studies that used artificial neural networks (ANNs) and long short-term memory (LSTM) neural networks, support vector machines (SVR), random forest (RF), and/or statistical algorithms such as ARIMA and ordinary least squares (OLS) [10] as a way of modelling the energy use. Many studies compared the effectiveness of several algorithms, for example, SVM and ANN [11]; MLP, SVM, Gaussian processes (GP), SVR, RF, and eXtreme gradient boosting (XGB) [12]; and ANN with case-based reasoning (CBR) [13]. However, it was proven that data-driven models may be limited in providing an understanding of the analysis [14] since they are black-box models and do not allow for learning about the underlying phenomena, among which one can find the behaviour of users. Hybrid or grey-box modelling approaches offer a combination of physical and data-driven prediction models and provide a more interpretable solution. In [15] some performance improvement was seen when using a hybrid model for predicting the total and non-AC energy consumptions of residential buildings, but no behaviour was modelled. In short, the literature shows that the modelling of energy consumption had an embedded nonphysical component coming from the effect of the occupant's behaviour of which the highest impact is the thermostat [16], and that has not been modelled yet. This is inconvenient as it is known that large numbers of people's adjustments of thermostats is inadequate because people have incorrect mental models of how HVAC systems work [17]. Incorrect operation could be corrected if it were understood (with models), but if it is left uncorrected, it will lead to energy waste and discomfort.
Thermostat adjustment is an option for users to regulate air-conditioning machines. These days, this adjustment is made digitally in most cases; the user selects a given temperature that the machine tries to provide. This makes thermostat values over time a discrete series. In the new paradigm of the Internet of Things, thermostats are becoming intelligent and connected to the internet. This allows for incorporating intelligence to optimise occupants' comfort. This paper shows the creation of a model for the human action of modifying the thermostat value. This will allow for equipping smart thermostats with the capacity to anticipate what users want. In addition, smart services such as demandresponse events are likely to proliferate in the near future. The actuation over thermostat is one of the most promising demand-response events to minimise peak loads. Being able to model the occupants' preferences will allow for creating more adequate demand response programs, with more user acceptance and more effective peak reduction.
The importance of understanding users' preferences for air conditioning can be proven with the developments that are appearing in this field, such as the proliferation of the socalled smart thermostats. Smart thermostats are devices capable of connecting to the internet and capturing data for optimising the operation of air-conditioning machines, minimising the energy use and maximising the comfort. Smart thermostats require intelligent algorithms that predict occupants' habits to be complete [18], and technological advantages have been seen in this front, such as the Neurothermostat (based [19]), the Smart Thermostat [6] and others [20][21][22][23][24][25]. In addition, commercially available solutions are also now present in the markets, such as NEST, tado, and EcoBee's Smart-Si (nest.com, tado.com, ecobee.com/solutions/home/Smart-si, accessed on 10 June 2022).

Study Design and Cohort under Study
The study described in this paper includes initial data from air-conditioning devices installed in an Internet of Things environment. The experimental set-up involved the orchestration of IoT devices within a platform for data acquisition. From each one of the devices, a time series that provided the set-point temperature and the operation (on/off) was obtained for 21 months between March 2016 and November 2017 (included). The data had a sampling period of 6 min, and the data were stored in a secure database at the use-case premises. Additionally, the data came from two different buildings, which served to cross-validate the method proposed in this paper.
The air-conditioning machines were variable refrigerant flow consoles connected to central units on the roof. This implies that the cooling or heating potential provided by the machine originates in the central units, but the users can control how much they get from them by setting the thermostat temperature. This is a common installation of air conditioning in commercial and office buildings. The consoles are installed in independent spaces, and the offices are used by independent participants, so each time series represents the behaviour of a person. A few of them are located in multiple-use rooms such as libraries, meeting rooms, or seminar rooms, but these were excluded from this study and the evaluation of the algorithms.

Building Characteristics
The experiment was carried out in two university buildings, one located in an elevated suburban area in a medium-sized city. The building is used mainly for offices and laboratories (these were taken out of the experiment as they have dedicated ventilation systems and could have distorted the results). The building was constructed in the 1970s and has single glazing in openable windows. The rooms' areas range from 10 to 20 m 2 , but the majority of them are 10 m 2 . The offices are distributed across the four floors of the building. The second building is also located in suburbia, but it was built in the 2000s. As in the previous case, the devices chosen were those from individual offices.
The consoles in each room are controlled with a remote controller with which the user can turn the machine on or off and adjust the temperature set-point. The adjustments of the temperature can be made in any value ranging from 16 to 30 • C in intervals of one degree. The system does not have any resetting action, so the machines are not automatically moved to a given reference. The users were not given instructions on how to operate the machines or what temperatures they should set on the machines. They were also not informed that the study was specific about the thermostat values they would choose.

Platform and Sensors
The IoT platform used in this work is described in detail in (Terroso et al., 2018), but an overview is presented here for contextualisation. The sensorisation is in charge of connecting physical devices or actuators that are going to provide data to the platform. For the A/C machines, the connection was made through a gateway, which allowed for obtaining data from each independent machine.

Modelling the Thermostat Adjustment Phenomena and Testing Approaches
To the authors' knowledge, we present for the first time a model of thermostat values as a time series. The literature on the topic shows that when modelling series in the field of energy consumption conditioned by users' preferences, Markov chains (MCs) are the most accurate and accepted option [26][27][28]. However, no modelling of the thermostat value has been attempted in the past in this form to the best of the authors' knowledge. The technique explained in the following sections is based on Markov chains, and it allows for modelling the behaviour of thermostat adjustment as a time-dependent variable as in energy series in the previous literature. This allows for developing behavioural models, and the characterisation of users depending on how they use conditioning systems.

Markov Chains
Markov chains (MCs) have been long used to model energy-related behaviours. In this study, each one of the time series corresponds to a single A/C machine and therefore a participant. The time series representing thermostat values can be represented with a Markov chain without hidden states as the series is discrete. For the identification of the transition matrices, one has to calculate the probability of moving from one state to another; in this way, the transition matrix will be formed, as shown in Equation (1) [29]. P i,j = P X n+1 = s j X n ) = s i ; ∀n, i, j ∈ {1, . . . , k}, ∀i 0 , . . . , i n−1 ∈ {1, . . . , k} where P i,j is component i, j of the transition matrix and P X n+1 = s j X n ) = s i is the probability of state s j conditioned to the probability of the previous step being s i . As the data represent a finite realisation of the MC, the probabilities have to be estimated. An algorithm based on maximum likelihood [30] is used for this, as this provides with confidence intervals of the estimation of the matrix. This will be needed for the rest of the study. With this method, the maximum likelihood estimation of each component of the transition matrix is given by Equation (2): is the i, j component of the transition matrix and n ij is the number of transition from i to j in the data. With this, the error on the estimation can be calculated with Equation (3) [31]: where SE ij is the standard error of element i, j of the transition matrix. An example of a transition matrix of a time series used for this work can be seen in Figure 1.
Some data were pre-processed to analyse the series under study, and in some cases, the set-point temperature stayed fixed for a large period of time. This leads to components on the diagonal of the transition probability being significantly large. Because of this, a threshold was put in place to exclude series in which the largest component of the diagonal of the transition matrix was larger than 0.997. The reason we selected this criterion is that we saw that this zone was empty and therefore was not relevant for an algorithm aiming at representing user behaviour. Some data were pre-processed to analyse the series under study, and in some cases, the set-point temperature stayed fixed for a large period of time. This leads to components on the diagonal of the transition probability being significantly large. Because of this, a threshold was put in place to exclude series in which the largest component of the diagonal of the transition matrix was larger than 0.997. The reason we selected this criterion is that we saw that this zone was empty and therefore was not relevant for an algorithm aiming at representing user behaviour.

Live Bayesian Inference of Transition Matrices (LBITM)
The method created for this work is new to the authors' knowledge and has been given the name live Bayesian inference of transition matrices (LBITM). It consists of using Bayesian learning to infer the transition matrix in real time. The rationale behind this is that a life transition matrix can be updated as the data keep coming, and in the real world, it can be an identifier of the behaviour of the user for that period. Each row of the transition matrix represents a discrete probability distribution that indicates the probability of switching to another state from a given state. These probabilities can be considered a set of parameters θ with variance representing the confidence on each component of the matrix. As the method is Bayesian, at the same time that the transition matrix is obtained, the variance of that given matrix is calculated. This allowed us to have a metric of the certainty about its parameters.
Considering this, one can then define a prior set of parameters and a prior set of variances. With these priors, Bayesian learning can be implemented by adapting the equations from [32] as Equation (4), which was created for the method: where is the posterior distribution of variance, β is a noise precision, is the posterior estimation of the parameters and t is the target, considered here the data point on that time step. The fact that the LBITM uses the variance of the matrix is ideal for the problem at hand as the variance of the transition matrix is key for evaluating the certainty on transition matrices of Markov chains as it is shown on Equation (3).
Each new data point that represents the change of state from i to j in the method will be represented by a vector with 1 in the position j and a 0 in the rest of positions, which will be vector t in Equation (4).

Live Bayesian Inference of Transition Matrices (LBITM)
The method created for this work is new to the authors' knowledge and has been given the name live Bayesian inference of transition matrices (LBITM). It consists of using Bayesian learning to infer the transition matrix in real time. The rationale behind this is that a life transition matrix can be updated as the data keep coming, and in the real world, it can be an identifier of the behaviour of the user for that period. Each row of the transition matrix represents a discrete probability distribution that indicates the probability of switching to another state from a given state. These probabilities can be considered a set of parameters θ with variance representing the confidence on each component of the matrix. As the method is Bayesian, at the same time that the transition matrix is obtained, the variance of that given matrix is calculated. This allowed us to have a metric of the certainty about its parameters.
Considering this, one can then define a prior set of parameters and a prior set of variances. With these priors, Bayesian learning can be implemented by adapting the equations from [32] as Equation (4), which was created for the method: where S n is the posterior distribution of variance, β is a noise precision, m n is the posterior estimation of the parameters and t is the target, considered here the data point on that time step. The fact that the LBITM uses the variance of the matrix is ideal for the problem at hand as the variance of the transition matrix is key for evaluating the certainty on transition matrices of Markov chains as it is shown on Equation (3).
Each new data point that represents the change of state from i to j in the method will be represented by a vector with 1 in the position j and a 0 in the rest of positions, which will be vector t in Equation (4).
As shown in Equation (4), the Bayesian learning does not require complex operations in each time step. Instead, this method can update the transition matrix with a few operations. The results will show the computational advantage that this represents with respect to computational cost. The strength of Bayesian inference comes from the use of prior information on the initial model parameters and the initial variances of the prior parameters.
The prior can be used to provide higher-level information about the parameters of the model and their variance. Following this, one could define the initial prior as that defined in Equation (5). To distribute the variance in an equal way on all the probabilities, the same was used for all the components of the matrix: where S 0 is the initial variance and σ 0 is a factor of confidence (given the value of 2 in our case as the parameters are by definition bounded by [0 1]). The reason we have chosen a value of 2 is because this helps to converge the algorithm at the same time that it validates S as a good indicator of the confidence of the parameters (if it becomes smaller, it means that the algorithm has really found a good estimation with the data available). Equation (1) defines the components of a matrix with all terms having a value of one and dimensions equal to the transition matrices.
To execute the algorithm, each data point can be transformed to the corresponding target vector t and the matrix of parameters and variances updated using Equation (4). The result shows that this newly developed LBITM converges to the actual transition matrix calculated with the frequentist approach, a principle that should be made true when using Bayesian inference. This means that with the LBITM, we manage to obtain the same result as with the frequentist method but using a fraction of the time and considerably fewer resources.
The model created in this work to characterise thermostat preferences was developed under the rationale of a new paradigm of smart controls, more precisely, of smart thermostats. The model of the thermostat adjustment with Markov chains can be seen on Section 3.1.1. Also, as part of this research work, it is desirable to have the feature of being trained with data obtained on-the-go. For this reason, we also present in this paper the live Bayesian inference of transition matrices (LBITM), a new algorithm that is capable of estimating transition matrices using Bayesian inference and an informed prior. The method developed here, LBITM, consists of a mathematical formulation capable of obtaining the transition matrices of the Markov chains to model each individual behaviour corresponding to an individual person in real time, which could have great value for a personal smart thermostat.
To evaluate the quality of the newly developed model and algorithm, several performance indicators were tested. The evaluation started with accuracy comparisons. The model and the LBITM were compared with the batch estimation of transition matrices. In this way, it was possible to evaluate the method with the state of the art, which obtained transition matrices from the batched data (as, for example, in [33]). Then, LBITM was tested when the information from the first test was used as prior on a different set-up (building B) under a transfer learning philosophy. Afterwards, the computational intensity of the method was tested, and finally, the differences in energy use predicted when using state-of-the-art methods and the LBITM were evaluated.

Validation of the LBITM Method
For the evaluation of the proposed method, 75 devices were each used by a unique user and were selected after a cleaning stage from more than 200 devices. In this test, the accuracy of LBITM, which uses online estimation, was compared with batch estimation algorithms. For this test, the prior for the LBITM was a transition matrix obtained by applying the average to all transition matrices of the devices. With this, the LBITM method was tested only as an online method, removing potential errors that could have been introduced for having a prior unrepresentative of the set (which will be tested in the next section as a transfer learning application). The prior obtained as the average of the actual transition matrices of all the devices in the building, and used in this test, has been called the endemic prior.
To ensure that the evaluation was meaningful and comparable between the live and the batch methods, the errors between the actual matrices and those estimated with LBITM and the batch operation for each device were normalised in the following way: For each device, the actual transition matrix was calculated with the whole series, and at the same time, a confidence interval on the actual matrix was obtained. Then for this device, a parameter epsilon was defined to be the Euclidean distance between the actual matrix and its upper bound of this confidence interval. This epsilon defines the radius of a hypersphere within which no transition matrix is statistically different from the actual transition matrix considering the data that are available. For the normalisation, the errors from the batch and from the LBITM estimations were divided in each device by its corresponding epsilon. This made possible the comparison between series despite the certainty in the given actual transition matrix (which depends on the series). With this normalisation, an estimation of one of the algorithms with an error of 1 is at the same Euclidean distance to the actual matrix as it is the upper bound of the confidence interval to the actual transition matrix. The result of this first test corresponding to the evaluation of the LBITM compared with the batch method with the endemic prior is shown in Figure 2.
matrix considering the data that are available. For the normalisation, the errors from the batch and from the LBITM estimations were divided in each device by its corresponding epsilon. This made possible the comparison between series despite the certainty in the given actual transition matrix (which depends on the series). With this normalisation, an estimation of one of the algorithms with an error of 1 is at the same Euclidean distance to the actual matrix as it is the upper bound of the confidence interval to the actual transition matrix. The result of this first test corresponding to the evaluation of the LBITM compared with the batch method with the endemic prior is shown in Figure 2.
After this experiment, it was possible to see that when using LBITM, the error is in all cases smaller than twice the confidence on the actual transition matrix and in all cases smaller than the batch algorithm unless they were within the confidence interval of the estimation (not significantly different) (Figure 2). The advantage of the LBITM method is how it compares with the batch method: In the first 320 h of operation, the model created at this stage using batches of data can be as distant as four times the confidence interval, whereas with LBITM, the error is constrained to half that value. Although 320 h of operation (13.5 days) may sound small for buildings that have a lifetime of decades (if not centuries), it is relevant for this kind of application as the smart thermostats will have to capture the preferences of users as soon as possible, and the lifetimes of these devices and the tariffs that they may offer are much shorter.  Figure 2. The evolution of the estimation representing every 80 h of operation for the batch estimation and for the LBITM for building A with an endemic prior. The graph has been normalised considering an estimation error equal to 1. The error was considered the Euclidean distance between the matrices. The series from building A were used to obtain the endemic prior. The boxes represent 50% of the data and the whiskers 99.3%.
After this experiment, it was possible to see that when using LBITM, the error is in all cases smaller than twice the confidence on the actual transition matrix and in all cases smaller than the batch algorithm unless they were within the confidence interval of the estimation (not significantly different) (Figure 2). The advantage of the LBITM method is how it compares with the batch method: In the first 320 h of operation, the model created at this stage using batches of data can be as distant as four times the confidence interval, whereas with LBITM, the error is constrained to half that value. Although 320 h of operation (13.5 days) may sound small for buildings that have a lifetime of decades (if not centuries), it is relevant for this kind of application as the smart thermostats will have to capture the preferences of users as soon as possible, and the lifetimes of these devices and the tariffs that they may offer are much shorter.

Robustness of the Method and Transfer Learning
The LBITM performed better than batch estimation with an endemic prior for all tests with different hours of operation. To validate the method fully, it was necessary to test the algorithm with data series that were independent of the series for the building under study here. To do this, a test was performed on a different building but with similar use, i.e., an exogenous prior. In this test, 35 series from building B were evaluated as in the previous case considering a normalisation based on the Euclidean norm between the actual transition matrices and their upper confidence intervals. The results of this test are shown in Figure 3.
considering an estimation error equal to 1. The error was considered the Euclidean distance between the matrices. The series from building A were used to obtain the endemic prior. The boxes represent 50% of the data and the whiskers 99.3%.

Robustness of the Method and Transfer Learning
The LBITM performed better than batch estimation with an endemic prior for all tests with different hours of operation. To validate the method fully, it was necessary to test the algorithm with data series that were independent of the series for the building under study here. To do this, a test was performed on a different building but with similar use, i.e., an exogenous prior. In this test, 35 series from building B were evaluated as in the previous case considering a normalisation based on the Euclidean norm between the actual transition matrices and their upper confidence intervals. The results of this test are shown in Figure 3. The graph was normalised considering the estimation error equal to 1. The error considered the Euclidean distance between the matrices. Building A was also used, in this case, to obtain an informed prior for a transition matrix that represented the temperature switching of the thermostat.
In the previous experiment, the prior obtained in building A was informative of the behaviour shown in that building. However, the results from this test also showed the intrinsic reflection of the core of thermostat behaviour as it also represented a good start for estimating the transition matrices of each device. The test with participants from Building B with an exogenous prior ( Figure 3) gave results nearly as good as those obtained for the previous test with an endemic one.
The results show that the LBITM with an exogenous prior is not as accurate as that with an endemic prior in the first iterations. However, the estimation of the matrices with the LBITM still outperforms the batch estimations in the first batches. Additionally, in all cases, the estimations with the LBITM never exceed three times the confidence intervals. As in the previous case, the LBITM may have slightly larger errors at intermediate times of the estimation (around 720 h); however, the p-values demonstrate no statistical significance between these errors. The graph was normalised considering the estimation error equal to 1. The error considered the Euclidean distance between the matrices. Building A was also used, in this case, to obtain an informed prior for a transition matrix that represented the temperature switching of the thermostat.
In the previous experiment, the prior obtained in building A was informative of the behaviour shown in that building. However, the results from this test also showed the intrinsic reflection of the core of thermostat behaviour as it also represented a good start for estimating the transition matrices of each device. The test with participants from Building B with an exogenous prior ( Figure 3) gave results nearly as good as those obtained for the previous test with an endemic one.
The results show that the LBITM with an exogenous prior is not as accurate as that with an endemic prior in the first iterations. However, the estimation of the matrices with the LBITM still outperforms the batch estimations in the first batches. Additionally, in all cases, the estimations with the LBITM never exceed three times the confidence intervals. As in the previous case, the LBITM may have slightly larger errors at intermediate times of the estimation (around 720 h); however, the p-values demonstrate no statistical significance between these errors.

Computational Time
For an algorithm to work on a smart device under the IoT paradigm, it has to require limited computational resources to run because these devices are normally small and operated with batteries. When looking at computational times, LBITM showed outstanding performance, and the test that proved that is shown in this section. Calculating the transition matrix in the batched mode required a great deal of computational resources, and we put in place a clock mechanism in our testing framework to analyse the time needed to obtain the transition matrices in each case. The need for storing the data and the long computational times in each calculation of the transition matrix were the main disadvantages of the batch estimation. Hence, the computational needs of the LBITM were evaluated to test its benefits in this respect.
The computational effort of LBITM is trivial, as it only requires a few operations. For each data point, only the row containing the current transition of the Markov chain needs to be modified, and for that row, a few algebraic operations are performed in order to update its values. The computational times were rather constant in the realisations of the testing framework, so one of the instances (one series) is shown here for demonstration in Figure 4. operated with batteries. When looking at computational times, LBITM showed outstanding performance, and the test that proved that is shown in this section. Calculating the transition matrix in the batched mode required a great deal of computational resources, and we put in place a clock mechanism in our testing framework to analyse the time needed to obtain the transition matrices in each case. The need for storing the data and the long computational times in each calculation of the transition matrix were the main disadvantages of the batch estimation. Hence, the computational needs of the LBITM were evaluated to test its benefits in this respect.
The computational effort of LBITM is trivial, as it only requires a few operations. For each data point, only the row containing the current transition of the Markov chain needs to be modified, and for that row, a few algebraic operations are performed in order to update its values. The computational times were rather constant in the realisations of the testing framework, so one of the instances (one series) is shown here for demonstration in Figure 4. The computational times for the two methods are substantially different. The computational time to update the transition matrix for each one of the devices in each time step is always close to 3 × 10 −3 s. Oscillations in this time were noisy due to internal processes in the computer. In the case of the batch estimation, the computational time increased rapidly (Figure 4). It should be noted that the computational time was two orders of magnitude higher, even in the calculation of the transition matrix with the smallest sample (480 points). This suggests that a device required to perform this calculation would need to have computational power that could can render the modelling unfeasible in many applications (small, battery-powered IoT devices). We consider this very short computational time one of the main advantages of the LBITM method. In the new paradigm of distributed computing and IoT, having an algorithm that can bring excellent results in such short time is highly beneficial. The computational times for the two methods are substantially different. The computational time to update the transition matrix for each one of the devices in each time step is always close to 3 × 10 −3 s. Oscillations in this time were noisy due to internal processes in the computer. In the case of the batch estimation, the computational time increased rapidly ( Figure 4). It should be noted that the computational time was two orders of magnitude higher, even in the calculation of the transition matrix with the smallest sample (480 points). This suggests that a device required to perform this calculation would need to have computational power that could can render the modelling unfeasible in many applications (small, battery-powered IoT devices). We consider this very short computational time one of the main advantages of the LBITM method. In the new paradigm of distributed computing and IoT, having an algorithm that can bring excellent results in such short time is highly beneficial.

Implications on Conditioning Machine Operation
The most recent publications are pointing towards a new scenario in which thermostats will be connected to the internet and equipped with embedded operative systems. This will allow them to be 'smart' in the sense that they will be capable of making predictions based on data collected on the go and on prior information based on other datasets (transfer learning), and other relevant data (weather forecast, location, demographics of users, etc.). The method shown here could be the base of these new thermostats, but for that, it is also necessary that its performance at accurately predicting the operation of airconditioning machine.
To evaluate this, 10 series of thermostats from building A were used to estimate their transition matrices with batch estimation and with LBITM using 4320 data points at a resolution of 6 min (approximately one day of operation) that were re-sampled taking one every 48, so the series has relatively few redundant points. After this, the empirical cumulative probability distributions (ECPD) of the thermostat values forecasted with that series over a month were obtained for the batch algorithm and for the LBITM. The ECPD can be visualised as curves represented on power over time when transforming the thermostat temperature to power in a given scenario of external temperature and heat transfer coefficient. It was possible to generate three of them, one for the actual transition matrix, one for that obtained with batch estimation and one obtained with the LBITM algorithm. The error between the two ECPDs estimated and the actual transition matrix will have units of kelvin × hour. Considering a typical building of 100 m 2 floor area and a heat transfer oefficient of 2 W/(m 2 K), one can estimate the error of energy predicted operation using the two methods in kWh for a given month ( Figure 5).
predictions based on data collected on the go and on prior information based on other datasets (transfer learning), and other relevant data (weather forecast, location, demographics of users, etc.). The method shown here could be the base of these new thermostats, but for that, it is also necessary that its performance at accurately predicting the operation of air-conditioning machine.
To evaluate this, 10 series of thermostats from building A were used to estimate their transition matrices with batch estimation and with LBITM using 4320 data points at a resolution of 6 min (approximately one day of operation) that were re-sampled taking one every 48, so the series has relatively few redundant points. After this, the empirical cumulative probability distributions (ECPD) of the thermostat values forecasted with that series over a month were obtained for the batch algorithm and for the LBITM. The ECPD can be visualised as curves represented on power over time when transforming the thermostat temperature to power in a given scenario of external temperature and heat transfer coefficient. It was possible to generate three of them, one for the actual transition matrix, one for that obtained with batch estimation and one obtained with the LBITM algorithm. The error between the two ECPDs estimated and the actual transition matrix will have units of kelvin × hour. Considering a typical building of 100 m 2 floor area and a heat transfer oefficient of 2 W/(m 2 K), one can estimate the error of energy predicted operation using the two methods in kWh for a given month ( Figure 5). Figure 5. Errors on the prediction of operation for the batch estimation and the LBITM. The transition matrices were generated with 48 data points of the series that represent approximately a day of operation, and the obtained transition matrix was used to generate the forecasted operation in the two cases. The p-value when comparing both distributions with a t-test is <0.0001.
When looking at the operation of the machines as the number of hours that the system will be working at each given temperature for a future period of time, it is possible to see that the error in the prediction of operation with batch estimation is rather high. This comes as no surprise, as there is little information on the first hours of operation of a machine in the batch method. However, using LBITM, the error in the predicted operation was rather small for the 10 series. This is because the LBITM is an informed method that Figure 5. Errors on the prediction of operation for the batch estimation and the LBITM. The transition matrices were generated with 48 data points of the series that represent approximately a day of operation, and the obtained transition matrix was used to generate the forecasted operation in the two cases. The p-value when comparing both distributions with a t-test is <0.0001.
When looking at the operation of the machines as the number of hours that the system will be working at each given temperature for a future period of time, it is possible to see that the error in the prediction of operation with batch estimation is rather high. This comes as no surprise, as there is little information on the first hours of operation of a machine in the batch method. However, using LBITM, the error in the predicted operation was rather small for the 10 series. This is because the LBITM is an informed method that captures from before the intrinsic nature of thermostat operation. The test showed that in all cases, this error in operation prediction is smaller using LBITM than batch estimation.

Discussion
This work shows a study of data coming from IoT devices serving as the thermostats of individual air conditioning machines. The modelling proved to be accurate, and we believe it is a fundamental step towards efficient and comfortable smart buildings. It was seen that the Markov chains are capable of accurately representing the temperatures chosen by the users. With this, it has been possible to bridge the gap in the full characterisation of occupants' behaviour with respect to thermostat preferences, which was lacking until now, in the discipline of building physics.
The LBITM method was created for this research after observing a lack of methods on the live modelling of thermostat actuation behaviour. This method was evaluated with real data, and the results are rather successful. Further investigation is suggested on Bayesian methods of obtaining transition matrices of Markov chains, as the authors did not find much on this in the available literature. It is expected that this method could be used for other disciplines that use Markov chains and that the characterisation of these chains as informed priors could be beneficial.
LBITM proved to be highly accurate and time efficient for the problem at hand. The method was based on using prior information that captures how people use thermostats, and it was derived from real data. The algorithm is capable of reaching a solution that is within the confidence intervals of the actual parameters in very short times, and it does not lead to mistaken estimations even at the start of the run when few data points are available.
Overall, LBITM is a powerful modelling tool for the phenomenon that was studied here, and it allows implementing on-line modelling of this phenomenon at virtually zero computational cost. This method in conjunction with Markov-Chain-based modelling is a long-expected and much-needed tool for smart heating and cooling controls and will have a substantial impact on the scientific and professional community of smart-technologies and of the Internet of Things.
The work presented here provided good results and could be complemented with demographic studies of thermostat users. It is believed that the findings obtained here represents certain demographic profiles (mainly warm climates) and a demographic study could add value to this research.

Conclusions
After the research shown in this paper, we suggest that the human behaviour of changing thermostat values can be modelled using Markov chains with the result of representing the resultant energy consumption and transitions between state and state rather well.
In the current situation, in which the IoT imposes over all domains, a "live" approach to modelling users' thermostat behaviour was beneficial. This live implementation for characterising the behaviour was much faster when using Bayesian principles, which have not been used in the past for this problem. This will eventually lead to defining more acceptable demand-response policies for intelligent buildings. To do so, we propose the live Bayesian inference of transition matrices (LBITM) method for estimating the transition matrix of a user's temperatures. This method is based on Bayesian learning that does not require complex operations, and therefore, it is suitable for IoT devices. The present paper gives insight into the suitability of applying Bayesian-based approaches to estimate user behaviour for thermostat values. The paper also demonstrates the possibility of transferring its potentially promising results to other settings, at least for certain hours of operation.
This method allows for smart thermostats to incorporate an extra feature that consists of a digital twin of the operator, making them capable of understanding the type of users they are serving, anticipating their decisions and suggesting better ways of operation for optimising comfort, minimising energy use and increasing the life span of the equipment.
Although more work is needed in the field of smart thermostat modelling, we believe that this is an important step in the field and that the methodology we offer is innovative, covers a current need and is sound enough for adoption and improvement.

Institutional Review Board Statement:
This research was performed using data from machines and no personal information from subjects. Furthermore, the research did not involve experimentation with humans or animals.

Informed Consent Statement:
No informed consent form was needed for this study as there was no sensitive data.

Conflicts of Interest:
The authors declare no conflict of interest.