Computational Intelligence Technologies for Occupancy Estimation and Comfort Control in Buildings

: This paper focuses on the development of a multi agent control system (MACS), combined with a stochastic based approach for occupancy estimation. The control framework aims to maintain the comfort levels of a building in high levels and reduce the overall energy consumption. Three independent agents, each dedicated to the thermal comfort, the visual comfort, and the indoor air quality, are deployed. A stochastic model describing the CO 2 concentration has been studied, focused on the occupancy estimation problem. A probabilistic approach, as well as an evolutionary algorithm, are used to provide insights on the stochastic model. Moreover, in order to induce uncertainty, parameters are treated in a fuzzy modelling framework and the results on the occupancy estimation are investigated. In the control framework, to cope with the continuous state-action space, the three agents utilise Fuzzy Q -learning. Simulation results highlight the precision of parameter and occupancy estimation, as well as the high capabilities of the control framework, when taking into account the occupancy state, as energy consumption is reduced by 55.9%, while the overall comfort index is kept in high levels, with values close to one.


Introduction
The problem of occupancy estimation and prediction, in the context of comfort control in buildings, is of crucial importance and has been studied by many authors [1]. Our motivation for this work is to combine an algorithm that estimates the number of occupants inside a room, given the CO 2 state concentration, with a multi agent control system that utilises the available information and controls the comfort indexes: thermal comfort, visual comfort, and indoor air quality. Most heating, ventilation, and air conditioning systems (HVAC) function on the assumption of maximum occupancy, not taking into account the variations of the occupancy level. There are various factors to be studied, in order for an estimation algorithm to be developed, that computes the actual number of occupants that entered a room during a time period.
In [2], Dong et al. study the control of HVAC systems using three approaches for the occupancy prediction: the expectation maximisation algorithm, a finite state automata method and introduces a method based on uncertain basis functions. A method based on a hybrid neural network is proposed in [3], where an extreme learning machine tuned with differential evolution is proposed. Models of occupancy in a single person office are studied in [4] using a non-homogeneous Poisson process to simulate the occupancy sequence. In [5] Feng et al. study the simulation of occupancy in buildings in four levels: the number of occupants in a building, the occupancy status in a space, the number of occupants in a space and the space location of an occupant. The correlation between occupancy and energy performance of a building is presented in [6], together with the impact on the quality of the indoor environment. A study that takes into account the stochastic nature of occupants behaviour can be found in [7], where a Markov chain is proposed. The Markov process is also studied in the context of occupancy models in [8]. Occupancy prediction based on the carbon dioxide concentration is studied in [9], where Jiang et al. study a Bayesian filtering approach. In [10], authors study models for CO 2 concentration prediction: A deterministic mass balance differential equation is firstly used and, moreover, a linear model is constructed using system identification techniques. The importance of occupancy for building control is highlighted in [11], where a model predictive controller utilises the occupancy information.
Occupancy estimation models can be classified into three categories. (a) statistical (black-box) models, (b) physical/deterministic (white-box) models, and (c) grey-box models. The first category includes hidden Markov models, machine learning methods, and artificial neural networks [2][3][4]. The second category, the physical models, are most often based on a CO 2 mass balance equation [9,12,13]. The benefits of the first and second category, i.e., black and white box models, are incorporated in the category of grey-box models. The latter models synthesise physical knowledge with statistical and data-driven parameter estimation. A grey-box model can describe variations in a room's carbon dioxide level and estimate room occupancy [3,[14][15][16][17].
Multi agent control systems are a group of agents which, in order to solve a problem, interact with the environment and with each other. MACS have been proposed for achieving high levels of thermal comfort inside a building and reducing the consumption of the heating/cooling system by efficiently controlling the indoor temperature. In [18], a MACS for a building with two rooms is proposed in order to control the thermal comfort of the two rooms through a central HVAC system. Intelligent control of HVAC systems [19] is also studied in [20] using a fuzzy based approach for personal comfort satisfaction. In [21] a MACS with multiple layers and a central agent for coordination is proposed. The system aims to reduce the overall efficiency of the building and simultaneously to satisfy the thermal needs of the occupants. Additionally, MACS have been proposed for achieving air quality, and thermal and visual comfort in the interior of the buildings with minimisation of energy consumption [22]. The MACS controls the actuators regarding the CO 2 concentration, the illumination, and the temperature in conjunction with reducing the energy consumption. These approaches are based on multi-layer MACS with local agents and supervisors which can lead to failure if the central agents fail. Moreover, they are based on the expert knowledge or on offline optimisation algorithms for adjusting parameters and they do not use any learning mechanism [23]. Only recently, few methods using reinforcement learning have been proposed but these methods are only limited to achieve thermal comfort [24]. The main drawback of this method is that it cannot be applied for continuous state-action space. Fuzzy logic systems can be used in order to overcome this drawback. Moreover, none of the above mentioned papers, combine occupancy estimation models with a multi agent control system, in order to reduce energy waste. In the proposed system, a grey-box model consists of stochastic differential equation (SDE) and fuzzy sets is able to quantify uncertainties, which improves the occupancy estimation. The contribution of this study is to: • Provide a fully decentralised control architecture; • Provide a control architecture based on reinforcement learning; • Incorporate a fuzzy logic system with reinforcement learning; • Combine a multi agent control system with a grey-box modelling; • Combine a multi agent control system with occupancy estimation; • Combine fuzzy modelling with occupancy estimation.
The paper is organised as follows: The problem under study is stated in Section 2, in addition to a discussion on general notions of stochastic processes and the analysis of the CO 2 adopted stochastic model. Stochastic differential equations' parameter estimation and differential evolution are also analysed. The use of the occupancy estimation algorithm, proposed in [14] is being studied for the estimated and the fuzzy parameters of the stochastic model, using two different occupancy profiles. Furthermore, a thorough analysis of Fuzzy Q-learning and the MACS is presented, together with the mathematical modelling of the building under study. Simulation results are demonstrated in Section 3 for each of the methods used and conclusions are discussed.

Problem Statement
The problem studied in the present work focuses on the development of a multi agent control system for the control of the artificial lighting, ventilation, and heating/cooling subsystems, in order for the comfort indexes to be maintained in the desired levels. The MACS should utilise the occupancy information, estimated using carbon dioxide concentration data. The carbon dioxide generation is considered to be given by the solution of a stochastic differential equation. By numerically solving the stochastic differential equation the training data are generated, representing the CO 2 levels in the room under study. Following that, an occupancy estimation algorithm is implemented to give information on the number of occupants. Two main assumptions have been made: • The training set consists of 1440 state measurements, corresponding to one working day, 24 h, times 60 min per hour; • The state, i.e., the CO 2 concentration level, is fully observable.
The work flow of the proposed method, graphically illustrated in Figure 1, is the following: Extending the CO 2 mass balance equation to a stochastic differential equation. The stochastic mass balance equation is numerically solved, to generate data of CO 2 concentration. Assuming that the solution time-series represents a stochastic process with unknown parameters, the parameters are estimated by maximising the likelihood function of the data with a differential evolution algorithm. Given the data and the estimated parameters, the occupancy estimation algorithm is implemented to compute the unknown number of occupants, that generated the data. Since the method of maximum likelihood provides only point estimates of the unknown parameters, fuzzy modelling is employed to study the impact of the parameters' uncertainty on the occupancy estimation. The estimated profile of the building occupancy is going to be used as input in the MACS system. The MACS system consists of three decentralised local agents and controls the actuators, each responsible for the indoor temperature, the indoor horizontal illuminance, and the carbon dioxide concentration. Occupancy estimation defines whether the building is unoccupied and whether the actuators need not to operate, in order to reduce energy waste in time periods where there are no occupants.

Stochastic Processes and CO 2
The generation and decay of CO 2 is governed by the mass balance ordinary differential equation [25,26] given by: where V denotes the volume of the room, x CO 2 (t) the carbon dioxide concentration, x CO 2 out is the outdoor carbon dioxide concentration, q is a term associated with the ventilation and infiltration and window air exchange. G CO 2 is a term associated with the generation of carbon dioxide in the room. The latter contains two terms: the carbon dioxide generation per occupant c CO 2 occ multiplied by a positive integer valued number n occ (t), denoting the number of persons, coming from the fact that the more the persons the higher the concentration in the room. If a closed window and mechanical ventilation free room are considered, the q term in Equation (1) is characterised only by the infiltration rate q in f . In order to adopt a more realistic model for the carbon dioxide concentration in the room we must treat the CO 2 time series as a stochastic process. By adopting a stochastic model, e.g., considering the CO 2 concentration data as a stochastic process, uncertain factors and other random unmodelled phenomena are taken into account [26]. The stochastic nature emerges not only from known factors, e.g., varying exhalation rates of the occupants in the room, the air exchange through doors, the non-homogeneous nature of the room's air, as well as unknown factors that cannot be described in mathematical terms. As referred to in [26], the CO 2 differential equation contains difficult to exact modelling factors, even in the deterministic case, such as surface area on which indoor pollutants are deposited and the deposition velocity of the pollutant, among others. Such terms can be considered as unmodelled, since their exact mathematical expression in the equation may lack accuracy. The Brownian motion, which will be introduced to the mass balance differential equation, can model both known but uncertain, and unknown or unmodelled phenomena, that govern the indoor carbon dioxide concentration.
A formal definition of a stochastic process is given below: 27]). Let the probability space (Ω,F ,P), the ordered set T and the measurable space (E,G). A stochastic process is a collection of random variables X = {X t ; t ∈ T}, where for each fixed t ∈ T, X t is a random variable in (Ω,F ,P) to (E,G). Ω denotes the sample space, and E denotes the state space of the stochastic process X t .
Stochastic processes can be viewed as a function of both t ∈ T and ω ∈ Ω, e.g., for fixed t we have a random variable, X t , and for fixed ω in the sample space Ω we find the function X t (ω) : T → E, which is the realisation of the process. One of the most important stochastic processes is the Brownian motion [28].
Therefore, considering the CO 2 generation as a stochastic process the concentration is associated to a random variable, due to the unknown influencing random effects. Since a deterministic differential equation model has been used, Equation (1), the randomness must be encapsulated in the model. By adding a noise term, the ordinary differential equation can be extended to a stochastic differential equation. Stochastic differential equations are used to model dynamics which exhibit random behaviour. It is now important to state the connection of stochastic differential equations and stochastic processes, in order to justify why adopting such an approach makes perfect sense.
Our purpose is to solve the following stochastic differential equation of the general form Let the filtration F t generated by the Brownian motion B t , i.e., F t := σ({B s : s ∈ [0, t]}), for every t ≥ 0. Under mild conditions (Similarly to Lipschitz continuity and linear growth) on the functions µ(t, X(t); θ) and g(t, X(t); θ) the solution of the stochastic differential equation exists and it is a unique process, X(t), t ≥ 0 satisfying: • It has continuous paths; • It is F t -adapted; • Almost surely, it holds that The solutions of stochastic differential equations are stochastic processes called diffusion processes. The mathematical treatment of stochastic differential equations is quite technical, however in the case of linear one-dimensional equations, closed form solutions could be derived. Moreover, as discussed in the next section, the solutions of stochastic differential equations are Markov processes [29], a property that turns out to be crucial in practical implementations.

The Hull-White Model as Mass Balance Equation
Since, we wish to treat the unknown process of carbon dioxide indoor generation and decay as a stochastic process, it is useful to model the mass balance equation with a stochastic differential equation. A model which originates from the financial mathematics literature, called the Hull-White Model (HWM) [29,30], has the suitable mathematical form in order for Equation (1) to be extended into a stochastic framework. This model has been introduced by John Hull and Alan White in 1994, as an extension to the Vasicek model for interest rates modelling. The Hull-White model is given by the following equation: where R(t) denotes the interest rate, Φ(t) denotes a time varying function, and α is the rate at which the rate is pushed towards the long term level. Borrowing the mathematical structure of the HWM, a stochastic differential equation that is a generalisation of the Ornstein-Uhlenbeck model, can be written: where X CO 2 (t) is the carbon dioxide concentration in ppm (Appendix A), q in f denotes the infiltration rate in the room, i.e., the air change per hour, σ denotes the constant volatility of changes in CO 2 , due to the Gaussian white noise (Brownian motion's increments are N distributed.) dB t , X CO 2 out is the outdoor CO 2 concentration in ppm and c CO 2 occ is the CO 2 generation per person in ppm/h. Equation (5) is the stochastic mass balance differential equation.
We are dealing with a first order linear non autonomous stochastic differential equation, thus an analytical solution can be derived. The solution of Equation (5) can be found by applying Itô's lemma on the function e −q in f X CO 2 (t) and then integrating to get: therefore, the expectation, mean solution, conditioned on the σ-algebra F s is given by: since, E σ t s e −q in f (t−τ) dB τ |F s = 0. F s is the filtration generated by the process up to time s. Moreover, the variance of the process is The mean solution of the stochastic differential equation, describing X CO 2 (t), is composed by an exponential term which forces the state to decay and an integral term which is due to the non autonomous nature of the differential equation: this term reflects the presence of occupants in the room at each time t, i.e., a piecewise constant function, such that n occ : [0, ∞) → Z + . The integral is fully tractable, thus a closed form solution is derived. The presence of occupants inside the room should, from a physics perspective, increase the levels of the carbon dioxide concentration, a fact that is reflected naturally by the above mathematical modelling.
The computation of the mean and variance describing the behaviour of the stochastic process X CO 2 = {X CO 2 (t); t ≥ 0} allows to write analytical expressions of the Likelihood function as the product of the transition densities from a state at time s to the state at time t, with t ≥ s.
The original linear stochastic differential equation is equivalent [31] to the following discrete time system: where the coefficients f k , u k and Σ k reads as: where ψ k (t k+1 , t k ) = e −q in f ∆t , and ∆t = t k+1 − t k These expressions can be analytically evaluated, to give the mean and variance of the original stochastic differential equation. As discussed earlier, our purpose is to model the carbon dioxide concentration by a stochastic process. This is performed through the assumption that the generation and decay of carbon dioxide is governed by a linear non autonomous stochastic differential equation. The solution of the latter is a stochastic process and can be obtained either by using the discrete representation Equation (9), or by solving the original stochastic differential Equation (5) using a numerical integration scheme. To generate the training data, on which the parameters and occupancy estimation algorithm is based, the differential equation is solved using the method of Euler-Maruyama (EM), see Algorithm 1 [31].
Algorithm 1 Euler Maruyama 1: Given a stochastic differential equation of the form dX(t) = µ(t, X(t); θ)dt + g(t, X(t); θ)dB t , with known parameters θ, an initial condition X(t 0 ) and a time interval [t 0 , t]. 2: Divide the time interval [t 0 , t] into N steps of ∆t length. 3: for Every time step k do 4: Draw a random variable such as ∆B k ∼ (0, ∆t)

5:
Compute the approximation: In this paper, CO 2 data are generated by simulating Equation (5), using the EM method. In order to be able to perform this, the parameters θ, are considered to be known. In Figure 2, the evolution of the stochastic differential equation's solution, with a single occupant, is depicted: Ten solution paths, each associated with a realisation of the Brownian motion, the mean solution and the 95% quantiles. In the following section, the method for the estimation of true parameters will be analysed, if an assumption of unknown parameters lies.

Estimation of the Stochastic Process Parameters
Since in real world applications the data of the carbon dioxide concentration may come in a time series, i.e., a set of available data, one might ask what the parameters of the process would be. Thus, an algorithm must be implemented in order to be able to estimate the true parameters of the problem. Henceforth, it is assumed that the solution time series of the carbon dioxide concentration is given, and the unknown parameters θ of the stochastic process are to be estimated.
We, therefore, have a problem of parameter estimation on stochastic differential equations: Given the stochastic differential Equation (5), the task is to estimate the parameters vector θ from a finite sample of available observations of the stochastic process X CO 2 (t) = {X CO 2 (t), where t = t 0 , . . . , t N }. The model is parametrised by the following quantities: the infiltration rate q in f , the outdoor carbon dioxide concentration X CO 2 out , the generation rate of carbon dioxide c CO 2 occ by a single occupant and the volatility of the noise σ. Parameter estimation problem in probabilistic models has been investigated by many authors and several methods have been proposed. For our task, the well established method of maximum likelihood in the context of stochastic differential equations, is adopted. The likelihood function must be maximised, with respect to the unknown parameter vector θ.
In light of the Markov property of the stochastic differential equation's solution, an analytical expression of the likelihood of the observed data given the parameters is written by: where, in general, p(X(t)|X(s), θ) is the transition density, describing the probability of the stochastic process to be from X(s), to the state X(t) in time t − s. Since the solution of the stochastic differential equation is a linear transformation of the Brownian motion, which is a Gaussian Process, it follows that the solution will also be Gaussian, thus, the transition densities are given by: where the Markov property is explicitly used, and the expressions for the distribution's mean and variance are given by Equations (7) and (8), respectively. The parameter vectorθ = θ 1 ,θ 2 ,θ 3 ,θ 4 = q in f ,X CO 2 out ,ĉ CO 2 occ ,σ must be computed, such that the likelihood to be maximised. In simple terms, the parameter space Θ, is searched, for those parameters that generate the actual data, which in our case are the data from the numerical solution of the stochastic mass balance differential equation. For practical reasons, the parameters are determined in order for the negative log-likelihood function to be minimised, i.e., where l θ i , u θ i denote the lower and upper bounds of the feasible domain for each of the four parameters. L(.) denotes the negative log-likelihood function given by: In this paper a differential evolution algorithm is proposed, in order to numerically minimise the negative log-likelihood, without the need of derivative information.
Differential Evolution (DE) [32], belongs to the class of evolutionary algorithms (EAs), that stochastically explore the space of the optimisation problem's decision variables. DE uses a structure of a population of candidate solutions and guides the search procedure through a number of generations, until termination criteria have been met. The algorithm is parametrised by a minimum number of hyper-parameters: the mutation scaling F, the crossover probability p cr and the population size n p .
The mutation of each individual comes in terms of perturbation of a target vector from the population with two weighted differential, described by: for i = 1, . . . , n p , where r 1 to r 5 ∈ {1, 2, . . . , n p } and i = r 1 = r 2 = r 3 = r 4 = r 5 . The mutation operator produces a population of perturbed model parameter vectors, called mutant vectors. Crossover is described by: where U i denotes the uniform distribution in integers {1, . . . , 4} and U the uniform distribution in (0, 1). The candidate solutions will be represented by a population of individuals. For every generation g the population of the parameters is described: The proposed method is summarised in Algorithm 2.
Algorithm 2 Parameter estimation using likelihood function and differential evolution 1: Generate Data by numerically integrating the stochastic mass balance differential Equation (6) using Euler-Maruyama method. D = {(t k , X CO 2 (t k ))} k=1,...,N . 5: Initialise Differential evolution variant DE/Rand/2/Bin, where Rand describes the mutation operator, 2 indicates the number of differentials, and Bin the crossover scheme. 6: Set Mutation parameter F, crossover probability p cr , population size n p and a maximum number of generations: MaxGen. 7: Generate Initial population of candidate solutions 8: for Every Generation g do 10: for all individuals in P g do 11: Mutate 12: P g+1 mutation ← Mutation(P g , F) 13: if Bound constraints [l θ i , u θ i ] are violated then 14: if L(P g ) < L(P g+1 trial ) then P g+1 ← P g 20: end if 22: end for 23: In the latter algorithm, it is assumed that the data are available to us. To achieve this, we synthetically generate them, by solving the stochastic differential equation using the Euler-Maruyama method. The function Resampling implements the resampling of the DE solutions into the feasible domain, since the mutation operation may produce solutions that are outside of the parameter's bound constraints. The maximum number of generations, MaxGen is set to 100, the population size n p = 80, the crossover probability p cr = 0.2 and a mutation scaling F sampled uniformly in [0. 2,2], in every generation.
The justification for the choice of the hyper parameters is based on [32]: Empirical studies on the simulations performed, show that the convergence of the algorithm occurs at around 70th generation, hence a maximum number of 100 generations would suffice. The probability of crossover, p cr is chosen to be small in order to avoid premature convergence and moreover, to increase the search robustness. The mutation scaling is chosen to be given by a uniform distribution in the range [0. 2,2], in order to achieve a trade-off between exploring tight valleys and search diversity. As far as the population size is concerned, the choice of 80 achieves a reasonably fast convergence, and it is chosen on a trial and error basis.

Fuzzy Occupancy Estimation
It should be clearly stated, that up to this point the unknown (to be estimated) parameters were considered to be: the infiltration rate q in f , the outdoor carbon dioxide concentration X CO 2 out , the carbon dioxide generation per occupant c CO 2 occ and the volatility of the noise term σ. Since the number of occupants in Equation (5), can be considered as a parameter too, the problem is addressed by likewise asking the same question as in the parameter estimation context: which occupancy vector generates the observed data.
Hence, if the model is re-parametrised with a single unknown parameter n occ (t) and the estimated parametersθ are considered to generate the true data in high accuracy, then a likelihood approach can be used to determine the occupancy vector. We followed and implemented the occupancy estimation algorithm proposed in [14] and estimated the occupancy vector at every time instance. The occupancy vector n occ is very high dimensional, thus an optimisation based approach even in terms of evolutionary algorithms has, to our knowledge, not been studied. Wolf et al. propose a greedy algorithm, based on maximum likelihood's principle, which initially considers the occupancy vector as a zero vector, representing an empty room, and iteratively adds integers values to the vector points, i.e., adds an occupant in each time step, where the likelihood function is maximised. For completeness, an implementation of the algorithm is given in Algorithm 3.
The reason that the estimated parametersθ serve as inputs in the algorithm, is that in a real world setting the data observation will be given as a time-series, where the model's parameters will be unknown. In the case when the estimated parameters are close to the ground truth, then the estimated model will generate very similar data and the only parameter that will be to be determined is the occupancy vector. The performance of the algorithm has been tested under two different occupancy profiles, with very accurate results. It is crucial to note that the method of parameter estimation, in our case the differential evolution algorithm, must result in close values of the true parameters θ otherwise, as explored next, the occupancy estimation algorithm will not output the correct estimation vector.
Algorithm 3 [14] Occupancy estimation algorithm 1: Inputs: Observed Data X CO 2 (t), Estimated Parametersθ, Likelihood Function L(n occ (t)) 2: Initialisê n occ ← 0 1×Number of Observations A ← I Number of Observations×Number of Observations Compute the value of the negative log-likelihood function: ,θ,n occ ) 3: for a number of iterations do 4: for every time step, i = 1 . . . N do 5: n i occ ←n occ + A(i, :) 6: The methodology of parameter θ estimation, via the maximum likelihood approach, results in point estimates. Since the true parameters that generated the data of carbon dioxide concentration are unknown, a question of how certain about these values we could be, emerges. As earlier discussed there are two kinds of uncertainty: the uncertainty that stems from random effects and unmodelled dynamics and the structural uncertainty, which corresponds to possible fluctuations of the model parameters. Stochastic differential equations can be considered to model the first case. Pertaining to the second type of uncertainty, a solution can be the use of fuzzy modelling [33]. The subject of stochastic differential equations with fuzzy parameters, leading to Fuzzy stochastic differential equations, have been studied, from a solution point of view, in [34,35].
In this paper, the effect of this structural parametric uncertainty onto the occupancy estimation algorithm, is investigated. The estimated parameters are mapped onto the fuzzy space using triangular fuzzy numbers, thus an uncertainty is induced on the model's parametric space. Fuzzy numbers of the following form have been used: where x denotes a point in the domain X of the parameter to be fuzzified, [x L , x R ] denotes the support of the fuzzy number and x C denotes the core of the fuzzy number. Note that µ here symbolises the membership function of the fuzzy set, not to be confused with the drift term in Equation (2).
The assumption of structural uncertainty has been made for two of the estimated parameters: the infiltration rate q in f and the carbon dioxide generation per occupants c CO 2 occ . This reflects the fact that in a real scenario the infiltration may have small deviations from the true value [36]. Similarly, the CO 2 generation per occupant in the room, c CO 2 occ , in any case cannot be a crisp value, since it strongly depends on factors, such as the type of work that an occupant performs during its presence in the room, the age of the occupant, the gender of the occupant, and various physical characteristics [37,38]: a young child's carbon dioxide generation differs from an adult heavy worker's.
Smaller uncertainty bounds are induced on the infiltration parameter, and larger on the CO 2 generation per occupant. The fuzzification of the latter parameters is implemented by a symmetric triangular fuzzy number, given by Equation (20), where the supports have been calculated byθ 1 ± 10% andθ 3 ± 30% for the infiltration parameter and the carbon dioxide generation, respectively. The estimated parametersθ 1 andθ 3 are the cores of the fuzzy sets that describe the uncertainty of the infiltration and CO 2 generation. Figure 3 depicts the fuzzy sets, their support and the crisp uncertainty set, corresponding to the 0.5-level. To study the result of the fuzzy parameters in the occupancy estimation algorithm, the a-cuts A a have been computed using the definition: A a = {x ∈ X , m(x) ≥ a}. A 0.5-level is considered, meaning that a trust region is given, in the values of the fuzzified parameters with membership µ ≥ 0.5.
The limit points of the a − cuts are given by: The implementation of the occupancy estimation algorithm, results in different occupancy vectors n occ (t), reflecting the fact that when fuzzy parameters are considered, the estimation gives an upper and lower estimation of the number of occupants in the room at each time.

Fuzzy Q-Learning and Multi Agent Control System
Reinforcement learning refers to a group of algorithms that have been inspired from the learning mechanism of humans or other leaving beings. Reinforcement learning algorithms aim to extract a policy via exploring the possible state-action pairs. The term policy refers to the mapping between states and actions. All reinforcement learning algorithms need to utilise a signal called reward, which defines a measure of how well an agent performs at a given state.
A tabular form of reinforcement learning, is Q-learning [39]. In the Q-learning framework, a Q-function, that tries to estimate the future discounted rewards for actions in given states, is gradually built by the agent. The Q-function output for a given state x and an action a is defined as Q(x, a) and the values of the Q-function are stored in a Q-table. Q-learning's main drawback is that it cannot be applied for continuous state-action space. Fuzzy Q-learning [40] can be used to overcome this difficulty and it is summarised in the following steps:
Output selection, for each fired fuzzy rule, based on the exploration/exploitation algorithm;

3.
Global output's α(x) and corresponding value's of Q(x, α) computation by: where M is the number of the fired fuzzy rules, α i (x) is the fired degree for each fuzzy rule, α i is the action that is selected for each fuzzy rule and q[i, i † ] is the corresponding Q value for the pair of the fuzzy rule i and the action i † . The action corresponds to the action that the exploration/exploitation algorithm selects.

4.
The global action α(x) is applied and the next state x is observed;
Q-values are updated according to the formula: where ∆Q = R + γV(x ) − Q(x, α) and and q[i, i * ] is the corresponding Q value for the pair of the fuzzy rule i and the action i * . γ denotes the discount factor and η the learning rate. The action i * corresponds to the action that has the maximum corresponding Q value for the fuzzy rule i.
For solving complex or physically distributed problems, a multi agent control system can be used. Q-learning and consequently fuzzy Q-learning can be applied to MACS with the most common approach being independent learners [41], where each agent ignores the existence of other agents and interacts independently with the environment.

Heating and Cooling System
The space is a building office with a surface of 50 m 2 and volume 150 m 3 (5 m × 10 m × 3 m) and a 3 m 2 (3 m × 1 m) window which is located in the north side of the office. The heating/cooling system of the building introduce either hot or cold air into the space [42,43]. The heat flow into the space can be computed by the next equation: where the left hand side denotes the heat flow rate into the space. Theat cool denotes the temperature of the hot and the cold air 40 • C or 16 • C, respectively. T room is the temperature of the room's air,Ṁ is the mass air flow of the air from the heating/cooling system (2600 kg/h) and c is the air heat capacity at constant pressure 1005.4 J/kg • C. HG o denotes the heating gain from the occupants (115 W/occupant) [44], n occ (t) is the number of occupants and HG E is the heating gain from the electrical loads (5.38 W/m 2 × 50 m 2 = 269 W) [36], HG L is the heating gain from the artificial lighting (0.5 × 800 = 400 W) and HG S is the heating gain from the solar irradiance through the window [45] which is computed as: where A w denotes the area of window, r w denotes the transparency of the window, D = 1 denotes the window shadowing factor, I w is the diffused solar irradiance that reaches the window, g w is the glass conductance (2 W/m 2 • C) and T out is the temperature of the outdoor air. The thermal model of the building does not take into consideration the effect of the solar irradiance through the building opaque surfaces. The derivative of the space temperature over time, is expressed as: where M a is the air mass inside the building (183.75 kg) and: where R eq is the equivalent thermal resistance of the building envelope (1.0306 × 10 −6 • C/W). The equivalent thermal resistance is computed by the following set of equations [42].
where A w is the area of the windows as given in Equation (30), Th wi is the thickness of the windows (0.01 m), k wi is the thermal conductivity of the windows (0.78 W/m • C), A wα is the area of the walls (187 m 2 ), Th wα is the thickness of the walls (0.2 m), k wα is the thermal conductivity of the walls (0.038 W/m • C). The heat loss/gain through the building envelope (walls area, windows area, and type of insulation) is embedded inside the equivalent thermal resistance [42]. The positive or negative sign on Equations (29) and (31), refers to the process of heating or cooling, respectively.

Lighting System
In order to compute the diffused illuminance provided by the sky inside the building the following equation is used. The horizontal illuminance in a point p [46] which is located in the interior of the building is computed as: where E p,d is the horizontal diffused illuminance of the point p, L s is the illuminance from the sky, z is the horizontal distance between the point p and the window (10 m close to the south wall of the building which is amongst the most shading points inside the building). It is assumed that the horizontal diffused illuminance, provided by the sky, is distributed uniformly and equals to the value of the point p, and consequently the control decisions are taken according to this value. Fluorescence lamps are used for the artificial lighting and the ratio between the horizontal luminance and the consumed power is 100 lm/W [47]. The electro-chromic window [48], can change its transparency by providing a small amount of voltage. The nominal power of the artificial lighting is 400 W which can provide a maximum illuminance of 800 lux in a surface of 50 m 2 . The relation between the lux and the consumption is assumed to be linear. Moreover, it is assumed that the horizontal illuminance provided by the lamps is distributed uniformly all over the surface of the office.

Ventilation System
The CO 2 mass balance with no noise consideration and with a given ventilation rate is given by: where V is the volume of the space (150 × 10 3 L), and q vent is the ventilation rate, which is now introduced by the MACS. According to [36,44] for a building office, the air change rate must be between 2 and 8. For a space of 150 × 10 3 L the ventilation rate of the ventilation system can be 600 × 10 3 L/h. The installed power for such a system is approximately 165 W by assuming a relationship of 0.5 W per flow rate (L/s). As we will discuss in the simulation section, the generation rate per person has been estimated to 26.9427 L/h, the outdoor carbon dioxide concentration has been estimated to 359.4813 ppm and the infiltration rate has been estimated to 0.1167 air change rate which equals to 17.535 L/h. At this point, it should be noted that mechanical ventilation affects the indoor temperature, however nowadays mechanical ventilation systems incorporate heat recovery that can reach efficiency of almost 90%. Based on that fact, the effect of the ventilation flow rate to the indoor temperature is neglected from the building's thermal model.

Multi Agent Control System
The MACS consists of a group of three agents: A = {AG T , AG L , AG CD }, where AG T is the temperature agent, AG L is the illuminance agent, and AG GD the CO 2 agent. The states of the agents are defined by a total number of six fuzzy state variables X i , i.e., two variables for each agent. For each positive values input, five triangular membership functions (MFs) have been used. In addition to that, for each both positive and negative values input, seven triangular MFs have been used (Figure 4). Membership functions are denoted as linguistic variables: PVB, PB, PM, PS, Z, NS, NM, and NB, e.g., positive very big, positive big, positive medium, positive small, zero, negative small, and negative medium, respectively. AG T has two variable inputs: the outdoor temperature which is in the range [0, 1], and the error e T between the set point and the indoor temperature, normalised in the range [−1, 1]. The total states are 35, which are represented by an equal number of fuzzy rules. The output vector of the temperature agent has five fuzzy singleton sets The global action is the control signal, that defines the percentage of the air flow rate by the heating/cooling system, according to its maximum capacity. Positive signal actuate the heating system while negative signal actuate the cooling system. The reward R A T of this agent is defined as follows: where P HC is the power consumption of the heating/cooling system. AG L has two variable inputs, the indoor horizontal illuminance, normalised in the range [0, 1] and the error e L between the set point and the indoor illuminance normalised in the range [−1, 1]. The total states are 35, which are represented by an equal number of rules. The output vector of the AG L has five fuzzy singleton sets, The global action defines the percentage of the power change to be consumed by the artificial lighting system according to its nominal operating power for positive signal, while negative signal change the transparency of the electro-chromic window. The reward R A L of this agent is defined as follows: where P AL is the power consumption of the artificial lighting system. AG CD has two variable inputs, the number of the occupants normalised in the range [0, 1] and the error e CO between the set point and the indoor CO 2 concentration, normalised in the range [0, 1]. The total states (Five membership functions for each input. See plot (a) on Figure 4) are 25, which are represented by an equal number of rules. The output vector of the agent has five fuzzy singleton sets, The global action is the control signal which represents the percentage of the power, change to be consumed by the ventilation system according to its maximum power. The reward R A CD of this agent is defined as follows: where P V is the power consumption of the ventilation system. All three agents use the same exploration/exploitation algorithm. When a new state is visited, the agent performs exploration for 500 rounds. Following the 500 rounds, the agent checks and performs the actions that has not been performed at all. By this way it is assured that all possible actions are explored. Consequently, the agent performs exploitation for 99% and exploration for 1% for a given state.
Additionally, AG T and AG L operate only when occupancy is detected inside the building space. For periods, where the occupancy vector is zero, i.e.,n occ (t) = 0, the agent's control signal is set to zero and the learning mechanism stops functioning. Thus, when the building is empty a waste of energy is avoided.

Parameters and Occupancy Estimation
The proposed methodology has been implemented to estimate the parameters θ, as well as the occupancy vector for a set of given data, that stem from the EM solution of the stochastic differential Equation (5). All mentioned algorithms have been programmed in MATLAB. Figure 5 illustrates ten solution paths of the stochastic differential equation, where the parameters are chosen as: q in f = 0.12 h −1 [36], X CO 2 out = 400 (ppm) [10], c CO 2 occ = 180 (ppm/h), [14], and σ = 2 [14]. A profile n occ (t) materialises a simple scenario of presence in a room. In order to replicate a realistic problem, a single realisation of the solution is considered, and the time series is used to estimate the parameters using the method of evolutionary maximisation of the likelihood function. In the last figure, it is clear that the solution of the stochastic mass balance equation reflects the fact that when occupants are present in the room, e.g., approximately at 13:40 to almost 21:00, the CO 2 increases as expected. When n occ (t) vanishes, then the solution has a mean reverting behaviour (Ornstein-Uhlenbeck Process.), analogous to the infiltration rate.
The estimation algorithm has been implemented using a maximum number of generations MaxGen = 100, a population of n p = 80, a crossover probability p cr = 0.2 and a mutation scaling F sampled uniformly from [0. 2,2]. The estimated parameters of the proposed method are given in Table 1.  Figure 6, illustrates the solution of the stochastic model, i.e., the training data, and the model's solution corresponding to the estimated parameters. The solution of the differential equation using the estimated parametersθ is in very close vicinity to the true solution.
The occupancy estimation algorithm, has been implemented in the latter case and the estimationn occ (t) is identical to the ground truth, n occ (t). The result is illustrated in Figure 7.
The performance of both the parameter estimation method, as well as the occupancy estimation algorithm is excellent. For the implementation of the occupancy algorithm, 1440 points of CO 2 concentration are used, after having generated them from the numerical solution of the mass balance equation; using the method of Euler-Maruyama. Furthermore, the estimated parameterθ has been used, as described in the section of occupancy estimation algorithm. The number of iterations for the algorithm has been set to 1000. To study the result of parametric uncertainty on the occupancy estimation algorithm, the estimated infiltration rate and the estimated CO 2 generation per occupant, are modelled using fuzzy numbers. The parameters of the fuzzy case are computed using the a-cut method, as previously described, and are illustrated in Figure 3. The stochastic differential equation solutions using the estimated parameters, as well as the fuzzy estimated parameters (Table 2) are illustrated in Figure 8. It is clear that a parametric uncertainty bound is introduced in the model, since fuzzy parameters are considered.  The occupancy estimation algorithm is implemented in the case of bothθ andθ parameters. The results are depicted in Figure 9. The estimation result produced by the algorithm differs from the real number of occupants, that generate the CO 2 data. By considering the infiltration and the carbon dioxide generation by the occupants as fuzzy, a further uncertainty in the model other than the noise, is induced. The occupancy vector in the case ofθ is, in some time points, overestimated. This reflects the fact that since CO 2 generation is much lower than the estimated value, then the algorithm, based on the maximum likelihood principle, tries to determine what n occ (t) should be in order for the training data to be generated. Therefore, it computes an estimation vectorn occ (t), so that the solution, if considered parametersθ, would match the true data. In the case ofθ the estimated occupancy vector is, in some time points, underestimated reflecting the fact that in some time points the number of occupants should be smaller than 4, in order for the true data to be produced. For the MACS development, the proposed methodology is used, for the estimation of an occupancy profile that resembles a working office: an increasing number of occupants are present in the room's interior until late afternoon. Following to that, a single occupant is present for two hours, until midnight. The occupancy profile and the associated CO 2 concentration, are illustrated in Figure 10.
The true parameters used in the simulation, as well as their estimation, using differential evolution, are given in Table 3.  The algorithm of occupancy estimation, performs very accurately and the true occupant vector n occ (t) is identical to the estimated,n occ (t). The results are given in Figure 11. Similarly to the previous case, the parametric uncertainty on the parameters is addressed, by assuming that the infiltration and the CO 2 generation per occupant are modelled by triangular fuzzy numbers, with core given byθ 1 for q in f andθ 3 for c CO 2 occ . Applying the a-cut methodology the left and right parameters were computed to be (Table 4). The application of the occupancy algorithm in the case of fuzzy parameters is shown in Figure 12. The analysis follows the same philosophy as previously described. It is quite interesting, that fuzzy parameters increase the uncertainty in the model, and the occupancy estimation algorithm results in overestimated or underestimated vectors. However, it should be clearly stated that this is not a weak point of the algorithm, but a direct result from the maximum likelihood nature of it.

Multi Agent Control Framework
As far as the MACS is concerned, the simulation time has been set to one year with a simulation step time of 0.001 h. The data, regarding the ambient temperature, the horizontal diffused illuminance, and the solar irradiance are acquired from the database of the Institute for Environmental Research and Sustainable Development (IERSD), National Observatory of Athens, and they are associated with the year 2019, for Athens, Greece with a sample time of 1 h. For the outdoor CO 2 concentration, the constant value of 359.4813 ppm that has been estimated, is used. The set point for the heating/cooling system has been set to 23 • C, when the outdoor temperature falls under 21 • C, and 26 • C when the outdoor temperature exceeds 29 • C. The set point for the indoor illuminance has been set to 700 lux and the set point for the CO 2 concentration has been set to 380 ppm.
All the sub-models lighting, thermal, CO 2 concentration models and the algorithms of the MACS have been implemented in MATLAB/Simulink, each sub-model and each agent is an embedded MATLAB function with multiple inputs and outputs. The relation between inputs and outputs are described by the corresponding algorithms and equations. For example, the CO 2 concentration sub-model is a two input (number of occupants and ventilation rate) and one output system (CO 2 concentration).
The following figures illustrate the outdoor temperature, the indoor temperature, the control signal of the heating/cooling system and the occupancy of the building. After the extensive exploration, the indoor temperature remains almost constant at 23 • C for the winter and at 26 • C for the summer, during the hours when occupants are present inside the building. Figures 13 and 14 are associated with one random day of winter and one random day of summer, respectively. When the occupancy level of the building is zero, the control signal of the agent is zero and consequently the indoor temperature is lower from the set point for the winter and above the set point for the summer. When occupants are entering the building the agent starts to operate again and the indoor temperature is reaching the set point with small oscillations.  A similar philosophy follows AG L . At first, the corresponding agent visits new states and mainly explores the state-action space. Following the exploration, the indoor illuminance remains almost constant at 700 lux, at the hours of existing occupancy. Figure 15 depicts the outdoor horizontal diffused illuminance, the total indoor illuminance, the control signal, the transparency of the electro-chromic window, the provided illuminance by the artificial lighting system and the number of people inside the building, for a random day. When the occupancy of the building is zero, the control signal of the agent is zero, and consequently the indoor illuminance is proportional to the outdoor horizontal illuminance. When occupants enter the building, the agent starts to operate again and the indoor hori-zontal illuminance is reaching the set point with small oscillations, by either changing the transparency of the window or by changing the illuminance of the artificial lighting. Figure 16 illustrate the occupancy of the building, the control signal of the agent and the indoor CO 2 concentration, for one random day. The agent explores in the beginning and after the exploration, the indoor concentration of the CO 2 remains lower than the value of 800 ppm. The ventilation system operates when there are people inside the building, and either reduces or stop its operation, when the occupancy level vanishes, and consequently the concentration of the CO 2 is reduced.
The overall energy consumption of the three systems: heating/cooling (H/C) system, artificial lighting (A/L) system and ventilation (V) system for one year period is computed as 1663 kWh in relation to the much larger value 3770 kWh, which is the overall consumption of these systems for the same year without taking into account the occupancy of the building: the heating/cooling system, as well as the lighting system, operate only for hours wheren occ = 0. This concludes an overall reduction of 55.9%.  The consumptions of each system, with and without regarding the occupancy of the building, are presented in Table 5 and graphically illustrated in Figure 17.  The overall comfort index C i arises from three indexes; the thermal comfort index T i , the visual comfort index L i , and the indoor air quality index A i . The trapezoidal membership functions, that calculate the respecting indexes according to the indoor temperature, the indoor horizontal illuminance and the CO 2 concentration, are depicted in Figure 18.
The overall comfort index is computed by the following equation: Figure 19 illustrates C i and the number of occupants for a random day. C i is zero when no occupants are inside the building and equals, for most of the time, to one when there are occupants inside the building.  The MACS performance, pertaining to the case of fuzzy parameters, is presented in the following figures. Figures 20 and 21, illustrate the outdoor temperature, the indoor temperature, the control signal, and the occupancy level estimated when usingθ andθ, respectively.   Figure 22 depicts the MACS results, associated with the occupancy estimation, pertaining toθ. The outdoor illuminance, the total indoor illuminance, the control signal, the transparency of the window, the artificial lighting and the occupancy level, are illustrated. The results when using theθ-based estimated occupancy vector, are given in Figure 23. Figures 24 and 25, illustrate the occupancy of the building, the control signal of the agent and the indoor CO 2 concentration, for one random day for the different occupancy vectors.
The total consumption, as well as the consumptions of the heating/cooling system, the artificial lighting system and the ventilation system for two different occupancy profiles, are presented in Table 6. In the first profile, the overall consumption is 1533 kWh, meaning a reduction of 130 kWh (Comparison with the first column of Table 5). This reduction is a result of the ventilation system. The agent of the ventilation system learns a policy that reduces the consumption of the actuator but, as it is illustrated in Figure 24, the indoor air of the building, is of lower quality, as the CO 2 concentration reaches values up to 900 ppm. Small reduction is also observed to the heating/cooling system (Figure 20), but it is justified by the increased number of the occupants: More occupants are heating the thermal space, due to the occupant's heating gains.
In the second profile, the overall consumptions is 1528 kWh, meaning a reduction of 135 kWh (Comparison with the first column of Table 5). In this case, the reduction is a result from the heating/cooling system ( Figure 21). The agent of this system learns a better policy that reduces the consumption of the system, without particularly reducing the thermal comfort. A small reduction of 2 kWh is observed to the ventilation system ( Figure 25), which is justified by the lower number of the occupants. The consumption of the lighting system, as it is illustrated in Figures 22 and 23, remains the same, since the lighting system is not affected by the occupant number.

Conclusions
A MACS with a parameter and occupancy estimation algorithm has been studied, with respect to the control performance and maintenance of the building's comfort levels. A stochastic process models the CO 2 generation and decay, by including the unmodelled dynamics. Given CO 2 concentration data, the parameters of the model, as well as the occupancy level, are estimated. Maximum likelihood methodology follows the estimation. In order to induce further uncertainty in the assumed model, the infiltration and carbon dioxide concentration are treated using fuzzy numbers, and the effect of the occupancy estimation algorithm in the new model is studied. By using two boundary cases of the α-cut method, for the fuzzy parameters, two occupancy estimations are produced. These occupancy estimated profiles can be interpreted to describe an uncertainty bound on the occupancy level which stems from the parametric uncertainty on the infiltration rate and the carbon dioxide generation rate per occupant.
The simulation results highlight the successful application of the independent learners approach of the MACS. This control framework succeeds stability over the desired set points for all the control parameters, in a fully decentralised framework. Each agent acts independently, thus a possible failure in one agent will not lead to a failure of the total control system. The enhanced approach, which takes into account the occupancy of the building, leads to even better performance because not only the set points of the parameters are achieved, but also an energy reduction of about 55.9% is observed. This reduction is a result of the integration of the estimated occupancy vector, which dictates which time periods the thermal and lighting agents will function. Therefore, a waste of energy is avoided for the time periods where there are no present occupants in the building. This approach is also tested for different occupancy profiles arising by the fuzzy parameters with similar successful results.
In conventional building control techniques the control algorithm receives immediate feedback and generates a control action. In the proposed MACS for building control, the control algorithm receives a delayed feedback, which results each agent to incorporate previous knowledge. Thus, an agent applies the optimum policy depending on the long term reward.
The over/under estimated occupancy level, when considering fuzzy parameters, indicate that further research should focus on the field of parametric uncertainty and on fuzzy stochastic differential equations. The incorporation of uncertainty bounds on occupancy estimation, could be included in the MACS control design to further reduce the energy consumption. Future research directions will also focus on the occupancy prediction problem, as well as the development of evolutionary based algorithms for occupancy estimation based on CO 2 concentration data.
A quiet interesting topic that we surely plan to investigate involves more complex approaches of cooperative MAS, which combine the indoor environmental quality parameters which, in combination with artificial general intelligence (Deep AI), may lead to further improvement of the proposed system. Furthermore, the impact of the ventilation system to the indoor temperature will be introduced into the thermal model of the building for providing more accurate results. Finally, the effect of ventilation systems, with and without heat recovery, on thermal comfort and energy consumption, will be studied and compared.

Acknowledgments:
The authors wish to express their sincere gratitude, for the availability of the temperature and irradiance data, to the Institute for Environmental Research and Sustainable Development (IERSD), National Observatory of Athens.

Conflicts of Interest:
The authors declare no conflicts of interest.