Towards Energy Efficiency: Forecasting Indoor Temperature via Multivariate Analysis

The small medium large system (SMLSystem) is a house built at the Universidad CEU Cardenal Herrera (CEU-UCH) for participation in the Solar Decathlon 2013 competition. Several technologies have been integrated to reduce power consumption. One of these is a forecasting system based on artificial neural networks (ANNs), which is able to predict indoor temperature in the near future using captured data by a complex monitoring system as the input. A study of the impact on forecasting performance of different covariate combinations is presented in this paper. Additionally, a comparison of ANNs with the standard statistical forecasting methods is shown. The research in this paper has been focused on forecasting the indoor temperature of a house, as it is directly related to HVAC---heating, ventilation and air conditioning---system consumption. HVAC systems at the SMLSystem house represent 53.9% of the overall power consumption. The energy used to maintain temperature was measured to be 30--38.9% of the energy needed to lower it. Hence, these forecasting measures allow the house to adapt itself to future temperature conditions by using home automation in an energy-efficient manner. Experimental results show a high forecasting accuracy and therefore, they might be used to efficiently control an HVAC system.


Introduction
Nowadays, as the Spanish Institute for Diversification and Saving of Energy (IDAE) [1] of the Spanish Government says, energy is becoming a precious asset of incalculable value, which converted from electricity, heat or fuel, makes the everyday life of people easier and more comfortable. Moreover, it is also a key factor to make the progress of industry and business feasible.
Spanish households consume 30% of the total energy expenditure of the country [1]. In the European Union (EU), primary energy consumption in buildings represents about 40% of the total [2]. In the whole world, recent studies say that energy in buildings also represents a 40% rate of the total consumed energy, where more than half is used by heating, ventilation and air conditioning (HVAC) systems [3].
Energy is a scarce resource in nature, which has an important cost, is finite and must be shared. Hence, there is a need to design and implement new systems at home, which should be able to produce and use energy efficiently and wisely, reaching a balance between consumption and streamlined comfort. A person could realize his activities much easier if his comfort is ensured and there are no negative factors (e.g., cold, heat, low light, noise, low air quality, etc.) to disturb him. With the evolution of technology, new parameters have become more controllable, and the requirements for people's comfort level have increased.
Systems that let us monitor and control such aspects make it necessary to refer to what in reference [4] is called "Ambient Intelligence" (AmI). This refers to the set of user-centered applications that integrate ubiquitous and transparent technology to implement intelligent environments with natural interaction. The result is a system that shows an active behavior (intelligent), anticipating possible solutions adapted to the context in which such a system is located. The term, home automation, can be defined as it is mentioned in reference [5], as the set of services provided by integrated technology systems to meet the basic needs of security, communication, energy management and comfort of a person and his immediate environment. Thus, home automation can be understood as the discipline which studies the development of intelligent infrastructures and information technologies in buildings. In this paper, the concept of smart buildings is used in this way, as constructions that involve this kind of solution.
In this sense, the School of Technical Sciences at the University CEU-UCH has built a solar-powered house, known as the Small Medium Large System (SMLsystem), which integrates a whole range of different technologies to improve energy efficiency, allowing it to be a near-zero energy house. The house has been constructed to participate in the 2012 Solar Decathlon Europe competition. Solar Decathlon Europe [6] is an international competition among universities, which promotes research in the development of energy-efficient houses. The objective of the participating teams is to design and build houses that consume as few natural resources as possible and produce minimum waste products during their lifecycle. Special emphasis is placed on reducing energy consumption and on obtaining all the needed energy from the sun. The SMLsystem house includes a Computer-Aided Energy Saving System (CAES). The CAES is the system that has been developed for the contest, which aims to improve energy efficiency using home automation devices. This system has different intelligent modules in order to make predictions about energy consumption and production.
To implement such intelligent systems, forecasting techniques in the area of artificial intelligence can be applied. Soft computing is widely used in real-life applications [7,8]. In fact, artificial neural networks (ANNs) have been widely used for a range of applications in the area of energy systems modeling [2,[9][10][11]. The literature demonstrates their capabilities to work with time series or regression, over other conventional methods, on non-linear process modeling, such as energy consumption in buildings. Of special interest to this area is the use of ANNs for forecasting the room air temperature as a function of forecasted weather parameters (mainly solar radiation and air temperature) and the actuator (heating, ventilating, cooling) state or manipulated variables, and the subsequent use of these mid-/long-range prediction models for a more efficient temperature control, both in terms of regulation and energy consumption, as can be read in reference [10].
Depending on the type of building, location and other factors, HVAC systems may represent up to 40% of the total energy consumption of a building [2,3]. The activation/deactivation of such systems depends on the comfort parameters that have been established, one of the most being indoor temperature, directly related to the notion of comfort. Several authors have been working on this idea; in reference [2], an excellent state-of-the-art system can be found. This is why the development of an ANN to predict such values could help to improve overall energy consumption, balanced with the minimum affordable comfort of a home, in the case that these values are well anticipated in order to define efficient energy control actions.
This paper is focused on the development of an ANN module to predict the behavior of indoor temperature, in order to use its prediction to reduce energy consumption values of an HVAC system. The architecture of the overall system and the variables being monitored and controlled are presented. Next, how to tackle the problem of time series forecasting for the indoor temperature is depicted. Finally, the ANN experimental results are presented and compared to standard statistical techniques. Indoor temperature forecasting is an interesting problem which has been widely studied in the literature, for example, in [2,3,[12][13][14]. We focus this work in multivariate forecasting using different weather indicators as input features. In addition, two combinations of forecast models have been compared.
In the conclusion, it is studied how the predicted results are integrated with the energy consumption parameters and comfort levels of the SMLsystem.

SMLhouse and SMLsystem Environment Setup
The Small Medium Large House (SMLhouse) and SMLsystem solar houses (more info about both projects can be found here: http://sdeurope.uch.ceu.es/) have been built to participate in the Solar Decathlon 2010 and 2012 [6], respectively, and aim to serve as prototypes for improving energy efficiency. The competition focus on reproducing the normal behavior of the inhabitants of a house, requiring competitors to maintain comfortable conditions inside the house-to maintain temperature, CO 2 and humidity within a range, performing common tasks like using the oven cooking, watching television (TV), shower, etc., while using as little electrical power as possible.
As stated in reference [15], due to thermal inertia, it is more efficient to maintain a temperature of a room or building than cooling/heating it. Therefore, predicting indoor temperature in the SMLsystem could reduce HVAC system consumption using future values of temperature, and then deciding whether to activate the heat pump or not to maintain the current temperature, regardless of its present value. To build an indoor temperature prediction module, a minimum of several weeks of sensing data are needed.
Hence, the prediction module was trained using historical sensing data from the SMLhouse, 2010, in order to be applied in the SMLsystem.
The SMLhouse monitoring database is large enough to estimate forecasting models, therefore its database has been used to tune and analyze forecasting methods for indoor temperature, and to show how they could be improved using different sensing data as covariates for the models. This training data was used for the SMLsystem prediction module.
The SMLsystem is a modular house built basically using wood. It was designed to be an energy self-sufficient house, using passive strategies and water heating systems to reduce the amount of electrical power needed to operate the house.
The energy supply of the SMLsystem is divided into solar power generation and a domestic hot water (DHW) system. The photovoltaic solar system is responsible for generating electric power by using twenty-one solar panels. These panels are installed on the roof and at the east and west facades. The energy generated by this system is managed by a device to inject energy into the house, or in case there is an excess of power, to the grid or a battery system. The thermal power generation is performed using a solar panel that produces DHW for electric energy savings.
The energy demand of the SMLsystem house is divided into three main groups: HVAC, house appliances and lighting and home electronics (HE). The HVAC system consists of a heat pump, which is capable of heating or cooling water, in addition to a rejector fan. Water pipes are installed inside the house, and a fan coil system distributes the heat/cold using ventilation. As shown in reference [16], the HVAC system is the main contributor to residential energy consumption, using 43% of total power in U.S. households or 70% of total power in European residential buildings. In the SMLsystem, the HVAC had a peak consumption of up to 3.6 kW when the heat pump was activated and, as shown in Table 1, it was the highest power consumption element of the SMLsystem in the contest with 53.89% of total consumption. This is consistent with data from studies mentioned as the competition was held in Madrid (Spain) at the end of September. The house has several energy-efficient appliances that are used during the competition. Among them, there is a washing machine, refrigerator with freezer, an induction hob/vitroceramic and a conventional oven. Regarding the consumption of the washing machine and dishwasher, they can reduce the SMLsystem energy demand due to the DHW system. The DHW system is capable of heating water to high temperatures. Then, when water enters into these appliances, the resistor must be activated for a short time only to reach the desired temperature. The last energy-demanding group consists of several electrical outlets (e.g., TV, computer, Internet router and others). Although the energy consumption of the house could be improved, the installed systems let the SMLsystem house be a near-zero energy building, producing almost all the energy at the time the inhabitants need it. This performance won the second place at the energy balance contest of the Solar Decathlon competition. The classification of the Energy Balance contest can be found here: http://monitoring.sdeurope.org/index.php?action=scoring&scoring=S4 .
A sensor and control framework shown in Figure 1 has been used in the SMLsystem. It is operated by a Master Control Server (MCS) and the European home automation standard protocol known as Konnex (KNX) (neither KNX nor Konnex are acronyms: http://ask.aboutknx.com/questions/430/abbreviationknx) has been chosen for monitoring and sensing. KNX modules are grouped by functionality: analog or binary inputs/outputs, gateways between transmission media, weather stations, CO 2 detectors, etc.
The whole system provides 88 sensor values and 49 actuators. In the proposed system, the immediate execution actions had been programmed to operate without the involvement of the MCS, such as controlling ventilation, the HVAC system and the DHW system. Beyond this basic level, the MCS can read the status of sensors and actuators at any time and can perform actions on them via an Ethernet gateway. A monitoring and control software was developed following a three-layered scheme. In the first layer, data is acquired from the KNX bus using a KNX-IP (Internet Protocol) bridge device. The Open Home Automation Bus (openHAB) [17] software performs the communication between KNX and our software. In the second layer, it is possible to find a data persistence module that has been developed to collect the values offered by openHAB with a sampling period of 60 s. Finally, the third layer is composed of different software applications that are able to intercommunicate: a mobile application has been developed to let the user watch and control the current state of domotic devices; and different intelligence modules are being developed also, for instance, the ANN-based indoor temperature forecasting module.
The energy power generation systems described previously are monitored by a software controller. It includes multiple measurement sensors, including the voltage and current measurements of photovoltaic panels and batteries. Furthermore, the current, voltage and power of the grid is available. The system power consumption of the house has sensors for measuring power energy values for each group element.
The climate system has power consumption sensors for the whole system, and specifically for the heat pump. The HVAC system is composed of several actuators and sensors used for operation. Among them are the inlet and outlet temperatures of the heat rejector and the inlet and outlet temperatures of the HVAC water in the SMLsystem. In addition, there are fourteen switches for internal function valves, for the fan coil system, for the heat pump and the heat rejector. The DHW system uses a valve and a pump to control water temperature. Some appliances have temperature sensors which are also monitored. The lighting system has sixteen binary actuators that can be operated manually by using the wall-mounted switches or by the MCS. The SMLsystem has indoor sensors for temperature, humidity and CO 2 . Outdoor sensors are also available for lighting measurements, wind speed, rain, irradiance and temperature.

Time Series Forecasting
Forecasting techniques are useful in terms of energy efficiency, because they help to develop predictive control systems. This section introduces formal aspects and forecasting modeling done for this work. Time series are data series with trend and pattern repetition through time. They can be formalized as a sequence of scalars from a variable x, obtained as the output of the observed process: a fragment beginning at position i and ending at position j will be denoted by s j i (x). Time series forecasting could be grouped as univariate forecasting when the system forecasts variable x using only past values of x, and multivariate forecasting when the system forecasts variable x using past values of x plus additional values of other variables. Multivariate approaches could perform better than univariate when additional variables cause variations on the predicted variable x, as is shown in the experimental section.
Forecasting models are estimated given different parameters: the number of past values, the size of the future window, and the position in the future of the prediction (future horizon). Depending on the size of the future window and how it is produced [18], forecasting approaches are denoted as: single-step-ahead forecasting if the model forecasts only the next time step; multi-step-ahead iterative forecasting if the model forecasts only the next time step, producing longer windows by an iterative process; and multi-step-ahead direct forecasting [19] if the model forecasts in one step a large future window of size Z. Following this last approach, two different major model types exist: • Pure direct, which uses Z forecasting models, one for each possible future horizon.
• Multiple input multiple output (MIMO), which uses one model to compute the full Z future window. This approach has several advantages due to the joint learning of inputs and outputs, which allows the model to learn the stochastic dependency between predicted values.
Discriminative models, as ANNs, profit greatly from this input/output mapping. Additionally, ANNs are able to learn non-linear dependencies.

Forecast Model Formalization
A forecast model could be formalized as a function F , which receives as inputs the interest variable (x 0 ) with its past values until current time t and a number C of covariates (x 1 , x 2 , . . . , x C ), also with its past values, until current time t and produces a future window of size Z for the given x 0 variable: The number of past values I(x) is important to ensure good performance of the model, however, it is not easy to estimate this number exactly. In this work, it is proposed to estimate models for several values of I(x) and use the model that achieves better performance, denoted as BEST. It is known in the machine learning community that ensemble methods achieve better generalization [20][21][22]. Several possibilities could be found in the literature, such as vote combination, linear combination (for which a special case is the uniform or mean combination), or in a more complicated way, modular neural networks [23]. Hence, it is also proposed to combine the outputs of all estimated models for each different value of I(x), following a linear combination scheme (the linear combination is also known as ensemble averaging), which is a simple, but effective method of combination, greatly extended to the machine learning community. Its major benefit is the reduction of overfitting problems and therefore, it could achieve better performance than a unique ANN. The quality of the combination depends on the correlation of the ANNs, theoretically, as the more decorrelated the models are, the better the combination is. In this way, different input size I(x) ANNs were combined, with the expectation that they will be less correlated between themselves than other kinds of combinations, as modifying hidden layer size or other hyper-parameters.
A linear combination of forecasts models, given a set F θ 1 , F θ 2 , . . . , F θ M of M forecast models, with the same future window size (Z), follows this equation: where α i is the combination weight given to the model θ i ; and Ω i (x) is its corresponding Ω function, as described in Section 3.1. The weights are constrained to sum one, M i=1 α i = 1. This formulation allows one to combine forecast models with different input window sizes for each covariate, but all of them using the same covariate inputs. Each weight α i will be estimated following two approaches: • Uniform linear combination: α i = 1 M for 1 ≤ i ≤ M . Models following this approach will be denoted as COMB-EQ.
• Exponential linear combination (softmax): being an inverted loss-function (error function) value for the model θ i , given the dataset D. It will be computed using a validation dataset. In this paper, the loss-function will be the mean absolute error (MAE), defined in Section 3.2, because it is more robust on outlier errors than other quadratic error measures. This approach will be denoted as COMB-EXP.

Evaluation Measures
The performance of forecasting methods over one time series could be assessed by several different evaluation functions, which measure the empirical error of the model. In this work, for a deep analysis of the results, three different error functions are used: MAE, root mean square error (RMSE) and symmetric mean absolute percentage of error (SMAPE). The error is computed comparing target values for the time series s t+1 , s t+2 , . . . , s t+Z , and its corresponding time series predictionŝ t+1 ,ŝ t+2 , . . . ,ŝ t+Z , using the model θ: The results could be measured over all time series in a given dataset D as: |D| being the size of the dataset and L = {MAE, RMSE, SMAPE}, the loss-function defining MAE , RMSE , and SMAPE .

Forecasting Data Description
One aim of this work is to compare different statistical methods to forecast indoor temperature given previous indoor temperature values. The correlation between different weather signals and indoor temperature will also be analyzed.
In our database, time series are measured with a sampling period of T = 1 min. However, in order to compute better forecasting models, each time series is sub-sampled with a period of T = 15 min, computing the mean of the last T values (for each hour, this mean is computed at 0 min, 15 min, 30 min and 45 min). The output of this preprocessing is the data series s (x), where: One time feature and five sensor signals were taken into consideration: • Indoor temperature in degrees Celsius, denoted by variable x = d. This is the interesting forecasted variable. • Hour feature in Universal Time Coordinated (UTC), extracted from the time-stamp of each pattern, denoted by variable x = h. The hour of the day is important for estimating the Sun's position. • Sun irradiance in W/m 2 , denoted by variable x = W . It is correlated with temperature, because more irradiance will mean more heat. To evaluate the forecasting models' performance, three partitions of our dataset were prepared: a training partition composed of 2017 time series over 21 days-the model parameters are estimated to reduce the error in this data; a validation partition composed of 672 time series over seven days-this is needed to avoid over-fitting during training, and also to compare and study the models between themselves; training and validation were performed in March 2011; a test partition composed of 672 time series over seven days in June 2011. At the end, the forecasting error in this partition will be provided, evaluating the generalization ability of this methodology. The validation partition is sequential with the training partition. The test partition is one week ahead of the last validation point.

Standard Statistical Methods
Exponential smoothing and auto-regressive integrated moving average models (ARIMA) are the two most widely-used methods for time series forecasting. These methods provide complementary approaches to the time series forecasting problems. Therefore, exponential smoothing models are based on a description of trend and seasonality in the data, while ARIMA models aim to describe its autocorrelations. Their results have been considered as a reference to compare to the ANN results.
On the one hand, exponential smoothing methods are applied for forecasting. These methods were originally classified by [24] according to their taxonomy. This was later extended by [25], modified by [26] and extended by [27], giving a total of fifteen methods. To select the best-fitting models within this framework, each possible model was estimated for the training partition, and the two best models were selected. To carry out this selection, Akaike's Information Criterion (AIC) was used as suggested by some works in the literature [28,29]. The selected models were: the first model with multiplicative error, multiplicative damped trend and without the seasonal component (MMdN model), and the second model with additive error, additive damped trend and without the seasonal component (AAdN model). The MMdN model was chosen for the validation partition in order to minimize the MSE.
On the other hand, ARIMA models were estimated. The widely known ARIMA approach was first introduced by Box and Jenkins [30] and provides a comprehensive set of tools for univariate time series modeling and forecasting. These models were estimated for our data with and without covariates. The last value of variable hour (x = h), codified as a factor-using 24 categories (0 to 23), -and the hour as a continuous variable were used as covariates.
Either linear and quadratic form of this quantity were used, but linear performs worst. Therefore, three model groups are used: ARIMA without covariates (ARIMA), with covariate x = h as a factor (ARIMAF) and with covariate x = h as a quadratic form (ARIMAQ). The best models for each group were estimated for the training partition, and in all cases, the non-seasonal ARIMA(2,1,0) model was selected for the ARIMA part of each model using AIC. The best results, in terms of MSE, were obtained in models with covariate time as a factor and covariate time as a quadratic form.
The forecast library in the statistical package R [31] was used for these analyses.

ANNs
Estimation of ANN forecast models needs data preprocessing and normalization of input/output values in order to ensure better performance results.

Preprocessing of Time Series for ANNs
The indoor temperature variable (x = d) is the interesting forecasted variable. In order to increase model generalization, this variable is differentiated, and a news (x = d) signal sequence is obtained following this equation: The differentiation of indoor temperature shows that is important to achieve good generalization results, and it is based on previous work where undifferentiated data has been used [11].
The time series corresponding to sun irradiance (x = W ), indoor relative humidity (x = H), air quality (x = Q) and rain (x = R) are normalized, subtracting the mean and dividing by the standard deviation, computing new signal sequences,s (x ∈ {W, H, Q, R}): where E[s (x)] is the mean value of the sequence;s (x) and σ(s (x)) is the standard deviation. These two parameters may be computed over the training dataset. For the hour component (x = h), a different approach is followed. It is represented as a locally-encoded category, which consists of using a vector with 24 components, where 23 components are set to 0, and the component that indicates the hour value is set to 1. This kind of encoding avoids the big jump between 23 and 0 at midnight, but forces the model to learn the relationship between adjacent hours. Other approaches for hour encoding could be done in future work.

ANN Description
ANNs has an impressive ability to learn complex mapping functions, as they are universal function approximators [32] and are widely used in forecasting [10,22,33,34].
ANNs are formed by one input layer, an output layer, and a few numbers of hidden layers. Figure 2 is a schematic representation of an ANN with two hidden layers for time series forecasting. The inputs of the ANN are past values of covariates, and the output layer is formed by the Z future window predicted values, following the MIMO approach described in Section 3, which has obtained better accuracy in previous experimentation [11]. The well-known error-backpropagation (BP) algorithm [35] has been used in its on-line version to estimate the ANN weights, adding a momentum term and an L2 regularization term (weight decay). Despite that theoretically algorithms more advanced than BP exists nowadays, BP is easier to implement at the empirical level, and a correct adjustment of momentum and weight decay helps to avoid bad local minima. The BP minimizes the mean square error (MSE) function with the addition of the regularization term weight decay, denoted by , useful for avoiding over-fitting and improving generalization: where θ is a set of all weights of the ANN (without the bias); and w i is the value of the i-th weight.

Experimental Results
Using the data acquired during the normal functioning of the house, experiments were performed to obtain the best forecasting model for indoor temperature. First, an exhaustive search of model hyper-parameters was done for each covariate combination. Second, different models were trained for different values of past size for indoor temperature (x = d), and a comparison among different covariate combinations and ANN vs. standard statistical methods has been performed. A comparison of a combination of forecasting models has also been performed. In all cases, the future window size Z was set to 12, corresponding to a three-hour forecast.
A grid search exploration was done to set the best hyper-parameters of the system and ANN topology, fixing covariates x ∈ {d, W, H, Q, R} to a past size, I(x) = 5 and I(x = h) = 1, searching combinations of: • different covariates of the model input; • different values for ANN hidden layer sizes; • learning rate, momentum term and weight decay values. Table 2 shows the best model parameters found by this grid search. For illustrative purposes, Figures 3 and 4 show box-and-whisker plots of the hyper-parameter grid search performed to optimize the ANN model, d + h. They show big differences between one-and two-hidden layer ANNs, two-layered ANNs being more difficult to train for this particular model. The learning rate shows a big impact in performance, while momentum and weight decay seems to be less important. This grid search was repeated for all the tested covariate combinations, and the hyper-parameters that optimize MAE were selected in the rest of the paper. Table 2. Training parameters depending on the input covariates combination (η is the learning rate, µ is the momentum term, and is weight decay).

Covariate Analysis and Comparison between Different Forecasting Strategies
For each covariate combination, and using the best model parameters obtained previously, different model comparison has been performed. Note that the input past size of covariates is set to I(x ∈ {W, H, Q, R})= 5 time steps, that is, 60 min, and to I(x = h) = 1. For forecasted variable x = d, models with sizes I(x = d) ∈ {1, 3,5,7,9,11,13,15,17,19, 21} were trained.
A comparison between BEST, COMB-EQ and COMB-EXP approaches was performed and shown in Table 3. Figure 5 plots the same results for a better confidence interval comparison. Table 4 shows COMB-EQ weights used in experimentation, obtained following Equation 4 and using MAE as the loss-function. From all these results, the superiority of ANNs vs. standard statistical methods is clear, with clear statistical significance and with a confidence greater than 99%. Different covariate combinations for ANN models show that the indoor temperature correlates well with the hour (d + h) and sun irradiance (d+W ), and the combination of these two covariates (d+h+W ) improves the model in a significant way (99% confidence) with input d+W . The addition of more covariates is slightly better in two cases (d+h+W +R and d+h+W +Q), but the differences are not important. With only the hour and sun irradiance, the ANN model has enough information to perform good forecasting. Regarding the combination of models, in some cases, the COMB-EXP approach obtains consistently better results than COMB-EQ and BEST, but the differences are not important.
A deeper analysis could be done if comparing the SMAPE values for each possible future horizon, as Figure 6 shows. A clear trend exists: error increases with the enlargement of the future horizon. Furthermore, an enlargement of the confidence interval is observed with the enlargement of the future horizon. In all cases, ANN models outperform statistical methods. For shorter horizons (less than or equal to 90 min), the differences between all ANN models are insignificant. For longer horizons (greater than 90 min), a combination of covariates d+h+W achieve a significant result (for a confidence of 99%) compared with the d+W combination. As was shown in these results, the addition of covariates is useful when the future horizon increases, probably because the impact of covariates into indoor temperature becomes stronger over time.
Finally, to compare the generalization abilities of the proposed best models, the error measures for the test partition are shown in Table 5 and Figure 7. All error measures show better performance in the test partition, even when this partition is two weeks ahead of training and contains hotter days than the training and validation partitions. The reason for this better performance might be that the test series has increasing/decreasing temperature cycles that are more similar to the training partition than the cycles in the validation partition. The differences between models are similar, and the most significant combination of covariates is time hour and sun irradiance (d+h+W ) following the COMB-EXP strategy, achieving a SMAPE ≈ 0.45%, MAE ≈ 0.11, and RMSE ≈ 0.13. Table 3. Symmetric mean absolute percentage of error (SMAPE) , MAE and root mean square error (RMSE) results on the validation partition comparing different models, input features and combination schemes with the 99% confidence interval. BEST refers to the best past size ANN, CEQ refers to COMB-EQ ANNs, and CEXP refers to COMB-EXP ANNs. Bolded face numbers are the best results, and the gray marked row is the most significant combination of covariates. ARIMA: auto-regressive integrated moving average models; ARIMAQ: ARIMA with covariate x = h as a quadratic form (ARIMAQ); ARIMAF: ARIMA with covariate x = h as a factor.  Figure 5. SMAPE error plot with 99% confidence interval for models of Table 3 Figure 7. SMAPE error plot with the 99% confidence interval for the models of Table 5  In order to perform a better evaluation, the conclusions above are compared with mutual information (MI), shown in Table 6. Probability densities have been estimated with histograms, making the assumption of independence between time points, which is not true for time series [36], but is enough for our contrasting purpose. The behavior of the ANNs is similar to the MI study. Sun irradiance (W ) covariates show high MI with indoor temperature (d), which is consistent with our results. Humidity (H) and air quality (Q) MI with indoor temperature (d) is higher than sun irradiance, which seems contradictory with our expectations. However, if we compute MI only during the day (removing the night data points), the sun irradiance shows higher MI with indoor temperature than other covariates.
Regarding the hour covariate, it shows lower MI than expected, probably due to the cyclical shape of the hour, which breaks abruptly with the jump between 23 and 0, affecting the computation of histograms.

Conclusions
An overview of the monitoring and sensing system developed for the SMLsystem solar powered house has been described. This system was employed during the participation at the Solar Decathlon Europe 2012 competition. The research in this paper has been focused on how to predict the indoor temperature of a house, as this is directly related to HVAC system consumption. HVAC systems represent 53.89% of the overall power consumption of the SMLsystem house. Furthermore, performing a preliminary exploration of the SMLsystem competition data, the energy used to maintain temperature was found to be 30%-38.9% of the energy needed to lower it. Therefore, an accurate forecasting of indoor temperature could yield an energy-efficient control.
An analysis of time series forecasting methods for prediction of indoor temperature has been performed. A multivariate approach was followed, showing encouraging results by using ANN models. Several combinations of covariates, forecasting model combinations, comparison with standard statistical methods and a study of covariate MI has been performed. Significant improvements were found by combining indoor temperature with the hour categorical variable and sun irradiance, achieving a MAE ≈ 0.11 degrees Celsius (SMAPE ≈ 0.45%). The addition of more covariates different from hour and sun irradiance slightly improves the results. The MI study shows that humidity and air quality share important information with indoor temperature, but probably, the addition of these covariates does not add different information from which is indicated by hour and sun irradiance. The combination of ANN models following the softmax approach (COMB-EXP) produce consistently better forecasts, but the differences are not important. The data available for this study was restricted to one month and a week of a Southern Europe house. It might be interesting to perform experiments using several months of data in other houses, as weather conditions may vary among seasons and locations.
As future work, different techniques for the combination of forecasting models could be performed. A deeper MI study to understand the relationship between covariates better would also be interesting. The use of second order methods to train the ANN needs to be studied. In this work, for the ANN models, the hour covariate is encoded using 24 neurons; other encoding methods will be studied, for example, using splines, sinusoidal functions or a neuron with values between 0 and 23.
Following these results, it is intended to design a predictive control based on the data acquired from ANNs, for example, from this one that is devoted to calculating the indoor temperature, extrapolating this methodology to other energy subsystems that can be found in a home.