Design of Ensemble Forecasting Models for Home Energy Management Systems

: The increasing levels of energy consumption worldwide is raising issues with respect to surpassing supply limits, causing severe effects on the environment, and the exhaustion of energy resources. Buildings are one of the most relevant sectors in terms of energy consumption; as such, efﬁcient Home or Building Management Systems are an important topic of research. This study discusses the use of ensemble techniques in order to improve the performance of artiﬁcial neural networks models used for energy forecasting in residential houses. The case study is a residential house, located in Portugal, that is equipped with PV generation and battery storage and controlled by a Home Energy Management System (HEMS). It has been shown that the ensemble forecasting results are superior to single selected models, which were already excellent. A simple procedure was proposed for selecting the models to be used in the ensemble, together with a heuristic to determine the number of models.


Introduction
The increasing levels of energy consumption worldwide is raising issues with respect to surpassing supply limits, the severe effects on the environment (Earth-wide temperature increase, depletion of ozone layer, climate problems, and others), and the exhaustion of energy resources [1].
Academia and industry have spent the last decades proposing and discussing new sustainable energy systems and methods for integrating fluctuating renewable energy sources/storage into the electric power grid.
The impact of new approaches to design, develop, and manage energy systems while maintaining sustainability throughout their operation lifetime has been a significant challenge. Keywords such as "smart grids" and "nearly zero energy buildings" rose and are usually employed within the boundaries of the subsectors of energy systems, although they should be analyzed in the context of the overall energy system [2]. An example of this is the concept of intelligent energy systems, generally referred as "Smart Energy Systems" (see [2] for a comprehensive discussion on the synergies between different energy systems having sustainability as its focus).
The full potential of control management through the use of computational resources in energy systems has yet to be reached. It is still a topic that must be the focus of research. Individual advances on sub-energy systems are clearly essential, but the integration of all the sub-energy systems in a global approach is also required. A study of the influence of one sub-system operation on another one may be found in [2]. energy demand enable a deeper understanding of consumption patterns for building owners and decision-making entities [16] motivated by social and scientific sustainable development and equitable energy use.
It is crucial to consider the connection of distributed renewable energy, storage, and energy management [17] in buildings. Several challenges are associated with the intermittence of renewable energy and its storage in a single building or small community scenario. A key issue is the need to accurately develop an energy management system (EMS) that will balance electricity supply and demand, aiming at the minimization of associated costs, the attainment of a positive impact on the grid [17], and contributing to the decarbonisation of energy related to the building sector [16]. As part of the global approach to achieve sustainability in energy systems, increasing sustainability from a demand-side point of view has vast economic implications [16], and it requires the evaluation of energy efficiency from diverse design options and operational planning strategies [1,16,18]. These approaches require models for demand simulation and forecasting that are simultaneously accurate and computationally fast [18]. Forecasting electricity demand of a building is an integral component of smart grids with respect to improving EMS efficiency concerning sustainability goals and cost reduction [1,17]. The accurate forecasting of electricity demand, meaningful for demand response, is explored in [16], where the importance of the availability of high-resolution smart metering infrastructure is also discussed.
Indeed, the larger availability of data at a single building level, considering that an increasing number of buildings possess smart meters nowadays, magnifies the potential of EMS to cause a positive impact at a big picture level with respect to the energy sector.
For the use of the acquired data in the forecasting models, the system must be continuously monitored and managed with respect to the energy time series and the factors that have more impact on the building's energy performance (the exogenous variables of the time series). The challenges of load forecasting are significantly based on the intrinsic nonlinearity, volatility, and stochastic nature of the real-time load profile and load profile dependence of the occupancy pattern, especially in the residential case [16,19]. In recent years, the acceleration of Big Data platforms has increased the use of machine learning methods for delivering more accurate and fast electricity demand forecasts for residential EMS [17,19,20]. Machine learning (ML) facilitates an adequate mimicry of Building Performance Simulation (BPS) algorithms based on engineering methods, while being considerably faster in generating results than compared to BPS [18,20]. Further details on ML methods will be provided in Section 2.

Objectives, Contributions, and Work Organization
In this context, the objective of this work is to explore the use of ensemble techniques and to improve the forecasting performance of artificial neural networks models originally designed by multi-objective genetic algorithms. These models will be used for energy forecasting in residential houses, with PV generation and battery storage, controlled by a Home Energy Management System (HEMS). The case study is a residential house located in Portugal.
The main contributions of this work are listed below: (i) A detailed review of ML techniques in energy forecasting in buildings and HEMS; (ii) A simple scheme to design ensemble models for forecasting the energy produced and consumed in residences with PV generation and battery storage. Notice that as PV generation forecasting also implies the forecast of solar irradiance and atmospheric temperature, four different forecasting models are needed for a HEMS.
This study is organised into five main sections. Section 1 describes the context of the work, objectives, and contributions. Section 2 presents the literature review, covering the most used ML algorithms for energy systems and focusing on the topics of energy forecasting in buildings and forecasting applications in Home Energy Management Systems. Section 3 presents the design methodology used in this study, and Section 4 describes the case study employed. Section 5 discusses the results achieved with a single model and an ensemble of models, comparing also the results obtained with other techniques. Section 6 concludes the paper and points out future research directions.

Literature Review
The literature review section is segmented into three subsections. The first aims to deliver an overview of the most used ML-based prediction methods for energy systems. The second aims to present a review of the related publications on the topic of energy forecasting in the building sector. The third will discuss applications of these prediction methods on Home Energy Management Systems (HEMS). The Clarivate Analytics Web-of-Science was the source of information used to develop the literature review.

Machine Learning (ML)-Based Prediction Methods for Energy Systems
The prediction of energy is essential due to many factors, as highlighted in the introduction of the scope of this study. As in the case of ML methods, it is crucial to have access to extensive energy data in a time series context. The analysis of these time-series data has the potential to assess it in a meaningful statistical manner while predicting future values by using previous ones. Theoretical explanations on time-series analysis are extensively reported in the literature, such as, for instance, in [21][22][23][24]. A time series can be described as an ordered sequence of values sampled at equal time intervals, and its analysis is composed of two parts. First, the structure and underlying patterns of the observed data should be obtained. Second, in order to support future predictions, a model should be fitted to the sampled data.
This study does not aim to elaborate on the fundamentals of the techniques since they are widely reported in the literature, as in [25][26][27]. ML approaches may be employed in a supervised and unsupervised context. Four main steps should be used in a ML-based approach: data collection, data pre-processing, model training, and model testing. MLbased prediction may use single, ensemble, and hybrid models, and they are extensively described in [1]. The first employs one learning algorithm, the second comprises multiple prediction models, and the third combines two or more ML techniques. Figure 1 presents a general structure of single, ensemble, and hybrid models for forecasting time series. This work focuses on ensemble models. Section 3 presents the design methodology used in this study, and Section 4 describes the case study employed. Section 5 discusses the results achieved with a single model and an ensemble of models, comparing also the results obtained with other techniques. Section 6 concludes the paper and points out future research directions.

Literature Review
The literature review section is segmented into three subsections. The first aims to deliver an overview of the most used ML-based prediction methods for energy systems. The second aims to present a review of the related publications on the topic of energy forecasting in the building sector. The third will discuss applications of these prediction methods on Home Energy Management Systems (HEMS). The Clarivate Analytics Webof-Science was the source of information used to develop the literature review.

Machine Learning (ML)-Based Prediction Methods for Energy Systems
The prediction of energy is essential due to many factors, as highlighted in the introduction of the scope of this study. As in the case of ML methods, it is crucial to have access to extensive energy data in a time series context. The analysis of these time-series data has the potential to assess it in a meaningful statistical manner while predicting future values by using previous ones. Theoretical explanations on time-series analysis are extensively reported in the literature, such as, for instance, in [21][22][23][24]. A time series can be described as an ordered sequence of values sampled at equal time intervals, and its analysis is composed of two parts. First, the structure and underlying patterns of the observed data should be obtained. Second, in order to support future predictions, a model should be fitted to the sampled data.
This study does not aim to elaborate on the fundamentals of the techniques since they are widely reported in the literature, as in [25][26][27]. ML approaches may be employed in a supervised and unsupervised context. Four main steps should be used in a ML-based approach: data collection, data pre-processing, model training, and model testing. ML-based prediction may use single, ensemble, and hybrid models, and they are extensively described in [1]. The first employs one learning algorithm, the second comprises multiple prediction models, and the third combines two or more ML techniques. Figure 1 presents a general structure of single, ensemble, and hybrid models for forecasting time series. This work focuses on ensemble models. Examples of single prediction model techniques include Artificial Neural Networks (ANN), Support Vector Regression (SVR), Linear Regression (LR), and autoregressive integrated moving average (ARIMA). These methods are extensively described in [1]. Studies that used ANN as an energy prediction method for buildings energy systems may be found in [28][29][30][31][32][33][34]. Applications employing SVR as an energy prediction method for buildings may be found in [35][36][37]. Nowadays, linear regression is often used as a comparative  Examples of single prediction model techniques include Artificial Neural Networks (ANN), Support Vector Regression (SVR), Linear Regression (LR), and autoregressive integrated moving average (ARIMA). These methods are extensively described in [1]. Studies that used ANN as an energy prediction method for buildings energy systems may be found in [28][29][30][31][32][33][34]. Applications employing SVR as an energy prediction method for buildings may be found in [35][36][37]. Nowadays, linear regression is often used as a comparative method to evaluate the performance of more elaborated machine learning methods. Studies that used ARIMA as an energy prediction method for buildings may be found in [38][39][40]. Ensemble methods have gained substantial attention in recent years and are extensively used nowadays because of their favourable forecasting predictive performance, and the combination of models may contribute in avoiding overfitting, which can occur by the selection of the best model in a single model scenario [1]. Studies that used ensemble methods as energy demand prediction methods for buildings may be found in [41,42].
Hybrid models combine ML techniques between themselves or are associated with optimisation algorithms. They can be created with one or more phases, corresponding to different problem-solving goals in order to overcome individual weaknesses, and can deal with complex components. A hybrid approach for forecasting energy consumption is proposed in [1].
As noted by different authors, ML models are very well-suited energy systems for forecasting, and they are described in early statistics literature [43,44]. In [45], the authors provide a substantial review on the four main ML approaches identified: ANN, Support Vector Machines (SVM), Gaussian-based regressions, and clustering, which have commonly been applied to improve building energy forecasting performance. The authors in [46] reviewed state-of-the art ML models used in the general application of energy consumption-the most relevant literature (to the date of their article) published in the field is classified according to ML modelling technique, energy type, perdition type, and application area. Another comprehensive review may be found in [47].

Forecasting of Energy Consumption in Buildings
This subsection aims to present a sample of related studies to the topic approached by this article. There are some interesting reviews in this topic, such as [48].
In [20], the study assesses entire building designs and design components based on a ML component-based approach. Test cases show that high prediction quality may be achieved, resulting in errors of 3.7% for cooling and 3.9% for heating.
In [19], the authors propose an innovative deep neural network-based energy prediction algorithm for forecasting the day-ahead hourly energy consumption profile by considering occupancy rate. In [16], another deep neural network model is designed by optimising the hyperparameters in order to enhance neural networks' performances in a residential building based on the occupancy rate. Among the comparisons of different methods, the authors highlighted that some can take hours or days to process the data and to create a prediction model, which is an inconvenience even when it reaches very good performance. In [18], ML architectures are presented, and their suitability for space exploration in building design is evaluated. Compared to traditional ANN, deep learning has the potential to increase the performance of the forecasting models; an example of this is Multi-Task Learning, which can achieve more efficiency in the component development process.
Manuscript [22] performs feature selection for different energy systems in a residential building context, comparing it by using more than five different methods. The findings of this paper help select proper models, sensors, and inputs for model-predictive control systems during the heating and cooling seasons. In [49], the Gaussian Kernel regression model with random feature expansion and non-parametric based k-NN models were assessed against many different criteria based on feature significance for scenarios that vary the time intervals between samples.
In [50], the authors developed a forecasting system that optimises linear time series (using linear time-series model, a Seasonal Auto-Regressive Integrated Moving Average) with non-linear ML models (least squares support vector regression model) in order to identify the historical pattern of energy consumption and to predict multi-step ahead energy consumption. Optimisation algorithms were investigated using high-dimension mathematical benchmark functions, and computational time and input needs were assessed.
In [14], the work focuses on predicting the energy consumption of low energy buildings in two different scenarios (employing the entire data or only relevant data). As expected, the results showed that the relevant data modelling approach that relies on small representative data selection has higher accuracy (R 2 = 0.98; RMSE = 3.4) than all data modelling approaches (R 2 = 0.93; RMSE = 7.1).
In [51], the Holt-Winters (HW) method and Extreme Learning Machine (ELM) network were used in a hybrid model fashion for ultra-short-term predictions in a residential context, with a time scale of 15 min. The proposed model commonly demonstrated lower error compared to HW, ELM, and long-short-term memory networks when predicting residential electricity consumption. Substantial reductions on the RMSE were obtained (87.98%, 64.89%, and 53.39%, respectively).
In [52], by combining physical and data-driven approaches, a hybrid approach is applicable for modelling the building stock's heating and cooling energy consumption, including residential and non-residential buildings. In this study, several models based on machine learning were assessed. Among them, the polynomial kernel support vector regression showed the best accuracy at the level of a single building, and the Gaussian radial basis function kernel support vector regression performed the best at the stock level. Another study that compares many machine learning-based models may be found in [53], where these models were validated against energy certifications (within the German regulation) for residential buildings; this data-driven approach is more accurate by almost 50% in comparison to the first approach.
In [54], the authors present a hybrid technique (Convolutional Neural Network and a Multi-layer Bi-directional Long-short-term memory model) and tested it in different scenarios. They showed that better results may be obtained using a 10-fold cross validation and a hold-out method.
In [15], 380 buildings from the end of the last century were employed for a comparative analysis of the predictive modelling of heating energy consumption. The authors selected different groups of variables and assessed the methods for obtaining data against the quality of the forecasting results. Six ML methods were used.
In [55], the authors performed an evaluation of three learning algorithms in an ensemble fashion by considering their performance. The algorithms are extremely randomised trees (extra-trees), random forests, and gradient boosted regression trees. Among them, gradient boosting improved prediction accuracy by an average of 14% and 65% for heating and cooling loads, in comparison with other literature proposed algorithm.
In [56], the goal was to evaluate the energy demand rate for building heating, and the authors combined the BORUTA feature selection algorithm and the rough set theory models, which proved result in good prediction quality while limiting the number of input variables. In [57], the paper presents a recurrent ANN for medium-to-long term predictions of electricity consumption profiles in buildings (one-hour resolution). The proposed method achieves lower relative errors compared to the conventional multi-layered perceptron neural network but presents differences when comparing residential and commercial contexts. In [58], the study's objective was to obtain an accurate prediction of heat demand by using hybrid models, looking one hour ahead. The optimization of the feature set was obtained by using Pearson and least absolute shrinkage and selection operator methods, and the final results were compared with ANN and SVR traditional models.
In [47], six decomposition-based evolutionary ANNs for city and building scale energy forecasting were examined. Several measures were used to improve performance, and the results show that they can obtain high fitting accuracy and low error rates for different prediction horizons of forecasting/planning tasks. In [59], the authors designed a probabilistic data-driven predictive model for predicting electricity demand. The model is based on the Bayesian network framework. Scenarios considering different temporal granularities and spatial resolutions were assessed. They concluded that the Bayesian network framework is efficient for highlighting the dependencies between variables in the considered scenarios.
In [60], ML methods were used to derive data-driven appliance models and usage patterns to predict energy demand, aiming for increased accuracy of predictions of comfort needs, energy costs, environmental impacts, and grid service availability. Seven point six Energies 2021, 14, 7664 7 of 37 percent of energy savings were achieved without requiring substantial behavioural changes. The algorithms responded with 10% or lower errors when in a demand response event.
In [61], a method of coupling simulation with ML to predict indoor conditions and electricity demand in response to schedules and other factors was assessed. Potential spikes were identified based on predicted values. Coupling simulation techniques with ML reduced the requirement for costly and intrusive data collection methods. In [62], the authors applied extreme gradient boosting to predict and analyse electricity, gas, and water consumption and used SHapley Additive exPlanation to interpret the results. The methods were used in three different models: electricity, gas, and water consumption, in which a non-linear relationship was found between gas consumption and building intensity-due to an apparent relationship to the technology itself. Building type also significantly impacted interrelationships, especially between electricity and water.
Principal component analysis was performed for dimensionality reduction and for finding hidden patterns to provide data in clusters in [63]. The clusters were associated with climatic variables to forecast power consumption using regression-based ML models. In [64], the study aimed to develop an improved SVM (which applies Gaussian radial basis function optimised by a genetic algorithm as the kernel function) model to predict electricity demand under multiple scenario' strategies. An average reduction of 12.1% in monthly electricity demand was achieved, compared with conventional behavioural intervention.
In [65], the study aimed to identify the best data-driven method for quantifying the impacts of climatic and socioeconomic changes on electricity consumption in buildings. A timeframe of four decades of data was used to train and validate the models. Monthly electricity consumption is predicted to decrease by 89.40% in the residential and commercial sectors, respectively, compared with 2018 levels.
The authors proposed in [33,34] the use of Radial-Basis Function (RBF) networks, designed by a Multi-Objective Genetic Algorithm (MOGA) framework, for multi-step forecasting of residential load demand. They used three years of data collected in Honda Smart Home US [66]. A single chosen model was compared with an ensemble of models, the latter obtaining the best results.

Applications of ML-Based Energy Systems Forecasting in HEMS
Many studies are available where the applications of ML-based energy systems forecasting in the HEMS context are assessed. Contribution [67] addresses the problem of residential load scheduling by using optimisation techniques in a receding horizon approach in a seven days-ahead prediction horizon. The proposed approach was compared with receding horizon and day-ahead scheduling techniques, and the obtained results were considered valid compared to the existing state-of-the-art approach. In [68], the study proposed a ML platform on a smart-gateway-based smart-grid in residential buildings, analysing occupant behaviours on a short-term load forecasting scheme. Based on the occupant behaviour profile and energy demand prediction, the proposed EMS can achieve up to 19.66% more peak load reduction and 26.41% more cost savings than compared to the SVM approach. In [31] the current authors developed short-term multi-step PV power forecasts to be used in model-based predictive control for HEMS. MOGA-designed RBFs were employed. In [69], also for short-term prediction, the authors looked at load power forecasting for HEMS by considering a smart community. Solar power systems are also the object of study in [70], where the forecasting is developed by using a long-short-term memory (LSTM) model, using different time scales (e.g., 15 min, 30 min ahead, and one day ahead). In [71], the authors addressed forecasting and HEMS optimization from the microgrid perspective, presenting a fully developed and implemented control scheme.
In [72], machine learning methods were used to predict the flexibility of a HEMS. User-behavior prediction and its impact on HEMS are assessed in [73]. In [74], the authors approach forecasting techniques in HEMS from a prosumer perspective, indicating that different renewables' availability highly influences optimal demand allocation, renewablesbased energy allocation, and the charging-discharging cycle of energy storage and electric vehicles. In contribution [75], machine learning methods are used to improve load prediction in a HEMS context based on human behavior patterns recognition. In [76], the authors implemented a self-learning HEMS based on Internet of Things, focusing on price forecasting, price clustering, and power alert system in order to enhance its functions. In [77], forecasting using deep learning is developed, aiming at improvement of automation efficiency in HEMS. In [78], the effect of electric vehicle movement schedule in a system composed of a photovoltaic generator, home energy storage, and HEMS control was analyzed.
Notice also that the use of popular open-source solutions such as Prophet [79] or AtsPy [80] can be used to obtain time-series forecasts.

Future Applications for Schedulable and Non-Schedulable Appliance Consumption Forecasting Using NILM
Due to the limitations on the practical implementation of in-depth and expensive monitoring systems, non-intrusive load monitoring (NILM) is becoming a hot topic [81]. One of the most important key points of a home energy management system (HEMS) is monitoring specific appliances. It aims to provide detailed information about the operating states and power consumption of specific devices in the house. Furthermore, it will allow HEMS systems to schedule energy-consuming appliances in order to establish energysaving strategies, such as reprogramming high-power appliances to operate during off-peak hours [82].
In fact, electric appliances can be classified as schedulable (deferrable) and nonschedulable (non-deferrable) [83]. Devices such as washing machines, dryers, and water pumps can have their operations deferred and can be inoperative during peak energy demand hours. Moreover, this class of appliances includes also thermostatic devices such as heating/cooling systems and electric water heaters, representing a significant fraction of overall household electricity usage [84]. The non-schedulable devices, on the other hand, are made up of devices such as lighting, refrigerators, or cooking, where their electrical energy needs cannot be postponed.
The purpose of appliance monitoring is to identify the operating states and electrical consumption of schedulable and non-schedulable devices in real time. This can be performed by installing one or more sensors in each load of interest. This process is known as intrusive load monitoring. Due to its intrusive nature, which involves specific privacy concerns, the difficulty involved in installing and configuring several sensors, and its expensive cost, non-intrusive alternatives are preferred [83].
Non-intrusive load monitoring (NILM) aims to detect individual device usage from the aggregate total consumption collected by a smart meter at the building's entrance. In general, the NILM process includes data collection, feature extraction, event detection, and load identification [83,85]. In [81], the authors present an overview of the state-of-the-art residential electrical demand monitoring. Unlike previous reviews, the applications of load monitoring are addressed based on technical challenges faced by the residential systems available. Contribution [86] proposed non-intrusive load monitoring based on the time window for HEMS application. The authors examined three machine learning algorithms (Decision Tree, k-Nearest Neighbor (kNN), and Random Forest). They obtained good performance by using a low-frequency public dataset.
An enhanced HEMS in residential power scheduling using a non-intrusive load monitoring approach and an automated nondominated sorting genetic algorithm-II (NSGA-II) was presented in [87]. The authors showed that the proposed advanced HEMS with the NILM approach is practicable and feasible in a real-world context. An analysis of device flexibility in the context of user behavior in a home energy management system using smart plugs was presented in [88]. They demonstrated that by including consumer behavior-related features, the suggested approach obtained outstanding performance. The identification of the operation, as well as energy consumption, of the set of non-schedulable appliances and for each schedulable device is not the end of the story. For an efficient HEMS, the consumption of each group of appliances should also be forecasted by using Energies 2021, 14, 7664 9 of 37 algorithms described before. An example of the forecasting of these two groups of devices can be observed in [34].

Design Methodology
In this Section, the models and the methods used for predictive model design are briefly introduced. The reader is encouraged to inspect the referenced papers for a deeper description.

The Models
The models used are RBF-ANN models. Typically, a Gaussian radial type of function is employed by hidden neurons, and their outputs are linearly combined afterwards. The model output is given by (1): is the j th input at k, w is the vector of linear weights, C(j) is the vector (extracted from a C matrix) of the centres associated with hidden neuron j, σ j represents its spread, and 2 the Euclidean distance.
As the use of the model here is for prediction, a dynamic model is needed. This is achieved by employing external feedback, assuming that (1) can be observed as follows.
The use of delayed versions of the measured output in i[k] of (1) allows the interpretation of RBF as a Nonlinear Auto Regressive (NAR) model.
Employing delayed versions of external (eXogeneous) inputs, (3) is changed to a NARX model:ŷ where, for the sake of simplicity, only one external input, v, was used. As the evolution of the forecasts over a prediction horizon (PH) is the objective, (4) is iterated over that horizon. For k + 1, we have the following.
Measured values for one or more terms in the argument of (5) may not be available depending on the indexes selected for the delays. Thus, these values must be obtained by using previous predictions. In this manner, the computation of the predictions over a prediction horizon PH may require PH executions of the model (5), representing a multi-step predictive model.

Model Design
A data-based model design is usually achieved by using the following three steps: (i) Using the available data, training, generalization or testing, and validation sets should be constructed. This phase is known as data selection. (ii) Once datasets have been built, the structure of the models, as well as their inputs, should be determined. This phase is known as structure selection. (iii) For each model determined in the previous step, its parameters should be estimated. This is the estimation step.
The training set is used to estimate the parameters of each model designed; the testing set is used to compare models during the model design phase or to terminate parameter estimation; both sets are employed in the two last phases. The validation set, which is not used in the model design cycle, is employed to compare the performance of different designed models. In this application, the design data consists of samples, each one using the current value of the modelled data as a target, and delayed values of the modelled variable, as well as delayed values of every exogenous variable (if existent), as inputs.

Data Selection
We employ an approximate, stochastic convex-hull algorithm for data selection. This algorithm, denoted as ApproxHull, was proposed in [89]. It determines the convex hull (CH) of the data, treating memory and time complexity in an efficient manner. These CH vertices are compulsorily introduced in the training set, allowing the model to be designed with elements covering the entire operational range. The remaining samples included in the training set, testing set, and validation set are randomly extracted from the available data, without considering CH samples. For further details on the ApproxHull incremental algorithm, please consult [89]. Approxhull has been applied to different design problems [60,90,91] and also for online model adaptation [92].

Structure Selection
For feature and topology selection, this study uses a Multi-Objective Genetic Algorithm (MOGA). Within scope of this study, the model is considered as an multi-objective optimization, and restrictions and priorities are possibly assigned to each defined objective. The evolutionary algorithm searches the admissible space of the number of neurons and the number of inputs (lags for the modelled and exogenous variables) for the RBF models. For a detailed explanation of MOGA operation, please consult [93].
In this case, different objectives must be defined prior to MOGA execution. The minimization objectives used in this work are the RMSEs of the training set (ε T r ) and of the testing set (ε T e ), the model complexity (O M ), and the forecasting performance ε PH . This last criterion is obtained by the sum of the RMSEs along with PH (6), where D is a time series, with p data points, and E is an error matrix (7).
In this manner, we compute the RMSE for each step-ahead prediction and sum up those values. In order to compute this criterion, a forecasting period must be used. To differentiate between the forecasting period used in MOGA design and the forecasting period employed in model validation, two notations will be introduced: ε PH MOGA and ε PH VAL . Notice that the latter is computed in a time series that is not employed for model design.

Parameter Estimation
Each model in the current population is a specific RBF, for which its parameters must be estimated. A modified version of the Levenberg-Marquardt [94,95] algorithm, which exploits linear-nonlinear parameter separation, is employed. Briefly, (1) can be expressed as follows: where X represents the input matrix, v the nonlinear parameters (the centres C and the spreads σ), and u represents the linear parameters (linear weights w). By using this parameter decomposition, the optimal value of the linear parameters can be obtained as follows (9):û where the symbol + denotes a pseudo-inverse operation. By incorporating this value in the usual training criterion (sum of the square of the errors), a new criterion is obtained: which is independent of the value of the linear parameters. In order to minimize (10) by using the LM algorithm, we need the Jacobian of the model, which can be obtained [96] as follows.
In other words, this is the Jacobian matrix computed in with the usually method of using linear parameters as the optimal ones. Finally, the LM update, s[k], is computed as the solution of the following equation: where e is the error vector, and λ is the regularization parameter.
As the model is nonlinear, different initial values for the nonlinear parameters can result in different final results. For this reason, good initial values are important. The MOGA framework allows employing random values or a clustering method and the Optimal Adaptative K-Means (OAKM) algorithm [97]. A user-specified number of trials can be specified for each model in the population, each one with its initial values. As a multi-objective formulation is used, the best model can be determined in several ways, which are also user-specified. Finally, each parameter estimation procedure stops whether a user-specified number of iterations is reached or an early stopping technique is employed with the use of the testing set.

Model Ensemble
As MOGA uses a multi-objective formulation, its result is not a single solution but a set of non-dominated or Pareto solutions. Therefore, the user must select, among these nondominated models, one that presents a good trade-off performance among the different objectives, ε T r , ε T e , O M , and ε PH MOGA , as well as in the validation data, which is not used for model design. Typically, ε V and ε PH VAL are employed.
Since the number of non-dominated or preferable (that meet user-specified goals) solutions is typically large, the choice of this "best compromise" model is not a trivial process. On the other hand, non-dominated or preferable solutions are typically very good models, obtained with a huge computational effort in a computer cluster, and it seems to be a waste when only using one of them.
For this reason, the use of some of these models for ensemble averaging was proposed in [34]. The idea is to use, as output, the median value of the outputs of the models in the ensemble. The median value is preferable to the arithmetic mean because, as the forecasting performance is typically used as a minimizing criterion and not as a restriction, even preferable models can sometimes obtain large forecasting values.
The selection of the models to be used in the ensemble was not addressed in [34]. It will be discussed in this paper together with a deeper analysis of the ensemble results.

Case Study Description
The residence employed in this study is located in Montenegro, Faro, Algarve, Portugal (37 • 0 55 N, 7 • 56 6 W). The residence employed has two floors with 20 different spaces. A detailed description of the case study may be found in [32].
A Schneider panel consisting of 16 monophasic circuit breakers, plus a triphasic one, is used as the electric panel. It has a solar photovoltaic system composed of 20 Sharp NU-AK panels [99], each with a maximum power of 300 W. The inverter is a Kostal Plenticore Plus converter [100] that also controls a BYD Battery Box HV H11.5 (capacity of 11.5 kWh) [101]. An intelligent weather station is used to acquire the relevant weather data, as well as to compute their evolution within a user-specified PH [102]. Wi-Fi power plugs [103] are used to monitor and control specific devices. The house also has a few Self-Powered Wireless Sensors (SPWS) available for measuring room climate variables and occupant activity [104]. A data acquisition system is implemented in order to monitor the variables related to electricity consumption. The data that will be used for load demand are supplied by a Carlo Gavazzi (EM340) 3 phase energy meter [105]. The devices making measurements for additional electricity-related variables used for NILM purposes include Circutor Wibeees (WBs) [106], which are plug-and-play wireless devices for acquiring electric consumption values. One hundred and ninety-eight variables are sampled by the WBs every second.
Many gateways and a technical wireless network are responsible for data transmission from/to the devices that are the objects of measurements. A diagram of the data acquisition system is shown in Figure 2. For a description of the system implemented, as mentioned, the reader is invited to consult [32].

Results
Six variables are used for the current study: total electric power demand (PD), PV DC power generated (PG), atmospheric air temperature (T), global solar irradiance (R), occupation (Occ), and day encoding (DE). The first four variables are measured by the data acquisition system. Occ represents the number of occupants present in the house each day. Day encoding, presented in Table 1, characterises each day of the week and the occurrence and severity of holidays based on the day they occur, as may be consulted in [107,108]. The regular day column shows the coding for the days of the week when these are not a holiday. The following column presents the values encoded when there is a holiday, and finally, the special column shows the values that substitute the regular day value in two special cases: for Mondays when Tuesday is a holiday and for Fridays when Thursday is a holiday.

Results
Six variables are used for the current study: total electric power demand (P D ), PV DC power generated (P G ), atmospheric air temperature (T), global solar irradiance (R), occupation (O cc ), and day encoding (D E ). The first four variables are measured by the data acquisition system. O cc represents the number of occupants present in the house each day. Day encoding, presented in Table 1, characterises each day of the week and the occurrence and severity of holidays based on the day they occur, as may be consulted in [107,108]. The regular day column shows the coding for the days of the week when these are not a holiday. The following column presents the values encoded when there is a holiday, and finally, the special column shows the values that substitute the regular day value in two special cases: for Mondays when Tuesday is a holiday and for Fridays when Thursday is a holiday. Sixteen months of data were used, ranging from 1 May 2020 00:07:30 to 31 August 2021 23:52:30. The first four variables were averaged in 15 min steps, while a constant value was used for all samples within the same day for O cc and D E . The use of a 15 min time step is due to a previous study [109], where it was proved that this time step, together with a prediction horizon of 28 steps, enabled excellent HEMS performances.
Regarding P D , its maximum, mean, and minimum values are 7.0, 1.1, and 0.0 kW, respectively. Figure 3 shows the daily energy consumption in kWh. We can note that the maximum consumption occurs in winter. Figure 4 illustrates the evolution of P G , for some winter days. The maximum value of P G is 6.2 kW, occurring on 28 April 2021 12:22:30.   Regarding R, its maximum value is 1.18 kW/m 2 . The peak sunshine hours (or daily sunshine insolation) obtained for this location is represented in Figure 5. Its maximum value is 8.5 h, occurring on 20 June 2020. The maximum, mean, and minimum values of T are, respectively, 40.3, 17.1, and −0.6. Figure 6 shows the average daily temperature.   Regarding R, its maximum value is 1.18 kW/m 2 . The peak sunshine hours (or daily sunshine insolation) obtained for this location is represented in Figure 5. Its maximum value is 8.5 h, occurring on 20 June 2020. The maximum, mean, and minimum values of T are, respectively, 40.3, 17.1, and −0.6. Figure 6 shows the average daily temperature. Regarding R, its maximum value is 1.18 kW/m 2 . The peak sunshine hours (or daily sunshine insolation) obtained for this location is represented in Figure 5. Its maximum value is 8.5 h, occurring on 20 June 2020. The maximum, mean, and minimum values of T are, respectively, 40.3, 17.1, and −0.6. Figure 6 shows the average daily temperature. Regarding R, its maximum value is 1.18 kW/m 2 . The peak sunshine hours (or daily sunshine insolation) obtained for this location is represented in Figure 5. Its maximum value is 8.5 h, occurring on 20 June 2020. The maximum, mean, and minimum values of T are, respectively, 40.3, 17.1, and −0.6. Figure 6 shows the average daily temperature.    The next figure (Figure 7) shows the values of O cc throughout the entire period, while Figure 8 illustrates a snapshot of D E for the first two months.
As observed in Figures 1-4, there are some gaps within the acquired data. This does not constitute a major problem when a static model is being designed, but, for a dynamic model, the entire range of lags (d max ) must be be presented at every sample of data considered for the design. Moreover, if the goal is to design a predictive model to forecast the evolution of a certain variable over a Prediction Horizon (PH), for each instant of time you need, in addition to d max past values, PH posterior values. As one should assess the prediction performance over a time series with, say, n pred values, d max + n pred + PH consecutive values must be available for the modelled variable, as well as for every exogenous model input.     As observed in Figures 1-4, there are some gaps within the acquired data. This does not constitute a major problem when a static model is being designed, but, for a dynamic model, the entire range of lags (dmax) must be be presented at every sample of data considered for the design. Moreover, if the goal is to design a predictive model to forecast the evolution of a certain variable over a Prediction Horizon (PH), for each instant of time you need, in addition to dmax past values, PH posterior values. As one should assess the prediction performance over a time series with, say, npred values, dmax + npred + PH consecutive

Data Sets Description
As four different forecasting models are needed in a HEMS, a decision was made with respect to using the same forecasting time series for the design and validation of all models. The period that will be used for forecasting during MOGA design will be from 7 July 2020 13:07:30 to 27 July 2020 03:07:30 (1881 samples); for validation the period between 04 July 2021 05:37:30 and 24 July 2021 03:22:30, 1962 samples will be employed.
Four models will be designed in this study, and they are designated by M 1 to M 4 . The first model will forecast power demand (P D ). It is a NARX model for which its exogeneous variables are T, O cc , and D E .
The use of these exogeneous variables was discussed and justified in previous publications of the authors (please see [32,33,110]. In (13), the dash superscript denotes a set of delayed values of the corresponding variable. Typically, the power demand at any instant within a day is correlated to corresponding values one day before and, to a lesser extent, values observed one week ago. For this reason, lags of the modelled and the exogenous variables will be collected from three periods: immediately before the sample, centred at the corresponding instant one day ago, and centred at the corresponding instant one week ago. For this particular model, we shall use [20,9,9] for P D , [20 9 0] for AT, [1 0 0] for O cc , and [1 0 0] for D E . This means that for P D we shall consider the first 20 lags before the current samples: nine centered 24 h ago and nine centered one week ago. For AT, the same number of lags before the current sample and centered one day ago will be used (but not from the third period), and for the other two variables only the first lag will be allowed. This means that the total number of lags (d max ) that MOGA will consider is (20 + 9 + 9) + (20 + 9) + (1) + (1), i.e., 69 lags.
As data averaged in 15 min intervals are used, one week of data consists of 4 × 24 × 7 = 672 samples. With the additional four lags before one week of data, we have the largest delay index, l ind = 676 samples. As a PH of 28 steps ahead is used, for this model, we obtain the following: n pred_MOGA = 1881 − 676 − 28 = 1161 samples (around 12 days of data in July 2020) and n pred_VAL = 1962 − 676 − 28 = 1242 samples (around 13 days of data in July 2021).
The second model, M 2 , will forecast solar irradiance, R. It is a NAR model (with no exogenous inputs).
This model uses [20 9 9] lags, which means that d max = 38, and l ind , n pred_MOGA and n pred_VAL are the same with M 1 .
Finally, the fourth model is used to predict the electric power generated by the photovoltaic systems. It is a NARX model, for which its exogeneous inputs are R and T (for the choice of the exogeneous variables, please see [31]).
This model uses [20 9 9] lags for P PV , [20 9 9] lags for R, and [20 9 0] lags for T, which means that d max = 105, and l ind , n pred_MOGA , and n pred_VAL are the same as M 1 and M 2 .

Approxhull Results
The total number of samples supplied to Approxhull is 17,098. They will be divided into Training (T r ), Testing (T e ), and Validation (V) sets. Their dimensions are 10,258, 3419, and 3421 samples, respectively. Notice, however, that the sample indexes for these sets are not identical in the four models, as the number and indexes of Convex Hull points (CH), which will be mandatorily integrated into each T r , changes for each model. The numbers of CH points are 3630, 1835, 112, and 1306 for M 1 , M 2 , M 3 , and M 4 , respectively.

MOGA Results
For all problems, MOGA was parameterized with the following values: For all models, two MOGA executions were performed. In the first, the following objectives were minimized: ε T r , ε T e O M and ε PH MOGA .
By analyzing the results of the first MOGA iteration in the second MOGA iteration, some of these objectives will be recast as restrictions by using a heuristic for that purpose. Notice that, unless stated otherwise, the results use scaled data within the range of [−1 +1].

Single Solution Model 1-Power Demand
In the first MOGA execution 311 non-dominated models were found. The histogram of the usage of the lags of these models, for P D and T, can be found in Figure 9a,b, respectively.
Lags from the different periods are selected for these variables. Occupation and day encoding were less used. Among the 311 non-dominated models, only 16 used O cc , and 42 used D E .
By using the results obtained in the first execution, a second MOGA run was executed; this time, it had the the following objectives: Lags from the different periods are selected for these variables. Occupation and day encoding were less used. Among the 311 non-dominated models, only 16 used Occ, and 42 used DE.
By using the results obtained in the first execution, a second MOGA run was executed; this time, it had the the following objectives:  (Table   2) for the two executions.
As it can be observed that PD lags from the three periods are employed, while for T only lags from the first period were selected. The other exogenous variables were not employed. The model has seven neurons. In terms of prediction performance, MOGA PH  for the first forecasting period was 4.55. Its value for the second forecasting period was 4.79. The prediction performance for this second period was slightly worse than for the first one, In this execution, 341 non-dominated solutions and 204 preferable solutions were obtained.
The minimum values of ε T r , ε T e , ε V and ε PH VAL are shown in the next table (Table 2) for the two executions. From Table 2 we can observe that slightly better results were obtained for ε PH VAL .One model has been selected from the preferable solutions, which offers a good compromise between the different criteria. As it can be observed that P D lags from the three periods are employed, while for T only lags from the first period were selected. The other exogenous variables were not employed. The model has seven neurons. In terms of prediction performance, ε PH MOGA for the first forecasting period was 4.55. Its value for the second forecasting period was 4.79. The prediction performance for this second period was slightly worse than for the first one, but this can happen. Firstly, the models have been designed for minimizing the prediction performance for the first period and not the second one; secondly, no data from the second period were involved in the design. Thirdly, the results obtained for ε PH VAL , whenever needed, forecast T, which was not the case for the first period where only measured data were employed. Figure 10 shows the evolution of RMSE over the prediction horizon, and Figure 11 shows the one-step-ahead prediction; this time it is demonstrated at the original scale. Both graphs are related to the validation period. period were involved in the design. Thirdly, the results obtained for VAL PH  , whenever needed, forecast T, which was not the case for the first period where only measured data were employed. Figure 10 shows the evolution of RMSE over the prediction horizon, and Figure 11 shows the one-step-ahead prediction; this time it is demonstrated at the original scale. Both graphs are related to the validation period.

Model 2-Solar Irradiance
The first MOGA execution obtained 216 non-dominated solutions. Figure 12 illustrates the lags' usage. period were involved in the design. Thirdly, the results obtained for VAL PH  , whenever needed, forecast T, which was not the case for the first period where only measured data were employed. Figure 10 shows the evolution of RMSE over the prediction horizon, and Figure 11 shows the one-step-ahead prediction; this time it is demonstrated at the original scale. Both graphs are related to the validation period.

Model 2-Solar Irradiance
The first MOGA execution obtained 216 non-dominated solutions. Figure 12 illustrates the lags' usage.

Model 2-Solar Irradiance
The first MOGA execution obtained 216 non-dominated solutions. Figure 12 illustrates the lags' usage.  By using the results obtained in the first execution, a second MOGA run was executed, this time with the following objectives: In this execution, 341 non-dominated solutions and 204 preferable solutions were obtained.
The minimum values of ε T r , ε T e , ε V and ε PH VAL are shown in Table 3:  Table 3 demonstrates that the same results were obtained in the second MOGA iteration. One model has been selected from the preferable solutions, which offers a good compromise between the different criteria.
As observed, lags from the three periods are employed. The model has seven neurons, which indicates that a simple model can be used for forecasting R. In terms of prediction performance, ε PH MOGA was 3.59, while ε PH VAL was 2.58. The prediction performance for this second period was much better than for the former simply because the first period included cloudy days, which did not occur for the latter (please see Figure 13).

Model 3-Atmospheric Temperature
In the first MOGA execution, 175 non-dominated models were designed for M3. The next figure (Figure 15) shows the histogram of lags usage for M3.

Model 3-Atmospheric Temperature
In the first MOGA execution, 175 non-dominated models were designed for M3. The next figure (Figure 15) shows the histogram of lags usage for M3. By using the results obtained in the first execution, a second MOGA run was executed, this time with the following objectives: In this execution, 341 non-dominated solutions and 204 preferable solutions were obtained.
The minimum values of ε T r , ε T e , ε V and ε PH VAL are shown in Table 4 for two iterations. The second MOGA execution obtained slightly better forecasting results. Model (19) was selected from the preferable solutions, employing lags from the two periods.
The model has nine neurons, which means that it has 90 nonlinear parameters. In terms of the prediction performance, ε PH MOGA was 2.84. Its value for the validation period was 2.70. Figure 16 demonstrates the evolution of the RMSE over the prediction horizon, and Figure 17 demonstrates the one-step-ahead prediction in the original scale. Both graphs are related to the second forecasting period. The model has nine neurons, which means that it has 90 nonlinear parameters. In terms of the prediction performance, MOGA PH  was 2.84. Its value for the validation period was 2.70. Figure 16 demonstrates the evolution of the RMSE over the prediction horizon, and Figure 17 demonstrates the one-step-ahead prediction in the original scale. Both graphs are related to the second forecasting period.

Model 4-Power Generated
Three hundred fifty-six non-dominated models were obtained in the first MOGA generation. The histogram of usage of the lags of these models for PG, R, and T can be found in Figure 18a-c, respectively.

Model 4-Power Generated
Three hundred fifty-six non-dominated models were obtained in the first MOGA generation. The histogram of usage of the lags of these models for P G , R, and T can be found in Figure 18a-c, respectively.
By analysing Figure 18, we can observe that lags from the different periods are selected for these variables.
By using the results obtained in the first execution, a second MOGA run was executed, this time with the following objectives: In this execution, 295 non-dominated solutions and 186 preferable solutions were obtained.
The minimum values of ε T r , ε T e , ε V and ε PH VAL are shown in Table 5 for the two executions. Figure 17. Measured (blue) and one-step-ahead predicted (red) air temperature.

Model 4-Power Generated
Three hundred fifty-six non-dominated models were obtained in the first MOGA generation. The histogram of usage of the lags of these models for PG, R, and T can be found in Figure 18a-c, respectively. By analysing Figure 18, we can observe that lags from the different periods are selected for these variables.
By using the results obtained in the first execution, a second MOGA run was executed, this time with the following objectives:  Slightly better results were obtained for ε PH VAL in the second excutions. The following model has been selected from the preferable solutions.
Model (20) employs lags from the first two periods, while for R, lags from the three periods were chosen. Two lags are employed for T from the two periods. In terms of prediction performance, ε PH MOGA was 1.54, while ε PH VAL was 1.67. The prediction performance for this second period was slightly worse than for the first one. Figure 19 demonstrates the evolution of RMSE over the prediction horizon, and Figure 20 demonstrates the one-step-ahead prediction. As observed, excellent results were obtained.
Energies 2021, 14, 7664 23 of 37 periods were chosen. Two lags are employed for T from the two periods. In terms of prediction performance, MOGA PH  was 1.54, while VAL PH  was 1.67. The prediction performance for this second period was slightly worse than for the first one. Figure 19 demonstrates the evolution of RMSE over the prediction horizon, and Figure 20 demonstrates the one-step-ahead prediction. As observed, excellent results were obtained. Figure 19. Evolution of RMSE over the prediction horizon-M4.

Ensemble Averaging
The number of models in the ensemble and the question of how to select them will be discussed in this Section. At the end of the second MOGA execution, we have the results of the preferable models available in terms of their performance on the training, testing and validation datasets, and on the first forecasting dataset. Our goal is to select an ensemble of models such that their forecasting performance within the entire time series is better than using a single model. In order to assess this, we proceed to analyse forecasting performance in the second forecasting period, which is not used for model design. We shall compare VAL PH  , which is the forecasting performance of the models chosen previously, with the one obtained by using the median of the outputs of the models in ensemble 50% VAL PH  . In order to assess the dispersion of the results of using this specific ensemble, we shall also assess the first and third quartile performances, . Please note that we used these two measures as they are robust to outliers. If the models generalize well, it can be argued that the forecasting performance in the first forecasting dataset would be translated with a similar performance throughout the time series. Using this assumption, we shall select the models within the ensemble with

Ensemble Averaging
The number of models in the ensemble and the question of how to select them will be discussed in this Section. At the end of the second MOGA execution, we have the results of the preferable models available in terms of their performance on the training, testing and validation datasets, and on the first forecasting dataset. Our goal is to select an ensemble of models such that their forecasting performance within the entire time series is better than using a single model. In order to assess this, we proceed to analyse forecasting performance in the second forecasting period, which is not used for model design. We shall compare ε PH VAL , which is the forecasting performance of the models chosen previously, with the one obtained by using the median of the outputs of the models in ensemble ε PH 50% VAL . In order to assess the dispersion of the results of using this specific ensemble, we shall also assess the first and third quartile performances, ε PH 25% VAL and ε PH 75% VAL , as well as the interquartile value,

VAL
. Please note that we used these two measures as they are robust to outliers. If the models generalize well, it can be argued that the forecasting performance in the first forecasting dataset would be translated with a similar performance throughout the time series. Using this assumption, we shall select the models within the ensemble with the smallest 10, 25, and 50 values in ε PH MOGA . We shall denote this set of models as M ε PH MOGA (n) * , where * denotes the type of model, and n assumes the values 10, 25, and 50.
Another criterion that can be used for model selection is the 2-norm of the linear weight vector, w 2 , of each model. A large norm indicates that the output will change significantly with small changes in the basic functions outputs. Using this line of thought, we will select the models with the smallest 10, 25, and 50 values of w 2 and denote these sets as M The ensembles forecasting performances are shown in Table 6. Notice that ε PH VAL = 4.79. Table 6. Forecasting Performances for M 1 . By analysing the results shown in Table 6, we can conclude that the use of the median of the different ensembles always obtains better results than the single selected model. By comparing the two different selection criteria, better results are always obtained for , which also has a small dispersion of results. Figure 21 Figure 22 shows the evolution of the ε PH VAL , in red, and for the models belonging to the selected ensemble, ε PH VAL . The evolution of the median results is shown in black, and it can be observed that, for all prediction steps, the scaled RMSE is always inferior to the single model. By analysing the results shown in Table 6, we can conclude that the use of the median of the different ensembles always obtains better results than the single selected model. By comparing the two different selection criteria, better results are always obtained for

Model 2-Solar Irradiance
The ensembles forecasting performances for R are shown in Table 7. Notice that for VAL PH  , the forecasting performance of single model was 2.58. The performance of the ensemble models is always superior to the single model. In the same manner as in M1, the forecasting selection criterion is better than 2-norm. For this criterion, in contrast with the previous case, dispersion increases with the number of elements in the ensemble, and the smallest ensemble obtained the best results.
As observed by analysing Figures 23 and 24, the 1-step-ahead approximations ob-

Model 2-Solar Irradiance
The ensembles forecasting performances for R are shown in Table 7. Notice that for ε PH VAL , the forecasting performance of single model was 2.58.  The performance of the ensemble models is always superior to the single model. In the same manner as in M 1 , the forecasting selection criterion is better than 2-norm. For this criterion, in contrast with the previous case, dispersion increases with the number of elements in the ensemble, and the smallest ensemble obtained the best results.
As observed by analysing Figures 23 and 24

Model 3-Atmospheric Temperature
The ensembles forecasting performances are shown in

Model 3-Atmospheric Temperature
The ensembles forecasting performances are shown in

Model 3-Atmospheric Temperature
The ensembles forecasting performances are shown in Table 8. Notice that the ε PH VAL = 2.70.  In the same manner as in the previous models, the median forecasting performance of the ensemble models is always better than the single model. Ensembles for which their models have been selected with the forecasting criterion are better than compared to using the weight norm. The dispersion of the results decreases with the dimension of models. The approximation is very accurate; consequently, the dispersion of the ensemble values is very small. As observed in Figure 26, the RMSE of the single model is always superior to the ensemble model median. In the same manner as in the previous models, the median forecasting performance of the ensemble models is always better than the single model. Ensembles for which their models have been selected with the forecasting criterion are better than compared to using the weight norm. The dispersion of the results decreases with the dimension of the ensemble, but the one with 25 elements obtains the best median value. The results are shown in Figures 25 and 26. Figure

Model 4-Power Generated
The ensembles forecasting performances for PG are shown in Table 9. Notice that 1.67 In the same manner as in the previous models, the median forecasting performance of the ensemble models is always better than the single model. Ensembles for which their models have been selected with the forecasting criterion are better than compared to using the weight norm. The dispersion of the results decreases with the dimension of the ensemble, but the one with 25 elements obtains the best median value. The results are shown in Figures 25 and 26. Figure  els. The approximation is very accurate; consequently, the dispersion of the ensemble values is very small. As observed in Figure 26, the RMSE of the single model is always superior to the ensemble model median.

Model 4-Power Generated
The ensembles forecasting performances for PG are shown in Table 9. Notice that 1.67

Model 4-Power Generated
The ensembles forecasting performances for P G are shown in Table 9. Notice that ε PH VAL = 1.67.
In the same manner as in the previous models, the performance of all ensemble models was better than the single model; this time, it was significantly better. The forecasting criterion also produced better results than compared to using the weight norm. In contrast with the other models, the ensemble with 50 models achieved better median forecasting performance this time, albeit with a larger dispersion. For this reason, we chose the  In the same manner as in the previous models, the performance of all ensemble models was better than the single model; this time, it was significantly better. The forecasting criterion also produced better results than compared to using the weight norm. In contrast with the other models, the ensemble with 50 models achieved better median forecasting performance this time, albeit with a larger dispersion. For this reason, we chose the     Figure 28 illustrates the evolution of RMSE evolution over the prediction horizon. Apart from the result for the first step-ahead, all other values were significantly better than the selected model.

Discussion of the Results
As observed from analysing the results presented in Section 5.3.1, the forecasting results obtained with the MOGA single solution are very good and are among the best results presented in the literature. Despite that, MOGA ensembles with averaging solutions,

Discussion of the Results
As observed from analysing the results presented in Section 5.3.1, the forecasting results obtained with the MOGA single solution are very good and are among the best results presented in the literature. Despite that, MOGA ensembles with averaging solutions, discussed in Section 5.3.2, significantly improved the prediction results for all models considered.
Two different criteria were introduced in order to select the elements in the ensemble: using ε PH MOGA and w 2 . For all ensembles generated, the use of the forecasting criterion obtained better ε PH 50% VAL results. For this reason, it should be the criterion used for model selection.
The number of models in the ensemble was also discussed for all model types. Among the three possibilities (10, 25, and 50), 25 models were chosen. Notice that this value is approximately 10% of the preferable solutions chosen by MOGA, and this may constitute a rule of thumb to be employed. Additionally, in terms of computing time, this number is not translated in a large overhead relative to the single solution.

Comparison of Results
In this section, we shall compare the forecasting performance of the technique proposed in this paper with similar studies found in the literature. We shall start with previous publications of the authors; subsequently, we shall address other authors' works. Only two out of the four models will be considered: M 1 , for load demand, and M 4 , for power generation, as the other two are only necessary for these two models.
In [110], data from the same house were employed to generate load demand predictive models. Data from January to July 2020 were employed for model design, and forecasting validation used two weeks of data, from 10 to 24 July 2020. The same time resolution, 15 min, was employed.
Although only 7 months of data were employed in that study, compared to the 15 months employed here, the minimum values of the RMSEs in the different datasets were comparable to the ones presented in Table 2. Forecasting performance was assessed with a PH of 48 steps. The best RMSE result obtained, considering only the first 28 prediction steps employed here, was 4.84. This should be compared with ε PH VAL = 4.79 and the best ensemble results, 4.61. It should be stressed that although the validation periods used in the two papers are different, single solution results are comparable, and both are worse than the ensemble solution.
In publication [66], data collected in Honda Smart Home US were employed. Please note that the energy consumption of the two houses is similar. Two different time steps were used, 15 min and 1 h. Three years of data were used for model design and validation. In order to analyse prediction results, one week worth of data, between 25 February to 1 March 2017, was employed. By using the first-time resolution, the ε PH VAL obtained was 9.24.
The forecasting performance of R, T, and P G models was discussed in [31]. Data from the same house were employed, with the design period covering the days between 19 May and 31 July 2020 (two months and a half), and the forecasting performance was assessed during the 14 June to 12 July period. The minima RMSEs for the three datasets and for the three models were equivalent to the values presented in Tables 3-5.
In terms of forecasting performance, in [31], a PH of 48 steps was also applied. By using only the first 28 prediction steps for the sum, a value of 2.60 was obtained for R. 1.41 here. Again, the ensemble forecasting values were significantly better than the ones obtained in [31].
Other authors have proposed different techniques for load demand and power generation. As criteria other than the RMSE were employed in their studies, other performance criteria are defined below.
In the previous equations, n is the number of samples, y t is the measured tth value,ŷ t is the predicted value, y is the mean value, and r is the range of the measured variable.
Although there are several studies available in the field of load demand prediction, the majority considers buildings or sets of households. The scale we are considering is very important, as an aggregation of load demands or energy consumptions is usually much smoother than individual ones.
As an extreme example, Figure 29 illustrates Portuguese electricity demand for two consecutive weeks some years ago [111]. The daily profile should be compared with the one shown in Figure 11, demonstrating that the profile of the aggregation of a (very large) number of consumers is much smoother than an individual one. The interested reader can inspect the forecasting performance of Portuguese load demand in the previous publication and in [107,108,112].
In this manner, we will only consider publications that address forecasting energy consumption with respect to single households. This is the case of [113], where SVR is applied for hourly and daily energy forecasting of 15 houses in Ontario, Canada, from 2014 to 2016. Focusing on the hourly resolution, the one-step MAPE of energy consumption varies from 23.3% to 67.96%, depending on the accuracy category (good hourly and daily accuracy down to poor hourly and daily accuracy). As in our approach, we forecast the power demand in steps of 15 min, and we used the first four forecasts for calculating the energy. Notice that as we use four predictions to calculate one single result, the comparison is not fair for our approach. Having said that, our MAPE hourly consumption is 16.5%, which is much better that the results obtained in [113].
Wen and co-workers [114] employed data from residential buildings (single-family homes, town home, and apartments) from the Dataport website [115]. They produced hourly and daily consumption forecasts for aggregated consumptions by using Deep Recursive Neural Networks (RNN) and Long Short-Term Memory (LSTM) models. The hourly MAPE for a single house ranged between 11.3% and 17.97%, depending on the Deep model used. However, as the authors stated, the best Deep RNN model has the limitation of needing future values with respect to weather, which were not forecasted in their work in contrast to what happens in this paper.
In relation to PV power generated, Rana and co-workers [116] employed multi-step forecasts, as we did here. Forecasts between 5 min and 1 h obtained MREs between 4.2 and 9.3%. Our approach obtains MRE values between 2.35% and 2.45% for forecasts between 15 min and 1 h. Figure 29. Snapshot of electricity consumption for Portugal for two consecutive weeks [111].
In this manner, we will only consider publications that address forecasting energy consumption with respect to single households. This is the case of [113], where SVR is applied for hourly and daily energy forecasting of 15 houses in Ontario, Canada, from 2014 to 2016. Focusing on the hourly resolution, the one-step MAPE of energy consumption varies from 23.3% to 67.96%, depending on the accuracy category (good hourly and daily accuracy down to poor hourly and daily accuracy). As in our approach, we forecast the power demand in steps of 15 min, and we used the first four forecasts for calculating the energy. Notice that as we use four predictions to calculate one single result, the comparison is not fair for our approach. Having said that, our MAPE hourly consumption is 16.5%, which is much better that the results obtained in [113].
Wen and co-workers [114] employed data from residential buildings (single-family homes, town home, and apartments) from the Dataport website [115]. They produced hourly and daily consumption forecasts for aggregated consumptions by using Deep Recursive Neural Networks (RNN) and Long Short-Term Memory (LSTM) models. The hourly MAPE for a single house ranged between 11.3% and 17.97%, depending on the Deep model used. However, as the authors stated, the best Deep RNN model has the limitation of needing future values with respect to weather, which were not forecasted in their work in contrast to what happens in this paper.
In relation to PV power generated, Rana and co-workers [116] employed multi-step forecasts, as we did here. Forecasts between 5 min and 1 h obtained MREs between 4.2 and 9.3%. Our approach obtains MRE values between 2.35% and 2.45% for forecasts between 15 min and 1 h.
Subsequent studies employed only supplied one-step-ahead forecasts. A comparison of the results obtained with the multi-step-ahead forecast is not fair to our approach, especially when one-step-ahead forecasts correspond to many steps-ahead of the multi-step forecasts. In [117], forecasts of up to 1 h are produced, with MAPE values between 24.7% and 37.8%. In the approach proposed in this paper we obtain a MAPE of 15.6% for a 1 h forecast. Hossain and Mahmood [118] also employed the MAPE criterion in their work. For summer months and for a 6 h prediction step, a MAPE value of 28.6% was obtained. In our approach, for a 24 steps-ahead prediction step (6 h ahead), a MAPE value of 16% was obtained. Publication [119] employed the R 2 criterion. For summer, which is the case considered here, the coefficient of determination ranges from 0.99 to 0.96, between the 15 min and 180 min prediction horizon. In our application, for the same prediction PHs, R 2 ranged from 0.99 to 0.98. Subsequent studies employed only supplied one-step-ahead forecasts. A comparison of the results obtained with the multi-step-ahead forecast is not fair to our approach, especially when one-step-ahead forecasts correspond to many steps-ahead of the multistep forecasts. In [117], forecasts of up to 1 h are produced, with MAPE values between 24.7% and 37.8%. In the approach proposed in this paper we obtain a MAPE of 15.6% for a 1 h forecast. Hossain and Mahmood [118] also employed the MAPE criterion in their work. For summer months and for a 6 h prediction step, a MAPE value of 28.6% was obtained. In our approach, for a 24 steps-ahead prediction step (6 h ahead), a MAPE value of 16% was obtained. Publication [119] employed the R 2 criterion. For summer, which is the case considered here, the coefficient of determination ranges from 0.99 to 0.96, between the 15 min and 180 min prediction horizon. In our application, for the same prediction PHs, R 2 ranged from 0.99 to 0.98.
The current methodology can be applied to any EMS, such as the one recently presented in [109] by the authors. The prediction of the different variables feeds the modelbased predictive control algorithm. Indeed, most parts of intelligent EMS, regardless of how they are designed or implemented, need to be fed with forecasted inputs that will impact its control.
As mentioned by different authors cited in the literature review, one of the drawbacks of designing models with very high prediction accuracy is the large computational time it takes to design the model. With some models, due to their complexity, the execution time is also high. With this study, model design is very costly, but the execution time is extremely fast due to the fact that the models are very simple.
The data collected by the project in which the present study is developed will be made publicly available soon.

Conclusions
In this paper, an ensemble of RBF models, designed by MOGA, for forecasting variables used in HEMS systems has been proposed. It has been shown that ensemble forecasting results are always superior to single selected models, which were already excellent. A simple procedure was proposed for selecting the models that are used in the ensemble, together with a heuristic to determine the number of models.
The models designed in this paper will be used in the HEMS algorithm proposed in contribution [109], which employed measured data in order to simulate forecasting. This HEMS employed the Branch and Bound (BAB) algorithm in order to implement a Model-Based Predictive Control scheme. Its execution did not consider any uncertainty in the data.