Modeling Energy Demand—A Systematic Literature Review

In this article, a systematic literature review of 419 articles on energy demand modeling, published between 2015 and 2020, is presented. This provides researchers with an exhaustive overview of the examined literature and classification of techniques for energy demand modeling. Unlike in existing literature reviews, in this comprehensive study all of the following aspects of energy demand models are analyzed: techniques, prediction accuracy, inputs, energy carrier, sector, temporal horizon, and spatial granularity. Readers benefit from easy access to a broad literature base and find decision support when choosing suitable data-model combinations for their projects. Results have been compiled in comprehensive figures and tables, providing a structured summary of the literature, and containing direct references to the analyzed articles. Drawbacks of techniques are discussed as well as countermeasures. The results show that among the articles, machine learning (ML) techniques are used the most, are mainly applied to short-term electricity forecasting on a regional level and rely on historic load as their main data source. Engineering-based models are less dependent on historic load data and cover appliance consumption on long temporal horizons. Metaheuristic and uncertainty techniques are often used in hybrid models. Statistical techniques are frequently used for energy demand modeling as well and often serve as benchmarks for other techniques. Among the articles, the accuracy measured by mean average percentage error (MAPE) proved to be on similar levels for all techniques. This review eases the reader into the subject matter by presenting the emphases that have been made in the current literature, suggesting future research directions, and providing the basis for quantitative testing of hypotheses regarding applicability and dominance of specific methods for sub-categories of demand modeling.


Introduction
The transformation of our energy system towards a more reliable, eco-friendly, and cost-effective one is a central goal of today's energy policy. An integral part of the planning processes across different infrastructures are energy system models. As the scope of such models is expanding across multiple infrastructures and energy carriers [1] they become increasingly detailed and complex [2]. Hence, well-founded information on future energy demand with the high temporal and spatial resolution is one of the most crucial inputs for such models, having a direct impact on associated decision-making processes [3] affecting real-time grid operation as well as long-term infrastructure extension planning. Accordingly, there is a strong need for reliable models predicting and simulating energy demand (in this article, all methods for the mathematical representation of energy demand or consumption are summarized under the term "energy demand modeling". Therefore, the terms energy consumption and energy demand are to be understood syn-onymously). Finally, energy demand modeling is the essential basis for all quantifications of demand flexibility.
There is an entire field of research revolving around the question of how energy demand can be modeled using a variety of approaches on different scales ranging from a global level down to a single appliance [4,5]. In the year 2009, there were 60 English articles indexed on "Web of Science", which had "energy demand" (the query also included "energy consumption" as a synonym for "energy demand") and "model" in their title. In 2020 this number had increased to 641. Energy demand models have a wide range of applications. As shown by Bhattacharyya and Timilsina [6], they can range from shortterm energy consumption forecasting in energy grids and markets over a simulation of heat and electricity loads in buildings and industrial processes to econometric long-term projections of national energy demand. In this article both, future-oriented forecasting, as well as operational simulation of energy demand in technical systems, is addressed by the term modeling.
Several reviews have been published capturing the variety of approaches and describing the developments in energy demand modeling literature. 28 recent reviews have been analyzed for this article. An overview of their characteristics can be found in Table A1 in the Appendix A. Seven out of these 28 stood out in terms of their systematic procedure ensuring transparency, replicability, and reduced bias following the conduct of a systematic literature review as described in [5,7,8]. These seven studies will be briefly presented in the following.
Kuster et al. [9] present a review on electric load forecasting techniques. 41 papers are reviewed regarding applied techniques, input data, pre-processing routines, geographic extend, temporal resolution, and horizon. While this review covers a variety of criteria, the number of reviewed articles could be extended across other energy carriers and sectors. In [10], 63 articles are reviewed which focus on energy consumption in buildings mainly applying ML techniques. The authors analyze the reviewed articles regarding techniques, types of feature, pre-processing, temporal granularity, data size, type of building, type of energy end-use, and performance measures. In [11], an analysis of the viability of various model inputs for residential energy consumption is given, focusing on socio-demographic, psychological, and contextual factors. In [4], Debnath and Mourshed present a review on forecasting techniques for supply and demand in energy planning models across all energy carriers. The authors present 483 models from articles published between 1985 and 2017. They discuss geographical extend, time frames, and performance measures, as well as specific criteria for techniques, such as the number of neurons in layers for artificial neural networks (ANN). While this review provides a wide-ranging analysis, data-related aspects are not included and a distinction between sectors is missing. Riva et al. [12] provide an analysis of 130 peer-reviewed studies on long-term rural energy planning, covering the electricity, oil, and heating sector on the demand and supply side. The reviewed studies are classified according to spatial coverage, planning horizon, energy carrier, mathematical models, and energy use. Šebalj et al. [13] review 39 articles on predicting natural gas consumption in the residential and commercial sector, published between 2003 and 2017. Articles are categorized regarding technique, input variables, spatial scope, and temporal horizon. Wei et al. [14] compiled a literature study on conventional and artificial intelligence-based models in energy consumption forecasting. 116 publications have been described with respect to purpose, temporal horizons, data properties, applied areas, pre-processing, and forecasting techniques. Additionally, forecasting accuracy is evaluated considering the MAPE. Table 1 shows which aspects have been covered by recent systematic reviews. It reveals that none of the existing reviews provides comprehensive coverage regarding all of the aspects analyzed in the article at hand. Table 1. Overview of recent systematic literature by content. In each line, black squares ( ) indicate topics covered in the given review. Most reviews cover several sectors or energy carriers and analyze model inputs and spatio-temporal features. Few reviews analyze model accuracies and only the present article covers all the aspects.

Methodology
The literature review follows a systematic procedure as recommended in [4,7,9]. The step-by-step procedure is shown in Figure 1. This review provides a comprehensive description and well-structured presentation of the content of recent international literature on energy demand modeling. Therefore, a systematic and replicable analysis of a high number of articles was conducted regarding the utilized techniques as well as associated input data, accuracy, and spatio-temporal resolution across different energy carriers and sectors. This comprehensive and concise literature classification serves as a decision-base for fellow researchers for the selection of appropriate data-model combinations for their projects. Direct and easy access to articles corresponding to a particular set of criteria is provided through structured tables in the Appendix A. Moreover, the advantages and drawbacks of common techniques as well as countermeasures against disadvantages are presented. This review constitutes an exploratory study examining and categorizing a broad and up-to-date literature base regarding an unprecedented number of properties using descriptive statistical methods. Challenges and future research directions are suggested and the compiled material provides a basis for future hypothesis-based quantitative testing.
The article is organized as follows: In Section 2, the systematic review process is described. In Section 3, a description and classification of techniques are given. In Section 4 the results of the literature analysis are presented, starting with sectors and energy carriers and followed by results on modeling techniques, input data, temporal and spatial characteristics as well as accuracy. In Section 5 most significant results are discussed and in Section 6 future research directions are suggested. The paper concludes with Section 7.
To aim for recent and relevant literature, the search was limited to articles published between 2015 and 2020 in journals related to energy, engineering, modeling, and simulation or computer science in English. The literature base for this review is the result of a replicable query to Web of Science Core Collection, a database for international journal publications and conference proceedings [15], on 1 May 2021. A search string was derived from a keyword matrix containing keywords from the thematic groups "energy", "demand" and "modeling". The search string and keyword matrix can be found in the Appendix A of this review (see Table A2). The search yielded 695 articles, which were then further scrutinized based on their title and abstract resulting in an exclusion of 276 articles due to non-matching topics or closed access despite institutional logins at the publishers' websites. The final literature collection contains 419 articles.
Articles are analyzed according to the properties listed in Table 2. Given the variety of entries for all the criteria, they have been grouped in the column "possible values" in Table 2. The spatial resolution is defined by the smallest energy-consuming entity, which is modeled in the respective articles. For the temporal horizon, various categorizations exist in the literature [4,16]. The chosen definition is inspired by Wei et al. [14]. The MAPE is defined as the average absolute discrepancy between the predicted value and the actual value, expressed as a percentage of the actual value [17]. It is a unitless performance measure and not dependent on the magnitude of the system, which makes it appropriate for comparing the performance of techniques applied in different contexts [18]. Therefore, it is a widely used accuracy measure in energy demand modeling [5]. For the techniques, a variety of classifications can be found in the literature. The following section provides a clear definition of categories of techniques used for energy demand modeling.
Regression techniques are used to solve an underlying regression problem [23], which consists of finding an approximation of a functional relation between numerical input and output variables. To find a solution for the approximation, different methods are employed, oftentimes minimizing the sum of the squares of errors [24]. For linear relations, this can be done by the ordinary least squares method. For non-linear relations, methods of steepest descent are used [24] or kernel functions [25]. Typical examples for statistical regression as found among the reviewed articles are linear, nonlinear, logistic, quantile, and ridge regression. Outside of statistical techniques, non-parametric regression can be found where ML techniques, such as ANN, kernel regression, or regression trees are employed to derive the functional form and regression parameters from the data [26,27].
TSA techniques derive their predictions from a historic time series, i.e., historic energy consumption data. In their core, many TSA approaches represent regression models since the predicted value is estimated based on one or more previous values [9]. This category includes univariate time series models such as autoregressive moving average (ARMA) models. Popular other techniques are autoregressive integrated moving average (ARIMA) models for non-stationary time series, seasonal autoregressive integrated moving average (SARIMA) models for seasonality, and ARMA models with exogenous variables (ARMAX) [5,28]. A typical multivariate TSA method is vector auto-regression as used in [29,30]. As suggested in [4], the category of TSA also includes exponential smoothing models and ARCH techniques.

Machine Learning Techniques
Techniques from this category find broad application in energy demand modeling and prediction and can be divided into supervised and unsupervised learning approaches.
Supervised learning approaches use labeled training datasets to derive a function describing a relation between inputs and outputs based on examples of input-output pairs [31]. They can be applied to numerical variables in the case of regression problems and categorical variables in the case of classification problems [32]. Within this sub-category, common techniques are ANN and instance-based algorithms, such as k-nearest neighbor and kernel machines [33]. A common example for the latter is SVM, which can convert nonlinear problems in low-dimensional space to linear problems in high-dimensional space [34]. Furthermore, in this category, there are decision trees, Bayesian algorithms, and ensemble learning approaches, such as gradient boosting machines [5].
Unsupervised learning approaches are often applied to clustering problems. These algorithms deduce structures in an unlabeled input dataset, e.g., through finding similarities [35].

Metaheuristic Techniques
Metaheuristic techniques oftentimes are used to solve optimization problems and can be incorporated into other techniques to improve performance [23]. The category includes evolutionary algorithms, which mimic mechanisms that are inspired by biological processes, such as reproduction, mutation, recombination, and selection [4]. It includes genetic algorithms, particle swarm optimization, bee colony optimization, firefly algorithms, and more [36]. In combination with ML approaches, they can be employed for parameter optimization in SVM [37] or weight optimization in ANN [22,38], resulting in approaches such as the firefly algorithm neural network [23]. Genetic algorithms are also used for feature selection in ML approaches [39,40].

Stochastic, Fuzzy and Grey Systems Theory Techniques
The techniques in this category are used to model different types of uncertainty. Zimmerman describes two basic forms of uncertainty: the traditional logic of probability describes randomness regarding the occurrence of an event, whereas fuzziness, describes the ambiguity of an event, i.e., to what extent an event occurs [41]. As Hájek et al. [42] point out, probability and fuzzy logic represent different sorts of uncertainty. Probability theory is used to describe stochastic processes [43] in which future states of a system are described by their past states plus a random change.
In energy demand modeling, stochastic processes such as Markov chains are used for forecasting and simulation of load profiles [20,[44][45][46]. Fuzzy logic is employed in the form of fuzzy time series [47], fuzzy regression models [5], fuzzy clustering [48], and adaptive neuro-fuzzy interference systems (ANFIS) [49]. Tien states that fuzzy uncertainty can be analyzed with the grey system theory [50] and Debnath and Mourshed classify them as uncertainty methods [4]. The grey theory was proposed by Deng in 1982 and was developed to estimate the behavior of an uncertain system given only a limited amount of data [51]. The fundamental grey model GM(1,1) relies on as few as four recent data points to forecast the future data point [50]. The model uses the least square method to obtain the parameters for the grey differential equations, which describe the change between time steps [51].

Engineering-Based Techniques
Engineering-based techniques follow a bottom-up approach and use a variety of external and internal parameters to describe an energy-consuming system in high detail [6,23]. Oftentimes, individual loads of end-use appliances are considered to obtain aggregate profiles [19]. Common examples are models on the level of individual dwellings [52] or industrial processes [53] as well as building simulations [54]. Engineering-based models have proven their worth in planning and design of technical systems, being able to simulate a system's behavior under conditions, for which there has been little historic data recorded yet [21]. Furthermore, they are promising approaches regarding the inclusion of the effects of household composition and individual behavior of dwellers as well as testing demand-side management strategies [20]. While techniques from this category are widely deployed in practice, they have been less visible in scientific articles on energy demand modeling and rarely been included in literature reviews.
Geographic information systems (GIS) are used for referencing data to geographic shapes, which have a distinct location and orientation relative to a reference coordinate system. Most of the time they are used for visualization and mapping of energy consumption as well as for urban [55] and rural [56] planning of infrastructure and building simulation [54,57].

Results
A total number of 419 articles originating from 54 different countries was reviewed. The country with the highest output of articles during the last years is China (98) followed by the USA (40) and Turkey (31). The analysis of the publication dates shows a slight increase of articles over the last six years from 66 in 2015 to 77 in 2020. Figure 2 shows the energy carriers and economic sectors on which the articles were focused. It reveals that in most articles the consumer group is not limited to a single sector but rather comprises all sectors together. This is the case when the energy consumption of an entire (market-) region is modeled. However, targeted modeling of residential and commercial energy consumption is also common, while industrial consumers are analyzed rather rarely. A possible interpretation could be the lower availability of publicly available consumption data for the industrial sector compared to the other sectors. Moreover, industrial energy consumption tends to be less dependent on exogenous influencing factors (e.g., weather) and rather exhibits production-related temporal patterns that are difficult to predict without knowledge of internal processes, which can be company-specific and proprietary knowledge. In most articles, the energy consumption of all sectors is modeled, e.g., of an entire region. Electricity consumption is modeled the most. In the residential and commercial sectors, a significant number of articles focus on heating and cooling demand in buildings. Figure 2 also shows that most articles focus on electricity demand. However, particularly within the residential and commercial sectors, various articles are modeling the consumption of thermal energy, i.e., demand for heating and cooling. This could be partially explained by the fact, that there is a significant number of articles focusing on energy in buildings in the residential and commercial sectors, where thermal energy accounts for the largest share of the energy consumed. A separate analysis of all technical systems showed that most articles focus on power grids, smart grids, and buildings (see Figure A3). Table A3 in Appendix A provides a structured reference list of the analyzed articles by energy carrier and sector. Fellow researchers can use this overview to find articles, which have similar objects of research with regard to the energy carriers and the sector.

Techniques and Input Data
A variety of techniques is employed in the field of energy demand modeling. Figure 3 shows the number of appearances (n) of each of the five major categories of techniques (see Section 3) across all articles as well as the proportion of energy carriers whose consumption was modeled. The relative share of energy carriers within each category of techniques. The relative share of appearances of each energy carrier within each of the five major categories of techniques (see Section 3) is shown along with the total number of appearances (n) of each category. Approaches like ML and statistical techniques are used the most and are largely applied to model electricity consumption. These methods mainly rely on historic consumption data, which is particularly well available for electricity consumption. Engineering-based approaches are used less frequently overall but are suited to model heat/cooling demand, especially in the context of building simulation.
It is revealed that ML and statistical techniques are employed in most of the reviewed articles. A possible reason could be that ML and statistical techniques, like TSA, can be applied to a variety of use cases with relatively little effort in terms of model configuration and data preparation. They mainly require historic load data as input, which can be complemented by a limited set of external parameters, such as calendar or weather information [51,[58][59][60]. Conventional regression techniques, like multiple linear regression, are commonly used among the articles as well and perform as benchmarks for other approaches [61][62][63].
Stochastic, fuzzy and grey techniques can be implemented as stand-alone models [64], e.g., simulating load profiles by sampling from stochastic processes such as Gaussian processes or Markov chains [52,65,66]. However, the representation of uncertain outcomes through stochastic, fuzzy or grey expressions is often combined with other techniques such as statistical regression [67,68] or ANN [69,70], giving results in the form of membership functions, intervals or probabilistic density.
Metaheuristic techniques are often used as part of hybrid techniques. Typical application is the optimization of model parameters [34,71] or feature selection [39,49] in ML models. Another example is the application of genetic algorithms in order to create optimized models as the result of an evolutionary process in which model a configuration is refined over multiple generations [72,73].
In the case of engineering-based techniques, energy demand is derived from the specifications of the system and its technical details [53,74,75]. Hence, an accurate simulation with an engineering-based model can require a high amount of data and effort, which might reduce the widespread use of such approaches [76]. However, engineering-based techniques are used to a greater extend to model heating and cooling demands, especially in the context of building simulation.
GIS techniques are used to reference data to geo-spatial shapes and visualization in the form of geographical maps. This has been used in the context of building simulation, where geometry and orientation was considered [54,57,77,78] or planning of rural [56] or urban [79,80] grid infrastructure. In other cases regional loads and trends in spatial energy consumption were analyzed [21,[81][82][83] or modelled using socioeconomic data [84][85][86]. Figure 4 provides more details on the ML techniques that are used in the articles. ANNs [87,88], by far, show the highest number of appearances, followed by instance-based algorithms such as SVMs [89,90]. Clustering algorithms such as the k-means algorithm are used frequently to split datasets into groups of maximum similarity. This can be applied to the formation of consumer groups but also to finding similar days in historic load datasets [91][92][93]. Ensemble learning algorithms combine multiple ML techniques that are individually trained in order to obtain an improved overall performance. This can be achieved by bootstrap aggregating (bagging) [94,95] and boosting [96,97]. Bayesian algorithms and decision trees use supervised learning algorithms and therefore are typically used for regression and classification problems, e.g., Bayesian networks [98,99] or classification and regression trees [100,101]. By combining multiple decision trees through bagging, where each tree is trained on a different sub-sample of the dataset, random forest ensembles are created [94,102,103]. Overall, in 210 out of 419 articles a combined approach was employed. Figure 5 shows the combinations among the five main categories. The self-arcs show the number of times the respective technique was used as a stand-alone approach or was combined with a technique from the same category, which is the case for 63 articles within the category of ML, 17 articles with statistical techniques, three articles with metaheuristic techniques and respectively four articles with stochastic/fuzzy/grey and engineering-based techniques. Figure 5. Combination of techniques. An arc connects two categories whenever in an article a combination of techniques from the two categories was used. Self-arcs indicate that a technique was used as a stand-alone approach or was combined with a technique from the same category. The size of the self-arc/arc at its start and endpoint represents the share of stand-alone/combined techniques relative to the total number of articles.
The analysis shows that among the analyzed articles, engineering-based techniques have the highest proportion of stand-alone models and metaheuristic techniques have the highest proportion of hybrid models. ML techniques often form hybrid techniques with themselves as well as with statistical and stochastic/fuzzy/grey techniques.
A typical example for a combined approach is the employment of techniques for clustering or frequency analysis for upstream data preparation. These techniques refine input data, e.g., by signal decomposition through Fourier or wavelet transformation, before the data is fed into a downstream model implemented by TSA [95,104,105] or ML techniques [106][107][108][109][110][111][112], where each of the decomposed signals is predicted separately. Another example is the integration of fuzzy mathematics into ANNs resulting in ANFIS [59,[113][114][115] or the incorporation of metaheuristic optimization algorithms into the training stage of an ANN [116,117] or SVM [34,37,71].
In other cases, an overall prediction will be given as a weighted average of the results of multiple models, which can stem from different categories. The calculation of the weights can be subject to various (metaheuristic) optimization techniques, allowing TSA, regression or ML techniques to be combined into one approach [59,113,[118][119][120]. In the case of engineering-based techniques, there are examples of combining simulation results from stochastic processes with bottom-up models, which are predominantly used for predicting energy demand in households [52,121,122].
Another aspect of the reviewed articles concerns the respective datasets that serve as inputs for the models. Table 3 gives an overview of examples for different model inputs, reflecting the variety of input data that is used in the field. Economic indicators: gross domestic product (GDP), gross national income (GNI), level of production, income, import and export level of a region; demographic indicators: human development indices, population, number of dwellers/buildings/residences, age, sex, education, infant mortality Technical system data Appliance data: equipment installed, number of appliances, efficiency, material properties, air change ratio, flow rate, outlet/inlet temperatures, rated power of the equipment, impedance Building data: floor space, number of bedrooms, transmission factor, building type, age of the building, efficiency rating, geometry of the building, the status of refurbishment, window area, building material, indoor temperature, indoor humidity Usage and behavioral data Time-use survey data, building usage (main residency, rented, owned, etc.), occupancy/activity patterns, operation/usage time of a device Energy prices Electricity and gas prices, tariffs, payment methods Figure 6 gives an overview of the frequency of usage of different types of input data as well as an indication of whether they are used in combination with others ("Multiple demand drivers (DD)") or as a "single DD". It is revealed that historic energy demand is used in 85% of the articles, which highlights its significance as a data source. As historic demand can be forecasted by trend extrapolation and pattern recognition, e.g., in the case of ARMA models, it constitutes a stand-alone data source. Weather data is used in 43% of the articles. Its frequent use could be explained by the fact that operational patterns of some end-user devices are correlated with weather phenomena, e.g., heating, cooling, and lighting systems. Calendar information is also widely used (in 29% of the articles) since energy consumption can exhibit daily, weekly or annual patterns. For example, in most cases, the metering data of a company will reflect working hours.
Usage and behavioral data can be used to describe the relationship between energy consumption and user behavior, i.e., a parameter indicating that a technical device is now in use. Regional demographic and economic data describe properties of a region, such as household income or economic output, which can serve as predictors for the energy consumption of this region. The relatively rare use of price data might seem surprising, but can be explained by the international literature base largely covering countries with regulated or only recently liberalised energy markets, such as China, where price signals have not been transmitted to end consumers in the past [123]. Furthermore, energy demand has shown to be price inelastic both in the short and long-term [124], and therefore energy prices are not be considered as dominant drivers of demand.
When taking a closer look at the usage of input data across the techniques (Figure 7), it becomes apparent that among the examined articles engineering-based techniques do not rely on historic energy demand data as much as the other techniques. Here, external parameters such as information on the technical system or weather data are essential, whereas historic energy demand is rather used for calibration and validation purposes. Tables 4-8 present the results of the analysis on the techniques and inputs used in the examined articles. They allow the reader to directly identify the compatibility of the data spectrum and methodology. The first two columns contain the techniques and associated advantages and disadvantages. The other columns refer to the input data types. In each cell a short assessment of the respective data-technique combination in the context of energy demand modeling is given in the following categories: • "Contributions" refers to the number of relevant articles. • "Impact" describes the importance (high, medium, low) of the data type for the respective technique considering different use-cases. • "Drawbacks" refers to the weaknesses and limitations of the data-technique combination.
"Outlook" gives a qualitative assessment of the applicability in future research based on the number of contributions and the rating on impact and drawbacks, revealing possible research fields with high or limited potential for application.
All articles relevant for each cell, are documented in detail in the structured reference lists in Tables A4-A7 in Appendix A, which enables the reader to find examples of datamodel combinations in recent literature. Table 4. Techniques and input data used (1/5). Compiled results of the analysis on techniques and input data. Each cell provides a short assessment of the respective data-technique combination. The corresponding articles for each cell are documented in detail in the structured reference lists in Tables A4-A7 in Appendix A.

Spatiotemporal Level of Detail
The spatial and temporal properties represent the level of detail applied within the articles and are common decision criteria for model selection since the level of detail of available input data substantially influences the resolution and scale of the output of a model. Figure 8 summarizes the core findings with regard to the applied techniques among the analyzed articles. The analysis of the temporal horizon (Figure 8, left) reveals an overall higher number of occurrences for short-(≤one day) and long-term (≥one year) projections compared to medium-term. Among the articles, ML techniques are applied more frequently for short temporal horizons while engineering-based and uncertainty techniques are used more often for longer time frames. This is in line with a common view within the scientific literature, whereby engineering-based energy demand models are described as simulation approaches, which are suitable for modeling longer time spans in a realistic and reliable manner [76]. For the other approaches, there seems to be no particular tendency.
The analysis of the temporal resolution (Figure 8, middle) reveals an overall tendency towards hourly time steps or shorter. This could be because hourly time steps represent a reasonable compromise between the level of detail, availability, and data quantity for most projects since an hourly resolution allows to represent most human-driven impacts on consumption, such as daily routines, while the amount of data is still manageable. Additionally, a lot of weather and consumption data is tracked at least in hourly time steps. However, there are differences among the energy carriers: for example, there was no article that modeled natural gas consumption with shorter than hourly time steps. The difference in temporal resolution for electricity and gas most likely reflects system operation requirements and the respective metering infrastructure [125]. Figure 8 also shows that for time steps longer than one day, there is a decrease of ML approaches while statistical approaches are used more frequently.
The investigation on the spatial resolution (Figure 8, right) shows an overall tendency among the analyzed literature of modeling energy consumption on the level of single households and buildings or regional level, e.g., on a district-scale. Comparatively few approaches focus on the level of appliances or national consumption. Engineering-based techniques stand out in having a clear tendency towards the building and appliance level, reflecting the high level of detail which is characteristic of this technique. Scaling an engineering-based model to the national level requires high amounts of detailed technical system data and has only been done in a few cases [126]. In contrast, statistical techniques are used most frequently on large geographic scales, reflecting their traditional role in econometric analyses. Tables A8 and A9 in Appendix A provide a structured list of references by techniques and levels of detail. The tables allow the reader to identify recent articles that share a similar combination of these properties, which enables fellow researchers to quickly find matching articles for their projects.

Prediction Accuracy
Prediction accuracy has a direct relationship with decision quality [113]. Therefore, the pursuit of performance enhancement and higher levels of accuracy is one of the driving factors of the development of new techniques and combinations among them. Given its importance, researchers might consider the forecasting accuracy as a criterion for model selection. According to Hong and Fan, the most used performance measure in the electric power industry is MAPE, due to its simplicity and transparency [5]. Lewis' benchmark [127], which has been mentioned by several authors [18,128], suggests, that a MAPE value of 10 % or lower indicates high prediction accuracy.
In several literature reviews MAPE values of different techniques are compared [4,14,23]. Debnath and Mourshed suggest, that ML and hybrid approaches tend to perform more accurate compared to other techniques [4] and Wei et al. found that MAPE values of long-term projections tend to be better than for short-term projections [14]. However, other authors are reluctant to give clear recommendations, stating that the different choices of performance measures make it hard to categorize the methods from best to worst [4] and that the suitability of models finally depend on the dataset [23].
MAPE was the most frequently used accuracy measure among articles and was provided in 217 of 419 articles. Other measures like root mean square error (RMSE), normalized RMSE (nRMSE), mean absolute error (MAE), and coefficient of determination (R 2 ) were encountered in several cases as well. Figure 9 shows the histogram of the MAPE values over-collected from all articles. The shape of the distribution as well as the values for skew (3.99) and kurtosis (23.07) give a strong indication that the values are not normally distributed. The same assumption can be derived from the histograms of the grouped MAPE values, e.g., by technique since they show similar characteristics.  Figure 10 shows that a direct comparison does not reveal a universal higher level of accuracy for any technique used among the articles. While engineering-based methods have a slightly higher median (green line), the means (red diamonds) are almost equal among all techniques. This could be due to the great variety of available methods within each category, allowing users to find techniques that are tailored to their use-cases. Furthermore, for every technique, particular measures and sub-routines have been developed to counter drawbacks and increase accuracy (see Section 4.5). An analysis of the MAPE values of techniques grouped by hybrid and stand-alone approaches yielded a similar result. Figure 10. Boxplot of MAPE values by categories of techniques. Box represents the interquartile range (IQR). Whiskers show a range of data beyond the 1st and 3rd quartile and extend until 1.5*IQR on each side, ending at maximum and minimum data points within that interval. Outliers are not shown. The green line represents the median. The red diamond represents mean. Among the analyzed articles, accuracy measured by MAPE does not seem to depend necessarily on the chosen technique. Figure 11 gives an indication, that different levels of spatial resolution could influence the accuracy of prediction. A higher level of detail seems to result in higher MAPE values, especially for articles in which loads of individual appliances were predicted showed. A possible interpretation could be, that aggregated loads on the level of countries or regions are easier to predict since they are smoother and more likely to show seasonal or trend-related patterns compared to loads of individual appliances. Disaggregated loads depend on the behavior of individual users having a higher degree of randomness, which naturally produces lower degrees of accuracy in their prediction. Similar analyses for articles grouped by the temporal horizon and the temporal resolution were conducted but did not show interpretable results (see Figures A1 and A2 in the Appendix A). Figure 11. Boxplot of MAPE values by spatial levels of detail. The same mode of display as in Figure 10. This shows that a higher level of detail results in lower accuracy. This is because aggregated loads on the country and regional level are smoother and have stronger temporal patterns compared to loads of individual appliances or households, which are subject to behavioral patterns and a higher degree of randomness.

Measures for Improvement of Accuracy
Techniques have become more flexible over the years to be adapted to the specific contexts and datasets in which they are used and sub-routines have been developed to counter drawbacks and improve predictive performance.
For ML techniques, the following measures to improve predictive performance have been found in the analyzed articles. To reduce overfitting, ensemble learning was employed to create independent predictions of multiple models and to use weighted averaged results [59,[95][96][97]99,113,115,[129][130][131][132][133][134]. Other measures against overfitting include the usage of incremental learning and dynamic neural networks, where the models are updated step by step during training phase [88,106,131,135] or restrictions on coefficients are implemented [136] as well as the introduction of dropout layers [137]. The adjustments of coefficients of the predicting variables in order to capture the essential properties of the training data and provide better generalization to yet unknown data points, is an important and widespread concept to avoid overfitting known as regularization. The idea is to use a regression technique to shrink or regularize the estimated coefficients, effectively discouraging the learning of a complex model and hence reducing risk of overfitting. The most common procedures are the least absolute shrinkage and selection operator (LASSO) [138][139][140][141][142] and the ridge regression [96,102,143,144]. Bayesian regularization was used in [145,146].
Another popular measure is the pre-processing of data in order to eliminate outliers and noise, as well as isolate seasonal [113] or temperature related [174] patterns. For TSA and ML techniques this has been done by using wavelet or Fourier transformation in [39,58,90,108,111,120,[175][176][177][178][179][180]. Particular attention was payed to the prediction of special events and holidays in [118,[181][182][183].
Sometimes model performance suffers from the lack of data. In the case of ML approaches this can be compensated by the creation of virtual data through densification or latent information functions [184,185]. In engineering-based techniques insufficient data can be tackled by prioritization as well as the right choice of representative samples as done in [19,54,[186][187][188].
ML approaches sometimes suffer from ending up in local shallow minima when optimizing parameters. The solutions can involve alternative (metaheuristic) optimization routines during training stage, such as the artificial bee colony algorithm for ANNs [116,189], the Cuckoo search [71] or the wolf pack algorithm [163] for SVMs.

Summary of Results
The following chapter provides a brief summary of advantages and disadvantages as well as countermeasures to cope with the drawbacks by category of techniques derived from the analyzed literature. Table 9 contains the most important elements. ML techniques are used the most across the articles and showed an increase in usage over the last years. ML has the advantages of being able to handle nonlinear relations and achieve high levels of accuracy with quite low implementation effort [22]. Drawbacks lie in the black-box character [190], the tendency of overfitting and getting stuck at shallow local minima [116,191]. Countermeasures are the use of regularization procedures, the formation of model ensembles as well as feature selection and data pre-processing by decomposition. ML techniques dominate across all temporal and spatial levels, however with a slight tendency towards smaller time steps, horizons and scales.
Statistical modelling techniques have a long history in econometrics and are common in energy demand modelling, too. They are the second most used technique. Multiple linear regression is fast and simple to use and capable of explaining the relationship between independent and dependent variables. However, when independent variables are correlated, these models face difficulties [148]. One way of counteracting is by refining the variable selection process and reducing complexity with the help of PCA or coefficient shrinking with LASSO. TSA techniques are easy to use and efficient in modelling overall trends and seasonal patterns. Limits occur when it comes to forecasting extreme events or outliers. Countermeasures can be found in the transformation and decomposition of data [104].
Techniques for stochastic, fuzzy and grey systems modelling address different types of uncertainty. As discussed by Hong and Fan [5], having elements of uncertainty included in model outputs can be considered to be unsatisfying for decision makers in management positions who expect single point values. However, this branch of energy demand modelling can still be considered as a recent development, which will likely see an increase in popularity over the next years. Fuzzy and grey approaches are able to deal with incomplete or inaccurate data [69,192]. Stochastic simulations might run into long computing times, which can be countered by variable elimination algorithms [98].
Meta-heuristic approaches are mainly used as part of combined models, e.g., by introducing genetic feature selection algorithms [40,112] or by optimizing model parameters through an evolutionary process [72,73,189]. However, the integration of a metaheuristic algorithm into another technique requires additional effort and can have low convergence rates that has to be justified by improved results [193].
Engineering-based techniques derive energy demand from a bottom-up representation, which involves a detailed representation of input-output-relationships based on the laws of physics. This level of detail represents a fundamental difference to other techniques. Engineering-based approaches are commonly used in the context of building simulation [57,194]. However, the requirement of large amounts of parameters as well as the accurate representation of input-output relationships make the initial set-up of such models rather laborious. Once created, however, these models have the potential to predict different scenarios on a long temporal horizon, which makes them particularly relevant in the context of system planning. Furthermore, the forecasting based on historic data, as done by ML and TSA techniques, is unable to depict structural disruptions, such as the consequences of political interventions, technological breakthroughs or a pandemic.

Discussion
Compared to existing literature reviews on energy demand modelling presented in Section 1 ( [4,[9][10][11][12][13][14]) and Table A1, this study covers all sectors, energy carriers, categories of techniques, input data types, spatio-temporal characteristics, accuracy as well as advantages, disadvantages and typical countermeasures. A recent publication in this journal by Mosavi et al. [195] focuses on the application of ML models in the energy system comprising separate analyses of multiple studies including accuracy values on different scales. In comparison, the study at hand presents a more detailed analysis of the demand sector, including a wider range of techniques and a more extensive and structured literature base, ensuring comparability of the properties of different articles.
Most articles focus on electricity consumption, which is not surprising since it is the most valuable and expensive form of energy. Moreover, due to service level requirements of grid infrastructure and expansion of smart meters, vast amounts of detailed and high-quality data is available in this sector [5]. However, the authors expect an increasing number of smart metering devices also in the gas and heat grids, which will improve data availability in the future and facilitate integrated energy system planning and modelling [196,197].
Studies with a focus on buildings represent a significant overall share among analyzed articles and are particularly relevant in the context of modelling heating and cooling demand in the residential and commercial sector. Compared to relevant analyses on the building sector assessing efficiency of wood-based constructions [198], their energy consumption during use [199] or the potential of zero energy buildings [200,201], the review at hand covers all energy demand sectors up to a national scale.
The industry sector is underrepresented in the available articles. Considering the intense efforts regarding efficiency targets and demand side management potentials, there is a large interest in modelling the industry sector including the adoption of new technologies, as explained by Fleiter et al. [76]. The small number of articles can be explained by a lack of publicly available data as well as the reluctance to publish this kind of research; energy consumption of a company is often considered sensitive data since it implicitly contains information about production activity and efficiency [125]. The results of the review at hand indicate that the challenges pointed out by Fleiter et al. regarding data availability and transparency in the industrial sector persist [76].
Accuracy of prediction is one of the most important factors in decision making, not only to enable the right choice of models but also to allow stakeholders to understand the performance of the employed method. Unlike the findings of other authors ( [4,14]), a tendency of higher accuracy for ML and hybrid techniques or for longer temporal horizons cannot be confirmed for the analyzed articles. It was shown that among the examined literature models with higher spatial detail have a tendency towards lower accuracy. This shows that comparing MAPE values of techniques irrespective of the individual context in which they are applied is of limited explanatory value. In order to robustly compare and discuss accuracy of different techniques they have to be applied to the same datasets, as done in forecasting competitions, such as GEFCom [27,139,[202][203][204].

Challenges and Future Research Directions
In the context of the energy transition, the focus of planning and decision-making processes is expanding across infrastructures and sectors. Accordingly, the level of detail and complexity of energy systems and energy demand models is increasing. Looking at the results of the analysis, the research focus is on sector-unspecific electricity demand, i.e., without a focus on a particular consumer group. Given the importance of carbon-free energy carriers, a continuous high output of articles on electricity demand is expected but should be backed up by intensifying research on modeling heat and hydrogen demand. Compared to the residential and commercial sectors, the industrial sector has been less frequent in the focus. Given the significant decarbonization challenges this sector faces, intense research efforts are needed and should lead to an increasing number of publicly available studies.
ML-techniques, more precisely ANN, and statistical approaches are the predominant methods and historic energy demand, weather data, and calendar information are the most frequently applied inputs. Based on the analysis of data-technique combinations (Tables 4-8), continuous intensive use of data-driven ML-techniques can be expected, especially in ensembles and combined models integrating key strengths of other models such as Fuzzy expressions or metaheuristic optimization algorithms, using historic energy demand and publicly available input data. At the same time, given the high explanatory value of the technical system and appliance usage data, the authors see a high potential for an intensifying application of these inputs. This development is supported by the expansion of the sensor and metering sector, which will further increase the availability and quality of data, especially on the level of buildings and appliances. In addition, this level of detail is needed in order to accurately model demand flexibility options and their technical and economic constraints in the different sectors. This also enables a broader application of detailed engineering-based models, which are particularly suited for representing the input-output relations.
Future work should make use of the knowledge base provided by this literature review, inspiring hypothesis-driven analyses and quantitative testing, focusing on the applicability and dominance of specific data-method combinations for energy demand modeling.

Conclusions
A comprehensive and up-to-date systematic literature review about energy demand modeling regarding techniques, data, accuracy, energy carriers, sectors, and spatio-temporal level of detail was presented. 419 articles published between 2015 and 2020 were reviewed. References are structured by property and compiled in tables for easy access.
The analysis has shown that energy demand modeling is a research field with continuously high numbers of yearly publications. The analysis of the articles proved the current trend of increasing popularity for ML approaches. Statistical models such as regression and TSA are well established, whereas stochastic/fuzzy/grey and metaheuristic techniques often are used as part of combined approaches. Engineering-based techniques stand out as they provide a more detailed representation of the physical properties of the energyconsuming system resulting in higher external and internal data requirements. A research gap was identified regarding models for industrial energy demand.
The level of accuracy proves to be a difficult criterion for a ranking of techniques since attainable performance depends on the particular context. Among the articles, a higher level of detail, e.g., forecasting on the level of appliances, produced lower levels of accuracy compared to forecasts of aggregated loads on country levels.
The material presented here shows trends with regard to the prevailing combinations of methods and data and allows these trends to be tested on the basis of quantitative methods in the future. In particular, this review provides the basis for further analyses and quantitative testing of hypotheses regarding the applicability and dominance of specific methods for sub-categories of demand modeling. Funding: This study was carried out as part of the research projects "ENavi" funded by the German Federal Ministry of Education and Research (BMBF) (funding number: 03SFK4T0) and "DemandRegio" funded by the German Federal Ministry of Economics and Energy (BMWi) (funding number: 03ET4040C).

Data Availability Statement: Data is contained within the article and Appendix A.
Acknowledgments: A big thank you goes to Sarah Schöngart from the department of energy and resource management at TU Berlin for her valuable contributions.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Table A1. Characteristics of existing literature reviews. Overview of existing literature reviews by content. In each line, black squares ( ) indicate topics covered in the given review. Only seven literature reviews used a systematic approach. Most reviews cover more than one sector or energy carrier and are concerned with analyzing model inputs (demand drivers). Few reviews show the number of articles reviewed. Only the present article covers all aspects.