A Review on Time Series Aggregation Methods for Energy System Models

Due to the high degree of intermittency of renewable energy sources (RES) and the growing interdependences amongst formerly separated energy pathways, the modeling of adequate energy systems is crucial to evaluate existing energy systems and to forecast viable future ones. However, this corresponds to the rising complexity of energy system models (ESMs) and often results in computationally intractable programs. To overcome this problem, time series aggregation (TSA) is frequently used to reduce ESM complexity. As these methods aim at the reduction of input data and preserving the main information about the time series, but are not based on mathematically equivalent transformations, the performance of each method depends on the justifiability of its assumptions. This review systematically categorizes the TSA methods applied in 130 different publications to highlight the underlying assumptions and to evaluate the impact of these on the respective case studies. Moreover, the review analyzes current trends in TSA and formulates subjects for future research. This analysis reveals that the future of TSA is clearly feature-based including clustering and other machine learning techniques which are capable of dealing with the growing amount of input data for ESMs. Further, a growing number of publications focus on bounding the TSA induced error of the ESM optimization result. Thus, this study can be used as both an introduction to the topic and for revealing remaining research gaps.


Drivers of Model Complexity
Due to the climate change caused by anthropogenic CO 2 emissions resulting from the burning of fossil fuels, a major turnaround in the fields of energy supply and consumption is an increasing necessity. Key aspects of addressing this challenge are the integration of renewable energy sources (RES) into existing energy systems, as well as a closer coupling of energy forms and sectors [1].
The evolution of the energy sector has been accompanied by a consistent effort to model and predict its development. Early attempts to forecast future energy demands can be traced back to the 1950s and constitute simple, assumption-based scenarios [2]. Another theoretical foundation for modern energy system models (ESMs) is the principle of peak-load-pricing first described by Boiteux in 1949 [3] (English translation in 1960 [4]) and Steiner in 1957 [5]. This approach distinguishes between capacity and the operating costs of facilities producing non-storable goods. Thus, it applies to

•
The need for security of supply to be provided for the growing demand by governmental quasi-monopolistic institutions throughout the 1970s, as well as the building of a reliable connection between a more competitive energy sector and later public interest [7,8].

•
The progress of computational resources that enabled more complex models [9].

•
The integration of non-dispatchable technologies, such as most of the RES and their impact on energy pricing; an effect that has constantly gained importance and which was first described in 1982 [10].
Ever since the first ESMs, based on optimizations rather than just simulations, were developed during the 1970s and 1980s [9,11], two major options arose for their developers, namely whether to focus on economic mechanisms, sometimes described as a top-down approach, or on the technical dimension, usually known as a bottom-up one [12,13]. Amongst the bottom-up frameworks, this review focuses on a vast number of approaches to modeling the different dimensions of energy systems, including methodologies such as optimizations and simulations [9,12,13]. With respect to the scope of ESMs, two fundamental dimensions can be delineated: namely spatiotemporal and techno-economic. The spatiotemporal dimension comprises the setting of input data that a model is intended to incorporate. The spatial sub-dimension is focused on the number of regions and their connections to each other, as energy systems on a national or even larger scale usually face the challenge of taking energy transmission between different regions into account. The temporal sub-dimension is divided into two aspects, namely temporal resolution (TR), often referred to as time steps, and the overall time horizon [9], which also concerns questions of storage modeling [14][15][16][17][18][19][20] as well as the linking of dynamic processes [21][22][23] and investment dynamics [24,25]. In contrast, the techno-economic dimension deals with how the components are represented in the model and whether their design and/or operation are optimized or, if their operational behavior is simply simulated, how they are mathematically represented and if the impact of supply and demand on energy prices is dynamically modeled or not [13]. Each of the dimensions listed above drives the overall complexity of ESMs, while the spatiotemporal resolution also affects the techno-economical dimension directly, e.g., the TR also limits the (technical) operational exactness of components in the energy system. Figure 1 illustrates the classification into top-down and bottom-up models [13,21,[26][27][28], top-down model types [29,30] and bottom-up model dimensions [13].

Motivation and Scope of the Review
Although Moore's law has held true for approximately 40 years [31,32] and there have been significant advances made in the branch-and-bound algorithms used for solving big mixed integer linear programs (MILPs) such as those used in energy system optimization models [33], a decelerating increase of transistor density could be observed in recent years [34]. On the other hand, liberalization, decentralization, and an increasing volatility in energy generation [35] are leading to more complex applications for ESMs. Therefore, the recent number of publications dealing with aggregation methods in ESMs illustrates the fact that many application cases are too complex to be overcome solely by computational power and mathematically equivalent transformations.
As mentioned above, the temporal sub-dimension in ESMs is crucial for the implementation of storages and the description of system dynamics, which is especially important for ESMs considering a high share of intermittent RES [36][37][38][39][40][41]. This applies for both single-node and multi-node ESMs, and the group of aggregation methods employed to tackle this issue is broad and diverse. Hence, this review addresses the issue of systematically categorizing the methods and their assumptions, as well as recent trends and the general shortcomings in the development of TSA methods. Furthermore, this work is intended to facilitate the development of new methods by combining existing methods and considering the shortcomings present. This endeavor differs from the scope of recent publications focusing on the general purpose of TSA [42]. The temporal sub-dimension that the aggregation methods presented in the following address is highlighted in Figure 1. As the input time series for constrained bottom-up ESM are often not only auto-correlated, i.e., to some extent periodic, but also cross-correlated, an aggregation based on time series can be applied in multiple ways. This review focuses only on the aggregation of time series based on their auto-correlation, i.e., the reduction of the number of time steps, e.g., by representing a whole year of data by a small number of typical days. This is represented by the rotating arrow in Figure 1 and will be defined as TSA in the narrow sense. In contrast, a reduction of the number of regions, technologies, or customer profiles is based on the mutual similarity of the same attributes at different locations or different attributes at the same location. Therefore, spatial or technological aggregation approaches are reducing the number of time series, but not the number of time steps. As mentioned above, the temporal sub-dimension in ESMs is crucial for the implementation of storages and the description of system dynamics, which is especially important for ESMs considering a high share of intermittent RES [36][37][38][39][40][41]. This applies for both single-node and multi-node ESMs, and the group of aggregation methods employed to tackle this issue is broad and diverse. Hence, this review addresses the issue of systematically categorizing the methods and their assumptions, as well as recent trends and the general shortcomings in the development of TSA methods. Furthermore, this work is intended to facilitate the development of new methods by combining existing methods and considering the shortcomings present. This endeavor differs from the scope of recent publications focusing on the general purpose of TSA [42]. The temporal sub-dimension that the aggregation methods presented in the following address is highlighted in Figure 1. As the input time series for constrained bottom-up ESM are often not only auto-correlated, i.e., to some extent periodic, but also cross-correlated, an aggregation based on time series can be applied in multiple ways. This review focuses only on the aggregation of time series based on their auto-correlation, i.e., the reduction of the number of time steps, e.g., by representing a whole year of data by a small number of typical days. This is represented by the rotating arrow in Figure 1 and will be defined as TSA in the narrow sense. In contrast, a reduction of the number of regions, technologies, or customer profiles is based on the mutual similarity of the same attributes at different locations or different attributes at the same location. Therefore, spatial or technological aggregation approaches are reducing the number of time series, but not the number of time steps.

Methodology and Structure of the Review
As highlighted above, this review focuses on the TSA methods in bottom-up energy system optimization models that include generation expansion planning (GEP), as well as unit commitment (UC) and have constantly emerged and evolved since the late 1970s and 1980s [9,11]. Among the early model frameworks, one group focuses on long-term system planning and has usually only one time step per year such as LEAP [43], EFOM [44], and BESOM [45], which are not subject to aggregation techniques and thus neglected in the following. The temporal dimension of the other major group of early bottom-up ESMs such as TIMES [46][47][48][49] and its predecessors MARKAL [50], MESSAGE [12,51], Energies 2020, 13, 641 4 of 61 IKARUS [9], and PERSEUS [52] are based on time slice formulations (in the case of PERSEUS called "time slots"), which are explained in more detail in Section 3.2.1. Although the long-term planning models with only one time step per year were consecutively combined with models with a higher temporal resolution, as is the case for TIMES [46][47][48][49] as a combination of MARKAL [50] and EFOM [44], the time slice approach, which is based on the modeler's experience only, remained unchanged for decades. With the first approaches to classify and group demand curves using unsupervised learning techniques, which the authors traced back to 1999 [53], new techniques for defining the temporal dimension of ESMs arose. To the best of the authors' knowledge, this was first done manually in 2008 [54] and by using a standard clustering algorithm in 2011 [55]. In order to investigate the rapid and manifold development of complex TSA methods based on feature-based grouping in detail, the start year for the literature review was set to 1999 and the literature research was stopped in July 2019. To avoid a bias towards the new methods based on unsupervised learning techniques, publications within the relevant time interval, which are based on long-existing and constantly evolving frameworks such as TIMES, are also considered. Thus, the research objective is narrowly defined and can be exhaustively examined.

Methodology of the Literature Research
With respect to a systematic and keyword-based search for TSA methods, the major challenge was the inconsistent naming of the applied methods. Furthermore, the majority of publications did not explicitly address the comparison of the different aggregation methods. Instead, the TSA methods were often simply applied. Therefore, terms such as TSA, TD, complexity reduction, or clustering, which are crucial for identifying TSA methods, only appear in a minority of publications as keywords. Moreover, a number of terms was found to be inconsistently or redundantly used by different research communities. Examples for this are the terms "representative days" and "typical days". Therefore, a heuristic approach was used as starting point that focused on a search for methods based on citations of earlier works. If no earlier work was cited dealing with TSA, the search was halted. Simultaneously, terms that appeared in multiple publications were considered to be keywords and, to overcome the problem of co-citation clusters [56] with own terms, these newly defined keywords were used for an additional search on the internet. The keywords used for the literature research that arose from this analysis are listed in Table 1 along with their definitions and terms that are synonymously used in the literature.
Building upon the analyzed literature and the basic features of a TSA process introduced by Kotzur et al. [57] and Schütz et al. [58], the table of methods in Appendix A was derived for categorizing and comparing the different methods. Moreover, the methods were also investigated on the basis of their capacity to link all time steps across the original time horizon, which enables seasonal storage, and their premise to approximate the duration curve or the unsorted time series. This ultimately leads to the structure of the following sections. Grouping, (Clustering) "Given a dataset of n time series data D = {F 1 , F 2 , . . . , F n }, the process of unsupervised partitioning of D into C = {C 1 , C 2 , . . . , C k } in such a way that homogeneous time series data are grouped together based on a certain similarity measure" [ A model "the analysis of existing national energy systems, as well as the prediction of potential future scenarios, is usually performed with" [9] Energy System Optimization Model, Energy

Structure of the Review
From the categorization in Appendix A, the methods presented in Section 3 are derived as the basic aggregation methods, as well as miscellaneous methods that cannot be clearly categorized. As aggregation methods commonly suffer from certain drawbacks, a number of methods exist to preserve additional information of the original input time series, which are presented in Section 4. Along with both Sections 3 and 4, the individual trends and possible reasons for them are discussed in Sections 3.5 and 4.3. The major results of the review are concluded in Section 5. Figure 2 illustrates the structure of the following chapters by highlighting comparable ideas with rival methods with the same colors and steps to be taken or decisions to be made for applying a sophisticated aggregation method with blue arrows. The grey backgrounds distinguish the basic aggregation process presented in Section 3 from the preservation of additional information of the original time series presented in Section 4. Along with both Sections 3 and 4, the individual trends and possible reasons for them are discussed in Sections 3.5. and 4.3. The major results of the review are concluded in Section 5. Figure 2 illustrates the structure of the following chapters by highlighting comparable ideas with rival methods with the same colors and steps to be taken or decisions to be made for applying a sophisticated aggregation method with blue arrows. The grey backgrounds distinguish the basic aggregation process presented in Section 3 from the preservation of additional information of the original time series presented in Section 4.  Along with the introduction of a new aggregation method, the impact of this method on potential input data is visualized. For this, a time series for photovoltaic capacity factors is used, which consists of 8760 hourly time steps for one year, and is illustrated in Figure 3. Finally, all of the described equations refer to those in existing publications, but are reformulated for the sake of consistency within this paper.
Energies 2020, 13, x FOR PEER REVIEW 7 of 71 Along with the introduction of a new aggregation method, the impact of this method on potential input data is visualized. For this, a time series for photovoltaic capacity factors is used, which consists of 8760 hourly time steps for one year, and is illustrated in Figure 3. Finally, all of the described equations refer to those in existing publications, but are reformulated for the sake of consistency within this paper.

Time Series Aggregation
The following section deals with the general concept of TSA. For the mathematical examinations of the following section, the nomenclature of Table 2 is used.

Time Series Aggregation
The following section deals with the general concept of TSA. For the mathematical examinations of the following section, the nomenclature of Table 2 is used.
The input data D usually consists of one time series for each attribute, i.e., D = A × S. The set of attributes A describes all types of parameters that are ex-ante known for the energy system, such as the capacity factors of certain technologies at certain locations or demands for heat and electricity that must be satisfied. The set of time steps describes the shape of the time series itself, i.e., sets of discrete values that represent finite time intervals, e.g., 8760 time steps of hourly data to describe a year. For all methods presented in the following, it is crucial that the time series of all attributes have identical lengths and TR. The possible shape of this highly resolved input data is shown in the left upmost field in Figure 4.   One approach for aggregating the input time series is to merge multiple time series of attributes with a similar pattern. However, this can only be performed for attributes describing similar units (e.g., the capacity factors of similar wind turbines) or similar customer profiles (i.e., the electricity demand profiles of residential buildings). As this approach is often chosen to merge spatially distributed but similar technologies, it is not considered as TSA in the narrow sense, but as spatial or technological aggregation, as the number of time steps is not reduced in these cases. This is illustrated in the right upmost field in Figure 4, and some examples from the literature are given in Appendix B.
TSA, as it is understood in this review, is the aggregation of redundant information within each time series, i.e., in the case of discrete time steps, the reduction of the overall number of time steps. This can be done in several ways. One way of reducing the number of time steps, as is shown in the central field of Figure 4, is the merging of adjacent time steps. Here, it needs to be highlighted that the periods shown in this field are for illustrative purposes only: The merging of adjacent time steps can be performed for full-length time series or time periods of time series only. Moreover, the merging of adjacent time steps can either be done in a regular manner, e.g., every two time steps are represented by one larger time step (downsampling) or in an irregular manner according to, e.g., the gradients of the time series (segmentation). A third possible approach is to individually variate the temporal resolution for each attribute, i.e., using multiple time grids, which could also be done in an irregular manner, as pointed out by Renaldi et al. [61]. These three methods directly decrease the temporal resolution and will be presented in Section 3.1.
Another approach for TSA is based on the fact that many time series exhibit a fairly periodic One approach for aggregating the input time series is to merge multiple time series of attributes with a similar pattern. However, this can only be performed for attributes describing similar units (e.g., the capacity factors of similar wind turbines) or similar customer profiles (i.e., the electricity demand profiles of residential buildings). As this approach is often chosen to merge spatially distributed but similar technologies, it is not considered as TSA in the narrow sense, but as spatial or technological aggregation, as the number of time steps is not reduced in these cases. This is illustrated in the right upmost field in Figure 4, and some examples from the literature are given in Appendix B.
TSA, as it is understood in this review, is the aggregation of redundant information within each time series, i.e., in the case of discrete time steps, the reduction of the overall number of time steps. This can be done in several ways. One way of reducing the number of time steps, as is shown in the central field of Figure 4, is the merging of adjacent time steps. Here, it needs to be highlighted that the periods shown in this field are for illustrative purposes only: The merging of adjacent time steps can be performed for full-length time series or time periods of time series only. Moreover, the merging of adjacent time steps can either be done in a regular manner, e.g., every two time steps are represented Energies 2020, 13, 641 8 of 61 by one larger time step (downsampling) or in an irregular manner according to, e.g., the gradients of the time series (segmentation). A third possible approach is to individually variate the temporal resolution for each attribute, i.e., using multiple time grids, which could also be done in an irregular manner, as pointed out by Renaldi et al. [61]. These three methods directly decrease the temporal resolution and will be presented in Section 3.1.
Another approach for TSA is based on the fact that many time series exhibit a fairly periodic pattern, i.e., time series for solar irradiance have a strong daily pattern. In the case of perfect periodicity, a time series could thus be represented by one period and its cardinality without the loss of any information. Based on this idea, time series are often divided into periods as already shown in the middle of Figure 4. As the periods are usually not constant throughout a year (e.g., the solar irradiance is higher in the summer than in the winter), the periods can either be merged based on their position in the calendar (time slices and averaging) or based on their similarity (clustering), as shown at the bottom of Figure 4. These methods will be described in Section 3.2. Moreover, information about the order in which the periods appear in the original time series must be preserved to be able to model temporal linkages such as the states of charge of storage technologies which will be referred to as "intertemporal constraints" in the following. This is discussed in Section 3.2.4. As already mentioned, the TR can also be reduced within the periods. This leads to Table 3, which illustrates the possible combinations of the methods presented above. Here, each method from column one could be combined with each method from column two. Moreover, a small number of methods based on random sampling and miscellaneous methods, but cannot be properly categorized in Figure 4 or Table 3. However, they will be described in Sections 3.3 and 3.4. In this way, Table 3 mirrors the structure of the following section.
In the following, methods that merge time steps or periods in a regular manner, i.e., based on their position in the time series only, will be referred to as time-based methods, whereas aggregation based on the time steps' and periods' values will be called feature-based. In this context, features refer not only to statistical features as defined by Nanopoulos et al. [62], but in a broader sense to information inherent to the time series, regardless of whether the values or the extreme values of the time series themselves or their statistical moments are used [63].

Resolution Variation
The simplest and most intuitive method for reducing the data volume of time series for ESMs is the variation of the TR. Here, three different procedures can be distinguished that have been commonly used in the literature: Downsampling is a straightforward method for reducing the TR by representing a number of consecutive discrete time steps by only one (longer) time step, e.g., a time series for one year of hourly data is sampled down to a time series consisting of 6 h time steps. Thus, the number of time steps that must be considered in the optimization is reduced to one sixth, as demonstrated by Pfenninger et al. [37]. As the averaging of consecutive time steps leads to an underestimation of the intra-time step variability, capacities for RES tend to be underestimated because their intermittency is Energies 2020, 13, 641 9 of 61 especially weakly represented [37]. Figure 5 shows the impact of downsampling the PV profile from hourly resolution to 6-h time steps, resulting in one sixth of the number of time steps. In comparison to the original time series, the underestimation of extreme periods is remarkable. This phenomenon also holds true for sub-hourly time steps [38,64,65] and, for instance, in the case of an ESM containing a PV cell and a battery for a residential building, this not only has an impact on the built capacities, but also on the self-consumption rate [38,65]. For wind, the impact is comparable [64]. As highlighted by Figure 4, downsampling can also be applied to typical periods. To the best of our knowledge, this was initially evaluated by Yokoyama et al. [66] with the result that it could be a crucial step to resolve a highly complex problem, at least close to optimality. The general tendency of downsampling to underestimate the objective function was shown in a subsequent work by Yokoyama et al. [67] and the fact that this is not necessarily the case when combined with other methods in a third publication [68]. Other works that deal with combined approaches will be discussed in Section 3.2.1.3.
Energies 2020, 13, x FOR PEER REVIEW 10 of 71 crucial step to resolve a highly complex problem, at least close to optimality. The general tendency of downsampling to underestimate the objective function was shown in a subsequent work by Yokoyama et al. [67] and the fact that this is not necessarily the case when combined with other methods in a third publication [68]. Other works that deal with combined approaches will be discussed in Section 3.2.1.3.

Segmentation
In contrast to downsampling, segmentation is a feature-based method of decreasing the TR of time series with arbitrary time step lengths. To the best of our knowledge, Mavrotas et al. [54] were the first to present an algorithm for segmenting time series to coarser time steps based on ordering the gradients between time steps and merging the smallest ones. Fazlollahi et al. [69] then introduced a segmentation algorithm based on k-means clustering in which extreme time steps were added in a second step. In both works, the segmentation methods were applied to typical periods, which will be explained in the following chapters. Bungener et al. [70] used evolutionary algorithms to iteratively merge the heat profiles of different units in an industrial cluster and evaluated the different solutions obtained by the algorithm with the preserved variance of the time series and the sum of zero-flow rate time steps, which indicated that a unit was not active. Deml et al. [71] used a similar, but not feature-based approach, as Mavrotas et al. and Fazlollahi et al. [54,69] for the optimization of a dispatch model. In this approach, the TR of the economic dispatch model was more reduced the further time steps lay in the future, following a discretized exponential function. Moreover, they compared the results of this approach to those of a perfect foresight approach for the fully resolved time horizon and a model-predictive control and proved the superiority of the approach, as it preserved the chronology of time steps. This was also pointed out in comparison to a typical periods approach by Pineda et al. [72], who used the centroid-based hierarchical Ward's algorithm [73] with the side condition to only merge adjacent time steps. Bahl et al. [74], meanwhile, introduced a similar algorithm as Fazlollahi et al. [69] inspired by Lloyd's algorithm and the partitioning around medoids algorithm [75,76] with multiple initializations. This approach was also utilized in succeeding publications [77,78]. In contrast to the approach of Bahl et al. [74], Stein et al. [79] did not use a hierarchical approach, but formulated an MILP in which not only extreme periods could be excluded beforehand, but also so that the grouping of too many adjacent time steps with a relatively small but monotone gradient could be avoided. The objective function relies on the minimization of the gradient error, similar to the method of Mavrotas et al. [54]. Recently, Savvidis et al. [80] investigated the effect of increasing the TR at times of the zero-crossing effect, i.e., at times when the energy system switches from the filling of storages to withdrawing and vice versa. This was compared to the opposite approach, which increased resolution at times without zero crossing. They also arrived at the conclusion that the use of irregular time steps is effective for decreasing the computational load without losing substantial information. Figure 6 shows advantages of the hierarchical method proposed by Pineda et al. [72] compared to the simple downsampling in Figure  5. The inter-daily variations of the PV profile are much more accurately preserved choosing 1460

Segmentation
In contrast to downsampling, segmentation is a feature-based method of decreasing the TR of time series with arbitrary time step lengths. To the best of our knowledge, Mavrotas et al. [54] were the first to present an algorithm for segmenting time series to coarser time steps based on ordering the gradients between time steps and merging the smallest ones. Fazlollahi et al. [69] then introduced a segmentation algorithm based on k-means clustering in which extreme time steps were added in a second step. In both works, the segmentation methods were applied to typical periods, which will be explained in the following chapters. Bungener et al. [70] used evolutionary algorithms to iteratively merge the heat profiles of different units in an industrial cluster and evaluated the different solutions obtained by the algorithm with the preserved variance of the time series and the sum of zero-flow rate time steps, which indicated that a unit was not active. Deml et al. [71] used a similar, but not feature-based approach, as Mavrotas et al. and Fazlollahi et al. [54,69] for the optimization of a dispatch model. In this approach, the TR of the economic dispatch model was more reduced the further time steps lay in the future, following a discretized exponential function. Moreover, they compared the results of this approach to those of a perfect foresight approach for the fully resolved time horizon and a model-predictive control and proved the superiority of the approach, as it preserved the chronology of time steps. This was also pointed out in comparison to a typical periods approach by Pineda et al. [72], who used the centroid-based hierarchical Ward's algorithm [73] with the side condition to only merge adjacent time steps. Bahl et al. [74], meanwhile, introduced a similar algorithm as Fazlollahi et al. [69] inspired by Lloyd's algorithm and the partitioning around medoids algorithm [75,76] with multiple initializations. This approach was also utilized in succeeding publications [77,78]. In contrast to the approach of Bahl et al. [74], Stein et al. [79] did not use a hierarchical approach, but formulated an MILP in which not only extreme periods could be excluded beforehand, but also so that the grouping of too many adjacent time steps with a relatively small but monotone gradient could be avoided. The objective function relies on the minimization of the gradient error, similar to the method of Mavrotas et al. [54]. Recently, Savvidis et al. [80] investigated the effect of increasing the TR at times of the zero-crossing effect, i.e., at times when the energy system switches from the filling of storages to withdrawing and vice versa. This was compared to the opposite approach, which increased resolution at times without zero crossing. They also arrived at the conclusion that the use of irregular time steps is effective for decreasing the computational load without losing substantial information. Figure 6 shows advantages of the hierarchical method proposed by Pineda et al. [72] compared to the simple downsampling in Figure 5. The inter-daily variations of the PV profile are much more accurately preserved choosing 1460 irregular time steps compared to simple downsampling with the same number of time steps.

Multiple Time Grids
The idea of using multiple time grids takes into account that different components that link different time steps to each other, such as storage systems, have different time scales on which they operate [14,15,81]. For instance, batteries often exhibit daily storage behavior, whereas hydrogen technologies [14,15] or some thermal storage units [81,82] have seasonal behavior. Because of this, seasonal storage is expected to be accurately modeled with a smaller number of coarser time steps. Renaldi et al. [61] applied this principle to a solar district heating model consisting of a solar thermal collector, a backup heat boiler, and a long-and a short-term thermal storage system to achieve the optimal tradeoff between the computational load and accuracy for modeling the long-term thermal storage with 6 h time steps and the remaining components with hourly time steps. It is important to highlight that the linking of the different time grids was achieved by applying the operational state of the long-term storage to each time step of the other components if they lay within the larger time steps of the long-term storage. This especially reduced the number of binary variables of the long-term storage (because it could not charge and discharge at the same time). However, increasing the step size led to an even further increase in calculation time, as the operational flexibility of the long-term storage became too stiff and the benefit from reducing the number of variables of the long-term storage decreased. Thus, this method requires knowledge about the characteristics of each technology beforehand. Reducing the TR of single components is a highly demanding task and is left to future research.

Typical Periods
The aggregation of time series into typical periods is based on the idea that energy systems behave similarly under similar external conditions, e.g., similar energy demands and capacity factors of RES [83]. Typical periods can consist of single time steps, which are called "system states" [19,[83][84][85][86], "snapshots" [63,87], or "external operation conditions" [88] in the literature, or periods containing more than one time step, e.g., "typical days" (TDs) or "representative days", which were used by the majority of authors. In the context of control engineering, the term "system states" is especially misleading, as the state of a system not only depends on external parameters such as capacity factors and demands to be fulfilled, but also on storage levels and other endogenous state variables. Therefore, the term "system state" in discrete ESMs is only equivalent to time steps if the system is not temporally coupled, i.e., neither state variables, nor intertemporal constraints linking them with each other exist. The following will refer to "typical time steps" (TTSs) if the typical period consists of only one time step. If not stated differently in the following, the authors used TDs. However, longer periods such as typical (also called representative) weeks ([57,89-91] ("typical weeks"), [92][93][94][95] ("representative weeks") also exist. This work only makes further use of the word "representative" in the context of clustering, as the representative of each cluster [96] is then

Multiple Time Grids
The idea of using multiple time grids takes into account that different components that link different time steps to each other, such as storage systems, have different time scales on which they operate [14,15,81]. For instance, batteries often exhibit daily storage behavior, whereas hydrogen technologies [14,15] or some thermal storage units [81,82] have seasonal behavior. Because of this, seasonal storage is expected to be accurately modeled with a smaller number of coarser time steps. Renaldi et al. [61] applied this principle to a solar district heating model consisting of a solar thermal collector, a backup heat boiler, and a long-and a short-term thermal storage system to achieve the optimal tradeoff between the computational load and accuracy for modeling the long-term thermal storage with 6 h time steps and the remaining components with hourly time steps. It is important to highlight that the linking of the different time grids was achieved by applying the operational state of the long-term storage to each time step of the other components if they lay within the larger time steps of the long-term storage. This especially reduced the number of binary variables of the long-term storage (because it could not charge and discharge at the same time). However, increasing the step size led to an even further increase in calculation time, as the operational flexibility of the long-term storage became too stiff and the benefit from reducing the number of variables of the long-term storage decreased. Thus, this method requires knowledge about the characteristics of each technology beforehand. Reducing the TR of single components is a highly demanding task and is left to future research.

Typical Periods
The aggregation of time series into typical periods is based on the idea that energy systems behave similarly under similar external conditions, e.g., similar energy demands and capacity factors of RES [83]. Typical periods can consist of single time steps, which are called "system states" [19,[83][84][85][86], "snapshots" [63,87], or "external operation conditions" [88] in the literature, or periods containing more than one time step, e.g., "typical days" (TDs) or "representative days", which were used by the majority of authors. In the context of control engineering, the term "system states" is especially misleading, as the state of a system not only depends on external parameters such as capacity factors and demands to be fulfilled, but also on storage levels and other endogenous state variables. Therefore, the term "system state" in discrete ESMs is only equivalent to time steps if the system is not temporally coupled, i.e., neither state variables, nor intertemporal constraints linking them with each other exist. The following will refer to "typical time steps" (TTSs) if the typical period consists of only one time step. If not stated differently in the following, the authors used TDs. However, longer periods such as typical (also called representative) weeks ( [57,[89][90][91] ("typical weeks"), [92][93][94][95] ("representative weeks") also exist. This work only makes further use of the word "representative" in the context of clustering, as the representative of each cluster [96] is then interpreted as the new typical period.
Analogous to the previous chapter, a number of time-based and feature-based methods exist that will be explained in the following.

Time-Based Merging
Time-based approaches of selecting typical periods rely on the modeler's knowledge of the model. This means that characteristics are included that are expected to have an impact on the overall design and operation of the ESM. As will be shown in the following, this was most frequently done for TDs, although similar approaches for typical weeks [89] or typical hours (i.e., TTSs) [97] exist. As pointed out by Schütz et al. [58], the time-based selection of typical periods can be divided into month-based and season-based methods, i.e., selecting a number of typical periods from either each month or from each season. However, we divide the time-based methods in consecutive typical periods and non-consecutive typical periods that are repeated as a subset with a fixed order in a pre-defined time interval.

Averaging
The method that is referred to as averaging in the following, as per Kotzur et al. [57], focuses on aggregating consecutive periods into one period. To the best of our knowledge, this idea was first introduced by Marton et al. [98], who also introduced a clustering algorithm that indicated whether a period of consecutive typical periods of Ontario's electricity demand had ended or not. In this way, the method was capable of preserving information about the order of TDs. However, it was not applied to a specific ESM. In contrast to that method, one TD for each month at hourly resolution, resulting in 288 time steps, was used by Mavrotas et al. [54], Lozano et al. [99], Schütz et al. [100], and Harb et al. [101]. Although thermal storage systems have been considered in the literature [99][100][101] (as well as a battery storage by Schütz et al. [100]), they were constrained to the same state of charge at the beginning and end of each day. The same holds true in the work of Kotzur et al. [57]. Here, thermal storage, batteries and hydrogen storage were considered and the evaluation was repeated for different numbers of averaged days. Buoro et al. [89] used one typical week per month to simulate operation cycles on a longer time scale. Kools et al. [102], in turn, clustered eight consecutive weeks in each season to one TD with 10 min resolution, which was then further down-sampled to 1 h time steps. The same was done by Harb et al. [101], who compared twelve TDs of hourly resolution to time series with 10 min. time steps and time series down-sampled to 1 h time steps. This illustrates that both methods, downsampling and averaging, can be combined. Voll et al. [103] aggregated the energy profiles even further with only one time step per month, which can also be interpreted as one TD per month down-sampled to one time step. To account for the significant underestimation of peak loads, the winter and summer peak loads were included as additional time steps. Figure 7 illustrates the impact of representing the original series by twelve monthly averaged consecutive typical days, i.e., 288 time steps instead of 8760.
steps. This illustrates that both methods, downsampling and averaging, can be combined. Voll et al. [103] aggregated the energy profiles even further with only one time step per month, which can also be interpreted as one TD per month down-sampled to one time step. To account for the significant underestimation of peak loads, the winter and summer peak loads were included as additional time steps. Figure 7 illustrates the impact of representing the original series by twelve monthly averaged consecutive typical days, i.e., 288 time steps instead of 8760.

Time Slices
To the best of our knowledge, the idea of time slices (TSs) was first introduced by the MESSAGE model [9,51] and the expression was reused for other models, such as THEA [104], LEAP [105], OSeMOSYS [106], Syn-E-Sys [107], and TIMES [48,49]. The basic idea is comparable to that of averaging, but not based on aggregating consecutive periods. Instead, TSs can be interpreted as the general case of time-based grouping of periods. Given the fact that electricity demand in particular not only depends on the season, but also on the weekday, numerous publications have used the TS method for differentiating between seasons and amongst days. In the following, this approach is referred to as time slicing, although not all of the cited publications explicitly refer to the method thus. Instead, the method is sometimes simply called "representative day" [66,67,[108][109][110][111][112][113], "TD" [54,[114][115][116][117][118][119][120][121], "typical daily profiles" [16,17], "typical segment" [122] "time slot" [52], or "time band" [123]. Accordingly, the term "TS" is used by the majority of authors [36,39,51,[104][105][106][107][124][125][126][127][128]. The most frequent distinction is made between the four seasons [16,17,36,39,[104][105][106][107]115,121,124,[126][127][128] or between summer, winter. and mid-season [40,51,54,66,67,91,[108][109][110]112,117,118,120,123,129,130], but other distinctions such as monthly, bi-monthly, or bi-weekly among others [40,51,54,111,113,114,116,119,122,125] can also be found. Within this macro distinction, a subordinate distinction between weekdays and weekend days [16,17,51,106,107,111,113,116,[121][122][123], weekdays, Saturdays, and Sundays [115,124,126], Wednesdays, Saturdays, and Sundays [104,105], or others, such as seasonal, median, and peak [40] can be found. In contrast to the normal averaging, each TS does not follow the previous one, but is repeated in a certain order a certain number of times (e.g., five spring workdays are followed 13 times by two weekend spring days before the summer periods follow). This is especially important when seasonal storages are modeled [16,17,106], which will be explained in greater depth in Section 3.2.4. As a visual inspection of Figures 7 and 8 shows, the TS method relying on the distinction between weekdays and seasons is not always superior to a monthly distinction. The reason for this is that some input data such as the PV profile from the example have no weekly pattern and spacing the typical periods equidistantly is the better choice in this case if no other input time series (such as, e.g., electricity profiles) must be taken into account. Thus, the choice of the aggregation method should refer to the pattern of the time series considered to be especially important for the ESM. For instance, the differences between week-and weekend days is likely more important to an electricity system based on fossil fuels and without storage technologies, whereas an energy system based on a high share of RES, combined heat and power technologies, and storage units is more affected by seasonality.
the typical periods equidistantly is the better choice in this case if no other input time series (such as, e.g., electricity profiles) must be taken into account. Thus, the choice of the aggregation method should refer to the pattern of the time series considered to be especially important for the ESM. For instance, the differences between week-and weekend days is likely more important to an electricity system based on fossil fuels and without storage technologies, whereas an energy system based on a high share of RES, combined heat and power technologies, and storage units is more affected by seasonality.

Time Slices/Averaging + Downsampling/Segmentation
Like the simple averaging of consecutive time periods that can be further sampled down, e.g., as done by Harb et al. [101], the typical periods in the TS method can also be further sampled down. This can be done, for instance, by downsampling to 2 h TSs [116,118,122], 4 h TSs [40], or a number of

Time Slices/Averaging + Downsampling/Segmentation
Like the simple averaging of consecutive time periods that can be further sampled down, e.g., as done by Harb et al. [101], the typical periods in the TS method can also be further sampled down. This can be done, for instance, by downsampling to 2 h TSs [116,118,122], 4 h TSs [40], or a number of different time step sizes to investigate the downsampling impact [66][67][68]. Moreover, day and night cycles (two diurnal TSs) [36,104,[126][127][128], optionally including the peak hour of the day [36,127,128] or other TSs of irregular length [39,54,106,107,112,123,129], were also used. Mavrotas et al. [54] also implemented an algorithm for segmenting the chosen TDs to coarser TSs based on ordering of the gradients between time steps and merging the smallest ones.
The extreme case of both the downsampling method and averaging/TS method is the representation of the total time series by its mean, which was performed by Merrick et al. [40]. As this approach is unable to consider any dynamic effects, it only served as a benchmark.

Feature-Based Merging
In contrast to representing time series with typical periods based on a time-based method, typical periods can also be chosen on the basis of features. In this section, the clustering procedure is explained both conceptually and mathematically. To the best of our knowledge, one of the first and most frequently cited works by Domínguez-Muñoz et al. [55] used this approach to determine typical demand days for a CHP optimization, i.e., an energy system optimization model with discrete time steps, even though it was not applied to a concrete model in this work. For this purpose, all time series are first normalized to encounter the problem of diverse attribute scales. Then, all time series are split into periods P, which are compared to each other by transforming them for each value x of each attribute a at each time step t within the period to a hyper-dimensional data point. Those data points with low distances to each other are grouped into clusters and represented by a (synthesized or existing) point within that cluster considered to be a "typical" or "representative" period. Additionally, a number of clustering algorithms are not centroid-based, i.e., they do not preserve the average value of the time series [58] which could, e.g., lead to a wrong assumption of the overall energy amount provided by an energy system across a year. To overcome this problem, time series are commonly rescaled in an additional step. The methods for this are presented in Sub-Section 3.2.2.3. This means that time series clustering includes five fundamental aspects: • A normalization (and sometimes a dimensionality reduction).
A method to choose representatives [59]. • A rescaling step in the case of non-centroid based clustering algorithms. As the clustered data are usually relatively sparse, while the number of dimensions increases with the number of attributes, the curse of dimensionality may lead to unintuitive results incorporating distance metrics [131], such as the Euclidean distance [59,[132][133][134]. Therefore, a dimensionality reduction might be used in advance [135][136][137], but is not further investigated in this work for the sake of brevity. In the following, each of the bullet points named above will be explained with respect to their application in TSA for ESMs. Further, the distance metric, clustering method, and the choice of representatives will jointly be presented in Section 3.2.2.2, because the number of clustering methods used for ESMs is small. Figure 9 shows the mandatory steps for time series clustering used for ESMs, which are presented in the following. The grey boxes contain optional methods for maintaining additional information that is important for the system design and which are presented in Section 4. Figure 10 shows the time series of photovoltaic capacity factors represented by 12 typical days (TDs) using k-means clustering and the python package tsam [57].

Preprocessing and Normalization
Clustering normally starts with preprocessing the time series, which includes a normalization step, an optional dimensionality reduction and an alignment step. Because of the diversity of scales and units amongst different attributes, they must be normalized before applying clustering algorithms to them. Otherwise, distance measures used in the clustering algorithm would focus on large-scaled attributes and other attributes would not be properly represented by the cluster centers. For example, capacity factors are defined as having values of between zero and one, whereas electricity demands can easily reach multiple gigawatts. Although a vast number of clustering algorithms exist, the min-max normalization is used in the majority of publications [14,18,39,57,58,69,92,93,135,[138][139][140][141] In cases in which the natural lower limit is zero, such as time series for electricity demands, this is sometimes [37,86,88,94,[142][143][144][145] reduced to:

Preprocessing and Normalization
Clustering normally starts with preprocessing the time series, which includes a normalization step, an optional dimensionality reduction and an alignment step. Because of the diversity of scales and units amongst different attributes, they must be normalized before applying clustering algorithms to them. Otherwise, distance measures used in the clustering algorithm would focus on large-scaled attributes and other attributes would not be properly represented by the cluster centers. For example, capacity factors are defined as having values of between zero and one, whereas electricity demands can easily reach multiple gigawatts. Although a vast number of clustering algorithms exist, the min-max normalization is used in the majority of publications [14,18,39,57,58,69,92,93,135,[138][139][140][141] In cases in which the natural lower limit is zero, such as time series for electricity demands, this is sometimes [37,86,88,94,[142][143][144][145] reduced to:

Preprocessing and Normalization
Clustering normally starts with preprocessing the time series, which includes a normalization step, an optional dimensionality reduction and an alignment step. Because of the diversity of scales and units amongst different attributes, they must be normalized before applying clustering algorithms to them. Otherwise, distance measures used in the clustering algorithm would focus on large-scaled attributes and other attributes would not be properly represented by the cluster centers. For example, capacity factors are defined as having values of between zero and one, whereas electricity demands can easily reach multiple gigawatts. Although a vast number of clustering algorithms exist, the min-max normalization is used in the majority of publications [14,18,39,57,58,69,92,93,135,[138][139][140][141]. For the time series of an attribute a ∈ A = {1, . . . , N a } consisting of s ∈ S = {1, . . . , N s } time steps, the normalization to the values assigned to a in time step s is calculated as follows: In cases in which the natural lower limit is zero, such as time series for electricity demands, this is sometimes [37,86,88,94,[142][143][144][145] reduced to: Another normalization that can be found in the literature [41,97,[146][147][148] is the z-normalization that directly accounts for the standard deviation, rather than for the maximum and minimum outliers, which implies a normal distribution with different spreads amongst different attributes: In Appendix C, the normalization approaches are exemplarily illustrated for a hypothetical short time series.
In the following, the issue of dimensionality reduction will not be considered due to the fact that it is only used in a small number of publications [135][136][137] and transforms the data into eigenvectors to tackle the non-trivial behavior of distance measures used for clustering in hyper-dimensional spaces [133].
A time series can further be divided into a set of periods P and a set of time steps within each period T, i.e., S = P × T. The periods are clustered into non-overlapping subsets P C , which are then represented by a representative period, respectively. A representative period consists of at least one discrete time step and, depending on the number and duration of time steps, it is often referred to as a typical hour, snapshot or system state, typical or representative day, or typical week. The data D = A × P × T can thus be rearranged so that each period is represented by a row vector in which all inter-period time steps of all attributes are concatenated, i.e., The row vectors of D arr are now grouped with respect to their similarity. Finally, yet importantly, it must be highlighted that the inner-period time step values can also be sorted in descending order, which means that, in this case, the duration curves of the periods are clustered as done in other studies [18,140,149,150]. This can reduce the averaging effect of clustering time series without periodic patterns such as wind time series.

Algorithms, Distance Metrics, Representation
Although a vast number of different clustering algorithms exist [96,151] and have been used for time series clustering in general [59], only a relatively small number of regular clustering algorithms have been used for clustering input data for energy system optimization problems, which will be presented in the following. Apart from that, a number of modified clustering methods have been implemented in order to account for certain properties of the time series, which will be part of Section 3.2.3. The goal of all clustering methods is to meaningfully group data based on their similarity, which means minimizing the intra-cluster difference (homogeneity) or maximizing the inter-cluster difference (separability) or a combination of the two [152]. However, this depends on the question of how the differences are defined. To begin with, the clustering algorithms can be separated into partitional and deterministic hierarchical algorithms.

Partitional Clustering
One of the most common partitional clustering algorithms used in energy system optimization is the k-means algorithm, which has been used in a variety of studies [14,15,24,37,57,58,63,69,74,78,[83][84][85][86][87]97,[137][138][139]141,142,[145][146][147][148][153][154][155][156][157][158][159][160][161]. The objective of the k-means algorithm is to minimize the sum of the squared distances between all cluster members of all clusters and the corresponding cluster centers, i.e., The distance metric in this case is the Euclidean distance between the hyperdimensional period vectors with the dimension dim(vec(T × A)) and their cluster centers c k , i.e., where the cluster centers are defined as the centroid of each cluster, i.e.: This NP-hard problem is generally solved by an adopted version [76] of Lloyd's algorithm [75], a greedy algorithm that heuristically converges to a local minimum. As multiple runs are performed in order to improve the local optimum, improved versions (such as k-means++) for setting initial cluster centers have also been proposed in the literature [162].
The only difference regarding the k-medoids algorithm is that the cluster centers are defined as samples from the dataset that minimize the sum of the intra-cluster distances, i.e., that are closest to the clusters' centroids. c k = argmin This clustering algorithm was used by numerous authors [14,19,41,55,57,58,74,78,86,139,141,149,150,159,[163][164][165][166][167], either by using the partitioning around medoids (PAM) introduced by Kaufman et al. [168] or by using an MILP formulation introduced by Vinod et al. [169] and used in several studies [14,41,55,57,139,159,164]. The MILP can be formulated as follows: Subject to: In a number of publications [41,57,58,86,139,141,159], k-medoids clustering was directly compared to k-means clustering. The general observation is that k-medoids clustering is more capable of preserving the intra-period variance, while k-means clustering underestimates extreme events more gravely. Nevertheless, the medoids lead to higher root mean squared errors compared to the original time series. This leads to the phenomenon that k-medoids outperforms k-means in the cases of energy systems sensitive to high variance, as in self-sufficient buildings, e.g., as shown by Kotzur et al. [57] and Schütz et al. [139]. In contrast to that, k-means outperforms k-medoids clustering in the case of smooth demand time series and non-rescaled medoids that do not match the overall annual demand in the case of k-medoids clustering, as shown by Zatti et al. [141] for the energy system of a university campus.

Agglomerative Clustering
In contrast to partitional clustering algorithms that iteratively determine a set consisting of k clusters in each iteration step, agglomerative clustering algorithms such as Ward's hierarchical algorithm [73] stepwise merge clusters aimed at minimizing the increase in intra-cluster variance (13) in each merging step until the data is agglomerated to k clusters. The algorithm is thus deterministic and does not require multiple random starting point initializations. Analogously to k-means and k-medoids, the cluster centers can either be represented by their centroids [41,159] or by their medoids [18,37,41,57,72,86,135,140,143,144,148,159]. The general property that centroids underestimate the intra-period variance more severely due to the averaging effect is equivalent to the findings when using k-means instead of k-medoids.

Rarely Used Clustering Algorithms
Apart from the frequently used clustering algorithms in the literature, two more clustering algorithms were used in the context of determining typical periods based on unsorted time intervals of consistent lengths.
K-medians clustering is another partitional clustering algorithm that is closely related to the k-means algorithm and has been used in other studies [58,139]. Taking into account that the Euclidean distance is only the special case for of the Minkowski distance [170] K-medians generally tries to minimize the sum of the distances of all data points to their cluster center in the Manhattan norm, i.e., for γ = 1 and the objective function [171,172]: For this, the L1 distance is usually used in the assignment step [171] and the median is calculated in each direction to minimize the L1 distance within each cluster [172]. However, Schütz et al. [58,139] used the Euclidean distance (like for k-means) in the assignment step to isolate the influence of using dimension-wise medians instead of dimension-wise means (i.e., centroids). Thus, all values come from the original dataset, but not necessarily from the same candidates [58].

Time Shift-Tolerant Clustering Algorithms
The last group of clustering algorithms applied for TSA in ESMs is time shift-tolerant clustering algorithms. These algorithms not only compare to the values of different time series at single time steps (pointwise), but also compare values along the time axis with those of other time series (pairwise). In the literature [41,159], dynamic time warping (DTW) and the k-shape algorithm are used, both of which are based on distance measures that are not sensitive to phase shifts within a typical period, which is the case for the Euclidean distance. The dynamic time-warping distance is defined as: where w describes the so-called warping path, which is the path of minimal deviations across the matrix of cross-deviations between any entry of x p and any entry of c k [41,174]. The cluster centers c k are determined using DTW Barycenter averaging, which is the centroid of each time series value (within an allowed warping window) assigned to the time step [175]. Moreover, a warping window [41,159] can be determined that limits the assignment of entries across the time steps. Shape-based clustering uses a similar algorithm and tries to maximize the cross-correlation amongst the periods. Here, the distance measure to be minimized is the cross-correlation and the period vectors are uniformly shifted against each other to maximize it [41,159,174,176]. It must be highlighted that both dynamic time warping and shape-based distance, have only been applied on the clustering of electricity prices, i.e., only one attribute [41,159]. Moreover, Liu et al. [148] also applied dynamic time warping to demand, solar, and wind capacity factors simultaneously. However, it is unclear how it was guaranteed that different attributes were not compared to each other within the warping window which remains a field of future research. Furthermore, a band distance, which is also a pairwise rather than a pointwise distance measure, was used in a k-medoids algorithm by Tupper et al. [167], leading to significantly less loss of load when deriving operational decisions for the next day using a stochastic optimization model.

Rescaling
Due to the fact that not all of the methods rely on the representation of each cluster by its centroid (i.e., the mean in each dimension), these typical periods do not meet the overall average value when weighted by their number of appearances and must be rescaled. This also holds true for the consideration of extreme periods, which will be explained in the following chapters. Accordingly, the following section will be referred to if rescaling is considered in the implementation of extreme periods. To the best of our knowledge, the first work that used clustering not based on centroids was that of Domínguez-Muñoz et al. [55], in which the exact k-medoids approach was chosen as per Vinod et al. [169]. Here, each attribute (time series) of each TD was rescaled to the respective cluster's mean, i.e., Furthermore, Domínguez-Muñoz et al. [55] discarded the extreme values that were manually added from the rescaling procedure. A similar procedure, which was applied for each time series, but not for each TD, was introduced by Nahmmacher et al. [143], who used hierarchical clustering based on Ward's algorithm [73] and chose medoids as representatives, which was later used in a number of other studies [14,18,41,57,140,159]. Here, all representative days were rescaled to fit the overall yearly average when multiplied by their cardinality and summed up, but not the average of their respective clusters, i.e., Schütz et al. [58,139], Bahl et al. [74], and Marquant et al. [149,150] refer to the method of Domínguez-Muñoz et al. [55], but some used it time series-wise and not cluster-and time series-wise. Schütz et al. [58,139] were the first to highlight that both approaches are possible. It also needs to be highlighted that these methods are not the only methods, as Zatti et al. [141], for instance, presented a method to choose medoids within the optimization problem without violating a predefined maximum deviation from the original data, but for the sake of simplicity, it focused on the most frequently used post-processing approaches. Additionally, other early publications, such as by Schiefelbein et al. [163], did not use rescaling at all. Finally, yet importantly, the rescaling combined with the min-max normalization could lead to values over one. Accordingly, these values were reset to one so as to not overestimate the maximum values and the rescaling process was re-run in several studies [14,18,57,140,143]. In contrast, Teichgräber et al. [41,159] used the z-normalization with rescaling in accordance with Nahmmacher et al. [143], but did not assure that the original extreme values were not overestimated by rescaling.

Modified Feature-Based Merging
Apart from the methods that are based on the direct clustering of the time series' values or periods, a number of methods exist that group time series in a consecutive manner [53], by means of other features, such as sorted time series (i.e., duration curves) [18,20,[92][93][94][95]140,144,149,150,177] or other statistical features such as the average, variance, minimal and maximal values [63], or predefine the clusters based on additional information [88]. These methods will be presented in the following.
With respect to grouping consecutive typical periods, an early publication by Balachandra et al. [53] started by grouping daily residual load profiles by month, and then applied multiple discriminant analysis to these groups and reclassified the days at the beginning or end of a group (month) to the preceding or subsequent group if they were more similar to it. This resulted in nine consecutive groups represented by their centroids. However, this aggregation was not applied to an energy system optimization.
Furthermore, a number of publications [20,94,95,144] rely on the principle introduced by Poncelet et al. [177]. For this, the normalized duration curves were placed into bins, i.e., how many hours of the year surpass a certain level between zero and the maximum level of the specific attribute. The same was performed for each candidate day. Then, the sum of absolute differences between the hours at which the reference curve surpassed a bin border and the hours at which the curve derived from a linear combination of a given number of candidates surpassed the same bin borders was minimized in an MILP.
Another approach aimed at reproducing a yearly duration curve was introduced by de Sisternes et al. [92,93]. Here, the duration curve of power feed-in by wind and solar at a certain penetration level was calculated and approximated by an exhaustive search for a combination from a subset of typical weeks. As this was a combinatorial problem, the computation time rapidly increased for higher numbers of weeks. In a later publication [92], the variability of the selected weeks was used as an additional metric.
Instead of clustering the original time series, the yearly duration curve was approximated in a number of publications [18,140,149,150]. For this, the candidate days were simply sorted prior to being clustered. This decreased the averaging effect of statistical events, such as wind, as the largest value and second largest, etc. always lay in the first dimension and second dimension, etc.
With respect to the clustering of other statistical features apart from the distribution curve (duration curve), Agapoff et al. [63] applied k-means clustering to snapshots (i.e., TTSs) and used different features for the clustering: either absolute values or the average, minimum, maximum, and standard deviation of all considered regions for either price differences, non-controllable demand and generation, or both. This is an extension with promising results to all thus far used clustering algorithms only applied to normalized absolute values. Finally, yet significantly, Lythcke-Jørgensen et al. [88] introduced a so-called CHOP-method that was based on splitting the range of each attribute, in this case the power price and relative heat demand on a five-year basis, into different intervals based on important values (e.g., zero-price) and even divisions between them. Then, all values (i.e., hours) were transferred in a 2d space in which the intervals for both attributes formed a grid. From each cell, the centroid was subsequently calculated if it contained any candidate hours. As information about the chronology of these TTSs was lost, the design of storage technologies resulted in large deviations from the reference case.
These cases highlight that methods based on well-known approaches are constantly customized for specific ESMs and improved where possible, which illustrates that the development of TSA methods is a dynamic process.

Linking Typical Periods
As mentioned above, some components, such as storage components of ESMs, link consecutive time steps by means of intertemporal constraints. The representation of time series by a few TDs or weeks does not generally take their order across the entire time horizon into account. This means that the modeling of filling levels is normally only possible within these typical periods with a periodic boundary condition for the state of charge. In this case, the order of typical periods no longer plays a role. On the other hand, seasonal storage cannot be sufficiently modeled by this method. Yet, this is especially important for energy systems based on a high share of RES. As per Bauer et al. [81], central solar heating plants with introduced short-term heat storages can typically supply 15-20% of the total residential heating demand. With seasonal heat storages, this fraction can be increased to about 50%. For a long period of time, the only approach to model seasonal storages was to drastically reduce TR, as by Tveit et al. [178], making it impossible to model short-term storages. To overcome this issue, different methods have been developed that take the linking of TDs into account.
As far as we know, the TIMES framework was the first to deal with linking TSs not only consecutively, but also inter-period storages that work on a larger time scale [46][47][48][49]. However, since the inter-period storages are meant to work between different years, e.g., as waste disposal sites [46], they are not linked to the intra-period storages, which only link consecutive TSs (segments) within one typical period, such as weekdays in spring.
Welsch et al. [106] and Samsatli et al. [16] independently developed a non-uniform hierarchical time discretization that is based on the selection of TSs. In two publications [16,17], Samsatli et al. chose two TDs with hourly data for both the week and weekend which was done for each season consisting of 13 weeks. This resulted in 192 time steps. For the modeling of the seasonal storage, the energy surplus across each time scale was determined and added up. As the chosen days appeared in a regular order within each season, the capacity constraints were not postulated for each time step. Instead, they were only defined for the first and last instance of each day type, the first and last week of each season, and the first and last season of each year, if a multiple year approach was chosen. Welsch et al. [106] chose a similar approach that consisted of three TSs for a workday and a weekend day in each season. However, the case study was only run with one TD with an hourly resolution.
Both approaches did not consider a self-discharge rate. The approach of Welsch et al. [106] was later developed by Timmerman et al. [107] to handle self-discharge and re-used by van der Heijde et al. [95]. Since the typical days in these publications [16,17,95,106,107] are aligned in a regular manner, the critical storage levels can only be reached at certain time steps which significantly reduces the number of side constraints. Taking the configuration used by Timmerman et al. [107] as an example, five identical workdays alternate 13 times per season with two identical weekend days. As each week consists of only two day types, of which the first is repeated five times and the second is repeated twice, the intermediate weekdays representing Tuesday, Wednesday, and Thursday cannot include critical states of charge-neither for a rising state of charge across the weekdays (the critical day would be Friday), nor for a decreasing state of charge across the weekdays (the critical day would be Monday). The same holds true for the intermediate weeks in each season. As they are repeated 13 times, either Similarly, but again independently, Spiecker et al. [116] developed a comparable approach that linked workdays and weekend days for every second month in an inter-day manner for pumped storage plants and an inter-month manner for large-scale storage systems in the E2M2s model. Moreover, the TDs were based on a recombining decision tree of 2 h segments and were thus capable of modeling the storage size stochastically.
Gabrielli et al. [15] developed a method to couple TDs using a function that assigns each day of the original time series to the TD it is represented by. This function is used to couple the state of charge of consecutive (typical) days in an additional equation and means that the operation of the components is modeled for a number of TDs, while the state of charge of the storages is modeled for the entire time horizon represented by a sequence of TDs. The approach was tested for a different number of TDs, as well as in a later publication [15,160].
Wogrin et al. [85] earlier proposed the same approach as Gabrielli et al. [15] for TTSs and took the information of the clustering indices, i.e., which original time step was represented by which TTS, to link TTS in order to consider start-up and shut-down costs, which was later re-used by Tejada-Arango et al. [19] for the calculation of storage levels using typical periods (days and weeks). However, in contrast to Gabrielli et al. [15], the storage levels were not constrained for each time step by Tejada-Arango et al. [19], but only at intervals of one week. Additionally, a similar method was applied to avoid unnecessary unit transitions at the border between two consecutive TDs.
Like the idea of Gabrielli et al. [15], Kotzur et al. [14] introduced a similar method of linking TDs in a chronologically correct order. Instead of directly linking each state of charge to the preceding one, the superposition principle was used to distinguish intraday and interday states of charge. Here, the interday state of charge describes the state of charge at the beginning of each day, while the intraday state of charge is defined to be zero at the beginning of each day but is defined for each hour of each TD. The sum of both values, i.e., the intraday state of charge for a given number of TDs, along with the interday state of charge, which was determined by the sum of storage level differences of each TD in the corresponding sequence, was then used to determine the storage levels at each time step. This approach was also used in later publications dealing with seasonal storage [18,140].
Another slight deviation of this method was applied by van der Heijde et al. [20], who also used the superposition principle discussed by Kotzur et al. [14] to couple TDs. However, they did not use clustering algorithms to group similar days and represented these by one TD for each cluster, but instead searched for a linear combination of days that minimized the deviation from the yearly duration curve; a procedure introduced by Poncelet et al. [144]. In contrast to clustering algorithms, this procedure did not directly lead to an assignment of original days to groups represented by single TDs. This meant that this had to be performed in a separate step. For this, a mixed integer quadratic programm (MIQP) problem was formulated that sought to minimize the sum of squared errors of each day of the original time series to the TDs. The outcome of this was a sequence of TDs that represented the original time series, which was crucial for linking the TDs in accordance with the aforementioned approach of Kotzur et al. [14]. Recently, Baumgärtner et al. [77] included the storage formulation of Kotzur et al. [14] in their rigorous synthesis of energy systems using aggregation approaches to define upper and lower bounds for the objective function with full time resolution, which will be explained in detail in Section 4.2.
The fact that a number of methods for linking typical periods were independently developed [14][15][16]85,106] shows the value of giving an overview over all of the methods employed so far.

Random Sampling
Another minor group of publications uses TSA based on random sampling. This means that the time steps or periods are randomly chosen from the original time series and considered to be representative for the entire time series. Most of the methods in the following deal with single time Energies 2020, 13, 641 22 of 61 steps instead of periods, which is an acceptable simplification when the impact of storage capacity or other intertemporal constraints on the system design can be neglected [166]. In contrast to the methods presented above, the time steps or periods are thus neither time-nor feature-based grouped or merged. Methods that are only run once based on random or user-specified selection will be defined as "3.3.1. Unsupervised". However, the majority of random sampling methods presented in the literature are repeated several times in order to determine a set of random samples that best captures the original time series' features. In the following, these methods are termed "3.3.2. Supervised".

Unsupervised
As with supervised random sampling methods, unsupervised random sampling methods can be applied to typical periods or single time steps. However, they appeared earlier than the supervised methods (2011 and 2012).
Ortiga et al. [179] introduced a graphical method for which a number of days from the dataset had to be defined. In a second step, the algorithm minimized the deviation between the duration curve of the original dataset and a duration curve of the chosen periods multiplied by a set of variable factors for the number of appearances of each TD.
With respect to the random sampling of time steps, Van der Weijde et al. [180] sampled 500 out of 8760 h to capture major correlations of the input data for seven regions.
However, in the years since 2012, these methods were substituted by supervised random sampling methods.

Supervised
Munoz et al. [181] applied supervised random sampling for 1 up to 300 daily samples out of a dataset of seven years, which were then benchmarked against the k-means clustering of typical hours. A similar method was used by Frew et al. [182], who took two extreme days and eight random days from the dataset and weighted each day so that the sum of squared errors to the original wind, solar and load distribution was minimized. This procedure was then repeated for ten different sets of different days, with the average of each optimization outcome calculated at the end. With respect to time steps, Härtel et al. [86] either systematically determined samples taking every n th element from the time series or randomly chose 10,000 random samples from the original dataset and selected the one that minimized the deviation to the original dataset with respect to moments (e.g., correlation, mean and standard variation). Another complex algorithm for representing seasonal or monthly wind time series was proposed by Neniškis et al. [51] and tested in the MESSAGE model. This approach took into account both the output distribution (duration curve) for a TD and the inter-daily variance, not to be exceeded by more than a predefined tolerance, while using a random sampling process. However, only the typical days for wind were calculated in this way, whereas the other time series (electricity and heat) were chosen using TS. Recently, Hilbers et al. [166] used the sampling method twice with different numbers of random initial samples drawn from 36 years. From a first run, the 60 most expensive random samples were taken and included in a second run with the same number of samples.
These methods are fairly comparable to the method of clustering TTSs. However, the initial selection of samples is based on random choice.

Miscellaneous Methods
Apart from the random sampling methods that cannot be systematically categorized with the scheme in Figure 4, an even smaller number of publications cannot be grouped in any way with respect to their TSA methods. For the sake of completeness, however, they are presented in the following.
Lee et al. [183] used an improved particle swarm optimization to optimize the UC of a power system with respect to fuel and outage costs. This method was based on an evolutionary algorithm that iteratively determined the "fittest" solutions and thus was quite comparable to supervised random sampling methods. However, the use of an own class of optimization algorithm is a unique feature. A similar approach to solve the UC problem of a grid-connected building with renewable energy sources and a battery was presented by Quang et al. [184]. In their work, a genetic algorithm and a particle swarm algorithm were used for different charge and discharge rates of the battery based on half-hourly time steps. It is worth mentioning that apart from these publications, a number of other works exist which use, among other methods, genetic algorithms or particle swarm algorithms to UC models. A comprehensive review on the methods to address the UC problem was given by Saravanan et al. [185]. However, these approaches are based on a survival of the fittest principle and not on a classic optimization problem so that an aggregation can only be applied by downsampling the time steps used for simulation. Moreover, these approaches are not directly applicable to combined UC and GEP models. Therefore, these methods are not further analyzed within the scope of this paper.
Xiao et al. [186] optimized the capacity of a battery and a diesel generator for an island system by searching for the optimal cut-off frequency at which the running of a diesel generator was more convenient without causing overly high fuel costs, whereas the battery capacity would be too large if it was run on a high frequency band. For this, an analysis based on discrete Fourier transform (DFT) was used, highlighting the different specific cost-dependent time scales on which different technologies operate.
More recently, Pöstges et al. [187] introduced an analytical approach to aggregate the time steps of a demand duration curve for a simple ESM without storage units and with only one energy type. Interestingly, this method led to a simplified problem formulation based on a minimum number of time steps without causing an error in the objective function. In this case, the supply technology costs are based on capacity-and operation-specific costs and the approach was inspired by an earlier work of Sherali et al. [6]. Sherali et al. proved in 1982 that the cost optimal operation of these simple systems can be interpreted as an optimization problem which is closely related to the peak load pricing theory introduced by Boiteux in 1949 [3] (English translation in 1960 [4]) and Steiner in 1957 [5].
To summarize, special methods that cannot be categorized in any way appear in an irregular manner, but can have special implications for the improvement of preexisting methods.

Overview and Trends in Aggregation
Due to the fact that the methods in Table 3 can be combined with each other and are either based on the careful selection of the modeler or on feature-based algorithms, it is an open question whether a clear trend can be observed with respect to the application of the methods.
For this purpose, Figure 11 shows the number of investigated publications containing at least one of the basic aggregation methods presented above. The random sampling and the miscellaneous methods were disregarded due to the small number of publications with no statistical significance. Moreover, the modified feature-based period merging methods were considered to belong to the same group of feature-based merging as the normal clustering methods for typical periods. Moreover, it should be highlighted that the search for literature was ended in July 2019 and that the trends are methodology-driven and not keyword-driven for the reasons given in Section 2.1.
At first sight, a comparison between the straightforward downsampling and feature-based segmentation reveals no trend. However, publications dealing with downsampling mainly address the question what TR is sufficient for a given problem, rather than improving the calculation time of a problem with a given TR without deteriorating the results. Furthermore, downsampling sometimes also only serves as a benchmark [37] that is outperformed by the other existing methods. In contrast to that, the development of slightly variated segmentation methods is ongoing and could even offer the option to iteratively increase the TR at crucial time steps instead of coarsening only.
one of the basic aggregation methods presented above. The random sampling and the miscellaneous methods were disregarded due to the small number of publications with no statistical significance. Moreover, the modified feature-based period merging methods were considered to belong to the same group of feature-based merging as the normal clustering methods for typical periods. Moreover, it should be highlighted that the search for literature was ended in July 2019 and that the trends are methodology-driven and not keyword-driven for the reasons given in Section 2.1. Figure 11. Trends in basic time series aggregation (TSA) methods for energy system models (ESMs) based on the major approaches presented in Section 3.
At first sight, a comparison between the straightforward downsampling and feature-based segmentation reveals no trend. However, publications dealing with downsampling mainly address the question what TR is sufficient for a given problem, rather than improving the calculation time of a problem with a given TR without deteriorating the results. Furthermore, downsampling sometimes also only serves as a benchmark [37] that is outperformed by the other existing methods. With regard to typical periods, the feature-based methods mainly represented by clustering have a rising trend, in contrast to the time-based definition of TSs and "averaging". Interestingly, the number of publications based on TSs kept increasing for some time after the development of the clustering approach in 2011. The reasons for this are twofold: First, the approach was only proposed by Domínguez-Muñoz et al. [55], but its superiority was not proven in an energy system model. Secondly, models such as the TIMES framework [46][47][48][49] have constantly been used ( [105,124,126,127]) since their publication. Accordingly, the method expires no sooner than the framework by which it is used unless the framework itself is updated. This highlights the inertia of new methods and the need for proper validation and benchmarking rather than the simple proposal of a method alone. Additionally, the share of RES is slowly increasing in energy systems and, accordingly, the requirements for models and their TR are changing as well [36][37][38][39][40][41].
Last but not least, the small number of publications that deal with a decrease in the TR, in contrast to the high number of typical period approaches, is notable. This is due to the relatively low potential of decreasing the number of time steps in energy system optimizations if the periodicity of day and night cycles is not exploited. However, the impact of larger time steps can be increased by magnitudes if it is combined with a typical period approach.
All things considered, Figure 11 shows that the future aggregation methods will most likely be feature-based, i.e., either consist of clustering only or rely on both clustering and segmentation. Table 4 sums up the key aspects for this trend towards feature-based merging. The combination of clustering and segmentation in order to compensate their remaining shortcomings named in Table 4 was first applied by Mavrotas et al. [54], later by Fazlollahi et al. [69] and a similar approach was recently used by Bahl et al. [74] and Baumgärtner et al. [77,78]. However, a detailed examination if there is an optimal trade-off between intra-period resolution and the number of periods remains a subject for future research.

Preserving Additional Information
As highlighted in Section 3, TSA methods are based on the representation of discrete time series by less time steps. These approaches are usually approximation methods, i.e., not analytically equivalent transformations, which often also include averaging procedures. From this, two major drawbacks arise:

•
Values of the original time series which could be especially important for the ESM are usually not preserved.

•
A reliable estimation of the deviation of the optimization result based on aggregated time series from the one based on full time series can usually not be given.
In order to address the first problem, Section 4.1 presents the approaches found in literature to keep additional information of the original time series considered to be important for the ESM during the aggregation process. Section 4.2 introduces methods to re-evaluate the quality of the aggregation after solving the aggregated ESM optimization to address the second issue.

A Priori Methods
Apart from the methods presented for TSA, the integration of periods or time steps considered to be "extreme" is a common procedure not only used in heuristic time-based, but also in feature-based approaches such as segmentation and clustering. Most of the methods are based on the assumption that extreme values in the input data lead to a design that is robust for all remaining time steps so that integrating these extreme periods ensures a feasible system design, despite the TSA.
In this section, approaches based on the input data only are presented, i.e., a priori methods. The integration of time series features considered to be extreme can happen in three different ways: by adding extreme periods to the set of typical periods, by the inclusion of extreme periods or time steps into typical periods using replacement, or by directly modifying the corresponding feature-based merging algorithm used for TSA in such a way that it automatically accounts for atypical periods.

Adding Extreme Periods
A straightforward approach to consider extreme values is to directly add them to the aggregated time series. Of course, this depends on the way in which the time series are aggregated. In the case of TTSs, i.e., single time steps that were derived from the original input data, extreme values can simply be taken from the original input data, e.g., Munoz et al. [181] forced the top ten peak demand hours to be individual clusters for the IEEE Reliability Test System [188]. The same holds true for ESMs based on TSs. As Devogelaer et al. [125] pointed out, the TIMES framework generally uses three daily levels as TSs: Day, night and a short peak slice (for electricity demand), which was also cited in other papers [36,127,128]. Additionally, Mallapragada et al. [39] used TSs without a peak TS, but highlighted that the original set-up in the ReEDS model [189], which the method was inspired by, used an additional TS that captured all the peak loads throughout a year. Similarly, Voll et al. [103] added two more time steps for winter and summer peak loads to their monthly-averaged demand profiles.
Extreme periods are usually defined as periods containing an extreme value of at least one attribute. For instance, Domínguez-Muñoz et al. [55] and Ortiga et al. [179] included the days containing the peak heating and peak cooling demands of their building models. The same was done for typical weeks by de Sisternes et al. [92,93] by either adding the week or a separate day containing the peak net-load hour. It was also pointed out that the integration of an additional day affected the approximation of the duration curve less than forcing the algorithm in selecting an entire week. Stadler et al. [113] included one peak demand day per month in their DER-CAM model. Wakui et al. [108][109][110], in turn, included one peak day for winter and one for summer regarding the energy demand of a residential building. Here, it is not clear if this applied for the overall demand of hot water and electricity, or the cumulative sum of energy demand throughout the day, as the peak value for hot water supply demand was smaller than that for the regular summer day. Marquant et al. [149,150] included a peak heating and peak electricity demand day for a district energy supply system, while neglecting the extreme values of possible PV feed-in in the latter publication [150]. Frew et al. [182] not only included maximum days, but also minimum days for each attribute into their POWER model [190]. For this, an extreme day was defined as a day that included the peak or minimum value of one of the three attributes of wind, solar, or e-demand averaged across all eligible regions. Merrick et al. [40] took one peak electricity demand day per month into account while neglecting the days with minimum capacity factors for wind and solar energy sources. Patteeuw et al. [94] added the coldest week, which coincided with the highest e-demand into a system model for a residential building, but again neglected the possible impact of solar thermal units and the PV panel. Heuberger et al. [158] integrated the day containing the peak electricity demand, neglecting the days of minimum potential wind and PV feed-in into a national hybrid GEP and UC model as well. Pfenninger et al. [37] tested various combinations of extreme days and weeks defined by the maximum or minimum wind and solar availability across the UK or the maximum or minimum difference between wind feed-in and electricity-demand.
For typical periods, Kotzur et al. [57] presented two different methods for adding extreme periods to aggregated time series following the clustering process based on TSA to typical periods. The first method simply appended the extreme periods, i.e., a period with a maximum or minimum (average daily or single time step) value was excluded from the cluster it was first assigned to and was separately integrated as a TD appearing only once. The second approach was to reassign all the days within the cluster, which are closer to the extreme day than to the cluster center, i.e., the extreme period became the representative of a new cluster.
Furthermore, the clustering tool tsam introduced by Kotzur et al. [57] can include typical periods with a maximum or minimum average across the period for a chosen attribute, i.e., extreme values with respect to the first momentum. This approach was also employed by Pfenninger et al. [37] for wind and solar time series. Similarly, Poncelet et al. [144] included the days containing the highest and lowest value for electricity demand and those with the highest and lowest average of wind and solar capacity factors for a GEP model to benchmark their own feature-based approach. However, a comprehensive study on whether time series for energy system optimizations can efficiently be clustered by means of their statistical momentums (average, standard variation, etc.) is still an open research question.
Recently, Pöstges et al. [187] showed that, for extremely simple energy systems with supply units with capacity-specific and operation time-specific linear cost functions, as well as only one considered energy commodity, the optimal operation time and necessary capacities can be derived analytically using the segments in the demand duration curve, in which each technology is the most profitable one.

Combinatorial Problem
A major drawback from which all of the methods presented above suffer is the fact that the number of extreme constellations grows exponentially with the number of time series taken into account. Figure 12 illustrates this for a hypothetical demand (D), wind capacity factor (W) and solar capacity factor (S) time series.
Energies 2020, 13, x FOR PEER REVIEW 27 of 71 regions. Merrick et al. [40] took one peak electricity demand day per month into account while neglecting the days with minimum capacity factors for wind and solar energy sources. Patteeuw et al. [94] added the coldest week, which coincided with the highest e-demand into a system model for a residential building, but again neglected the possible impact of solar thermal units and the PV panel. Heuberger et al. [158] integrated the day containing the peak electricity demand, neglecting the days of minimum potential wind and PV feed-in into a national hybrid GEP and UC model as well. Pfenninger et al. [37] tested various combinations of extreme days and weeks defined by the maximum or minimum wind and solar availability across the UK or the maximum or minimum difference between wind feed-in and electricity-demand. For typical periods, Kotzur et al. [57] presented two different methods for adding extreme periods to aggregated time series following the clustering process based on TSA to typical periods. The first method simply appended the extreme periods, i.e., a period with a maximum or minimum (average daily or single time step) value was excluded from the cluster it was first assigned to and was separately integrated as a TD appearing only once. The second approach was to reassign all the days within the cluster, which are closer to the extreme day than to the cluster center, i.e., the extreme period became the representative of a new cluster.
Furthermore, the clustering tool tsam introduced by Kotzur et al. [57] can include typical periods with a maximum or minimum average across the period for a chosen attribute, i.e., extreme values with respect to the first momentum. This approach was also employed by Pfenninger et al. [37] for wind and solar time series. Similarly, Poncelet et al. [144] included the days containing the highest and lowest value for electricity demand and those with the highest and lowest average of wind and solar capacity factors for a GEP model to benchmark their own feature-based approach. However, a comprehensive study on whether time series for energy system optimizations can efficiently be clustered by means of their statistical momentums (average, standard variation, etc.) is still an open research question.
Recently, Pöstges et al. [187] showed that, for extremely simple energy systems with supply units with capacity-specific and operation time-specific linear cost functions, as well as only one considered energy commodity, the optimal operation time and necessary capacities can be derived analytically using the segments in the demand duration curve, in which each technology is the most profitable one.

Combinatorial Problem
A major drawback from which all of the methods presented above suffer is the fact that the number of extreme constellations grows exponentially with the number of time series taken into account. Figure 12 illustrates this for a hypothetical demand (D), wind capacity factor (W) and solar capacity factor (S) time series. Figure 12. Impact of adding "shoulder values" as proposed by Frew et al. [182] as extreme values for a rising number of attributes.
As illustrated, the consideration of the minimum and maximum electricity demand D and D leads to two additional typical periods. Taking into account the extreme periods of an additional attribute leads to four potential extreme constellations, while the integration of three attributes As illustrated, the consideration of the minimum and maximum electricity demand D and D leads to two additional typical periods. Taking into account the extreme periods of an additional attribute leads to four potential extreme constellations, while the integration of three attributes potentially leads to eight extreme constellations, as in the publication of Frew et al. [182]. It is obvious that, for a certain number of locations and technologies, more extreme days (minimums and maximums) are needed than exist in a year (assuming that no period is extreme for more than one attribute). In the case of TDs including "shoulder values", i.e., the corners of the hypercube, this number is reached for only nine different attributes (2 9 = 512 > 365). If the extreme periods are considered for each attribute alone without deriving potential shoulder values, the number of extreme periods grows linearly with the number of time series which refers to the number of corners for the 1D figure, the number of sides of the square and the number of surfaces for the cube. In the case of TDs, including the extreme period or value for just one attribute each, this number is reached for (183 × 2 extreme values = 366 > 365) different attributes. This is the reason why some authors such as Pfenninger et al. [37] only considered the extreme values averaged across all regions. Other approaches aimed at automatically including certain extreme features in the once chosen typical periods [54,154] or searching for atypical days within the dataset with some additional constraints [141] which will be described in detail in the following two sub-sections.

Inclusion of Extreme Values or Additional Features
Given the fact that averaging across different periods or time steps, as is the case in many TSA approaches, leads to an underestimation of the inner-period variance, while manually adding periods considered to be extreme increases the computational load, different algorithms have been implemented on the basis of the inclusion of extreme values or additional features. Mavrotas et al. [54] synthesized seasonal 24 h profiles of heat demand using monthly averages. Of all the monthly averaged samples used for determining the seasonal profile, the overall maximum value was included in it. The adjacent time steps around the maximum were calculated with weighted averages in order to smoothen the profile, i.e., the day including the maximum value was weighted with 100% at the peak time step, with 75% in the neighboring time steps and 50% in the second adjacent time steps. As the cumulative sum of that profile no longer fitted the average cumulative sums of the used monthly profiles, the remaining 19 time steps per day were rescaled.
Green et al. [154] presented an approach for including dominant or common ramps into the profiles obtained by k-means clustering. For the dominant ramp method, the gradients of the centroid profiles were determined and, according to these, the mean gradients of those cluster members with the same gradient direction as the centroid profile were used to construct the ramps of the representative profile. The common ramp approach was based on the same idea of using the mean of gradients of pointing in the same direction. However, the choice which subset of gradients is used was made by the median of all gradients in each time step and not according to the gradient of the mean profile. A drawback of this method was that it could lead to significant offsets between the first and last time step of each period.
Regarding the integration of extreme periods, Kotzur et al. [57] also proposed the method to use the extreme period within a cluster as the cluster's representative, which should usually lead to a fairly conservative assumption, as this approach overestimates the frequency of extreme periods appearing in the time series.
Apart from that, some publications have aimed at increasing the robustness of their ESMs by artificially adding bias to the (aggregated) input data or favored stochastic optimization.
Spiecker et al. [116] used the stochastic E2M2s model implemented in GAMS to minimize the total annual costs of an energy system by establishing a recombining tree structure to the model consisting of two possible hydro power plant states and three possible wind feed-in states that changed in 2 h intervals. Furthermore, the storage levels across an entire year were also stochastically modeled. Wouters et al. [117] included variability of the season-based PV infeed into a neighborhood microgrid by splitting up the daily infeed into input-level histograms for each season. Then, the potential output profiles were determined by averaging all feed-in profiles within one season and the same cumulative feed-in level. Finally, the outputs of the PV panels for each season were determined using the seasonal average weighted by the days of occurrence at each feed-in level appearing in that season. Kools et al. [102] used synthesized PV profiles with minutely, quarter-hourly and hourly resolution, and artificially added fluctuations using a normal distribution and gamma distribution with a stochastic decomposition algorithm for a distributed generation system. Furthermore, the designs obtained for different temporal granularities were cross-compared with respect to the energy losses when operating the systems on a finer time scale.
Brodrick et al. [97] isolated three critical hours within six representative days for an integrated solar combined cycle through excessive testing and used this strongly reduced model for a multi-objective optimization based on an iteratively tightened CO 2 constraint which resembled an exhaustive approach. Although this method is not necessarily computationally less expensive, it differs from all the others because the aggregated amount of input data was not increased by this method.

Additional Constraints in Feature-Based Merging
Apart from assuring that the representation retains certain characteristics, methods that are even more sophisticated are capable of excluding extreme periods in the clustering process itself. For segmentation processes, Stein et al. [79] illustrated this, introducing a mixed integer program (MIP) that minimized the inter-time step differences for a given number of merging steps. Here, time steps not to be merged such as extreme values could be excluded with an additional side constraint. Moreover, it was assured that a maximum number of adjacent merges was not exceeded with an additional constraint. A similar approach was previously introduced in a publication by Fazlollahi et al. [69], in which the segmentation algorithm was based on iterative k-means clustering and maximum values were automatically excluded. Furthermore, segmentation was applied to typical periods that were determined using a clustering process to which extreme periods could be manually added. It is important to highlight that only maximum values were expected to be extreme. With respect to supply data such as the capacity factors of RES, it is trivial that periods with minimum values are likely critical as well.
With respect to an automatic inclusion of extreme days within a clustering algorithm, Zatti et al. [141] introduced the so-called k-MILP clustering, which is a modified version of the exact k-medoids algorithm and automatically excludes atypical periods. For this, the side constraint that each day from the original time series must be assigned to a representative day was relaxed so that the atypical days increasing the sum of distances the most could be excluded. However, the number of atypical days that were allowed to be excluded had to be set by an additional constraint. Moreover, additional constraints were added in order to assure that the sum over the repetition of representative days did not differ from that of the original data beyond a predefined share. Additionally, it was imposed that for some selected attributes, the extreme periods had to contain at least one day that was also close to the absolute extreme value of the respective attribute.
Apart from that, Gabrielli et al. [15] constrained the clustering procedure for TDs to maintain the maximum and minimum values of the heat and electricity demand profile used for a multi-energy district system, although this also included a solar input time series.
Concerning algorithms used for the integration of extreme events into TTSs, i.e., typical periods lasting for only one time step, a method based on a moving average has been proposed by Härtel et al. [86]. Here, the determined hourly TTSs derived from clustering were compared to their moving average within a 6 h window of the full time series. If more than 95% of these values were above or below the values in the cluster, the highest or lowest candidate within the system state cluster was chosen as representative.
The presented methods illustrate that considerable efforts have been made to integrate extreme periods into the clustering processes. However, as pointed out by Scott et al. [191], the extreme periods cannot be known in advance for most synthesis problems because the built capacities of each technology are an endogenous outcome from the optimizations, e.g., the peak capacity factors of wind turbines are not relevant if wind turbines are not chosen to be built in a greenfield energy system optimization. This imposes the need to gain information about possible designs of the energy system with preliminary optimizations, which ultimately led to the development of multi-level approaches.

A Posteriori Methods
The implementation of extreme periods normally increases the robustness of the aggregated energy system optimizations, but does not necessarily lead to feasible solutions for the full time series, for instance, because the component, for which an extreme value is integrated, is not chosen in the optimization. Storage units that smooth out the impact of extreme periods can be another reason why extreme values in the input time series are not necessarily the critical time steps in the energy system. Therefore, a number of publications focus on multi-level approaches in order to increase the robustness or operational exactness of aggregated energy system optimizations. The presented approaches can be divided into non-iterative and iterative methods. Figure 13 illustrates the interdependences of temporally aggregated energy system optimizations that motivate the inclusion of multi-stage approaches.

A Posteriori Methods
The implementation of extreme periods normally increases the robustness of the aggregated energy system optimizations, but does not necessarily lead to feasible solutions for the full time series, for instance, because the component, for which an extreme value is integrated, is not chosen in the optimization. Storage units that smooth out the impact of extreme periods can be another reason why extreme values in the input time series are not necessarily the critical time steps in the energy system. Therefore, a number of publications focus on multi-level approaches in order to increase the robustness or operational exactness of aggregated energy system optimizations. The presented approaches can be divided into non-iterative and iterative methods. Figure 13 illustrates the interdependences of temporally aggregated energy system optimizations that motivate the inclusion of multi-stage approaches. The main driver in the use of multi-stage approaches is based on the problems related to the inclusion of extreme periods. Due to the fact that the absolute importance of a single component with a given time series is unknown in advance, the impact of outliers within this time series is unknown as well. Therefore, different approaches aim at isolating certain information about potential energy system designs with preliminary optimizations in order to improve the aggregation process of the input data without increasing the size of the optimization problem. A second driver for multi-stage approaches is binary variables for design and operation, which significantly increase the complexity of large-scale energy system MILPs. However, the operational decisions depend on the design decisions and vice versa. Simply put, a component that is not chosen to be built is not operated. This can be exploited by deriving simpler aggregated design problems and separated optimization problems that can significantly reduce the complexity. Thirdly, not only aggregated ESMs but also the real energy systems face uncertain input data. TSA methods can thus be used to simplify models which are then re-calculated for slight variations in the input data. The resulting designs can then be compared to each other by checking the operational feasibility when being exposed to the time series of the other scenarios.
In the following, however, the approaches are divided into non-iterative approaches and iterative approaches, as iterative approaches focus on outperforming state-of-the-art solvers, while non-iterative approaches focus on the generation of fast and robust but suboptimal, or fast and optimal but only relatively robust, solutions.

Non-Iterative
Due to the fact that the main complexity of MILPs is caused by binary variables, Gabrielli et al. [15] introduced a method for reducing the number of binary operational variables, i.e., the on/off The main driver in the use of multi-stage approaches is based on the problems related to the inclusion of extreme periods. Due to the fact that the absolute importance of a single component with a given time series is unknown in advance, the impact of outliers within this time series is unknown as well. Therefore, different approaches aim at isolating certain information about potential energy system designs with preliminary optimizations in order to improve the aggregation process of the input data without increasing the size of the optimization problem. A second driver for multi-stage approaches is binary variables for design and operation, which significantly increase the complexity of large-scale energy system MILPs. However, the operational decisions depend on the design decisions and vice versa. Simply put, a component that is not chosen to be built is not operated. This can be exploited by deriving simpler aggregated design problems and separated optimization problems that can significantly reduce the complexity. Thirdly, not only aggregated ESMs but also the real energy systems face uncertain input data. TSA methods can thus be used to simplify models which are then re-calculated for slight variations in the input data. The resulting designs can then be compared to each other by checking the operational feasibility when being exposed to the time series of the other scenarios.
In the following, however, the approaches are divided into non-iterative approaches and iterative approaches, as iterative approaches focus on outperforming state-of-the-art solvers, while non-iterative approaches focus on the generation of fast and robust but suboptimal, or fast and optimal but only relatively robust, solutions.

Non-Iterative
Due to the fact that the main complexity of MILPs is caused by binary variables, Gabrielli et al. [15] introduced a method for reducing the number of binary operational variables, i.e., the on/off status of components. For this, the binary variables were modeled on the basis of a TD formulation obtained using k-means and linked to the fully resolved continuous variables by means of an assignment function. This approach did not necessarily lead to feasible solutions for less than six TDs, as the reconversion of hydrogen from the hydrogen storage involved was not able to match the thermal demand for a too limited number of operational modes.
A similar approach that focused on the reduction of binary variables was employed by Kannengießer et al. [140], who used the hierarchical clustering of sorted time series in a first step and determined the binary design variables of two ESMs. In a second step, the binary variables from the first step were taken as input parameters for a second iteration in which the capacities and (linearized) operation of the components were optimized for the full time series. This method was capable of identifying a feasible but not necessarily optimal system design with an overall computation time for the aggregated MILP and fully resolved LP that was smaller than the fully resolved MILP.
Apart from that, two recent publications dealt with the improvement of existing aggregation approaches for the input data. Sun et al. [135] introduced a cost-oriented two-level approach for solving an electricity investment model. Here, the model was independently solved for each input day and the cost factors for each unit were determined. These were dimensionally reduced with Laplacian Eigenmapping and then clustered for determining the cost-related TDs by choosing the medoid of each cluster in the dimensionally-reduced cost space, which was proven to be effective, compared to solely input data-based clustering.
Hilbers et al. [166] presented an approach based on random time steps. In a first run, a defined number of random samples were taken from 36 years of data and the energy system optimization (in the test case a power system model run with Calliope) was run once. From this, the 60 time steps with the highest variable costs were taken and introduced into a second set of random time steps that added up to the same total number of time steps. In order to avoid an overly conservative system design, the 60 extreme time steps are expected to appear only once in 36 years, which was considered with a corresponding small weight.

Iterative
Lin et al. [156] presented a two-stage approach for solving a semi-coarse model of a fully resolved MILP for cogeneration in energy-efficient buildings. For this, TDs were determined using k-means and the real days were chosen that were closest to the calculated centroids. The semi-coarse model was defined as an MILP with aggregated variables but a full number of constraints, while the coarse model was defined as an MILP with aggregated variables and constraints. Thus, the semi-coarse model was solved by solving the coarse model and iteratively adding violated constraints from the full model. The resulting semi-coarse model was an upper bound of the original problem with guaranteed feasibility, which was not the case for the coarse model. Here, storage units were taken into account and it was shown that the semi-coarse model had the same optimal value as the original model if the profiles were periodic and no intertemporal constraints reached across the periods.
A similar approach was introduced by Bahl et al. [157], who chose k-means clustering for determining TTSs for a distributed energy supply system without storage technologies. The system, optimized for the aggregated TTSs, was then operationally optimized for the full time series. If the system design was not feasible, additional feasibility time steps were defined for the aggregated optimization problem. When an operationally feasible design was obtained for both the aggregated and full time series, the difference between them was calculated and, if it was below a pre-defined threshold, the iteration was terminated. Otherwise, the number of TTSs was increased. It is noteworthy that a feasible operational optimization with the full time series for a system design based on an aggregated optimization is, in general, an upper bound for the original problem of a combined design and operational energy system optimization. Based on this initial approach, four consecutive publications [74,77,78,192] introduced an advanced iterative approach for simultaneously over-and underestimating the objective function of the original MILP by using TSA. In the latest of these publications [77], one branch for defining an upper bound and two branches for defining a lower bound were used, of which the larger one defined the tightest lower bound. The upper bound used k-means clustering and a randomly chosen further segmentation for determining the segmented TDs. For these, the design of the energy system was calculated and operationally optimized. If the design candidate was infeasible for the full time series, extreme values with no chronological order were added, as was performed by Bahl et al. [157]. Otherwise, the TR was increased [78]. The first of both the lower bounds was calculated using clustering to typical periods, segmentation and the relaxation of the determined time segments. This means that within the segments, the demand and operation only needed to lie within the maximum and minimum value of the original values in the segment. The second lower bound was obtained using a common branch-and-cut algorithm, which is also used in state-of-the-art solvers [78] such as Gurobi. For this, all binary variables were relaxed, which transformed the problem into an LP. In additional steps, the relaxed variables were consecutively fixed to binary variables, leading to a decision tree in which those branches were cut that did not improve the best solution obtained. Last but not least, the latest version of the approach [77] also used the seasonal storage formulation proposed by Kotzur et al. [14].
With respect to creating a robust system design, Gabrielli et al. [160] recently introduced an approach for creating artificial variance within given input data, deriving optimal energy systems from all the synthesized input scenarios and operationally testing these system designs for all other scenarios. The sum of the ratios between the system costs of all scenarios and the reference scenario divided by the number of scenarios was then denoted as optimality and the sum of the ratios of satisfied heat demand in all scenarios and the satisfied heat demand in the reference scenario divided by the number of scenarios was denoted as robustness. Besides, electricity could be taken from the grid at any time of the year. To depict situations in which the heat demand could not be fulfilled, a slack variable with arbitrarily high cost was introduced. In this manner, different system designs were examined with respect to their robustness and optimality. This led to the result that energy systems designed for a minimum emission of CO 2 also tended to be the most robust ones with respect to satisfying heat demand with a connection to the electricity grid only.
In another line of publications [66][67][68] by Yokoyama et al., semi-heuristic decomposition methods for energy systems without storage units or other intertemporal constraints were introduced, but with binary variables for both the design and operation of components. Here, the fact that operational binary variables generally depend on the design decision, i.e., if a unit is not built, the operational binary variables must be zero at any point in time, was exploited. In the first publication [66], the original MILP was sub-optimally, but feasibly solved and simplex variables were derived from the result. Then, sub-problems, each containing only one binary variable, were created, the optimal solution of the sub problem was determined for both assumptions of the binary variable and the difference between them was calculated. Based on this, the binary variables were set to either zero, one or remained a variable if the impact was small and positive. Then, the original MILP was solved again with partly fixed binary variables and, if a better solution was found, the process was repeated. Otherwise, it was terminated with a suboptimal solution.
In the second publication [67], the operational binary variables in the design problem were relaxed and the design binary variables were investigated using the branch and bound method. For each candidate, the operational sub-problems were optimized for each period. If the operation was infeasible, the design branch at the upper level was discarded. If all operational periods were feasible, the objective function of the overall problem was calculated and, if the objective function of the master problem was decreased, the design became the new incumbent. Otherwise, the branch in the upper, operationally-relaxed MILP was discarded again. In this way, the operational binary variables were not part of the branch and bound tree and could be calculated in parallel, exploiting the hierarchy between design and operation. This method was again not applicable to any system that included storage technologies.
In the most recent publication [68], this method was used in combination with a downsampling approach and further improved by defining bounds at the upper design and lower operational optimization level. This should help to discard solutions that would not be able to improve the objective function without calculating all the possible master and sub-problems. Additionally, an ordering strategy was also applied to increase the chance of discarding sub-optimal solutions even more rapidly.
In summary, multi-level approaches based on TSA in energy system optimizations try to exploit five different features that are not given in simple aggregation approaches: • Separating complicating binary variables from the vast majority of continuous variables.

•
Separating the design problem from the operational problem.

•
Obtaining feasible but suboptimal solutions instead of optimal but infeasible solutions for the fully resolved input data.

•
Deriving implications for a meaningful TSA from the system itself instead of the input data only.

•
Determining a more robust energy system by exposing the once optimized energy systems to different input data scenarios.
With respect to iterative approaches, however, it must be called into question as to whether these approaches are more efficient than well-known iterative decomposition approaches such as Bendersor Dantzig-Wolfe decomposition [193] (e.g., as used by Lara et al. [24] and Schwele et al. [194]).

Overview and Trends in the Integration of Additional Information
With respect to the methods to increase or even ensure the robustness of models optimized with aggregated time series, Figure 14 shows the number of publications that deal with at least one of the approaches presented above. Here, the publications from Sections 4.1.2 and 4.1.3 are summed up in one group, as both approaches do not increase the number of periods to be considered and thus do not suffer from the combinatorial problem presented in Section 4.1.1. Again, the trends are not keyword-driven, but methodology-driven for the reasons given in Section 2.1.
Energies 2020, 13, x FOR PEER REVIEW 33 of 71 optimization level. This should help to discard solutions that would not be able to improve the objective function without calculating all the possible master and sub-problems. Additionally, an ordering strategy was also applied to increase the chance of discarding sub-optimal solutions even more rapidly. In summary, multi-level approaches based on TSA in energy system optimizations try to exploit five different features that are not given in simple aggregation approaches:  Separating complicating binary variables from the vast majority of continuous variables.  Separating the design problem from the operational problem.  Obtaining feasible but suboptimal solutions instead of optimal but infeasible solutions for the fully resolved input data.  Deriving implications for a meaningful TSA from the system itself instead of the input data only.  Determining a more robust energy system by exposing the once optimized energy systems to different input data scenarios. With respect to iterative approaches, however, it must be called into question as to whether these approaches are more efficient than well-known iterative decomposition approaches such as Benders-or Dantzig-Wolfe decomposition [193] (e.g., as used by Lara et al. [24] and Schwele et al. [194]).

Overview and Trends in the Integration of Additional Information
With respect to the methods to increase or even ensure the robustness of models optimized with aggregated time series, Figure 14 shows the number of publications that deal with at least one of the approaches presented above. Here, the publications from Section 4.1.2 and Section 4.1.3 are summed up in one group, as both approaches do not increase the number of periods to be considered and thus do not suffer from the combinatorial problem presented in Section 4.1.1. Again, the trends are not keyword-driven, but methodology-driven for the reasons given in Section 2.1. Figure 14. Trends in methods to preserve additional information in time series aggregation (TSA) methods for energy system models (ESMs) based on the major approaches presented in Section 4.
In contrast to the clear trends in aggregation methods, the development of methods in the area of robustness is rather vague. The manual addition of extreme periods had a growing trend until 2016, but then drastically decreased again. As mentioned above, an extreme event in the input time series of a single attribute does not necessarily mean that it is also an extreme situation in the energy system. This is even more the case if storage capacities are considered. Moreover, the number of extreme periods is growing with the number of input time series, which makes this approach intractable for a large number of regions if all cases of potentially extreme periods are considered. This might explain why this method is slowly becoming unfavorable in times of growing ESMs. In contrast, the inclusion of extreme values, algorithmic considerations of extreme features within a given number of typical periods, or the definition of atypical days as extreme days are not subject to a combinatorial problem and therefore appear occasionally in the literature with a slightly rising trend. However, these methods fail to guarantee robustness. In contrast to the clear trends in aggregation methods, the development of methods in the area of robustness is rather vague. The manual addition of extreme periods had a growing trend until 2016, but then drastically decreased again. As mentioned above, an extreme event in the input time series of a single attribute does not necessarily mean that it is also an extreme situation in the energy system. This is even more the case if storage capacities are considered. Moreover, the number of extreme periods is growing with the number of input time series, which makes this approach intractable for a large number of regions if all cases of potentially extreme periods are considered. This might explain why this method is slowly becoming unfavorable in times of growing ESMs. In contrast, the inclusion of extreme values, algorithmic considerations of extreme features within a given number of typical periods, or the definition of atypical days as extreme days are not subject to a combinatorial problem and therefore appear occasionally in the literature with a slightly rising trend. However, these methods fail to guarantee robustness.
In contrast, the multi-stage approaches appear to have a clear upward trend, as they can be capable of guaranteeing robust but suboptimal solutions with respect to the non-aggregated time series. However, the convergence against the optimal solution can, to this end, only be guaranteed by increasing the number of typical periods and using a sophisticated iterative approach [77,78], which results in a resemblance to well-established and commercially-available solving algorithms. This leads to the question if convergence to the real optimum is the main target of aggregation methods, or if their focus will remain the creation of fast but satisfactorily accurate approximations that can be achieved by only two stages of design and operational optimization [140].
All things considered, the question of robustness is highly dependent on the size of the model, the considered attributes and the temporal interconnectedness. A field of future research thus remains the derivation of mathematical theorems, as introduced by Lin et al. [156] and Teichgräber et al. [41]. For example, the conditions under which an extreme input event leads to an extreme system situation or clear statements of under-and overestimation of the identified results for temporally-strongly coupled systems are of great interest.

Conclusions
This review of TSA methods for ESMs has revealed manifold key findings. Firstly, it is possible to systematically categorize the methods on the basis of their basic idea, the addressed problem and their compatibility. Secondly, the advances in TSA methods are clearly driven by shortcomings in both computational tractability and existing methods in models with changing requirements. Thirdly, it was shown that there are rival methods, of which the feature-based ones are usurping the time-based ones, as well as complementary methods. Moreover, compatible approaches can be applied stepwise and contain further sub-steps, such as clustering.
However, a systematic overview was lacking to this end, which this work has tried to rectify. One reason for this is also a major limitation of this literature review: As many publications focus more on the solvability of ESMs than on the applied aggregation methods itself, a keyword-or title-driven meta-analysis is not leading to a meaningful overview of existing methods and possible trends. This issue was addressed by defining a clear interval of publication dates and providing the most holistic categorization of the methods found in literature as possible.
Apart from that, open research questions are derived: • The question of the most important statistical features of the time series to be kept, i.e., whether the clustering of statistical features in a lower dimensional space is superior to the traditional TSA methods. • A way to measure the accuracy of different aggregation methods a priori by defining bounds that are also valid for the computationally intractable problem.

•
Enhancing the convergence rate of iterative methods in order to compete with the branch-and-bound or decomposition methods of commercial solvers • Expanding mathematical theorems regarding upper and lower bounds as introduced by Yokoyama et al. [67], Lin et al. [156] and Teichgräber et al. [41] to more general ones applicable to strongly temporally-interconnected ESMs.

•
Developing an approach that is capable of identifying the most critical situations in input time series in a non-empirical manner. This could lead to robust optimizations not based on MonteCarlo-like approaches.
However, it should be highlighted that temporal aggregation methods are always based on the complexity reduction of not perfectly redundant input data and thus introduce deviations from fully resolved models. Therefore, it should only be used for the sake of computational tractability. Apart from that, the clustering procedures can also be time-intensive, which can lead to trade-offs between the computational load of clustering and the saving of computational resources using the aggregated models.
Moreover, the trends in TSA also imply that the frequently used k-means, k-medoids and hierarchical clustering approaches to determine TDs are still state of the art. The review can thus be seen as both a useful introduction for researchers new to the topic of TSA and as a detailed guide for a standardized complexity reduction in ESMs including potential future research fields.

Conflicts of Interest:
The authors have no conflicts of interest to declare.     Single-node electricity model based on [200], no storage technologies,

Customer and Unit Partitioning
It is noteworthy that aggregations based on time series do not necessarily mean the aggregation of the temporal dimension. Instead, similar time series can also be clustered in order to generate a smaller number of possible technologies or regions with similar demand behavior or similar technologies. This is referred to as customer [136] and unit [204] partitioning. Given the fact that, in some cases, new methods were first introduced in this field of TSA, a short overview of methods used in this field could imply possible approaches for the TSA methods presented above.
One of these examples was published by Alzate et al. [195], who used spectral clustering with an out-of-sample extension to cluster customer profiles for electricity demand. For this, 123 time series were used for training and 122 for validation. The approach significantly outperformed the selection found by k-means clustering.
Benítez et al. [196] implemented a modified version of k-means for customer partitioning that was not only capable of clustering groups of customers with their daily profiles, but also with respect to their yearly profiles.
This means that each cluster center of a (daily) period followed a trajectory throughout the year resulting in representative yearly profiles. Sun et al. [136] used a C-vine copula-based mixture model to cluster residential electricity demands by maximizing a log-likelihood function, which slightly outperformed k-means clustering but was computationally significantly more demanding. Yang et al. [201] used k-shape clustering for forming typical residential daily profiles before applying it to TSA purposes. This highlights the importance of cross-linking different research fields within energy system analysis. Recently, Motlagh et al. [203] applied two different clustering algorithms on electricity demands for customer partitioning. The first one included a preliminary principal component analysis to decrease the complexity, followed by clustering using the adjacency metric, while the second one was model-based and transferring the profiles into the phase space. Then, a mapping strategy based on neural regression was used and the Euclidean distance between the map parameters was calculated, which outperformed the first, feature-based approach.
With respect to unit partitioning, Pavičević et al. [204] introduced three levels of potential clustering scopes.
The first was based on similar characteristics if the components were small, at the same location, with the same commodities and comparable temporal characteristics, while the second focused on the same location and the same commodities and the third only on the same commodities, such as averaged heat and electricity demands of industrial sites and residential buildings. Haikarainen et al. [198] used k-means clustering for grouping different nodes in an energy network that were then represented as a single component with averaged costs and the traits of all of the included technologies (supply, demand, storage). Based on decisions made for a coarse clustering, the number of clusters was stepwise increased and, in a final run, all binary decision variables were retained and a linear problem was solved for the fully resolved energy grid. This means that the clustering was not based on temporal features, but on spatial ones. This means that the units were merged based on their distance to each other, not their traits.
This highlights that temporal, spatial, and technological information can theoretically be aggregated based on their own or based on each other. Thus, TSA is traditionally seen as the aggregation of time series derived from temporal information.