Forecasting Hierarchical Time Series in Power Generation

: Academic attention is being paid to the study of hierarchical time series. Especially in the electrical sector, there are several applications in which information can be organized into a hierarchical structure. The present study analyzed hourly power generation in Brazil (2018–2020), grouped according to each of the electrical subsystems and their respective sources of generating energy. The objective was to calculate the accuracy of the main measures of aggregating and disaggregating the forecasts of the Autoregressive Integrated Moving Average (ARIMA) and Error, Trend, Seasonal (ETS) models. Speciﬁcally, the following hierarchical approaches were analyzed: (i) bottom-up (BU), (ii) top-down (TD), and (iii) optimal reconciliation. The optimal reconciliation models showed the best mean performance, considering the primary predictive windows. It was also found that energy forecasts in the South subsystem presented greater inaccuracy compared to the others, which signals the need for individualized models for this subsystem.


Introduction
The advent of Industry 4.0 revolutionized factories worldwide, since it allowed the connectivity between measuring machines and the automation of companies, distributing the capacity to collect massive volumes of data [1]. In high-level data analysis, forecasting models allow the extraction of behavior patterns, as well as the prediction of future values for the collected data set [2].
In the above-mentioned scenario, the construction of predictive models is gaining prominence in the literature [3][4][5], since economic agents deal with uncertainty in multiple spheres and aim to achieve the best results using available resources [6]. Developing acceptably accurate models presents a meaningful challenge, as prediction is a technique that deals with risk and there will always be a fundamental error associated with it. The best model is the one that most adequately represents the phenomenon of interest.
In relation to the object of our study, power generation, there are several forecasting applications: (i) classical time series models like the autoregressive moving average, autoregressive integrated moving average, and generalized autoregressive conditional heteroscedastic among others [7,8]; (ii) pre-processing techniques like spectrum analysis, wavelets, and Fourier analysis [9]; and, (iii) machine learning approaches such as neural networks, fuzzy systems, and support vector machine [10]. Alternatively, hybrid models aim to combine machine learning representations with different methods. These methods include focused time-delay neural networks [11], wavelet neuro-fuzzy systems [12], finite-impulse response neural networks [13], local feedback dynamic fuzzy

Materials and Methods
The secondary data used in this study correspond to the amounts of power generated by each of the Brazilian electrical subsystems (North, Northeast, Southeast/Midwest, and South). We separated these data according to the source of energy (wind, hydroelectric, thermal, solar, and nuclear). Data were obtained from the National Electric System Operator [28], due to their reliability. The observations of hourly power generation (GWh) were made during the period from January 2018 to January 2020, making a total of 17,521 h.
Based on Hyndman et al. [19], we present a schematic representation of the Brazilian energy generation system, comprising a three-level hierarchical structure ( Figure 1). Level 0 represents the total energy generated in Brazil (completely aggregated series). Level 1 denotes each of Brazil's electrical subsystems (first level of disaggregation). The last level, Level 2, represents each of the energy generating sources (Level k). According to this framework, it is possible to identify the most disaggregated time series (in this case k = 2). Table 1 shows the amounts of power generation in Brazil (GWh), according to generating sources and electrical subsystems. There is a predominance of hydroelectric generation (73%), making the Brazilian electrical matrix one of the cleanest in the world. At the same time, the Southeast/Midwest subsystem accounts for more than half (56%) of all energy generated in the country.  Table 1 shows the amounts of power generation in Brazil (GWh), according to generating sources and electrical subsystems. There is a predominance of hydroelectric generation (73%), making the Brazilian electrical matrix one of the cleanest in the world. At the same time, the Southeast/Midwest subsystem accounts for more than half (56%) of all energy generated in the country. Routines were implemented using the R ® programming language [29]. The R-package HTS was used to calculate the bottom-up, top-down, optimal combination reconciliation and trace minimization reconciliation. HTS is available at: https://cran.rproject.org/web/packages/hts/index.html. Although HTS includes functions for creating, plotting and forecasting hierarchical time series, it has some limitations. Those limitations include the fact that it has only three built-in forecasting options: ARIMA, ETS, and random walks [19]. This paper will use the ARIMA and the ETS models since they have automatic adjustment and allow consideration of factors such as the trend and seasonality of the data set. The computer used to execute the algorithms had CPU Intel Core i5-7200 2.70 GHz, RAM of 16 GB, and operating system Windows 10 x64. In the next subsection, we present the hierarchical reconciliation models used in the present paper, as well as the forecasting models.

The Bottom-Up (BU) Approach
The BU procedure requires first providing forecasts for every series at the bottom-level, and then summing these to generate forecasts for all the levels of the hierarchical structure [30]. In its simplicity, this approach neglects the relations between time series and works, mainly unsuccessfully, on highly disaggregated data. These data tend to have a low signal-to-noise ratio [27].  Routines were implemented using the R ® programming language [29]. The R-package HTS was used to calculate the bottom-up, top-down, optimal combination reconciliation and trace minimization reconciliation. HTS is available at: https://cran.r-project.org/web/packages/hts/index.html. Although HTS includes functions for creating, plotting and forecasting hierarchical time series, it has some limitations. Those limitations include the fact that it has only three built-in forecasting options: ARIMA, ETS, and random walks [19]. This paper will use the ARIMA and the ETS models since they have automatic adjustment and allow consideration of factors such as the trend and seasonality of the data set. The computer used to execute the algorithms had CPU Intel Core i5-7200 2.70 GHz, RAM of 16 GB, and operating system Windows 10 x64. In the next subsection, we present the hierarchical reconciliation models used in the present paper, as well as the forecasting models.

The Bottom-Up (BU) Approach
The BU procedure requires first providing forecasts for every series at the bottom-level, and then summing these to generate forecasts for all the levels of the hierarchical structure [30]. In its simplicity, this approach neglects the relations between time series and works, mainly unsuccessfully, on highly disaggregated data. These data tend to have a low signal-to-noise ratio [27]. According to the hierarchy (Figure 1), we first make h-step-ahead forecasts for all the bottom-level time series (n = 14): y AA,t ,ŷ AB,t ,ŷ AC,t ,ŷ BA,t ,ŷ BB,t ,ŷ BC,t ,ŷ BD,t ,ŷ CA,t ,ŷ CB,t ,ŷ CC,t ,ŷ CD,t ,ŷ DA,t ,ŷ DB,t ,ŷ DC,t .
(1) Summing these, we obtain h-step-ahead forecasts for the rest of the series: y t =ŷ AA,t +ŷ AB,t +ŷ AC,t +ŷ BA,t +ŷ BB,t +ŷ BC,t +ŷ BD,t +ŷ CA,t +ŷ CB,t +ŷ CC,t +ŷ CD,t +ŷ DA,t +ŷ DB,t +ŷ DC,t . y A,t =ŷ AA,t +ŷ AB,t +ŷ AC,t . y B,t =ŷ BA,t +ŷ BB,t +ŷ BC,t +ŷ BD,t . y C,t =ŷ CA,t +ŷ CB,t +ŷ CC,t +ŷ CD,t . y D,t =ŷ DA,t +ŷ DB,t +ŷ DC,t . (2) According to [19], it is possible to arrange the equations expressed in (2) into an algebra notation. Below is a complete notation for this problem: y t y A,t y B,t y C,t y D,t y AA,t y AB,t y AC,t y BA,t y BB,t y BC,t y BD,t y CA,t y CB,t y CC,t y CD,t y DA,t y DB,t y DC,t AA,t y AB,t y AC,t y BA,t y BB,t y BC,t y BD,t y CA,t y CB,t y CC,t y CD,t y DA,t y DB,t y DC,t Alternatively, the notation presented in (3) can be reformulated in a compact way by applying the summing matrix. Thus, the bottom-up approach can be represented as: where y t is an n-dimensional vector of h-step-ahead forecasts for the total energy, S is the summing matrix, andb t is an m-dimensional vector of h-step-ahead forecasts for each of the sources of energy at bottom-level. An advantage of this procedure is that we are forecasting at the bottom-level of a hierarchy. Consequently, no information is missed due to aggregation [17].

The Top-Down (TD) Approach
Top-down methods operate with strictly hierarchical aggregation structures, not with grouped structures. They involve first making forecasts for the Total level y t , and next disaggregating these down the hierarchy [17]. Let p 1 , . . . , p m be a set of disaggregation proportions that deliver the forecasts of the Total series, which are to be distributed in order to obtain forecasts for all series at the bottom-level y AA,t = p 1ŷt , y AB,t = p 2ŷt , y AC,t = p 3ŷt . y BA,t = p 4ŷt , y BB,t = p 5ŷt , y BC,t = p 6ŷt , y BD,t = p 7ŷt . y CA,t = p 8ŷt , y CB,t = p 9ŷt , y CC,t = p 10ŷt , y CD,t = p 11ŷt . y DA,t = p 12ŷt , y DB,t = p 13ŷt , y DC,t = p 14ŷt . (5) This can be rewritten using matrix notation. If we stack the set of proportions in an m-dimensional vector p = (p 1 , . . . , p m ) , we have the bottom-level h-step-ahead predictions. Overall, for a given set of proportions, top-down approaches can be written as: The main TD models stipulate disaggregation proportions according to the historical proportions of the data. Among the main models of this approach, we highlight the following three: (i) top-down Gross-Sohl method A (TDGSA), (ii) top-down Gross-Sohl method F (TDGSF), and (iii) Top-down forecast proportions (TDFP) ( Table 2). Additional details and demonstrations of Table 2 can be obtained from [18,31]. Table 2. TD disaggregation proportions according to the historical proportions of the data.

TD Gross-Sohl Method A TDGSA TD Gross-Sohl Method F TDGSF TD Forecast Proportions TDFP
Each proportion p j reflects the average of the historical proportions of the bottom-level series y j,t , t over the period t = 1, . . . , T relative to the total aggregate y t .
for j = 1, . . . , m. Each proportion p j takes the average historical value of the bottom-level series y j,t related to the average value of the total aggregate y t .
j,t is the sum of the h-step-ahead forecasts below the node that is l levels above node j.

The Optimal Reconciliation Approaches
The optimal reconciliation approach proposed by [19] consists of an ordinary least squares problem based on the calculation of independent projections for all hierarchical levels, then applying a regression model to optimize the combination of these forecasts. According to [32], we can write the base prediction as:ŷ where β t+h|t represents the unknown conditional mean of the most disaggregated series, and ε h is the error with mean of zero and covariance matrix h . If h were known, the estimator of β t+h|t would lead to the following weighted least squares, producing reconciled forecasts, as follows: where If the base forecastsŷ t+h|t are unbiased, then the reconciled forecasts y t+h|t will be unbiased, provided that SPS = S [19]. This condition is valid for this reconciliation procedure for the bottom-up, although not for the top-down, methods. Consequently, the top-down approaches will never give unbiased reconciled forecasts, even if the base forecasts are unbiased. Additionally, [27] proved that, in general, h is not known and not identifiable. The covariance matrix of the h-step-ahead reconciled forecast errors is given by the following expression: for any P such that SPS = S, then W h = Var (y t+h −ŷ t+h|t ) = E(ê t+h|tê t+h|t ) is the covariance matrix of the corresponding h-step ahead base forecast errors. The purpose is to get the matrix P that minimizes the error variances of the reconciled forecasts which are on the diagonal of the covariance matrix Var (y t+h − y t+h|t ) . Finally, [27] demonstrated that the optimal reconciliation matrix P that minimizes the trace of SPW h P S =, such that SPS = S, and the optimal reconciled forecasts, respectively, are given by: which is introduced as the MinT (minimum trace) estimator. The next step consists of estimating W h , a matrix of order n. Wickramasuriya, Athanasopoulos and Hyndman [27] proposed the following procedures (Table 3) to obtain the matrix: Table 3. Hierarchical forecasting for electricity generation based on the ARIMA procedure.

Procedure Description
OLS This is the most simplifying premise, and collapses the MinT estimator to the OLS estimator, proposed by Hyndman et al. [19]. This is optimal when the base forecast errors are uncorrelated and equivariant. WLSv , is the unbiased sample covariance estimator of the in-sample one-step-ahead base forecast errors. In this case, we can describe MinT as a WLS estimator applying variance scaling [27].

WLSs
W h = k h Λ, ∀h where k h > 0 and Λ = diag(S1) with 1 being a unit column vector of dimension n. We assume that each of the bottom-level base forecast errors has a variance k h and is uncorrelated between nodes. Consequently, every element of the diagonal Λ matrix receives the number of forecast error variances contributing to that aggregation level [27]. This estimator depends only on the grouping structure of the hierarchy.

MinT (Sample)
W h = k w W 1 , ∀h where k h > 0, the unrestricted sample covariance estimator for h = 1 [27]. In the results section, we denote this as MinT (Sample).

MinT (Shrink)
comprising the diagonal entries of W 1 , and λ D is the shrinkage intensity parameter. Thus, off-diagonal elements of W 1 are shrunk toward zero and diagonal elements (variances) remain unchanged [27]. Wickramasuriya, Athanasopoulos and Hyndman [27] suggested a scale and location invariant shrinkage estimator by parameterizing the shrinkage in terms of variances and correlations: wherer i j is the ijth element ofR 1 , the 1-step-ahead sample correlation matrix to shrink it toward an identity matrix.

ARIMA and ETS Formulation
ARIMA is one of the most-widely-used time series approaches for forecasting power generation [33]. Although studies have shown that ETS outperforms ARIMA [34], it is recommended to keep ARIMA Energies 2020, 13, 3722 7 of 17 as a reference model during the forecasting process. Moreover, several statistical software packages, like R ® , provide automatic model identification and parameter estimation skills for both ARIMA and ETS [17]. Professor Hyndman [19] developed the HTS package initially based on these predictive models. The present paper aims to test different approaches to optimal forecast reconciliation and, to do so, only the ARIMA and ETS models will be used. It is recommended that future studies extend these forecasting procedures using different predictive models, such as machine learning ones.
ARIMA was proposed by [33]. It is a linear forecasting method for dealing with stationary time series [34]. In the initial step, a time series is built stationary by differencing d times along with some nonlinear transformations, such as logging [34]. The consequential data are recognized as a linear function of past p data values and q errors (11), i.e., modeled as an autoregressive moving average (ARMA) model, where y t denotes real value at time t, ε t describes the error sequence: it is supposed to be white noise and Gaussian distributed (0, σ 2 ). ∅ i for (i = 1, 2, . . . , p) are autoregressive (AR) coefficients and Θ j for ( j = 1, 2, . . . , q) are moving average (MA) coefficients. p and q are integers referred to as model orders. The time series model is denoted as ARIMA(p, d, q) [35,36]. According to [34], the group of exponential smoothing methods utilizes the principle of weighted averages of past information for making forecasts. Since its formulation in 1950, a variety of exponential smoothing methods have been developed. All exponential smoothing methods were initially classified by [37], which has been continued by [38][39][40]. ETS stands for error, trend, and seasonality elements. As pointed by [34], the usual representation for these patterns involves a state vector x t = (l t , b t , s t , s t−1 , . . . , s t−m+1 ) , and the state space equations [39] have the resulting structure: where (ε t ) denotes a Gaussian white noise (0, σ 2 ) and µ t = w (x t − 1). The model with additive error has r t (x t − 1) = 1, so y t = µ t + ε t . The model with multiplicative errors has r t (x t − 1) = µ t = µ t , so y t = µ t (1 + ε t ). Consequently, ε t = (y t − µ t )/µ t is a relative error for the multiplicative model and any value of r t (x t − 1) will lead to the identical point forecast for y t [34,39].

Evaluating Forecast Accuracy
According to [20], there are several accuracy metrics, such as mean absolute percentage error (MAPE), mean absolute error (MAE), mean absolute scaled error (MASE), or root-mean-square error (RMSE), to evaluate the performance of point prediction methods, defined as follows: where y t is the amount of power generation at time t,ŷ t is the fitted value for power generation, and MAE in−sample,naive is the MAE generated by a naive forecast. Specifically, in studies of hierarchical time series, the MAPE indicator appears the most frequently in the literature [41][42][43]. MAPE was also the selected metric for the present paper (Figures 2 and 3). Complementarily, MAE, MASE, and RMSE were estimated, and the results can be found in the Appendix A (Figures A3 and A4). The values of the MAPE, MAE, MASE and RMSE statistics were obtained using a weighted average, with proportions from Table 1.
results of all linear reconciliation methods, such as OLS and WLS, with variations. Moreover, the MinT (Sample) approach returns the most accurate, coherent forecasts for all levels considering just the first forecast horizons. However, as the predictive window grows, the BU method becomes more accurate. Furthermore, the performance of the BU model increases as the time series disaggregate.
As expected, the results obtained using the top-down technique did not present good predictive results, since it is intended to generate forecasts for level 0, with worse accuracy for the other levels. Both BU and TD present disadvantages: they do not take the correlation among the series at each level into account.
The other accuracy metrics presented in the appendix (MAE, MAE, and RMSE) reinforce the results found. In general, the performance of the optimal reconciliation models, by trace minimization, provides more uniform estimates and better predictive potential for the first hours of the predictive horizon ( Figures A3 and A4).    performance, even with the increase of the forecast horizon hours. The average performance of the trace minimization (MinT) models shows stability, considering all hierarchical levels. As shown in Figure 2, the ETS-based predictive model shares some similarities with the ARIMA model. The BU technique is better for the most disaggregated levels, whereas the TD technique stands out only at the more aggregated levels. Note that the trace minimization procedures show significant gains over the classic linear models, namely OLS and WLS.

Results and Discussion
Figure 2, below, shows the predictive result obtained, using the ARIMA model, considering a predictive window of nine hours (h = 1, . . . , 9). Note that the model was estimated, taking the main hierarchical adjustment approaches into account, for the following levels: (i) total power generation in Brazil (Level 0), (ii) total energy generation by electrical subsystem (Level 1), and (iii) total energy generation by the energy generating source (Level 2). For Level 1, four forecasts (one for each electrical subsystem) were estimated. For Level 2, 14 forecasts (one for each energy source) were estimated.
Therefore, we estimated 1539 predictive models satisfying the following proportions: (i) 81 models for Level 0, (ii) 324 models for Level 1, and (iii) 1134 models for Level 2. The MAPE calculation for Levels 1 and 2 was based on a weighted average of the predictive errors. The weighting factors used are shown in Table 1. The performance of each predictive model, divided by the forecast horizon, is illustrated by a color scale. The green colors indicate the most accurate forecasts, while the red colors symbolize less accurate forecasts. The best forecasts, for each of the predictive horizons, are highlighted in bold. The last column of Table 1 presents the average performance for each forecast horizon (h) for each hierarchical approach.
As pointed by [27], the MinT procedure has a useful feature: it systematizes results into a unique analytical solution that incorporates information about the correlation structure of the entire dataset. Additionally, the minimum trace reconciliation, with or without regularization, presented the best results of all linear reconciliation methods, such as OLS and WLS, with variations. Moreover, the MinT (Sample) approach returns the most accurate, coherent forecasts for all levels considering just the first forecast horizons. However, as the predictive window grows, the BU method becomes more accurate. Furthermore, the performance of the BU model increases as the time series disaggregate.
As expected, the results obtained using the top-down technique did not present good predictive results, since it is intended to generate forecasts for level 0, with worse accuracy for the other levels. Both BU and TD present disadvantages: they do not take the correlation among the series at each level into account.
The other accuracy metrics presented in the Appendix A (MAE, MAE, and RMSE) reinforce the results found. In general, the performance of the optimal reconciliation models, by trace minimization, provides more uniform estimates and better predictive potential for the first hours of the predictive horizon ( Figures A3 and A4).
In addition to the ARIMA predictive model, Figure 3 presents the same forecasting procedures. However, they are based on the ETS automatic adjustment model. The objective is to show the influence of different forecasting methods for each hierarchical reconciliation model. In general, the error percentage produced by the ETS model was slightly higher than that produced by the ARIMA model. Figure 3 also shows the influence of trace minimization procedures (MinT) on the improvement of predictive performance. In particular, the MinT models have good predictive performance, even with the increase of the forecast horizon hours.
The average performance of the trace minimization (MinT) models shows stability, considering all hierarchical levels. As shown in Figure 2, the ETS-based predictive model shares some similarities with the ARIMA model. The BU technique is better for the most disaggregated levels, whereas the TD technique stands out only at the more aggregated levels. Note that the trace minimization procedures show significant gains over the classic linear models, namely OLS and WLS. Figures 2 and 3 present some limitations. In general, it is not possible to test the predictive influence of each of the subsystems within the established forecast horizon. To show this problem, Figure 4 presents a predictive comparison (MAPE) for each of the Brazilian electrical subsystems, considering the nine-hour predictive horizon. On the left is the technique with the best aggregation/disaggregation performance (BU) for the ARIMA model. On the right is the technique with the best average performance (MinT) for the ETS automatic selection model. Figure 4 thus shows a negative influence of the "south" electrical subsystem in the global measures of accuracy, especially from a predictive horizon of three hours onward. This system should be analyzed more thoroughly to identify energy sources located in the "south" subsystem that contributed most to the predictive instability of this system. Simultaneously, the use of individualized predictive models for this "south" system can be a good strategy, since unique climatic conditions exist in southern Brazil. Figure 4 thus shows a negative influence of the "south" electrical subsystem in the global measures of accuracy, especially from a predictive horizon of three hours onward. This system should be analyzed more thoroughly to identify energy sources located in the "south" subsystem that contributed most to the predictive instability of this system. Simultaneously, the use of individualized predictive models for this "south" system can be a good strategy, since unique climatic conditions exist in southern Brazil. Figures A1 and A2 (Appendix A) present the accuracy measure of the ARIMA and ETS models in detail, considering energy sources versus electrical subsystems. These results reinforce those in Figure 4, indicating instability in the southern subsystem, especially wind energy data.
Finally, some limitations of the present paper are recognized here. First, predictive models are based on past information evaluable, so the presented results cannot be extrapolated for different contexts and other time periods. Additionally, it is necessary to incorporate other predictive models to make the results more robust. In future research, it is recommended that models which integrate high-frequency data, e.g., the Wavelet approach, be adopted.

Conclusions
Analysis of the energy market is complicated. It involves the relationship between forecasting models and uncertainty, distinctly regarding the stochastic behavior of variables. The present paper is aimed at policymakers, offering a forecasting tool that deals with grouped time series. It also proposes a new forecasting approach, based on hierarchical modeling of the energy generation in Brazil.
The present paper introduces the use of trace minimization procedures (MinT) to aggregate and disaggregate forecasts based on the ARIMA and ETS models. MinT models performed better than the classic linear approaches, such as OLS and WLS. The MinT models also have high reliability for short predictive horizons. It is noteworthy that both hierarchical procedures and forecasting methods influence the predictive values of power generation in Brazil. Despite its advantages, the optimal reconciliation approach also has some limitations. This method could be unduly influenced by the sample period, and thus its ranking might change for other periods.  Figure 4, indicating instability in the southern subsystem, especially wind energy data.
Finally, some limitations of the present paper are recognized here. First, predictive models are based on past information evaluable, so the presented results cannot be extrapolated for different contexts and other time periods. Additionally, it is necessary to incorporate other predictive models to make the results more robust. In future research, it is recommended that models which integrate high-frequency data, e.g., the Wavelet approach, be adopted.

Conclusions
Analysis of the energy market is complicated. It involves the relationship between forecasting models and uncertainty, distinctly regarding the stochastic behavior of variables. The present paper is aimed at policymakers, offering a forecasting tool that deals with grouped time series. It also proposes a new forecasting approach, based on hierarchical modeling of the energy generation in Brazil.
The present paper introduces the use of trace minimization procedures (MinT) to aggregate and disaggregate forecasts based on the ARIMA and ETS models. MinT models performed better than the classic linear approaches, such as OLS and WLS. The MinT models also have high reliability for short predictive horizons. It is noteworthy that both hierarchical procedures and forecasting methods influence the predictive values of power generation in Brazil. Despite its advantages, the optimal reconciliation approach also has some limitations. This method could be unduly influenced by the sample period, and thus its ranking might change for other periods.
Therefore, the use of other predictive models, such as those based on analogs, machine learning, and other hybrid techniques, for example, is recommended. For future research, fine-tuning forecasts of the "south" electrical subsystem, as well as testing the accuracy of the hierarchal methods by using new forecasting approaches, is also recommended.
Finally, the present study contributes to the energy planning processes of different agents, given that understanding energy generation patterns is singularly important for minimizing risks and supporting reliable production planning. Good forecasts for future energy generation can support operational arrangements since energy supply and demand impact spot market sales prices. Acknowledgments: The authors would like to thank the National Council for Scientific and Technological Development (CNPq) and Companhia Energética Integrada (CEI) for supporting this research.

Conflicts of Interest:
The authors declare that there is no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: