Prediction Interval Adjustment for Load-Forecasting using Machine Learning

: Electricity load-forecasting is an essential tool for effective power grid operation and energy markets. However, the lack of accuracy on the estimation of the electricity demand may cause an excessive or insufﬁcient supply which can produce instabilities in the power grid or cause load cuts. Hence, probabilistic load-forecasting methods have become more relevant since these allow an understanding of not only load-point forecasts but also the uncertainty associated with it. In this paper, we develop a probabilistic load-forecasting method based on Association Rules and Artiﬁcial Neural Networks for Short-Term Load Forecasting (2 h ahead). First, neural networks are used to estimate point-load forecasts and the variance between these and observations. Then, using the latter, a simple prediction interval is calculated. Next, association rules are employed to adjust the prediction intervals by exploiting the conﬁdence and support of the association rules. The main idea is to increase certainty regarding predictions, thus reducing prediction interval width in accordance to the rules found. Results show that the presented methodology provides a closer prediction interval without sacriﬁcing accuracy. Prediction interval quality and effectiveness is measured using Prediction Interval Coverage Probability (PICP) and the Dawid–Sebastiani Score (DSS). PICP and DSS per horizon shows that the Adjusted and Normal prediction intervals are similar. Also, probabilistic and point-forecast Means Absolute Error (MAE) and Root Mean Squared Error (RMSE) metrics are used. Probabilistic MAE indicates that Adjusted prediction intervals fail by less than 2.5 MW along the horizons, which is not signiﬁcant if we compare it to the 1.3 MW of the Normal prediction interval failure. Also, probabilistic RMSE shows that the probabilistic error tends to be larger than MAE along the horizons, but the maximum difference between Adjusted and Normal probabilistic RMSE is less than 6 MW, which is also not signiﬁcant.


Introduction
Load-forecasting is an important tool for decision-makers that helps them in the creation of policies for planning and operation of the power system [1]. Most of these decisions must be taken based on electric demand forecasts, and the lack of accuracy in the estimations will lead to an inefficient decision-making process [2].
Specifically, the lack of accuracy may cause overestimation or underestimation of electricity demand [3]. The former causes an excessive amount of electricity to be purchased and supplied to the system, which causes power balance disturbances and instabilities in the power grid. The latter, on the other hand, leads to a risky operation of the power system by restraining the production of electricity, which may lead to load cuts directly affecting electricity users.
In particular, short-term demand forecasting is highly difficult due to its requirement of quick response, the amount of information required, as well as its complexity. These models must take into consideration not only the electric consumption pattern of the region but also its regulatory requirements. For instance, the Mexican Electric Market (MEM) establishes that every short-term load forecast method must be capable of estimating 8 periods of 15 min ahead (2 h) [4].
Intending to improve the accuracy of load demand forecast, researchers have developed forecast methods for short-term, mid-term and long-term [5]. Mid-term and long-term load demand forecasting is closely related to planning activities (e.g., power system maintenance tasks and capacity expansion planning), whereas short-term forecasts are employed for the ongoing operation (e.g., everyday unit commitment).
The main characteristic of electricity demand is that it is mostly (if not completely) influenced by human behavior patterns [6]. In this regard, human behavior follows a certain tendency with cycling patterns. This is, we humans do mostly the same things (e.g., the time to wake, the time to go to sleep, job schedule, etc.) on a day-to-day basis around the same time (e.g., wake up early in the morning). For instance, most productive human adults work on a working week basis. Hence, a weekday electricity consumption pattern is not only different from weekend patterns, but also different from holidays patterns (which may fall within the working week). This means that every period of time has different needs in terms of electricity demand forecasting and those needs are also different for each type of day.
Regarding a more technical aspect, electricity load-forecasting can be performed by using only historical measurements or forecast predictors (i.e., future period loads). For instance, load forecasting can be done 1 h ahead in 15 min interval using only load data from the past hour, or to use the predicted 15 min load within the 1 h ahead horizon. In particular, it has been stated that the incorporation of recursive forecast predictors leads to better performance in time series prediction [7]. Furthermore, if for each forecast horizon (i.e., 15 min ahead) a different model is trained, performance deterioration related to more distant forecasting horizons can be avoided (for instance, by using individual forecasting models for every 15 min) [7].
Therefore, an effective load-forecasting model must consider such patterns. In this paper, we investigate a modeling approach based on association rules (association rules are useful to describe a model in terms of cause and effect). The proposed approach aims at predicting electricity demand for two hours ahead in 8 periods of 15 min. The dataset is from a representative load zone of Mexico, which is 15-min load demand measures. The prediction intervals are estimated using Artificial Neural Network models. Then, the prediction intervals are adjusted through association-rule mining algorithms.

Literature Review
Unlike other Data Mining (DM) algorithms such as artificial neural network (ANN) or Random Forest (RF), association rules are not so popular regarding time series prediction. One reason is that these types of algorithms are usually associated with the design of expert systems which have fallen into disuse. For instance, in a recent review of DM algorithms applied to electricity time series forecasting [8], from the more than 100 works reviewed only 6 corresponded to algorithms using rule-based prediction. Nevertheless, regarding time series prediction, in [9] they proposed to use an ensemble of forecasting algorithm which is combined using fuzzy rule-based forecasting. The purpose of the latter is to determine the best weights for each forecasting method, such that the dependence between forecasting methods and time series statistical properties is aligned. The fuzzy rules are selected using linguistic association mining given the statistical properties of the time series. Using classical time series point forecasting methods, the proposed ensemble algorithm is tested against individual and the equal-weights ensemble employing the M3 competition time series. They found that the proposed ensemble performs slightly better than the tested algorithms.
In a more recent work [10], a modified a priori-based association-rule mining algorithm based on the Continuous Target Sequential Pattern Discovery (CTSPD) is proposed, it is then used to generate a set of association rules that help in predicting the concentration of air pollutants in New Delhi. In this work, time-dependent features from air pollutants time series are identified first to conform new variables (i.e., frequent sequences), which are then used (in the form of association rules) to predict the concentration of air pollutants. Their results showed that the proposed approach performed better than the India System of Air Quality and Weather Forecasting and Research (SAFAR). Similarly, in [11] authors propose an improved a priori algorithm for temporal data mining of frequent item sets in time series. This improved algorithm is focused on reducing the computational burden of identifying all frequent item sets, by constraining temporal relations. In this sense, this algorithm determines time constraints intervals, which are then used to filter (using the time interval algebra) and mine the corresponding transactions from the database. The method is compared against the classical a priori algorithm, obtaining a better performance regarding the storage and time required to mine rules.
On the other hand, an approach to the analysis of the electricity demand required by home appliances is proposed in [12]. In such work, several unsupervised learning algorithms (among them association rules) are employed in the identification of appliances energy consumption patterns. Using sequential rules, authors found that there exists a heavy interdependence between the usage patterns of home appliances, the time of the use, and the user activities. In the same fashion, a more recent work related to the analysis of smart metering analysis using a Big Data infrastructure is presented in [13]. By employing unsupervised data clustering and frequent pattern mining analysis on energy time series, authors derived accurate relationships between interval-based events and appliance usages. Then, these patterns are exploited by a Bayesian Network to predict the short and long-term energy usage of home appliances. This method is then compared with Support Vector Machines (SVM) and Multi-layer Perceptron (MLP), outperforming both in all tested forecasting horizons.
In general, there are many works focused on the effective estimation of prediction intervals using neural networks [14][15][16][17]. Although the approach developed on this works are optimal, none of them considers the modeling of an adjustment of the prediction interval using a rule-based analysis. Also, some research apply rule-based analysis to create point forecasts; however, the creation or adjustment of the prediction interval is not considered [18,19].

Materials and Methods
In this section, all the concepts of the developed methodology are described. First, there is a data preprocessing stage in which the raw data is transformed to be used by machine learning algorithms. Specifically, in this step time series data is transformed into a tabular form in which every element of the table is a segment of the original time series. Also, every element of the table is paired with its correspondent time period. Then, point forecast and prediction interval estimation are performed using artificial neural network (ANN) models. Specifically, the ANN models perform point forecasts in a test database and every error is stored. The stored errors are used to estimate the prediction intervals. At the same time, association-rule mining is performed to extract significant rules by means of the a priori algorithm. Then, the prediction intervals estimated with the Artificial Neural Networks models are adjusted by means of the obtained rules. Finally, the prediction intervals and its adjusted versions are evaluated. In Figure 1, the overarching methodology is shown.

Data Preprocessing
The data are from a representative location of Mexico. The exact location of the data cannot be revealed due to confidentiality reasons. The data is composed of 15-min immediate measurements of load demand. This means that we have 96 periods of 15 min within a day. Also, that means that by the rules of the Mexican wholesale market, any prediction model bust be capable of predict 8 values in the future in 15 min interval (2 h ahead). In Figure 2, the graph of the complete data is shown. To understand how the values distributed in the dataset, we use a histogram. In Figure 3, a graph of the histogram of the complete data is shown. The complete dataset consists of 81,128 measurements from 1 October 2015, to 24 January 2018. Table 1 gives a summary of the statistical properties of the data. For preprocessing, the time series is ordered in a form of delay embedding. The selected number of periods for the delay embedding is represented by s. For this paper, s is selected by the thumb rule described in [20]. The thumb rule states that for autoregressive models, at least 50 but preferably more than 100 observations should be taken. Based on this thumb rule, we decided to select 10 times the horizons h needed for this problem (8 horizons). Therefore, every example is conformed by vectors of 80 values, in which the last value is considered to be the dependent variable described by the rest of autoregressive values. Thus, for this paper s = 79. The final objective is to transform the original time series dataset into a table form. To achieve this transformation, the set of delayed time series is constructed as follows: Let L be set of load measurements: where n is the number of observations. Let D be the set of delayed time series: where m = n − s and d i is defined as follows: where t = i + s. In summary, the constructed dataset D contains m delayed time series d i (i ∈ {1, 2, . . . , m}), and every delayed time series d i contains values of the L dataset in the form of {V t−s , . . . , Figure 4, an example of a d delayed time series is shown. For every d i , V t represents the dependent variable and {V t−s , . . . , V t−2 , V t−1 , V t } the independent variables. Also, every d i is paired to its correspondent period p ∈ {1, 2, . . . , 96}. This pairing allows applying machine learning algorithms in subsets defined by each period. Specifically, in the case of Association Rules, it is necessary to apply a discretization method, albeit the format is essentially the same. In Section 3.3.1, this discretization process is explained. In the next section, the process of prediction interval estimation through artificial neural networks is explained.

Prediction Interval Estimation
A prediction interval (PI) is the estimation of a range in which a load value will fall with a certain probability [21]. PI estimation is an important part of a forecasting process and it is intended to indicate the expected uncertainty of a point forecast. Also, PIs allows us to offer a set of values in which a future value will fall given a probability, thus, creating a probabilistic forecast result. The following is a general form of a 100(1 − α)% confidence prediction interval expression: whereV t (p, h) is the point forecast of the period p in the horizon h, z α is the z-score of an empirical distribution given the probability 100(1 − α)% and (p, h) is the empirical distribution of errors of the forecast method in the period p and horizon h. In Equation (4), the z-score is the parameter that allows us to modify the prediction interval coverage [22]. In Figure 5, an example of how the z-score modifies the prediction interval coverage. It is worth mentioning that z-score value depends on the α value. Specifically, z-score = Z(100(1 − α)) where Z is a function that estimates the z-score value.
Prediction interval estimation using Equation (4) requires estimation of a set of prediction errors of a forecast model. In this paper, we use Artificial Neural Networks to generate a prediction model for each period of p.

Artificial Neural Network Training and Validation
Artificial Neural Networks (ANN) are models inspired by the central nervous system, which are made of interconnected neurons [23]. One of the most common ANN paradigms for both classification and regression is the Multi-Layer Perceptron (MLP). An MLP artificial neural network is composed of multiple layers of neurons: an input layer, one or more hidden layers, and an output layer. The input layer is responsible for receiving a given input vector and transform it into an output that becomes the input for another layer. A hidden layer transforms the output from the previous layer through a transfer function. Each neuron receives the input from all the neurons in the preceding layer, multiplies each input by its corresponding weight vector and then adds a bias. In this paper, a 3-hidden layer with 11 neurons per layer ANN was implemented.
We selected 3 hidden layers employing a rule described in [24] and then updated in [25]. The rule indicates that for complex problems, such as time series prediction and computer vision, 3 or more layers are adequate. Also, [24] states three rules to select the number of hidden neurons, for our problem we selected the rule that establishes the number of hidden neurons as 2/3 of the number of the input neurons. Thus, the number of hidden neurons would be (79/3) × 2 ≈ 52 neurons (17 neurons per layer.). However, in [24] they warn that too many neurons per layer may lead to overfitting, so we still tried to reduce the number of hidden neurons and tested architectures from 17 to 10 neurons per layer, 11 neurons per layer was the one architecture to have similar error rate as 17 neurons per layer in terms of MAE (Means Absolute Error).
The method for training the ANN models in this paper is the Resilient Backpropagation method described in [26]. Regarding the activation function, despite the existence of several types of activation functions such as linear, tanh, gaussian, etc. the sigmoidal function is conventionally used in time series forecasting, hence, the latter was employed [27]. In Figure 6, a graphical representation of this configuration is shown.
To train the ANN models, create the prediction intervals, and test the prediction intervals, the total dataset D was divided into three groups: D Train , D TrainPI and D Test . D Train is composed of the first 70% of data, D TrainPI of the following 20% D Test of the last 10%. D Train is divided in 96 subsets in a 80% train-20% test format. The elements contained in every D Train subdivision are sorted randomly. The subdivided D Train is used to train an ANN model a per subdivision. As a result, 96 ANN models were obtained and stored in a dataset A (a p ∈ A|p ∈ {1, 2, . . . , 96}). In Figure 7, a graphical representation of the Artificial Neural Networks models training is shown.  In Figure 8, a graphical representation of the training error per ANN model is shown. In the x axis of the graph, every model is represented with its correspondent time period.
Once the ANN models were obtained, we proceed to extract the prediction errors using D TrainPI . The following section explains this process.

Prediction Error Extraction
For prediction error extraction, the obtained ANN models are used to predict the V t values contained in D TrainPI . The ANN models takes the values of W i = {V t−s , . . . , V t−2 , V t−1 , V t } as inputs and producesV t (p, h) as output. Every measured error is stored in a separate database (D ). The obtained ANN models are purposely trained to predict only 1 horizon ahead (this to get 96 specialists ANN models in its correspondent period). Therefore, to predict the 8 horizons h needed for our problem, we use the predictions of the previous models of the last horizon as inputs, thus, D contains a set of (p, h) prediction errors for every period p for each horizon h. For every (p, h), the Var[ ( p, h)] value is estimated to obtain the final prediction interval per horizon and per period.
In Figure 9 a series of graphs of the obtained prediction intervals are shown. We call these prediction intervals the normal prediction intervals. Normal prediction intervals are used in conjunction with the support values of the association rules method to construct the Adjusted prediction intervals. In the next section, the extraction of association rules from the dataset is explained.

Association Rules Extraction
Association rules is a data mining methodology to extract relationships and dependencies between variables in datasets [28]. Its objective is to identify if-then patterns which are discovered in databases using some measures of interest [29]. For this paper, the a priori algorithm (see Appendixes A and B) is used to extract the rulesets needed.

Data Discretization
Numeric data is difficult to use in the a priori algorithm, so a discretization method is needed. In this paper, the discretization of data is made through the method described in [30], specifically, the type 7 quantile method. The discretization method is carried out as follows: Let Q 7 (P) be the type 7 quantile of the probability P ∈ {0.1, 0.2, . . . , 1}: where x (j) is the jth order statistic of x, n is the sample size, j = n · P + m and γ = n · P + m − j and m = 1 − P. In this paper, . . , m}}. Using Equation (5) in the set X, we can obtain the bins for data discretization. In Table 2, the obtained bins and quantiles are shown. The last row of Table 2 indicates the bins correspondent to the quantiles above them. Specifically, above every Bin, there is its lower and upper limit. Every value of set X is mapped according to its correspondent bin, so if a value falls inside the range of a bin, that value is substituted with the bin number and stored in a dataset L * . Therefore, using the dataset L * we can construct the transactional dataset D Trans as follows: Let L * be set of quantile mapped load measurements: where n * is the number of observations in L * . Let D Trans be the set of delayed of the quantile mapped time series : where m * = n * − s and d * i is defined as follows: where t = i + s. For every d * i , r * i = V * t represents the right part of the rule, and l * i = {V * t−s , . . . , V * t−2 , V * t−1 } represents the left part of the rule . Also, every d * i is paired to its correspondent period p ∈ {1, 2, . . . , 96}. The rule extraction process is carried out in D Trans using the a priori algorithm. In this paper, the parameters for the a priori algorithm are set so as obtaining rules with minimum support = 0.1 and con f idence = 0.9 (In some periods where rules were not found, support < 0.1 was used). The period element of the transactions dataset is used to segment the data into 96 subsets, one for every period. Then, the a priori algorithm is applied on each subset to obtain a ruleset rs for every subset. As a result, 96 rulesets were obtained and stored in a dataset R (rs p ∈ R|p ∈ {1, 2, . . . , 96}). In Figure 10, a graph of the distribution of the support value per period is shown. Every period distribution is represented as a box plot in which inside the box there is the 95% of the support values in period p.
The process of prediction interval adjustment using the rulesets contained in R is explained in the next section.

Prediction Intervals Adjusted by Means of Association Rules Support Metric
The prediction interval is adjusted by subtracting the value of a correspondent rule support to the 100(1 − α)% value when estimating the prediction interval. This adjustment occurs only when a specific rule in r p matches with the inputs of a a p ANN model. In this case, Equation (4) can be re-written as follows:V where δ = α + β The parameter β is a bias to adjust the value of the z-score. The modified z-score will decrease or not the prediction interval. The parameter β can take values according to the following expression: where W * i is the quantile mapped version of the ANN model inputs W i . To modify the value of the z-score, we re-write the confidence interval probability expression as follows: From the modified confidence interval probability shown in Equation (11), we can obtain the modified prediction interval confidence. With this value, we can look forward to its corresponding modified z-score from any z-score table [22]. This, in fact, gives us the corresponding z-score value, such that we can modify the coverage of the prediction interval.

Experiments and Results
To measure the efficiency of the prediction intervals (Normal and Adjusted), we propose the use of the Prediction Interval Coverage Probability (PICP) [31] and the Dawid-Sebastiani Score (DSS) [32]. We also measure probabilistic and point-forecast MAE and RMSE per horizon h and the PICP per period p for a better understanding of the prediction interval efficiency. To evaluate the quality of the Adjusted prediction interval, three experiments were conducted: All days, Weekdays and Weekends in dataset D Test . For every experiment, normal prediction intervals (Normal) and Adjusted prediction interval (Adjusted) are evaluated. The process to implement these evaluations consists of three steps: In the following section, the implementation of the mentioned metrics is described.

Prediction Intervals Evaluation Metrics
In this section, prediction intervals evaluation metrics are described. These metrics are helpful to evaluate and understand both Normal and Adjusted prediction intervals.

PICP (Prediction Interval Coverage Probability)
The PICP is the rate of real values that lies within the prediction interval. The PICP is estimated using the following equation: where w is the number of observations and θ g is defined by the following equation: where

Probabilistic and Point-Forecast RMSE (Root Mean Squared Error) and MAE (Mean Absolute error)
Probabilistic and point-forecast error is measured as indicated in Figure 11. Point forecast corresponds to the predicted load value, Outside Prediction Interval stands for those load measures that fall above or below the prediction interval range, and Inside Prediction Interval corresponds to the actual load measure that falls inside of the prediction interval range.
Using all the errors, we estimate probabilistic and point-forecast MAE and RMSE using the respective set of errors. Point MAE and RMSE estimation help us to estimate the precision of the forecast method, also helps us to understand the prediction intervals in general. Probabilistic MAE and RMSE help us to understand better the PICP metric result.

DSS (Dawid-Sebastiani Score)
The Dawid-Sebastiani Score (DSS) helps us to understand the quality of the prediction interval. The DSS is estimated as indicated in the following equation: where k and σ (p,h) are the kth error and the standard deviation from the error distribution (p, h) respectively. Equation (14) is modified to estimate the DDS of the Adjusted prediction intervals based on the support of the rules. The following equation describes the modified version of the DSS: where supp(p, h) is the set of support values used to adjust prediction intervals in period p and horizon h. It is worth mentioning that if no interval were modified, then supp(p, h) will be filled with 0 s and Equation (15) will become (14).

Results and Discussion
To compare the proposed approach, the Autoregressive Integrated Moving Average (ARIMA) model and a persistence model are also evaluated. The ARIMA model is a classical time series forecasting method. This method depends on three parameters: p which stands for the number of autoregressive variables; q which refers to the number of moving average variables; and d which indicates the number of times the data needs to be differentiate such that the time series is stationary. For experimental purposes, we estimated the ARIMA model using the process one described in [33]. The persistence model is often used to know if a forecast model provides better results than any trivial reference model [34]. First, we present the point-forecast MAE and RMSE. Whereas point-forecast MAE gives us a general idea of the precision of the forecast model, point-forecast RMSE penalizes large errors, so if the ANN models tend to return large error values the RMSE will be greatly separated from point-forecast MAE. In Figure 12, the point-forecast MAE and RMSE for the Persistence, ARIMA and the proposed model are shown.
As we can observe in Figure 12, the persistence and the ARIMA models work better, for point forecasts, in the first 5 horizons in comparison to the proposed model. However, the proposed model point-forecast RMSE follows the same tendency as point-forecast MAE, so we can say that the errors are consistent along the horizons for the three experiments, unlike the ARIMA and the persistence model for which the errors are larger along the horizons. Although we can observe that errors are consistent along the horizons for the proposed model, we can also observe that point-forecast MAE and RMSE tends to be larger in the Weekend experiment for the three models. This behavior is expected as we suppose that human activities in the Weekend are less constant than Weekdays. Then we present the DSS and the PICP along with the probabilistic MAE and RMSE. These results are presented per horizon and for all the three experiments and the three models evaluated. In Figure 13, the result of the DDS and the PICP per horizon is shown.  As we can observe in Figure 13, the DSS for the ARIMA and the persistence model is larger than the measurement of the proposed model for both Adjusted and Normal prediction intervals (larger values of DSS indicates lower quality of prediction intervals). Also, we can observe that the PIPC for the ARIMA and the persistence model is lower than the measurement of the proposed model for both Adjusted and Normal prediction intervals. For ARIMA and the proposed model, DSS and PICP per horizon of Adjusted and Normal prediction intervals are really close along the horizons for the three experiments, which indicates that for those models, the Adjusted and Normal prediction intervals are quite similar. Probabilistic MAE and RMSE provide another perspective of this result. In Figure 14, probabilistic MAE and RMSE for the persistence, ARIMA, and the proposed model is shown. As we can observe in Figure 14, for ARIMA and the persistence model the probabilistic MAE and RMSE are larger than the proposed model, and also that is increasing along the horizons. Although probabilistic MAE and RMSE for ARIMA and the persistence model are quite similar between Adjusted and Normal prediction intervals, probabilistic RMSE value is much separated from probabilistic MAE, which indicates that errors are very large sometimes.
For the proposed model, probabilistic MAE indicates that Adjusted prediction intervals fail by less than 2.5 MW along the horizons, which is not significant if we compare it to the 1.3 MW the Normal prediction interval failure. Also, probabilistic RMSE shows that the probabilistic error tends to be larger than MAE along the horizons, but the maximum difference between Adjusted and Normal probabilistic RMSE is less than 6 MW, which is also not significant. This significance is measured by the ancillary services requirements [35]. Ancillary services requirements are published daily in the Independent System Operator official site. For the region this method is applied, the requirements of the ancillary services for the first horizon is a constant value of 25 MW. This means that errors below 25 MW do not affect the power systems significantly. Also, it is worth mentioning that probabilistic MAE and RMSE are smaller in the Weekend experiment. This behavior may happen point-forecast RMSE and MAE are larger in the Weekend experiment, so we can expect prediction intervals to be larger on Weekends. In general, we can observe that the error metrics of the Adjusted prediction intervals are similar to the Normal prediction intervals. To understand better this similarity, we make use of PICP per period. In Figure 15 the PICP per period for all the three experiments and the three models is shown. As we can observe in Figure 15, for the ARIMA and the persistence models, Normal and Adjusted PIPC are similar. However, we can also observe that RMSE and MAE are larger than the measured for the proposed approach. Also, it is interesting to observe that ARIMA and persistence models have larger errors in the periods of 08:00-12:00 and 15:45-19:00, this may be caused by the load change during the day with respect to the sun position. Also, it is interesting to observe that these errors are lower in the persistence model for the periods 15:45-19:00. For the proposed approach, we can observe that probabilistic MAE and RMSE are more stables along the periods than the measured of the ARIMA and the persistence model. Also, we can observe that although Adjusted PICP drops until less than 75%, probabilistic MAE indicates that the error is always less than 5 MW, which is not significant. Also, probabilistic RMSE shows that in most of the periods the error is less than 15 MW, which is also not significant. A prediction interval creation method is presented. The proposed approach for the creation of the prediction interval allows modifying the prediction interval by means of an association rules method. Using the proposed approach, the prediction interval can be reduced as much as the corresponding support value. We construct prediction intervals using Artificial Neural Network models and we adjust them by means of rules obtained with the a priori algorithm. Prediction interval quality and effectiveness are measured by means of Prediction Interval Coverage Probability (PICP) and the Dawid-Sebastiani Score (DSS). PICP and DSS per horizon show that the Adjusted and Normal prediction intervals are pretty similar. The proposed approach was compared to the ARIMA model and a persistence model. The proposed model demonstrates to have better performance in all the prediction interval evaluation metrics. Also, probabilistic and point-forecast MAE and RMSE metrics are used. Probabilistic MAE indicates that Adjusted prediction intervals fail by less than 2.5 MW along the horizons, which is not significant if we compare it to the 1.3 MW the Normal prediction interval failure. Also, probabilistic RMSE shows that the probabilistic error tends to be larger than MAE along the horizons, but the maximum difference between Adjusted and Normal probabilistic RMSE is less than 6 MW, which is also not significant. This work was focused on the prediction interval adjustment, so as future work we will use an optimization method to select the optimal structure of the ANN models per period with the objective of increasing accuracy of the ANN models prediction. For the association rules method, the discretization method will be modified to obtain more quantiles so the rules can be more specific. Also, we will relax the parameters of Support and Confidence to enlarge the diversity of rules, and at the same time, we will include the Confidence metric to the prediction interval adjustment. Finally, this method will be tested in another dataset such as ERCOT or the GefCom(2012, 2014, 2017). Support is an indication of how frequently the rule appears in the database. Support is estimated by the following expression.
Confidence is an indication of how frequently the rule has been found to be true. Confidence is estimated by the following expression.
There is a third measure of interest called lift. This measure indicates the ratio of independence between X and Y. In other words, it indicates if the rule is not a coincidence. Lift is estimated by the following expression.
Any algorithm that is designed to extract association rules from a database must use a least one of these measures of interest to select reliable rules.

Appendix B. The a priori Algorithm
The most used algorithm for obtaining association rules is the a priori. The a priori algorithm selects the rules based on the minimum support. The minimum support is settled by the user of the algorithm. The pseudocode of the a priori Algorithm A1 is shown as follows: Algorithm A1 a priori algorithm Pseudocode.
C k : Set of Candidate elements of size k L k : Set of Frequent elements of size k Begin L 1 = {set of frequent elements of size k}; for (k = 1; L k = 0; k++) C k+1 = Selected candidates from L k for each transaction in database D Increment the count of candidates C k+1 that are contained in t end L k+1 = candidates in C k+1 that meets the minimum support end End