Adaptive Optimized Pattern Extracting Algorithm for Forecasting Maximum Electrical Load Duration Using Random Sampling and Cumulative Slope Index

: Load forecasting techniques can be an essential method to save energy and shave peak loads in order to improve energy efﬁciency and maintain the stability of a power grid. To achieve this goal, machine learning-based approaches have been proposed recently. Before moving toward the long-term and ultimate solution such as machine learning, we propose a simple and efﬁcient method to forecast electricity usage patterns and the duration of maximum electrical load using a small data set. The proposed algorithm can forecast maximum electrical load duration using random sampling and a cumulative slope index. To verify the algorithm, we utilized electricity data (from 2015.11 to 2016.12) obtained from a building with a constant lifestyle and electricity pattern. The performance of the algorithm was evaluated using electricity bills, the discharging condition of an energy storage system, and the cumulative slope index. It was found that the proposed algorithm could provide electricity cost savings of 0.62–2.28% compared with other, conventional electricity prediction techniques, such as the moving average method and exponential smoothing. In near future research, it is expected that this algorithm could be applied to electrical big data to handle real-time data processing.


Introduction
There is a growing demand for electrical power as industry and economies develop that can sometimes result in insufficient power. To solve this problem, we are striving to expand the electricity supply facilities to enhance the power supply. However, despite our efforts, there is a limit to the extent to which supply facilities can be increased. Therefore, demand response (DR) management [1-4] of electrical power is a current trend and an alternative to focusing on expanding supply facilities. DR can provide a predictive method to control the power flow of an energy storage system (ESS), photovoltaic (PV) supply, buildings and grid [1] and based on a combination of a time-based and incentive-based DR program, a real-time incentive DR program can reduce the peak load [2]. DR programs using a game model [3,4] are effective in improving grid efficiency and reliability. The main policy of demand response is to improve energy efficiency and power load management. Regarding energy efficiency, forecasting electricity usage can be an essential method to save energy and shave peak loads. Research regarding load forecasting is currently in progress using statistical methods and time-series models.
Specifically, these load forecasting methods include regression analysis, exponential smoothing, and neural networks. Regression analysis [5][6][7][8] is a method used to predict the load and then utilize the data to forecast load and independent variables such as temperature and GDP (Gross Domestic Product) [5]. Exponential smoothing models [9][10][11][12] offer an alternative method to forecast electricity usage using recently measured data and different types of weighting. This method can improve the error rate of predicted data because the recent data has a greater value in the weighting. On the other hand, older data has weight that decreases exponentially over time. Neural networks are one method to artificially model the human brain mathematically and such models can require more and different types of weights and variables. Neural networks can also be linked to non-linear models to distinguish between dependent variables and independent ones. The load forecasting method that combines soft computing and neural networks is the new approach called hybrid methodologies [13]. This model is different from traditional forecasting models such as statistical methods. Data mining framework can also be an intelligent analysis and prediction of energy consumption based on electricity data [14,15].
In this paper, we propose an algorithm for forecasting the duration of maximum electrical load using random sampling and a cumulative slope index. The reasons that we choose a proposed approach are because we do not need a lot of training data to forecast electricity usage patterns. Second, only electricity data (single variable) can be utilized to predict maximum electrical load durations. Finally, the proposed approach is a simple and efficient method to maintain prediction accuracy without a complicated model before moving toward a long-term and ultimate solution such as machine learning. What we propose to do here is to forecast maximum electrical load duration for a building with a constant lifestyle and electricity pattern on a weekly basis (except for holidays and weekends: Saturdays and Sundays). To verify the algorithm, we utilized the electricity data (2015. 11-2016.12) obtained from the headquarters building of Korea Electric Power Knowledge Data & Network (KEPCO KDN) located in Naju City, South Korea. The reason that the proposed algorithm can forecast the future electrical load of buildings is because household or office buildings have relatively constant electrical usage patterns. Therefore, for these, we can forecast the pattern of electrical use and maximum electrical load duration using the algorithm. Normally, we might use conventional and sequential correlation analysis to forecast the pattern and load duration. However, the proposed algorithm is intended to forecast electrical use patterns and maximum electrical load duration using repetitive correlation analysis with data extracted from random sampling to analyze as many data as possible to model the pattern of electrical usage.
By applying the proposed approach, we offer the potential that the electrical use pattern can be obtained more quickly than by using sequential correlation analysis, in terms of a stochastic method. In the second section, we will see how conventional and sequential correlation analysis is used to extract an electrical use pattern that can represent monthly data. In the third section, we propose the algorithm, defining an alternative approach to forecast electrical use patterns that can represent monthly data, employing random sampling and repetitive correlation analysis. Finally, to evaluate and flesh out the performance of the proposed algorithm, we will forecast maximum electrical load durations with each of three models (random, moving average, and exponential smoothing model). Then, we will discuss the quantitative results of the proposed algorithm using random sampling.

Sequential Correlation Analysis
Correlation involves the analysis of statistical and mathematical interrelations between components of large data sets. It can be utilized to analyze statistical interrelations between two probability variables in various fields of engineering. For example, we can study correlation of wind and hydro with electricity demand and prices [16] and develop a method to evaluate the characteristics Energies 2018, 11, 1723 3 of 23 of team communications based on social network analysis [17]. To measure the correlation and similarity between two probability variables, covariance can be utilized as follows: The correlation coefficient can be defined as in the following equation: The values of the correlation coefficient occur within the range −1 to +1 and when the correlation shows a linear relation, the correlation value gets closer to +1. In contrast, when the correlation has a non-linear relationship, the value gets closer to −1. Generally, if the correlation coefficient values are 0.7 or higher, they have a strong linear relationship. When the values are between 0.3 and 0.7, the linearity is moderate, and with values below 0.3, the correlation values have a weak linear relationship. Table 1 shows the statistical meanings of correlation coefficients. In this paper, we apply a correlation coefficient threshold of 0.8 (or 0.7) to decide the representative electrical pattern. This means that the electrical use pattern is defined as the pattern with correlation coefficient ≥0.7, which means a strong positive association between patterns.

Method for Extracting Representative Electrical Patterns for Each Month from the Original Data
In this section, we address the method used to extract the representative electrical pattern for each month from original data for the KEPCO KDN headquarters. The representative electrical pattern means the pattern of electrical usage of a specific date each month that could represent usage during the month. The main reason to extract the representative electrical pattern is to examine the performance of forecasting maximum electrical load duration for each model (random, moving average, and exponential smoothing model) compared to a representative electrical pattern. If the electrical pattern of a specific date within each month has probabilistically obvious affinities with the patterns of other dates (correlation coefficient ≥ correlation coefficient threshold) and also is in the majority within each month, the pattern can be a representative electrical pattern of the month. In this paper, a representative electrical pattern is defined as the pattern with correlation coefficient threshold of 0.8 that accounts for ≥65% (ratio threshold) within the data for a month. However, if any pattern within a month does not satisfy that condition, we apply the second definition of representative electrical pattern with correlation coefficient threshold of 0.7 that accounts for ≥65% within the data for a month. To achieve a representative electrical pattern, we perform correlation analysis sequentially using pre-processed data extracted from the original electricity data.
The Algorithm 1 explains how to extract the representative electrical pattern using sequential correlation analysis. As shown in Algorithm 1, to achieve a representative electrical pattern, we perform Energies 2018, 11, 1723 4 of 23 correlation analysis sequentially using pre-processed data extracted from the original electricity data. At first the correlation coefficient threshold was set at the correlation coefficient of 0.8, and then the ratio threshold was set at 65%. Second, we performed correlation analysis between components of two data sets sequentially. Finally, the pattern with the highest ratio among electrical patterns with a correlation coefficient ≥0.8 that accounts for ≥65% was extracted. As already mentioned, if any pattern within a month does not satisfy that condition, we apply the second definition of representative electrical pattern with correlation coefficient ≥0.7 that accounts for ≥65% within the data for a month, and then execute algorithm 1 once again. The concept of extracting representative electrical patterns and maximum electrical load duration using sequential correlation analysis is presented in Figure 1.
To forecast the duration of the maximum electrical load, we calculated cumulative slope index values for a representative electrical pattern and then, using the index, we compared it with each model for quantitative evaluation. The ratio of the representative electrical pattern with correlation coefficient ≥0.8 (or ≥0.7) is listed in Table 2. All these values were higher than 65% in case of correlation coefficient ≥0.8 (or ≥0.7) and represented the majority within each month. From these results, the use pattern of each month can be a representative of a statistical model of each month's data in terms of stochastic method. As shown in Figures 2-7, the representative electrical patterns of a specific date within each month are different from each other. From these results, we could roughly know how the maximum electrical load durations of representative electrical patterns that we want to forecast are also different from each other.  Figure 1. Process for extracting representative electrical patterns and maximum electrical load duration using sequential correlation analysis.

Proposed Algorithm Using Random Sampling
In this section, we address the proposed algorithm using random sampling and repeated correlation analysis. Using the random sampling technique and repeated correlation analysis with previous data (such as one month ago or two months ago) within the period 2015.11-2016.12, we were able to find electricity usage patterns with strong correlation value in the electric data set. Actually, random sampling or Monte Carlo techniques [18][19][20][21] are widely used to solve various kinds of problems through the computation of random numbers. This research usually deals with uncertainty and the sensitivity of the energy model. These are also used to provide statistical estimates in various fields of engineering. These techniques can be applied to evaluate power systems' composite reliability [18,19] and used for sensitivity analysis of scenario ranking for energy generation [20]. With the growing use of renewable energy resources, distributed generation (DG) systems are spreading and Monte Carlo techniques can address the probabilistic assessment of DG penetration to the distribution network [21]. Compared with sequential correlation analysis, the advantage of a correlation using random sampling is that it offers the potential to obtain electrical use patterns more quickly than with sequential correlation analysis, in terms of stochastic method. Using the proposed algorithm and random sampling, we might do less repetitive correlation analysis by using only the data extracted by random sampling. Algorithm 2 explains how to extract the pattern using random sampling. We perform correlation analysis between components of two data sets using random sampling to achieve a predicted pattern. At first, the correlation coefficient threshold was set at the correlation coefficient of 0.8, and then the ratio threshold was set at the specific value we want to achieve. The maximum repeated count was set at M values. Second, we performed correlation analysis between components of two data sets randomly and repeatedly. If the electrical pattern of a specific date was above the ratio threshold, the electrical use pattern for that sample date was recorded. Finally, when we encountered the maximum repeated count, the pattern among the sample dates with the highest ratio was extracted.  The concept of extracting the predicted pattern using random sampling is depicted in Figure 8. The concept of extracting the predicted pattern using random sampling is depicted in Figure 8.  Figure 8. Process of extracting the predicted pattern using random correlation analysis.
To verify the appropriateness of the algorithm, we compared each representative electrical pattern with a predicted pattern extracted by the proposed algorithm. Comparisons of representative and predicted patterns are shown in Figures 9-14. As shown in those figures, most of the predicted patterns bear a resemblance to the representative electrical pattern. From these results, we could roughly see the proposed algorithm can predict the maximum electrical load duration of the representative electrical pattern. However, the predicted patterns in Figure 14 cannot be seen as analogous to the representative electrical patterns. The cause of the problem might be that the link between the constant lifestyle and electrical load pattern are not the same as for the other patterns. In bear a resemblance to the representative electrical pattern. From these results, we could roughly see the proposed algorithm can predict the maximum electrical load duration of the representative electrical pattern. However, the predicted patterns in Figure 14 cannot be seen as analogous to the representative electrical patterns. The cause of the problem might be that the link between the constant lifestyle and electrical load pattern are not the same as for the other patterns. In the next section, to evaluate the quantitative performance of the proposed algorithm using random sampling, we discuss the prediction results of maximum electrical load duration.

Performance Results of the Proposed Algorithm
In this section, we examine the performance of the proposed algorithm. For evaluation of its performance, we discuss the results of maximum electrical load duration. The reason that we attempted to find the maximum electrical load duration was to examine the electricity cost saving effects from use of the proposed algorithm. Maximized electricity cost saving effects occur when ESS are discharging electricity during the period of maximum electrical load. To confirm the duration of maximum electrical load, we propose using the concept of cumulative slope index, as well as the ESS discharging condition and KEPCO electricity bills, to evaluate a quantitative analysis. The flowchart for evaluating the performance is shown in Figure 15. As shown in Figure 15, we extracted pre-processed data from the original data and performed correlation analysis and then calculated the cumulative slope index values for each model. Finally, we compared them with the cumulative slope index value of representative electrical loads for quantitative evaluation. maximum electrical load, we propose using the concept of cumulative slope index, as well as the ESS discharging condition and KEPCO electricity bills, to evaluate a quantitative analysis. The flowchart for evaluating the performance is shown in Figure 15. As shown in Figure 15, we extracted preprocessed data from the original data and performed correlation analysis and then calculated the cumulative slope index values for each model. Finally, we compared them with the cumulative slope index value of representative electrical loads for quantitative evaluation. Figure 15. The quantitative evaluation process of three models (random sampling, moving average, and exponential smoothing model). Figure 15. The quantitative evaluation process of three models (random sampling, moving average, and exponential smoothing model).

The Concept of Cumulative Slope Index
To evaluate the algorithm quantitatively, the concept of the cumulative slope index can be used. The definition of the cumulative slope index, including other parameters, is shown in Table 3. The slope of the predicted pattern is determined as follows: The slope of the representative electrical pattern is calculated as follows: The cumulative slope index of the predicted pattern is calculated as follows: The cumulative slope index of the representative electrical pattern is calculated as follows: The slope (X, S parameter) is defined as the electricity power variation divided by the time change. The cumulative slope of predicted pattern (CX parameter) and the cumulative slope of representative electrical pattern (CS parameter) are defined as the sum of the next slope value and the cumulative slope value. The cumulative slope index (CP, CR parameter) is defined as the ratio of the cumulative slope divided by the maximum cumulative slope (CX max , CS max parameter). For example, if the cumulative slope index is <80%, it means that the load is not within the maximum electrical load duration. In contrast, when the cumulative slope index is ≥80%, it indicates times within the maximum electrical load duration.

ESS Discharging Condition
Normally, an ESS cannot discharge continuously because of its limited capacity and lifecycle. Considering these characteristics, it is efficient to discharging an ESS only during the period of maximum electrical load. Therefore, the ESS is in discharging condition only while the cumulative slope index is over 80% (i.e., which means the electrical usage is 80% of maximum load). In this paper, the maximum electrical load duration is defined as being when the cumulative index is over 80%. When the ESS is discharging during the maximum electrical load, utility electricity is not used. At that time, we can provide electricity cost saving effects compared to using utility energy all the time. We evaluated the performance of the proposed algorithm under the ESS discharging condition that the cumulative slope index was over 80%.

KEPCO Electricity Bills
The KEPCO electricity bills were used to evaluate a quantitative analysis. The electricity costs from the bills are shown in Tables 4 and 5. We were able to perform quantitative evaluation by comparing the energy use predicted using the proposed algorithm against the predictions made with other techniques, using the actual costs from the bills.

Other Conventional Electricity Prediction Models (Moving Average Method and Exponential Smoothing)
Two conventional electricity prediction techniques are introduced here for comparison. One is the moving average method and the other is the exponential smoothing model.
The moving average is defined as follows: As a sort of smoothing time series, this method is widely used to forecast the electricity usage because the model is simple and practical. By applying the model, we can extract the predicted pattern using the moving average method from average previous data (one and two months ago). The averaged previous data and the predicted pattern extracted from the averaged previous data using the moving average method are summarized in Table 6. Table 6. Average previous data and the predicted pattern extracted from averaged previous data using the moving average method.

Averaged Data for Predicted Pattern for
The exponential smoothing model is another sort of smoothing time series. However, there are some differences in the data size used, for forecasting using the moving average method and the exponential smoothing model. While the moving average method needs all the previous data (one and two months ago), the exponential smoothing model needs only a small amount of recent data.
In this paper, we applied data from the previous week (5 days) to forecast the monthly pattern. For example, to forecast the pattern of the month 2016.03, we applied the data from five days (02/23, 02/24, 02/25, 02/26, and 02/29) to forecast the pattern. The model is defined as follows: As you can see from the equation, the most recent data gets the largest weighting and the old data has weights that decrease exponentially over time. Each date having a different weight for forecasting a monthly pattern is listed in Table 7.

The Performance Results of the Proposed Algorithm
In this section, we examine the performance results of the proposed algorithm. To evaluate the performance, we address the ability to forecast the maximum electrical duration using the KEPCO electricity bills, ESS discharging condition, and cumulative slope index.
As already mentioned, the proposed algorithm performs correlation analysis between components of two data sets using random sampling to achieve a predicted pattern. Using the proposed algorithm, we might do less repetitive correlation analysis by using only the data extracted by random sampling. Then, we compare the performance results of the proposed algorithm with other, more conventional electricity prediction techniques such as moving average and exponential smoothing. To evaluate the performance of three models (random sampling, moving average, and exponential smoothing), the evaluation process was divided into two parts: forecasting accuracy and cost savings.
According to the predicted accuracy for quantitative evaluation, we could use cumulative slope index values. If the cumulative slope index of the representative electrical load was <80%, the load was not at maximum. In contrast, when the cumulative slope index of the representative electrical load was ≥80%, it means that conditions were within the maximum electrical load duration. The results of the cumulative slope index values for each model and representative electrical load are shown in Tables 8-19.             Tables 8-19, some values of CP parameter of cumulative slope index show that the predicted result of the method is analogous to the result of the representative electrical load or indicate that the method forecasts the maximum electrical load duration (cumulative slope index ≥80%) exactly while the representative electrical load was within the maximum electrical load duration.
In contrast, other values of CP parameter of cumulative slope index show that the method forecasts the opposite result compared to the representative electrical load.
Through analysis of the results in Tables 8-19, the cumulative slope index values for the random sampling technique show 86% rates analogous to the representative electrical load among 72 maximum load time zones. The moving average shows 69% and the exponential smoothing model shows 67%.
These results show that the proposed algorithm using random sampling has a more analogous representative electrical pattern for obtaining probabilistically obvious affinities with the patterns for other dates (correlation coefficient ≥0.8) than do the other models (moving average and exponential smoothing).
According to the cost saving effects from quantitative evaluation, we performed an evaluation using the KEPCO electricity bill. Table 20 presents one quantitative evaluation and shows the evaluation result for 2016.04 for three models (random sampling, moving average, and exponential smoothing).
In the same way, the fee for the total duration (from 2016.01 to 2016.12: 12 months), can also be calculated. The evaluation results of 12-day fees and 240-day fees (weekdays over 12 months, excluding holidays) are shown in Table 21. As shown in Table 21, it was found that the proposed algorithm using random sampling could provide electricity cost savings of 0.62-2.28% greater than with the conventional electricity prediction techniques such as moving average and exponential smoothing for the total duration. Cost savings can be specified as the cost difference between moving average and random sampling, or between exponential smoothing and random sampling. Cost saving effects can be presented as cost saving percentages based on random sampling. When calculating the total electricity bill, we did not include basic charges and only considered the unit price in various time zones. This is because we only wanted to discuss the quantitative performance and efficiency of the proposed algorithm using random sampling.

Conclusions
In this paper, we propose an algorithm for forecasting maximum electrical load duration using random sampling and a cumulative slope index. Before moving toward the long-term and ultimate solution such as machine learning, we propose a simple and efficient method to forecast electricity usage patterns and the duration of maximum electrical load using a small data set. The algorithm can forecast electricity usage patterns and maximum electrical load duration for a building with a constant lifestyle and electricity use pattern on a weekly basis utilizing a random sampling technique.
In contrast to sequential correlation analysis, for which we might examine as many data as possible to determine the pattern of electrical usage, the proposed algorithm does repetitive correlation analysis with data extracted by random sampling. With the proposed approach, this paper offers the potential for electrical use patterns to be obtained more quickly than with sequential correlation analysis, in terms of stochastic method.
To verify the effectiveness of the algorithm, we used electricity data (2015. 11-2016.12) for the headquarters building of KEPCO KDN located in Naju City, South Korea, to forecast the electrical use pattern. We proposed an algorithm to provide an alternative approach to forecasting the electrical pattern to represent the usage data each month, employing random sampling and repetitive correlation analysis. To evaluate the performance of the proposed algorithm, we forecast the duration of maximum electrical load with each model (random sampling, moving average, and exponential smoothing) and the quantitative performance of the algorithm was evaluated using KEPCO electricity bills, ESS discharging condition, and cumulative slope index. As a result, it was found that the proposed algorithm can provide electricity cost savings of 0.62-2.28% greater than with other, more conventional electricity prediction techniques such as moving average and exponential smoothing. In near-future research, it is expected that this algorithm will become applicable to electrical applications involving big data, to deal with real-time data processing. It should also offer an alternative method for exploring the use of ESS systems for incorporation into various complex big data systems.