Long Term Energy Consumption Forecasting Using Genetic Programming

Managing electrical energy supply is a complex task. The most important part of electric utility resource planning is forecasting of the future load demand in the regional or national service area. This is usually achieved by constructing models on relative information, such as climate and previous load demand data. In this paper, a genetic programming approach is proposed to forecast long term electrical power consumption in the area covered by a utility situated in the southeast of Turkey. The empirical results demonstrate successful load forecast with a low error rate.


INTRODUCTION
Load management is a capability required by load distribution centers and electric utilities.Providing the balance between supply and demand in energy market and supplying the customers with more efficiency are available via satisfactory and reliable load management.Load forecasting is a requirement for successful load management.An accomplished load forecasting makes possible the trustiest planning for future.Especially, long term load forecasting is guidance for maintenance of electricity installations and construction planning.Therefore, power system engineers and electricity generation/distribution utilities attach importance to load forecasting.Hence, new forecasting methods have been more attractive by power utilities for the reliable and actual estimations.
Load forecasting analyses can be classified as short term, mid term and long term in general.Hourly and daily (24 hours) forecasts can be classified as short term load forecasting.Mid and long term forecasting cover the weekly, monthly, seasonal and annual forecasts.
Short term load variations are quite non-linear due to dependence on knowledge measured in short times from several minutes to an hour.Mid term or intermediate term forecasts cover the duration from a few days to several months.In long term forecasting, forecast time varies between 1 and 10 years.Load profile can be obtained using short term estimations.However, power system planners require the mid and long term load forecasts to make decisions about planning the mid and long term maintenance, preparing the future investment schedules and developing the generation, transmission and distribution systems.Planning of future investment for the constructions depends on the accuracy of the long term load forecasting considerably [1].Therefore, several estimation methods have been applied for short, mid and long term load forecasting.Conventional load forecasting techniques are based on statistical methods.Stochastic time series [2,3], autoregressive models [4], non-parametric regression models [5,6] were used in load forecasting.Also several soft computing techniques were used as load estimator, such as fuzzy regression [7,8], self organizing map (SOM) [9], non-fixed neural network [10], dynamic neural networks [11], a combination of regression analysis and a fuzzy inference system [12] and fuzzy regression tree and multi-layer perceptron of ANN model [13].
In this paper, we present a genetic programming approach on the forecasting of long term electrical power consumption of a moderate city in Turkey.We use the genetic programming method to forecast future usage through symbolic regression using annual data of the previous years.
The rest of the paper is structured as follows: section 2 summarizes methods of forecasting an unknown function, section 3 gives the details of the application of genetic programming to load forecasting and section 4 summarizes the results and a conclusion is given.

METHODS OF FORECASTING AN UNKNOWN FUNCTION
Approximating an unknown function with sample data is an important practical problem.In order to forecast an unknown function using a finite set of sample data, a function is constructed to fit sample data points.This process is called curve fitting.There are several methods of curve fitting.Interpolation is a special case of curve fitting where an exact fit of the existing data points is expected.In other forms of curve fitting, an approximate fit can be permitted.The term "regression" is used to include many different methods of curve fitting.
Once a model is generated, acceptability of the model must be tested.There are several measures to test the goodness of a model.Sum of absolute difference, mean absolute error, mean absolute percentage error, sum of squares due to error (SSE), mean squared error and root mean squared errors can be used to evaluate models.Minimizing the squares of vertical distance of the points in a curve (SSE) is one of the most widely used methods.

Genetic Programming and Symbolic Regression
In conventional regression, one has to decide on the approximation function (can be an n-degree polynomial, non-polynomial, or a combination of both) and try to find the coefficients of this selected function.Constructing an approximation function can be a difficult task.There is another form of regression called "symbolic regression".In the symbolic regression problem, the aim is to search a symbolic representation of a model, instead of only searching for coefficients of a predefined model.Genetic programming (GP) method introduced by Koza [14] can be used for the symbolic regression problem.GP searches for the model and coefficients of the model at the same time.
GP simultaneously works on a group of possible solutions instead of a single solution.This group of candidate solutions is called a "population".GP as a whole tries to find the best solution.Since the original function is unknown, GP searches for an approximate hence, an "acceptable" solution.GP is an extension to genetic algorithms where individuals are programs or functions.
In GP, individuals are represented as trees.Elements of the trees are functions and terminals.Terminals are the variables and the functions are operations applied to these variables forming the model together.For example, Fig. 1 shows the representation of the simple expression, x*y+x/2.

Fig. 1. An example tree representation in genetic programming
In applying genetic programming to a problem, there are five major preparatory steps.These five steps involve determining; (1) Predefined set of terminals, (2) Predefined set of primitive functions, (3) A fitness measure (called a "fitness function" to specify what needs to be done) (4) The parameters for controlling the run, such as population size, reproduction operators, probabilities of the operator and so on, and (5) The method for designating a result and the criterion for terminating a run [15].Genetic programming is an iterative method.The algorithm used by genetic programming in finding solutions is summarized in Fig. 2.

Fig. 2. Simplified form of the algorithm used by genetic programming
The first step in the genetic programming algorithm is the generation of an initial population either by using random compositions of the functions and terminals or by using a predefined strategy.
In the next step, the termination condition is checked.If the termination condition is reached, the process is ended and best so far result is reported.If the termination condition is not reached, the following steps are repeated: (1) Each individual in the population is evaluated and assigned a fitness value using the fitness function.
( (3) The individual that is identified by the method of result designation is reported as the result for the run (e.g., the best-so-far individual).This result may be a solution (or an approximate solution) to the problem [16].
Crossover operation is one of the two primary reproduction operators.In the crossover operation, two solutions are combined to form two new solutions or offspring.A random point is chosen in both of the parents, and the nodes below the crossover points are exchanged between parents.

APPLICATION OF GENETIC PROGRAMMING TO LOCAL DISTRIBUTION LOAD FORECASTING
In this study, power consumption data is processed with both conventional regression analysis and genetic programming techniques.We performed all of the computations described in this study using Matlab TM (The MathWorks Inc., Natick, MA) on an IBM PC with 3 GHz Pentium IV processor, 512 MB of memory, and Windows-XPTM Professional operating system.Curve fitting tool of Matlab (cftool) is used for conventional regression and GPLAB Toolbox for Matlab (available from http://gplab.sourceforge.net) is used for applying genetic programming.
Curve fitting tool of Matlab can be used to fit data using polynomial, exponential, rational, Gaussian and other equations.It also provides statistics to evaluate the goodness of a fit produced.
GPLAB is a free, highly configurable and extendable genetic programming toolbox supporting up-to-date features of the recent genetic programming research.Curve fitting tool is used for comparison with the genetic programming application.Among the different types of the fit, a 4th degree polynomial and a power equation the following form has produced the best results.Coefficients are calculated with 95% confidence bounds.
The equation found for the power equation is f(x) = ax b +c with calculated coefficients a=1.914, b=0.3409, c=-0.3431.
Same data has been used in GPLAB to find a symbolic model.The parameters used in the GP application are listed in Table 1.Standard crossover and mutation operators of the GPLAB are used.GPLAB supports adaptive probabilities for genetic operators.Probabilities of operators generating better individuals in each run are incremented and decremented if they produce worse individuals.
"rand" operator generates a random number between 0 and 1. "mydivide" operator returns the dividend if divisor is zero or the division otherwise."mypower" operator returns 0 if X1^X2 is NaN (not a number), infinite, or has imaginary part, otherwise returns X1^X2.Fitness function is the standard function of GPLAB used for symbol regression problems.To be able to compare the results of GP with other regression methods, SSE for the best solution found by GP is also calculated.When selecting individuals for reproduction, tournament method is used with a tournament size of 25.The GP has been run for 200 generations with a population size of 200.Since values of the data are large, each annual consumption value (yn) is represented as y n '= y n / 10 9 in all three models to make calculations efficiently.

RESULTS
The input to all three methods and the output calculated for existing data are given in table 2. All three models are constructed using input data.After the models have been constructed, power consumption values for the previous years are calculated using these models.Data in the table has also been shown graphically in Fig. 5.  3.As seen from the table, GP has outperformed the other two methods.Thus, it can be concluded that GP has produced a better model.

CONCLUSION
Long term power consumption forecasting can provide important information for power distribution centers.As seen from the data, power consumption is in this city is rapidly growing, therefore accurate forecasts can help authorities to make reliable plans.
In this work, a genetic programming based forecasting method is presented.Two other curve fitting methods are also presented for comparison with this technique.Data used in all three models are not preprocessed.Genetic programming technique is used to form a model and evaluate the parameters for the model.The goodness of the fit produced by the genetic programming method is evaluated using SSE method, which is better than the other two methods of regression.We can say that the genetic programming can be used for electric utility resource planning and forecasting of the future load demand in the regional or national service area effectively.
) A new population is created by applying the following operations.The operations are applied to individuals chosen from the population with a probability based on fitness: (i) Darwinian Reproduction: Reproduce an existing individual by simply copying it into the new population.(ii) Crossover: Create two new individuals by genetically recombining randomly chosen parts of two existing individuals using the crossover operation.(iii) Mutation: Create one new individual from one existing individual by mutating a randomly chosen part of the individual.
Fig 3 shows an example of crossover operation.

Fig. 3 .
Fig. 3.An example of crossover operation (X denotes the crossover point)

Fig. 5 .
Fig. 5. Comparison of input data and values produced by three methods

Fig. 6 .
Fig. 6.Comparison of forecasts produced by three methods

Table 1 .
Parameters used in the genetic programming application

Table 3 .
SSE values of the three methodsForecasts for the next years from the three models are given in table 4. Forecasts using polynomial and power equations model have 95% prediction bounds.Forecasts are calculated by providing new years to constructed models and computing the corresponding values.These results are also shown in Fig.6.

Table 4 .
Forecasts from three models