Applications of the Chaotic Quantum Genetic Algorithm with Support Vector Regression in Load Forecasting

: Accurate electricity forecasting is still the critical issue in many energy management ﬁelds. The applications of hybrid novel algorithms with support vector regression (SVR) models to overcome the premature convergence problem and improve forecasting accuracy levels also deserve to be widely explored. This paper applies chaotic function and quantum computing concepts to address the embedded drawbacks including crossover and mutation operations of genetic algorithms. Then, this paper proposes a novel electricity load forecasting model by hybridizing chaotic function and quantum computing with GA in an SVR model (named SVRCQGA) to achieve more satisfactory forecasting accuracy levels. Experimental examples demonstrate that the proposed SVRCQGA model is superior to other competitive models.


Introduction
With rapid economic development, accurate electricity load forecasting has become essential for many energy applications, such as energy generation, power system operation security, load unit commitment, and energy marketing.For example, power system decision makers can optimize load dispatch and adjust the electricity supply/price based on the forecasted loads, i.e., improve the power system management efficiency.As indicated in Xiao et al. [1], in China, there would be a year-long operational benefit with a 1% increase in the forecasting accuracy level.In addition, accurate load forecasting could also help managers set up well electrical power scheduling and successfully reduce system management risks.On the customer side, accurate load forecasting also facilitates the power usage decisions of customers to avoid load usage during the peak times and paying higher electricity prices.This usage balance between peak and bottom periods would lead to reliable power system operation of a utility.On the contrary, inaccurate forecast results would lead to inefficient power system operations and increased operating costs.As mentioned in the literature, a 1% increase in load forecasting error can lead to a loss of millions of dollars [2]. Therefore, as electricity prices also play a critical role in electricity production decisions, there are also several scholars who have proposed electricity price forecasting models in the literature [3,4].Readers may refer to Weron [5] for more comprehensive overviews.
The electricity load data are influenced by lots of factors, such as socio-economical activities, population, weather conditions, holidays, policy, and so on [6].Therefore, the electric load data reveal Energies 2017, 10, 1832 3 of 18 Lee and Lin [28].The main innovative contribution of this paper is hybridizing the chaotic mapping function and quantum computing technique with GA into a SVR model, to improve the problems as mentioned above, and thus achieve improved forecasting accuracy levels.
The remainder of this paper is organized as follows: the implementation details of the proposed SVRCQGA model are demonstrated in Section 2. Brief illustrations of the SVR model and the proposed CQGA are also clearly addressed.Section 3 demonstrates an experimental example and provides a statistical comparison among other benchmarking models proposed in existing papers.Conclusions are provided in Section 4.

Brief Description of the SVR Model
The principal modeling processes of the SVR model are briefly summarized as follows: the training data set, {(x i , y i )} N i=1 , is mapped to a feature space, n h , by the defined function, ϕ(x) : n → n h .The SVR function, f, is employed to linearly formulate the relationship between feature values (i.e., training data, x i ) and forecast values (y i ), and it is shown as Equation (1): where, f (x) is the forecasted values; the weight, w (w ∈ n h ) and coefficient, b (b ∈ ), could be determined during the minimization process of the empirical risk function, Equation (2): where, L ε (y i , f (x)) represents the main empirical risk, it is also the so-called ε-insensitive loss function; C and ε are the essential parameters.When the forecasting error is smaller than ε, the loss would be zero (refer to Equation ( 3)).The second term, 1 2 w T w, is the weight of the SVR function as mentioned, it determines the steepness.Therefore, C represents a trade-off role to balance the empirical risk and the steepness.For quadratic programming, two slack variables, ξ and ξ * , are introduced to measure the length between the actual values and the edge values of ε-tube.Then, Equation (2) could be transformed to the standard programming form with constraints, as shown in Equation ( 4 The solution weight vector, w, in the quadratic programming problem (Equation ( 4) is optimized by using the Lagrange multipliers method, as shown in Equation (5): where γ * i and γ i are the Lagrangian multipliers and satisfy the equality γ * i × γ i = 0. Eventually, the SVR function is formulated as Equation (6): where, K(x i , x j ) is the so-called kernel function, its value could be calculated by the inner product of ϕ(x i ) and ϕ(x j ), i.e., K(x i , x j ) = ϕ(x i ) • ϕ(x j ).There are several kinds of kernel function, the most widely used kernel function is Gaussian function, K(x i , x j ) = exp (−0.5 x i − x j 2 /σ 2 ), due to its excellence in complex nonlinear relationships mapping capability.Therefore, this paper employs a Gaussian function as the kernel function.
The most important job for improving the performance of an SVR model is adjusting well the parameter values, i.e., the three parameters, C, ε, and σ.However, there are no structural methods to efficiently set up the SVR parameters.This paper will continue exploring the feasibility of a chaotic quantum-behaved approach to overcome the disadvantages of genetic algorithms, namely CQGA; and, hybridizing CQGA with the SVR model, producing the SVRCQGA model, to determine the three parameters to improve the forecasting accuracy level.
GA generates new individuals by its advanced operations, including selection, crossover, and mutation operations.Particularly, the mutation operation is effective for making individuals have more satisfactory fitness values, and plays a critical role in maintaining the evolution quality for the population.Therefore, it has been applied to deal with many optimization problems.However, the population diversity would be reduced after repeated iterative computations and this leads to several major drawbacks, such as being time consuming, slow convergence, and becoming trapped in local optima.
Recently, quantum computing techniques have been hybridized with genetic algorithms, i.e., QGA [29].By applying the main computing techniques of quantum computing, including qubit, quantum superposition, and quantum entanglement, the chromosomes in QGA have been presented by qubit coding.In addition, quantum rotation gate operation for the chromosomes is employed during the whole evolutionary process.Therefore, it has lots of superior advantages during searching, such as speedy convergence, time saving, little population scale, and robustness.The applications of QGA also receive attentions in recent years, including traveling salesmen problems, personal scheduling problems, and dynamic economic dispatch problems, as well as improvements [30].For more application details of QGA, readers should refer to Lahoz-Beltra [31].

Quantum Computing Concepts
The quantum computing concepts are briefly described as follows: a quantum bit, abbreviated as qubit, is defined as the smallest information unit.In the quantum system, a qubit may be in the state "0", in the state "1", or in any superposition of these two states.The state of a qubit can be shown as Equation (7): where |0 and |1 are the values of traditional bits 0 and 1, respectively; α 1 and α 2 are the probability of their associate states and meet the normalization condition, as illustrated in Equation ( 8): where |α 1 | 2 is the probability that the qubit is in "0" state, and |α 2 | 2 is the probability that the qubit is in "1" state.For generalization, if a system has n qubits and totally 2 n states, then, the linear superposition of all states can be presented as shown in Equation ( 9): where p k is the probability of its associate state, S k , and meets the normalization condition, The probability of a qubit individual as a string with n qubits is presented as Equation ( 10): where Therefore, in QGA, the chromosome, with n qubits, could be presented as, P = (q 1 , q 2 , . . ., q n ), where q j (j = 1, 2, . . ., n) is an individual qubit of population as shown in Equation (10).
The quantum gate is an operator for qubits to implement unitary transformations, in which, the operation is represented by matrices.The basic quantum gates with a single qubit are the identity gate I and Pauli gates X, Y, and Z, as shown in Equation ( 11): The identity gate I keeps a qubit unchanged, i.
To obtain more results, it is feasible to use the trigonometric function with a phase angle θ, i.e., the so-called quantum rotation gate.The quantum rotation gate (cf.Equation ( 16)), is employed to update as the better solution in its current state: where P is the updated chromosome; θ is the designate angle to be used in the quantum rotation gate.

Implementation Steps of CQGA
The outstanding property of QGA is using quantum mechanics, such as qubits and their state superposition as mentioned above to represent the chromosomes (instead of traditional binary strings).The chromosome is represented as the superposition of all possible states.In the meanwhile, to keep the diversity of the population to avoid premature convergence is also an important issue.Chaos has two advantages: (1) it is sensitive to the initial conditions, i.e., minute changes in initial conditions steer subsequent simulations towards radically different final states; and (2) any variable in the chaotic space can travel ergodically over the whole space of interest, i.e., the so-called ergodicity Energies 2017, 10, 1832 6 of 18 property.Therefore, employing chaotic sequences to keep the diversity of population in the whole optimization procedures, will lead to very different future solution-finding behaviors, due to the ergodicity property.Eventually, chaotic sequences can help to enrich the search behavior and to avoid premature convergence.Considering the above mentioned statements, this paper also applies the chaotic variable to be hybridized with QGA (namely CQGA) to prevent the premature convergence problem.Furthermore, for the better chaotic distribution characteristics of cat function, it is used to generate the chaotic sequence.The two-dimensional cat function [32] is commonly used and is employed in this paper, as shown in Equation ( 17): where frac function is used to keep the decimal parts of a real number x by reducing an approximate integer.The complete processes of the proposed CQGA model is demonstrated in what follows and a brief flowchart is shown in Figure 1.
Energies 2017, 10, 1832 6 of 18 the chaotic variable to be hybridized with QGA (namely CQGA) to prevent the premature convergence problem.Furthermore, for the better chaotic distribution characteristics of cat function, it is used to generate the chaotic sequence.The two-dimensional cat function [32] is commonly used and is employed in this paper, as shown in Equation ( 17): where frac function is used to keep the decimal parts of a real number x by reducing an approximate integer.The complete processes of the proposed CQGA model is demonstrated in what follows and a brief flowchart is shown in Figure 1.Step 1. Set up quantum chromosomes.In this paper, the quantum chromosome is composed of a string of m qubits (superposition of all possible states), as shown in Figure 2.These SVR's three parameters, C, , and , are presented into the quantum qubit superposition format, i.e., each Step 1. Set up quantum chromosomes.In this paper, the quantum chromosome is composed of a string of m qubits (superposition of all possible states), as shown in Figure 2.These SVR's three parameters, C, ε, and σ, are presented into the quantum qubit superposition format, i.e., each chromosome has three genes to represent it.Based on the authors' practical trials and experience, choosing a gene with 40 bits could produce more satisfactory results, thus, a chromosome in total contains 120 qubits (i.e., m = 120).A gene that contains more qubits would be associated with better partitioning around the space.
Energies 2017, 10, 1832 7 of 18 chromosome has three genes to represent it.Based on the authors' practical trials and experience, choosing a gene with 40 bits could produce more satisfactory results, thus, a chromosome in total contains 120 qubits (i.e., m = 120).A gene that contains more qubits would be associated with better partitioning around the space.Step 2. Initialize population.The population of the quantum chromosome is initialized by setting all the amplitudes of qubits as √ [30], i.e., all superposition states has equal probability in the initial population.
Step 3. Evaluate fitness (forecasting errors).Evaluate the objective fitness (forecasting errors) by using the values of each quantum chromosome.The mean absolute percentage error (MAPE), illustrated in Equation (18), is employed to measure the forecasting errors: where N is the total number of forecasting results; is the actual value at each forecasting point i; is the forecasted value at each forecasting point i.
Step 4. Selection.In each generation, an elitist selection mechanism is used to select the best chromosome (with smallest MAPE value), i.e., the competition strategy is applied and as mentioned the best chromosome with the smallest MAPE value is recorded as the elitist and is reproduced as the initial chromosome for the next generation.
Step 5.Quantum crossover.To keep the population diversity, a quantum crossover operation is employed.Based on predefined crossover probability, Pcr (set as 0.9 [26]), the single-point-crossover principle is applied to randomly select two chromosomes to conduct crossover operation at any random position.For each generation, a new chromosomes pool would be generated after the quantum crossover operation is finished.Figure 3 illustrates the processed results of the quantum crossover operation.
Step 6.Quantum mutation.This is a useful approach to ensure population diversity.In this operation, each selected position of the participated quantum chromosome would be mutated with other real numbers according to the designate mutation probability, Pm (set as 0.1 [26]).Figure 4 shows an example of the quantum mutation operation.Step 2. Initialize population.The population of the quantum chromosome is initialized by setting all the amplitudes of qubits as 1 √ 2 [30], i.e., all superposition states has equal probability in the initial population.
Step 3. Evaluate fitness (forecasting errors).Evaluate the objective fitness (forecasting errors) by using the values of each quantum chromosome.The mean absolute percentage error (MAPE), illustrated in Equation ( 18), is employed to measure the forecasting errors: where N is the total number of forecasting results; y i is the actual value at each forecasting point i; f i is the forecasted value at each forecasting point i.
Step 4. Selection.In each generation, an elitist selection mechanism is used to select the best chromosome (with smallest MAPE value), i.e., the competition strategy is applied and as mentioned the best chromosome with the smallest MAPE value is recorded as the elitist and is reproduced as the initial chromosome for the next generation.
Step 5.Quantum crossover.To keep the population diversity, a quantum crossover operation is employed.Based on predefined crossover probability, P cr (set as 0.9 [26]), the single-point-crossover principle is applied to randomly select two chromosomes to conduct crossover operation at any random position.For each generation, a new chromosomes pool would be generated after the quantum crossover operation is finished.Figure 3 illustrates the processed results of the quantum crossover operation.
Step 6.Quantum mutation.This is a useful approach to ensure population diversity.In this operation, each selected position of the participated quantum chromosome would be mutated with other real numbers according to the designate mutation probability, P m (set as 0.1 [26]).Figure 4 shows an example of the quantum mutation operation.Step 7. Quantum rotation gate.This operation modifies the oscillation ranges of individuals to improve the performance by changing the state of each qubit.It is performed by using a quantum rotation gate (as shown in Equation ( 16)), in which the rotation angle is a function of the oscillation amplitudes ( , ), and the value of the individual qubit located at the position i would also be modified accordingly [33].The rotation angle is updated by Equation ( 19): where is the current forecasting error; is the average value of all previous forecasting errors.Based on quantum genetic algorithm performance, a general criterion to set values between 0.1π and 0.005π [31].
Step 8. Premature convergence test.Compute the mean square error (MSE), given by Equation (20), to test the level of premature convergence [34], and set up the criterion value, δ:  Step 7. Quantum rotation gate.This operation modifies the oscillation ranges of individuals to improve the performance by changing the state of each qubit.It is performed by using a quantum rotation gate (as shown in Equation ( 16)), in which the rotation angle is a function of the oscillation amplitudes ( , ), and the value of the individual qubit located at the position i would also be modified accordingly [33].The rotation angle is updated by Equation ( 19): where is the current forecasting error; is the average value of all previous forecasting errors.Based on quantum genetic algorithm performance, a general criterion to set values between 0.1π and 0.005π [31].
Step 8. Premature convergence test.Compute the mean square error (MSE), given by Equation (20), to test the level of premature convergence [34], and set up the criterion value, δ: Step 7. Quantum rotation gate.This operation modifies the oscillation ranges of individuals to improve the performance by changing the state of each qubit.It is performed by using a quantum rotation gate (as shown in Equation ( 16)), in which the rotation angle θ is a function of the oscillation amplitudes (α i ,β i ), and the value of the individual qubit located at the position i would also be modified accordingly [33].The rotation angle θ is updated by Equation ( 19): where f i is the current forecasting error; f avg is the average value of all previous forecasting errors.
Based on quantum genetic algorithm performance, a general criterion to set θ values between 0.1π and 0.005π [31].
Energies 2017, 10, 1832 9 of 18 Step 8. Premature convergence test.Compute the mean square error (MSE), given by Equation (20), to test the level of premature convergence [34], and set up the criterion value, δ: where f is given by Equation ( 21): If the value of the calculated MSE is less than δ, it implies premature convergence occurs.Hence, the chaotic cat function (Equation ( 17)) is employed to escape the local optimum, i.e., finding out new optimum, and set the new optimum as the best solution.
Step 9. Stop criteria.If the number of generations is greater than a given scale, then, the best solution could be the presented quantum chromosomes; otherwise, go back to Step 3 and continue searching the next generation.

Data Sets of Experimental Examples
To compare the performances from the hybrid quantum-behaved evolutionary algorithms with an SVR model, this paper employs the same experimental examples used in Huang [27] and Lee and Lin [28].These three experimental examples are: (1) the regional electricity load data in Taiwan from a published paper [23]; (2) the annual electricity load data in Taiwan from a published paper [23]; and (3) the electricity load data per hour from the 2014 Global Energy Forecasting Competition [35].The data setting details for each examples are summarized in the following.The data characteristics of these three examples are summarized in Table 1.For Example 1, there are in total 20 years of regional electricity load values (from 1981 to 2000) for four regions in Taiwan.Based on the same forecasting performance comparison conditions, the modeling sub-data set division is the same as in a previous paper [23].Thus, three subsets are obtained: a training subset (12 years of load data in total, from 1981 to 1992), a validation subset (a total of 4 years of data, from 1993 to 1996), and a testing subset (a total of 4 years of data, from 1997 to 2000).The well-known window-rolling procedure is employed during the whole process including the electricity load forecasts produced.For details of the window-rolling forecasting procedure readers should refer to Hong [23] and Lee and Lin [28].Three parameters are determined by CQGA, while the validation error is also calculated.The most suitable parameters are finalized only when the smallest validation errors occur.Finally, the four-step (year) load forecasting for each region in Taiwan is implemented by the proposed SVRCQGA model.For Example 2, there are in total 59 annual electricity load values (from 1945 to 2003).Similarly, to make sure the same forecasting performance comparison conditions are used, the modeling sub-dataset division is the same as in a previous paper [23], i.e., a training subset (40 years of data, from 1945 to 1984), a validation subset (10 years of data, from 1985 to 1994), and a testing subset (9 years of data, from 1995 to 2003).The modeling processing details are almost as the same as in Example 1: the window-rolling procedure is applied, then, three parameters are also selected by CQGA.The most suitable parameters are finalized only based on the smallest validation errors.Eventually, the one-step (year) load forecasting in Taiwan is implemented using the proposed model.For Example 3, there are a total of 744 h of electricity load data (from 00:00 1 December 2011 to 00:00 1 January 2012).Similarly, to be based on the same forecasting performance comparison conditions, the modeling sub-data set division is the same as in a previous paper [27,28].Thus, we have a training subset (552 h of load data, from 01:00 1 December 2011 to 00:00 24 December 2011), a validation subset (96 h of load data, from 01:00 24 December 2011 to 00:00 28 December 2011), and a testing subset (96 h of load data, from 01:00 28 December 2011 to 00:00 1 January 2012).The modeling processing details are almost as the same as in the two previous examples: a window-rolling procedure is still used, and the most suitable three parameters must be finalized based only on the smallest validation errors.Finally, the one-step (hour) load forecasting results are obtained using the proposed model.

Setting the CQGA Parameters
The parameters of CQGA for the three experimental examples are set practically: the population scale (P scale ) is set to be 200; the generations of the population (q max ) are no larger than 500; the qubit string length of a quantum chromosome (m) is set as 120; the probabilities of quantum crossover (P cr ) and quantum mutation (P m ) are set as 0.5 and 0.1 [26], respectively.Some controlled parameters during the modeling procedure are set as follows: the maximal iteration for each example is all set as 10,000 in each generation; σ ∈ [0, 5], ε ∈ [0, 100] in all examples, C ∈ [0, 20,000] in Example 1, C ∈ [0, 3 × 10 10 ] in Examples 2 and 3; δ is fixed as 0.001.

Forecasting Accuracy Indexes
To comprehensively compare the forecasting accuracy for each models, the mean absolute percentage error (MAPE; as shown in Equation ( 18)), the root mean squared error (RMSE; as shown in Equation ( 22)), and the mean absolute error (MAE; as shown in Equation ( 23)) are employed: where N is the total number of forecasting results; y i is the actual value at each forecasting point i; f i is the forecasted value at each forecasting point i.

Forecasting Performance Superiority Tests
To ensure the forecasting superiority of the proposed model is statistically significant, it is necessary to conduct some statistical tests to verify the significance of the proposed model.Based on Diebold and Mariano's [36] and Derrac et al.'s [37] suggestions, two tests are conducted in this paper, they are Wilcoxon signed-rank test [38] and Friedman test [39].
(A) Wilcoxon Signed-rank Test The Wilcoxon signed-rank test is used to detect the significance of a difference in the central tendency of two data series when the size of the two data series is equal.The statistic W is represented as Equation ( 24): where: where N is the total number of forecasting results.

(B) Friedman test
The Friedman test is used to measure the ANOVA in nonparametric statistical procedures; thus, it is a multiple comparisons test that aims to detect significant differences between the behaviors of two or more algorithms.The statistic F is represented as Equation (30): where N is the total number of forecasting results; k is the number of compared models; R j is the average rank sum obtained in each forecasting value for each algorithm as shown in Equation ( 31): where r j i is the rank sum from 1 (the smallest forecasting error) to k (the worst forecasting error) for ith forecasting result, for jth compared model.
The null hypothesis for Friedman's test is that equality of forecasting errors among compared models.The alternative hypothesis is defined as the negation of the null hypothesis.

Results and Analysis: Example 1
For Example 1, SVR's three parameter values determine the most suitable model for each region, which are computed by the QGA algorithm and CQGA algorithm, respectively, and with the smallest testing error (MAPE value).These determined parameters for each region are illustrated in Table 2.
For forecasting results comparison details, Table 3 demonstrates the forecasting accuracy indexes of the proposed SVRCQGA and other competitive models [27,28] for each region.Figure 5 illustrates the cumulative differences of MAE for each competitive models in four regions.The competitive models include SVRCQPSO (hybrid SVR with chaotic quantum PSO) [27], SVRQPSO (hybrid SVR with quantum PSO) [27], SVRCQTS (hybrid SVR with chaotic quantum tabu search) [28], and SVRQTS (hybrid SVR with quantum tabu search) [28] models.From Table 3 and Figure 5, it is obvious from the comparison that the proposed SVRCQGA model outperforms the other quantum-SVR-based models.Thus, it once again demonstrates the superiority of an SVR model in that it could obtain a more satisfactory forecasting performance by hybridizing quantum computing mechanics with a genetic algorithm.In the same time, the super capability of the cat mapping function in looking for a closer solution to the theoretical global optimum while suffering from premature convergence is noted.The QGA almost has done its best to look for the best solutions for each region, however, these solutions are still unsatisfactory by comparison with the performances of other alternatives.These solutions could be improved by employing a chaotic mapping function (this paper uses the cat mapping function), i.e., the CQGA, to achieve satisfactory solutions.

Figure 2 .
Figure 2. Quantum chromosome structure for three parameters.

Figure 2 .
Figure 2. Quantum chromosome structure for three parameters.

Figure 3 .
Figure 3. Example of the chromosome form of the parameters for the quantum crossover operation.

Figure 4 .
Figure 4. Example of the chromosome form of the parameter for the quantum mutation operation.

Figure 3 . 18 Figure 3 .
Figure 3. Example of the chromosome form of the parameters for the quantum crossover operation.

Figure 4 .
Figure 4. Example of the chromosome form of the parameter for the quantum mutation operation.

Figure 4 .
Figure 4. Example of the chromosome form of the parameter for the quantum mutation operation.

2 .
Annual Electricity Load Data in Taiwan: Example 2

Table 1 .
Data characteristics summary of three examples.