Electricity Consumption Forecasting using Support Vector Regression with the Mixture Maximum Correntropy Criterion

The electricity consumption forecasting (ECF) technology plays a crucial role in the electricity market. The support vector regression (SVR) is a nonlinear prediction model that can be used for ECF. The electricity consumption (EC) data are usually nonlinear and non-Gaussian and present outliers. The traditional SVR with the mean-square error (MSE), however, is insensitive to outliers and cannot correctly represent the statistical information of errors in non-Gaussian situations. To address this problem, a novel robust forecasting method is developed in this work by using the mixture maximum correntropy criterion (MMCC). The MMCC, as a novel cost function of information theoretic, can be used to solve non-Gaussian signal processing; therefore, in the original SVR, the MSE is replaced by the MMCC to develop a novel robust SVR method (called MMCCSVR) for ECF. Besides, the factors influencing users’ EC are investigated by a data statistical analysis method. We find that the historical temperature and historical EC are the main factors affecting future EC, and thus these two factors are used as the input in the proposed model. Finally, real EC data from a shopping mall in Guangzhou, China, are utilized to test the proposed ECF method. The forecasting results show that the proposed ECF method can effectively improve the accuracy of ECF compared with the traditional SVR and other forecasting algorithms.


Motivation
Electricity consumption forecasting (ECF) refers to the estimation or forecast of the future monthly electricity sales by collating and analyzing historical data. Whether for the bid evaluation system of the State Grid Corporation or for the open electricity market today, electricity sales forecasting is very important, especially for power sales companies, since it will directly affect the deviation assessment. The bigger the deviation, the higher the penalty. Accurate prediction of monthly electricity sales is of vital practical significance and practical value to the power sales company.
ECF is a new problem challenging the current power sales company. In essence it consists of load forecasting. However, compared with load forecasting, it has some new characteristics, such as fewer historical load data and stronger randomness.

Literature Review
Extensive research has been conducted on load forecasting. Ref. [1] established a long-term power load forecasting model by using a support vector machine (SVM) model based on the comprehensive consideration of economic factors, social factors, and energy market structure and optimizing a multi-factor medium and the parameters of SVM by using particle swarm optimization (PSO) and an improved cross-validation method. In Ref. [2], the improved fly optimization algorithm (FOA) algorithm was used to optimize the parameters of the wavelet least-squares support vector machine (using the wavelet kernel function instead of the Gauss kernel function in least-squares support vector machines (LSSVM)), and the combined algorithm was applied to medium-and long-term power load forecasting. In Ref. [3], the author combined the Fuzzy Time Series with Seasonal Autoregressive Fractionally Integrated Moving model to predict load changes, and the PSO algorithm s used to optimize the weights of the model leading to a good prediction result. In Ref. [4], the author proposed a hybrid optimization grey model called Rolling-ALO-GM (1,1) for ECM based on known information from annual EC by using the grey control theory. In Ref. [5], the author used the Drosophila optimization algorithm with decreasing step size (SFOA) to optimize the extended parameters of generalized regression neural network (GRNN). By combining it with the periodic characteristics of weather factors and power load, a SFOA-GRNN power load forecasting model was proposed. The experimental results showed that the improved model was much better than the original model.
The SVM is a supervised learning algorithm which can solve the problems of small sample, non-linearity, and high latitude and is more efficient in solving pattern recognition and regression problems. Therefore, it has been widely used in short-term load forecasting. In Ref. [6], in order to improve the accuracy and efficiency of calculation, the secondary sampling strategy of support vector regression (SVR) learning process was designed, and a method of modeling and forecasting short-term load point estimation and its confidence interval length using subsampled SVR ensemble was obtained. In Ref. [7], a new grid algorithm was proposed, which provides a new method for the determination of parametric regression surface. A sequential grid method based on SVR was established for short-term load forecasting. Compared with the standard SVR model, the accuracy of short-term load forecasting was greatly improved. In order to deal with the cyclic nature of electric loads, Ref. [8] proposed the SVR with chaotic cuckoo search (SVRCCS) model, based on using a tent chaotic mapping function to enrich the cuckoo search space and diversify the population to avoid trapping in local optima, and combined the SVRCCS model with a seasonal mechanism, thus obtaining a seasonal SVRCCS model. The numerical results showed that the proposed novel model outperforms other alternative models. Ref. [9] used the proposed chaotic efficient bat algorithm, niche search, and evolution mechanisms to optimize the parameters of the hybrid kernel-based SVR model and used motion data for a real floating platform to evaluate the reliability and effectiveness of the proposed model. Ref. [10] proposed a novel vector field-based SVR method. Through multi-distortions in the sample data space or high-dimensional feature space mapped by a vector field, the optimal feature space was found, in which the high non-linearity between inputs and outputs was approximated by linearity. The results indicated that the proposed method achieves better performance than commonly used methods with regard to the accuracy, robustness, and generalization ability.
However, the loss function of the traditional SVR is the mean-square error (MSE) criterion. The MSE criterion only considers the second-order distance of the prediction error distribution. It is very effective for data with Gaussian characteristics, which results in high accuracy only when predicting a stationary sequence. Compared with the power consumption of large users and stable users, the power consumption of small users is constrained by many external factors, such as orders, operation plans, equipment, environmental factors, meteorological changes, emergencies, and so on. It produces non-linear characteristics, such as randomness and uncertainty, and prediction errors are often non-gaussian. Therefore, the application of MSE-based SVR in power consumption forecasting will be less accurate compared with that for large users and stable users. It produces non-linear characteristics, such as randomness and uncertainty, and prediction errors are often non-gaussian. Therefore, the application of SVR based on MSE in ECF has some shortcomings. The Neural Network Laboratory of the University of Florida, USA, proposed a new definition of the correlation function of stochastic processes, namely, correntropy [11]. When it is taken as the cost function in machine learning and signal processing, a Maximum Correntropy Criterion (MCC) can be defined for non-gauss, non-linear applications. The MCC contains higher moments of probability density function and is suitable for any noise environment. So far, many robust learning algorithms have been developed using MCC [12][13][14][15][16][17][18]. It has been proved that they can perform a robust analysis [19][20][21][22] and effectively deal with non-Gaussian situations and outliers [23][24][25]. In order to further improve the learning performance, Ref. [26] proposed the concept of the maximum mixture correntropy criterion (MMCC), which uses the mixture of two Gauss functions as the core function. The learning algorithm using MMCC has a better performance than the traditional MCC-based learning algorithm and several other advanced learning algorithms. On the basis of the above analysis, MMCC was introduced into SVR for the first time, and an MMCCSVR forecasting model for EC is proposed in this paper.

Our Contribution
The main contributions of this study can be summarized as follows: (1) In order to solve the problem due to the fact that traditional SVR based on MSE loss function only has high efficiency in data processing with Gaussian characteristics, which leads to its insensitivity to burst and low accuracy in predicting a non-stationary sequence, MMCC is introduced into SVR as loss function, and an MMCCSVR model is proposed.
(2) The MMCCSVR model is applied to ECF for the first time. The simulation results show that the forecasting accuracy of this method is higher than that of traditional SVR and other algorithms.

Organization of the Paper
This paper is structured as follows: Section 2 gives an overview of SVR and MMCC. Section 3 proposes the MMCCSVR model for ECF in detail. Section 4 introduces the main steps of forecasting. In Section 5, the model validation is introduced and next, experimental results and comparisons with traditional SVR and other algorithms are shown and discussed. Finally, a brief summary and a description of future work are given in Section 6.

SVR
The SVR model has its own unique advantages in dealing with such problems as small amount of data, obvious non-linear characteristics, and high-dimensional pattern recognition. It only needs a small number of support vectors to establish decision functions, which can effectively avoid dimension disasters. Moreover, the reduction or increase of non-support vector data has little impact on the algorithm and has better robustness. SVR uses the kernel function ϕ(x) in substitution of nonlinear mapping, the controlled parameters are few, and the algorithm is simple and easy to implement [27].
In an SVR model, suppose the training set sample is (x i , y i )|i = 1, 2, . . . , n , and n is the number of samples. The regression problem can be transformed into an optimization problem, as follows.
where ξ is the error variable, ω is the weight vector, ω ∈ H, b is the bias, and C is the penalty parameter; C is used to control the minimization of the estimation error and the function smoothness. In order to solve the optimization problem, the Lagrange function is developed as where α = (α 1 , α 2 , . . . , α n ) is the Lagrange multiplier. Differentiating L with respect to the variables ω, b, ξ, and α, we obtain and, after solving the above functions, we can obtain the solution of the problem in the following form where K(·) is the kernel function. In this paper, we used Gaussian kernel K(x, to construct nonlinear SVR since it is widely used in the literature and has shown good learning properties in a variety of applications. In the following subsection, we introduce the MMCC method.

Mixture Maximum Correntropy Criterion
First, correntropy is defined as follows where κ(X, Y) is the kernel function, and E[•] is the mathematical expectation. Correntropy is a generalized correlation measure. Suppose (x i , y i ) n i=1 is of the random variables X, Y, the sample mean estimator of corrrentropy isV where g( is the Gaussian kernel with bandwidth σ > 0. As a nonlinear similarity measure, correntropy has been successfully used as an efficient optimization cost in signal processing and machine learning [25], and the corresponding cost function is MCC. In order to further improve the learning performance, Ref. [25] proposed a mixture correntropy that uses the mixture of two Gaussian functions as the kernel function; the mixture correntropy is defined by where σ 1 and σ 2 are the kernel bandwidths of the Gaussian functions G σ 1 (•) and G σ2 (•), and 0 ≤ λ ≤ 1 is the mixture coefficient. Hence, Equation (10) can be written aŝ Especially, when σ 1 = σ 2 , MMCC degenerates into MCC.

SVR with MMCC
The central idea of SVR with MMCC (MMCCSVR) is to integrate the MMCC regularization technique and the kernel trick in a unified framework. Therefore, the MMCCSVR seeks the optimal Concretely, the first term of (12) is based on the aforementioned MMCC (11) in the ϕ − included kernel space and is used as an empirical loss function to measure the fitting error. The second term of (12) makes the required regression estimation f (x) as smooth as possible from the perspective of Tikhonov regularization. It can also be interpreted as a regularization to overcome possible overfitting.
However, there are two difficulties to solve (12) directly. The first one is that the correntropy cost (11) is hard to be optimized. Although Liu et al. proposed a steepest descend-based method in Ref. [27] to tackle this problem, it is not very efficient. The second problem concerning (12) is due to the fact that ω may lie in a high or even infinite dimensional feature space induced by nonlinear map ϕ(·), thus limiting the application of conventional optimization methods. Fortunately, (12) can be converted to a more straightforward form by considering the following proposition to tackle the first problem [28][29][30].

Proposition 1.
There exists a convex function φ: R → R , such that and, for a fixed x, the maximum is reached at p = −g(x, σ).
The proof of Proposition 1 is based on the convex conjugated function theory and can be found in Ref. [31]. Now, we construct the following optimization problem with augmented objective function in an enlarged parameter space: where p = [p 1 , p 2 , . . . , p n ] T stores the auxiliary variables introduced in the half-quadratic optimization. Then, according to proposition 1, we notice that when [ω, b] is fixed, the following equation holds Hence, we can further get the following equation That is to say, maximizing J(ω, b) with respect to [ω, b] is equivalent to maximizing the augmented function J(ω, b, p) in the enlarged parameter space [ω, b, p]. Now, to overcome the second problem, we introduce a slack vector to convert the unconstrained optimization problem (16) to the following constrained problem ξ = [ξ 1 , ξ 2 , . . . , ξ n ] T , as in standard SVR We can choose to iteratively optimize (17) by alternating the optimization with respect to either [ω, b, ξ i ] or p, while holding the other term fixed.
First, we hold p fixed and maximize (17) with respect to [ω, b, ξ i ]. By dropping the other unrelated variables and introducing the variable q i = −p i , we obtain the following equivalent problem In order to solve the above problem, we introduce the following Lagrangian function where α i is a Lagrangian multiplier. Then, by following the Karush-Kuhn-Tucker (KKT) condition [32], Combing Equations (20)-(23) leads to the following linear system of equations Noticing that K + Q is symmetric positive-definite and thus invertible, we can efficiently solve (23) by the following equations [33]. Second, we hold [ω, b, ξ] fixed and optimize (17) with respect to p. Actually, according to Proposition 1, the optimal p is directly given by Up to now, we have solved both sub-problems constituting the original optimization problem (17). For the sake of clarity, we prescribe the proposed MMCCSVR as follows. First, set the parameter values p i = −q i = −1 for all samples, tolerance ε = 1e − 3, MMCC parameter σ 1 and σ 2 , kernel parameter γ, regularization parameter C, build kernel matrix K.
Step 1. Calculate the diagonal matrix Q and then solve (25) by Cholesky factorization to obtain [α, b].
Step 2. If [α, b] changes less than ε, go to Step 4, otherwise go to Step 3.
Step 3. Calculate the error variable ξ i by (22) and update p i = −q i by (26) and then go to Step 1.
Referring to the convergence analysis method in Ref. [28][29][30], the convergence of the proposed algorithm is proved as follows.
According to Step 1 and Step 2, Based on the property of correntropy, we can verify that the objective function J(ω, b, p) is bounded above, since g(x, σ) and − ω 2 are both bounded above. Due to the above facts and the well-known monotone convergence theorem, the algorithm is guaranteed to converge to its local optimal solution.

Characteristic Analysis of Electricity Consumption Data
By analyzing the characteristics of users' EC, power sales company can accurately understand the demand response mechanism and formulate scientific marketing strategies. This is of great significance for peak shaving and valley filling, optimizing the EC curve, and improving power quality.
The monthly EC of a shopping mall in Guangdong Province in 2017 was analyzed as an example. Figure 1 shows the daily EC of the commercial property common area of the mall and its corresponding daily maximum temperature curve for 2017, which are normalized values. As shown in Figure 1, the greatest influence factor of Heating, Ventilation and Air Conditioning (HVAC) EC is the temperature, and it can be seen that the total EC of the property public area of the shopping mall fluctuated, obviously, according to the season and had strong non-linear and fluctuating characteristics. Besides the similar fluctuating trends of HVAC and the total EC of the property public area of the shopping mall, the other sub-items showed a steady trend and did not change with time.

Characteristic Analysis of Electricity Consumption Data
By analyzing the characteristics of users' EC, power sales company can accurately understand the demand response mechanism and formulate scientific marketing strategies. This is of great significance for peak shaving and valley filling, optimizing the EC curve, and improving power quality.
The monthly EC of a shopping mall in Guangdong Province in 2017 was analyzed as an example. Figure 1 shows the daily EC of the commercial property common area of the mall and its corresponding daily maximum temperature curve for 2017, which are normalized values. As shown in Figure 1, the greatest influence factor of Heating, Ventilation and Air Conditioning (HVAC) EC is the temperature, and it can be seen that the total EC of the property public area of the shopping mall fluctuated, obviously, according to the season and had strong non-linear and fluctuating characteristics. Besides the similar fluctuating trends of HVAC and the total EC of the property public area of the shopping mall, the other sub-items showed a steady trend and did not change with time.
Therefore, considering the influence of HVAC EC, it was possible to forecast the EC of the whole commercial property public area and further analyze the influencing factors of HVAC EC.  Therefore, considering the influence of HVAC EC, it was possible to forecast the EC of the whole commercial property public area and further analyze the influencing factors of HVAC EC.
Comparing the trends of daily air conditioning power consumption and daily maximum temperature, we obtained Figure 2. Comparing the trends of daily air conditioning power consumption and daily maximum temperature, we obtained Figure 2.  In order to further study their correlation, we calculated Pearson correlation coefficients between historical HVAC daily EC values and their corresponding daily maximum temperature values.
The results showed that there was a great positive correlation between HVAC EC and air temperature. Therefore, air temperature and historical EC were input in the prediction model in order to improve the prediction accuracy of user EC.

Data Preprocessing
In the process of forecasting, the difference of magnitude of the data may affect the accuracy of forecasting and the speed of forecasting, so the initial data should be normalized to ensure the accuracy of the forecasting results. In the equation below, y i indicates the normalized data. The formula for data normalization is as follows: Generally, the initial data are normalized to [0,1] to ensure the speed and accuracy of the forecasting process.
After the prediction results are obtained, the obtained results should be anti-normalized to get the actual value of the predicted electricity. The inverse normalization formula is as follows:

Parameter Optimization
In order to ensure the accuracy of the results and the stability of the model, it is necessary to optimize the parameters of the ECF model. In this paper, the grid optimization method and K-fold cross-validation were combined to optimize the parameters.
It is convenient and simple to use the grid optimization method to optimize the parameters, because it is faster than other intelligent algorithms and can generally find a global optimal solution. The algorithm represents each set of parameters in the form of a range grid, traverses each set of parameters by using the independence of each grid, and then verifies the rationality of the set of parameters by K-fold cross-validation theory. Based on experience, most of the parameters σ 1 and σ 2 of MMCC were selected in [0,50]. In the SVR model, both C and the kernel parameter γ have a wide range of values, but a too large range of values will lead to a lower search speed. After many experiments, the probability that γ falls into the [0, 1] interval is higher. Therefore, the grid range of the model parameters were specified in this paper as C = [0, 20]

Model Implementation
This paper proposes an ECF model based on MMCCSVR considering the users' characteristics. Based on the robustness of MMCC in non-Gaussian sequence processing, the MMCCSVR ECF model was established to predict future EC. The parameters were optimized by combining grid optimization and K-fold cross-validation. The prediction model is where x is input data, including historical EC and corresponding temperature data, f (x) is the forecasting results of EC. The specific implementation process is shown in Figure 3. where x is input data, including historical EC and corresponding temperature data, ( ) is the forecasting results of EC. The specific implementation process is shown in Figure 3.

Evaluation Criterion
In order to comprehensively and extensively evaluate the effectiveness of the proposed forecasting model, five widely used performance metrics in statistics, i.e., mean absolute error (MAE), root-mean-square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R 2 ), were considered. The MAE, RMSE, and MAPE were used to measure the deviation between the actual EC and the prediction load. In addition, R 2 indicates the extent to which

Evaluation Criterion
In order to comprehensively and extensively evaluate the effectiveness of the proposed forecasting model, five widely used performance metrics in statistics, i.e., mean absolute error (MAE), root-mean-square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R 2 ), were considered. The MAE, RMSE, and MAPE were used to measure the deviation between the actual EC and the prediction load. In addition, R 2 indicates the extent to which the EC is predictable. These five metrics are defined as follows:

Results
In order to verify the practicability and forecasting efficiency of the proposed ECF method, we considered the EC data of a shopping mall in Guangdong Province from 1 January 2018, to 3 June 2018, and the corresponding daily maximum temperature as the research object. We selected the daily EC from 1 January to 3 May as the training sample, forecasted the EC of 31 days from 4 May to 3 June 2018, one day ahead, and evaluated comprehensively the ECF model based on MMCCSVR.

Parameters Selection
In this subsection, the joint efficiency of the free parameters for the prediction accuracy was investigated by using the grid optimization and conducting many experiments. Because the penalty coefficient C showed various empirical values in different applications, and different λ values correspond to a set of different optimal σ 1 and σ 2 , which can make the prediction accuracy meet the requirements, we fixed C = 7 and λ = 0.3, and then determined the optimal σ 1 and σ 2 . As can be seen in Figure 4, when σ 1 and σ 2 were within the interval [15,25], the prediction accuracy was higher. Taking σ 1 as an example, we further studied the variation of prediction accuracy with kernel parameters. Figure 5 and Table 1 show the prediction accuracy corresponding to different σ 1 values. The highest prediction accuracy was achieved at σ 1 = 20; therefore, σ 1 = 20 was chosen in this paper. Using the same method, we obtained the optimal σ 2 = 15 in this case. the requirements, we fixed 7 = C and 3 . 0 = λ , and then determined the optimal 1 σ and 2 σ . As can be seen in Figure 4，when 1 σ and 2 σ were within the interval [15,25], the prediction accuracy was higher. Taking 1 σ as an example, we further studied the variation of prediction accuracy with kernel parameters.

Comparison of the Forecast Results Obtained with Different Inputs
According to the parameter selection method described in Section 3.3 and 5.1, the parameters were set as: In order to further illustrate the necessity of using the temperature as an input variable, the prediction results of two kinds of inputs were compared with those of using only historical electric energy as input. The final prediction results are shown in Figure 6. It can be seen from Figure 6 that when only historical electric energy was used as input, a better prediction value could be obtained at the smooth point of the curve, but the prediction at the inflection point was not as good as that of the MMCCSVR prediction model with temperature as input. By comparing the error analysis results of the proposed method with those obtained by using only historical EC as input, it was found that the average MAPE was about 3.29% lower than that of the proposed method. Therefore, the prediction accuracy can be improved to a certain extent by using two kinds of input: EC and temperature.

Comparison of the Forecast Results Obtained with Different Inputs
According to the parameter selection method described in Sections 4.3 and 5.1, the parameters were set as: C = 7, λ = 0.3, σ 1 = 20, σ 2 = 15.
In order to further illustrate the necessity of using the temperature as an input variable, the prediction results of two kinds of inputs were compared with those of using only historical electric energy as input. The final prediction results are shown in Figure 6. It can be seen from Figure 6 that when only historical electric energy was used as input, a better prediction value could be obtained at the smooth point of the curve, but the prediction at the inflection point was not as good as that of the MMCCSVR prediction model with temperature as input. By comparing the error analysis results of the proposed method with those obtained by using only historical EC as input, it was found that the average MAPE was about 3.29% lower than that of the proposed method. Therefore, the prediction accuracy can be improved to a certain extent by using two kinds of input: EC and temperature.
The relative errors of daily forecasts are shown in Table 2. The characteristics of the shopping mall EC vary in the week. From the data, it appears that the EC is low on Monday and peaks on Saturday and Sunday. Between 4 May and 3 June 2018, the weekends were 5-6 May, 12-13 May, 19-20 May, 26-27 May, and 2-3 June. As shown in Table 2, the peak and valley values of EC were accurately predicted in the first half month, whereas the predicted values were mostly lower than the true values in the second half month. According to the analysis of historical electricity and temperature data, the temperature in May 2018 was slightly higher than that in May of the previous year, but the thunderstorm weather in the first half of the month may have affected customer flow and activities in the shopping mall. The weather in the second half of the month continued to be cloudy, and the EC increased, since the temperature was slightly higher than in previous years.
According to the parameter selection method described in Section 3.3 and 5.1, the parameters were set as: In order to further illustrate the necessity of using the temperature as an input variable, the prediction results of two kinds of inputs were compared with those of using only historical electric energy as input. The final prediction results are shown in Figure 6. It can be seen from Figure 6 that when only historical electric energy was used as input, a better prediction value could be obtained at the smooth point of the curve, but the prediction at the inflection point was not as good as that of the MMCCSVR prediction model with temperature as input. By comparing the error analysis results of the proposed method with those obtained by using only historical EC as input, it was found that the average MAPE was about 3.29% lower than that of the proposed method. Therefore, the prediction accuracy can be improved to a certain extent by using two kinds of input: EC and temperature.   Table 2 shows the predicted values using MMCCSVR and the traditional SVR from 4 May to 3 June 2018, and the MAPE between each predicted value and the corresponding actual value. As can be seen from Table 2, compared with the SVR model, the MMCCSVR model allowed to reduce the MAPE of almost every predicted value, especially when approaching the turning point of fluctuation. The results show that using MMCC as the loss function of SVR can significantly improve the prediction accuracy.

Comparison of Different Forecasting Methods
In order to test whether the proposed model can effectively improve the prediction accuracy, its prediction results were compared with those of other methods. We set the parameters as follows: SVR: C = 7, γ = 0.1; Back-propagation algorithm (BP): 10 Hidden layer.
The comparison results are shown in Figure 7. As shown in Figure 7, at the inflection point of the curve, the prediction efficiency of MMCCSVR is obviously better than those of other algorithms, showing the robustness and stability of the proposed method. Moreover, the predicted values changed appropriately during the analysis period, even coinciding with the true values at some individual points. the MAPE of almost every predicted value, especially when approaching the turning point of fluctuation. The results show that using MMCC as the loss function of SVR can significantly improve the prediction accuracy.

Comparison of Different Forecasting Methods
In order to test whether the proposed model can effectively improve the prediction accuracy, its prediction results were compared with those of other methods. We set the parameters as follows: SVR: Back-propagation algorithm (BP): 10 Hidden layer. MMCCSVR: The comparison results are shown in Figure 7. As shown in Figure 7, at the inflection point of the curve, the prediction efficiency of MMCCSVR is obviously better than those of other algorithms, showing the robustness and stability of the proposed method. Moreover, the predicted values changed appropriately during the analysis period, even coinciding with the true values at some individual points. The percentages of relative error of the different methods are shown in Table 3. Table 3 shows that MMCCSVR has better performance compared with other algorithms, displaying improved prediction accuracy of both global and outlier values.  The percentages of relative error of the different methods are shown in Table 3. Table 3 shows that MMCCSVR has better performance compared with other algorithms, displaying improved prediction accuracy of both global and outlier values.
The relative errors of MMCCSVR for different inputs compared with those of other methods from 4 May to 3 June 2018, are presented in Figure 8. As shown in Figure 8, the relative error of MMCCSVR was evenly distributed on both sides of 0 and had little fluctuation. The relative error of MMCCSVR using historical electricity as input was very large, due to the fact that the temperature changes were not taken into account. BP and traditional SVR algorithms were unable to predict electricity accurately under the influence of random factors. The relative errors of MMCCSVR for different inputs compared with those of other methods from May 4 to June 3, 2018, are presented in Figure 8. As shown in Figure 8, the relative error of MMCCSVR was evenly distributed on both sides of 0 and had little fluctuation. The relative error of MMCCSVR using historical electricity as input was very large, due to the fact that the temperature changes were not taken into account. BP and traditional SVR algorithms were unable to predict electricity accurately under the influence of random factors.   Table 4 proves the superiority of MMCCSVR over single-input MMCCSVR and the traditional SVR and BP algorithm for four error evaluation indexes; the four error evaluation indexes were obviously reduced, and in particular, the MAPE of this method was about 1.79%, and the R 2 was about 0.9781. Therefore, compared with the SVR method and the BP neural network algorithm, the proposed method effectively improved the prediction accuracy, and when temperature was added as input, the prediction efficiency was further improved. Considering the change of EC for a shopping mall in different months, we implemented the method by using EC data for a whole year, using 65% of the data as the training set and 35% as the testing set. We selected the EC data from 1 January to 25 August 2018 as the training sample and forecasted the EC data from 26 August to 31 December 2018, one-day ahead. The forecasted results are shown in Figure 9. As shown in Figure 9, when more data were used to validate the model proposed in this paper, the prediction curve was still close to the actual curve. Only a few curve inflection points presented inevitable errors due to random factors and the impact of emergencies, resulting in larger errors. In addition, at the inflection point of the curve, the prediction ability of MMCCSVR was obviously better than those of other algorithms, showing its robustness and stability. The MMCCSVR achieved good overall prediction, with the predicted values even coinciding with the true values at individual points.   Table 4 proves the superiority of MMCCSVR over single-input MMCCSVR and the traditional SVR and BP algorithm for four error evaluation indexes; the four error evaluation indexes were obviously reduced, and in particular, the MAPE of this method was about 1.79%, and the R 2 was about 0.9781. Therefore, compared with the SVR method and the BP neural network algorithm, the proposed method effectively improved the prediction accuracy, and when temperature was added as input, the prediction efficiency was further improved.
Considering the change of EC for a shopping mall in different months, we implemented the method by using EC data for a whole year, using 65% of the data as the training set and 35% as the testing set. We selected the EC data from January 1 to August 25 in 2018 as the training sample and forecasted the EC data from August 26 to December 31, 2018, one-day ahead. The forecasted results are shown in Figure 9. As shown in Figure 9, when more data were used to validate the model proposed in this paper, the prediction curve was still close to the actual curve. Only a few curve inflection points presented inevitable errors due to random factors and the impact of emergencies, resulting in larger errors. In addition, at the inflection point of the curve, the prediction ability of MMCCSVR was obviously better than those of other algorithms, showing its robustness and stability. The MMCCSVR achieved good overall prediction, with the predicted values even coinciding with the true values at individual points. The relative errors of MMCCSVR for different inputs compared with those of other methods for August 26 to December 31, 2018, are given in Figure 10. As shown in Figure 10, the relative error of MMCCSVR is evenly distributed on both sides of 0 and has little fluctuation. BP and traditional SVR algorithms were unable to predict electricity accurately under the influence of random factors. The relative errors of MMCCSVR for different inputs compared with those of other methods for 26 August to 31 December 2018, are given in Figure 10. As shown in Figure 10, the relative error of MMCCSVR is evenly distributed on both sides of 0 and has little fluctuation. BP and traditional SVR algorithms were unable to predict electricity accurately under the influence of random factors.   Table 5 proves the superiority of MMCCSVR over the traditional SVR and BP algorithm, as shown by four error evaluation indexes; when the experimental data increased, the four error evaluation indexes were obviously reduced. We found that the MAPE of this method was about 3.86% and R 2 was about 0.9846. Therefore, compared with the SVR method and BP neural network algorithm, the proposed method has good predictive performance which can effectively improve the prediction accuracy.

Conclusions
Around the urgent demand of ECF technology for power sales companies under the wave of power system reform, a novel forecast method for ECF, called MMCCSVR, is proposed, combining the SVR with MMCC. First, the user's characteristics were analyzed to determine the factors influencing EC, indicating that the historical temperature and historical EC are the main influencing factors. Second, in view of the strong fluctuation of EC for small consumers and the large influence of random factors, the MMCCSVR algorithm with excellent performance under non-Gaussian noise was selected to construct the prediction model, which was applied to a shopping mall in Guangzhou. The following conclusions were reached: 1) Compared with the single-input MMCCSVR prediction model, the single-point prediction accuracy was effectively improved, and the average relative error was reduced. 2) Compared with the traditional SVR and other algorithms, the prediction errors of peak and valley values of EC were improved effectively.
3) The prediction error MAPE of this model was 1.79% and met the assessment criteria of power deviation in the location of the shopping mall and the prediction accuracy requirement of the power sales company.  Table 5 proves the superiority of MMCCSVR over the traditional SVR and BP algorithm, as shown by four error evaluation indexes; when the experimental data increased, the four error evaluation indexes were obviously reduced. We found that the MAPE of this method was about 3.86% and R 2 was about 0.9846. Therefore, compared with the SVR method and BP neural network algorithm, the proposed method has good predictive performance which can effectively improve the prediction accuracy.

Conclusions
Around the urgent demand of ECF technology for power sales companies under the wave of power system reform, a novel forecast method for ECF, called MMCCSVR, is proposed, combining the SVR with MMCC. First, the user's characteristics were analyzed to determine the factors influencing EC, indicating that the historical temperature and historical EC are the main influencing factors. Second, in view of the strong fluctuation of EC for small consumers and the large influence of random factors, the MMCCSVR algorithm with excellent performance under non-Gaussian noise was selected to construct the prediction model, which was applied to a shopping mall in Guangzhou. The following conclusions were reached: (1) Compared with the single-input MMCCSVR prediction model, the single-point prediction accuracy was effectively improved, and the average relative error was reduced. (2) Compared with the traditional SVR and other algorithms, the prediction errors of peak and valley values of EC were improved effectively. (3) The prediction error MAPE of this model was 1.79% and met the assessment criteria of power deviation in the location of the shopping mall and the prediction accuracy requirement of the power sales company.