Short-Term Load Forecasting with Tensor Partial Least Squares-Neural Network

: Short-term load forecasting is very important for power systems. The load is related to many factors which compose tensors. However, tensors cannot be input directly into most traditional forecasting models. This paper proposes a tensor partial least squares-neural network model (TPN) to forecast the power load. The model contains a tensor decomposition outer model and a nonlinear inner model. The outer model extracts common latent variables of tensor input and vector output and makes the residuals less than the threshold by iteration. The inner model determines the relationship between the latent variable matrix and the output by using a neural network. This model structure can preserve the information of tensors and the nonlinear features of the system. Three classical models, partial least squares (PLS), least squares support vector machine (LSSVM) and neural network (NN), are selected to compare the forecasting results. The results show that the proposed model is efﬁcient for short-term load and daily load peak forecasting. Compared to PLS, LSSVM and NN, the TPN has the best forecasting accuracy.


Introduction
Load forecasting is very important in the planning, operation and maintenance of power system [1,2].The short-term forecasting technique can be used to predict the load in the next few hours or days.The forecasting accuracy directly affects the generation plan, the optimal combination of the generator, power flow calculation, electricity market transaction, power real-time dispatching, etc.
A good prediction model is always the key issue of load forecasting.Traditionally, forecasting was mainly based on the previous information of a certain time period of the load situation.In spite of the length of the time period, from the perspective of the data structure, the input of the prediction model is a time series of the power load.In fact, besides the previous situation, the power load is also related to many factors such as seasons, meteorological conditions and people's living habits [3,4].In order to improve the forecasting accuracy of load, the influence of these factors must be considered.Therefore, the input of the prediction model changes from time series to tensor, which has a complex structure [5].Tensors are physical quantities which can be expressed by combinations of several base vectors and their components.Theoretically, tensors can represent physical quantities with arbitrary complex relations since the abundant combinations of base vectors.Using tensors to express the multiple factors affecting power load accords with the essential characteristics of the factors.Because the factors have different measurement and characterization methods, it may cause information loss if the data obtained from these factors were represented in the same dimension.Although the information and relationships in different dimensions represented by tensors are recessive, the representation method retains the complete information of the high-dimensional data and the recessive relationships can be explicated by tensor algorithms.Some regression algorithms are adopted for building prediction models.Partial least squares (PLS) is a classical model which maps the variables in a new feature space with lower dimensions.It is widely used in various fields such as fault detection and diagnosis, robot, industrial process control, traffic safety, etc. [6][7][8][9].However, The method cannot be used directly for tensors.Some improved methods, which are combined with unfolding, are proposed [10,11].The main idea of unfolding is converting tensors into matrices so that original tensors can be replaced by the matrices which reserve the original values of every element.This process destroys the structure of tensors which may include a priori information and makes the physical meaning of data hard to understand [5,12].The higher-order PLS (HOPLS) technique projects tensors into latent space and applied PLS regression for corresponding latent variables [13].The method cannot be used when the output of the model is a one-dimensional tensor, namely vector.N-way PLS decomposes independent and dependent data into rank-one tensors [14,15].The decomposition must be subject to maximum pairwise covariance of the latent vectors.Canonical decomposition for tensors affects the computational complexity, the convergence speed and the fitness ability directly [16].Furthermore, both HOPLS and N-way PLS focus on the linear correlations between inputs and outputs of the models.Nonlinear characteristics of the data will reduce the forecasting accuracy.Neural network (NN), which is a classical nonlinear regression algorithm, learns knowledge and determines parameters from the samples without mathematical derivations.Many adaptive forecast and control methods based on NN are proposed to research and analyze nonlinear uncertain systems.For example, there are results including multiagent [17,18], full state constraints [19,20], uncertain nonlinear stochastic systems [21] and nonlinear MIMO systems [22].However, computational efforts of NN are substantial with the increasing complexity of the network structure.In addition, black-box structures make knowledge hard to understand [23].Least squares support vector machine (LSSVM) is another common nonlinear regression algorithm which is improved based on the support vector machine (SVM).The core idea is to map low-dimensional nonlinear problems to linear problems in high-dimensions.LSSVM performs competently on some issues [24][25][26], but it may lose the sparseness of support vectors [27].Moreover, for forecasting with tensor input, a tensors-to-matrices simplification process is needed for NN and LSSVM methods, during which it is bound to lose some potentially useful information.
In this paper, a tensor partial least squares-neural network (TPN) method is proposed for load forecasting.The method integrates an outer model and an inner model.The outer model is used to decomposition the input tensors.Tensor PLS decomposition is used in it since the method can extract common latent variables of the input and output of the system.However, PLS is a linear method.It can be used for tensor decomposition but it is not suitable for nonlinear predictive modeling.Therefore, the inner model, which is used to forecast, needs a nonlinear structure.Since NN can approximate any nonlinear function at a sufficient accuracy, it is selected to set up the inner model.According to the structure of the prediction model, the input and output of the system are projected into a low-dimensional common latent subspace in the outer model and latent variables extracted by the outer model are used as the input of the inner model.The modeling process involves linear decomposition, non-linear fitting and spatial mapping.So, three classical models (PLS, NN and LSSVM), which are typical representatives of the ideas above, are used to measure the forecast results.

Proposed Method
For the TPN, data from p measuring time points form the input tensor and can be represented by where X λ1 , i ∈ {1, • • • p} is the data obtained from the ith measuring time point.I 1 is the number of samples.I 2 is the number of measuring times.I 3 is the number of parameters.
represents the loads of the power system and it is the output vector.TPN contains a linear outer model and a nonlinear inner model.The outer model is built by tensor partial least squares and the inner model is built by the neural network.The structure of TPN is shown in Figure 1.
Energies 2019, 12, x FOR PEER REVIEW 3 of 9 X and y are projected into a subspace which has common latents.The parameters can be determined by the decomposition process.Because X and y have the same number of samples, X can be represented as (2) Where r G is the rth rank-(1, k, k) tensor.r t is the rth latent variable.r is the iteration times.R is the number of latent variables. (1) P and (2) P are the first and the second loading matrices.The operation ×n denotes the n-mode product [13].G is the core tensor which has a special block-diagonal structure and the elements indicate the level of local interactions between the loading matrices and the corresponding latent vectors.T is the latent variable matrix and r t is the rth column of T. (1)   r P and (2) r P can be gotten by singular value decomposition (SVD) [28-30].
y could be represented as where d is the loading vector.f is the residual of y.The schematic diagram of the decomposition procedure is shown in Figure 2.
Then NN is used to build the inner model.Equation ( 3) can be represented as where s(T) is the output of the NN model.Generally, NN has an input layer, an output layer and several hidden layers.A neuron is an activation function containing weight and bias parameters.The number of neurons in the hidden layer is usually determined by expert knowledge.For TPN in this paper, the number is 3. Figure 3 shows the structure of NN with three hidden layers.The inner model uses a back-propagation neural network (BPNN) to iterate.The learning procedure includes the feed-forward stage and the error back-propagation stage.In the feed-forward stage, the sigmoid function, the weights and the values at the previous layer are used to calculate the values.In the error back-propagation stage, the weights are modified by feedback.The two stages are repeated until the output values converge to the target values.
For a new input sensor ' X , the forecast value ˆ' y can be expressed as X and y are projected into a subspace which has common latents.The parameters can be determined by the decomposition process.Because X and y have the same number of samples, X can be represented as where G r is the rth rank-(1, k, k) tensor.t r is the rth latent variable.r is the iteration times.R is the number of latent variables.P (1) and P (2) are the first and the second loading matrices.The operation × n denotes the n-mode product [13].G is the core tensor which has a special block-diagonal structure and the elements indicate the level of local interactions between the loading matrices and the corresponding latent vectors.T is the latent variable matrix and t r is the rth column of T. P r and P r can be gotten by singular value decomposition (SVD) [28][29][30].
y could be represented as where d is the loading vector.f is the residual of y.The schematic diagram of the decomposition procedure is shown in Figure 2.
Then NN is used to build the inner model.Equation (3) can be represented as where s(T) is the output of the NN model.Generally, NN has an input layer, an output layer and several hidden layers.A neuron is an activation function containing weight and bias parameters.The number of neurons in the hidden layer is usually determined by expert knowledge.For TPN in this paper, the number is 3. Figure 3 shows the structure of NN with three hidden layers.The inner model uses a back-propagation neural network (BPNN) to iterate.The learning procedure includes the feed-forward stage and the error back-propagation stage.In the feed-forward stage, the sigmoid function, the weights and the values at the previous layer are used to calculate the values.In the error back-propagation stage, the weights are modified by feedback.The two stages are repeated until the output values converge to the target values.
For a new input sensor X , the forecast value ŷ can be expressed as f can be ignored if it is less than the threshold value by iteration.

PLS
The relationship between input and output of PLS model can be written as where X is the input matrix.y is the output vector.β is a matrix of regression coefficients.ε is a bias vector.
It supposes that a small number of principal components are defined by linear combinations of the input matrix.The original linear relationship can be rewritten as

PLS
The relationship between input and output of PLS model can be written as where X is the input matrix.y is the output vector.β is a matrix of regression coefficients.ε is a bias vector.
It supposes that a small number of principal components are defined by linear combinations of the input matrix.The original linear relationship can be rewritten as

PLS
The relationship between input and output of PLS model can be written as where X is the input matrix.y is the output vector.β is a matrix of regression coefficients.ε is a bias vector.
It supposes that a small number of principal components are defined by linear combinations of the input matrix.The original linear relationship can be rewritten as where v is a vector of regression coefficients corresponding to the latent variables.T is a matrix and it can be estimated as where P is the loading matrix representing the influence of X. W is the weight loading matrix indicating the correlation between output and input.

LSSVM
In the LSSVM model, a linear estimation is performed between the input X and the output y where ω is a weight coefficients matrix.b is a threshold vector.It supposes that ω can be written by a Lagrange multiplier as follows: where x i is a variable of input matrix X. α i is the Lagrange coefficient corresponding to x i .Then Equation ( 9) can be written as where x i T , X is the inner product.It can be replaced by a kernel function K(x i , X) and the nonlinear equation can be established by

NN
Similarly, BPNN is chosen for comparison.The schematic diagram of the NN structure is similar to the inner model of TPN, as shown in Figure 3.The activation function is a Sigmoid-type function.The number of hidden layers is 3.

Data Interpretation
The power system load is related to many factors.Besides the known prior load data, the environment temperature and lifestyle (working or rest days), which are two major factors of load, are used to build the prediction models [31].The data set is a part ofthe 2014 Global Energy Forecasting Competition Load Forecasting (GEFCOM2014-L) which contains load data from 1 January 2010 to 31 December 2014.The original data is a time series.For each time point, the element contains three dimensions, which are load, temperature and date.In the summer, the correlation between temperature and load is the highest [32].So the summer (June, July and August) load data is used.The set contains 11,040 actual datums.The temperature information comes from weather stations and the load information comes from power grid companies.The sampling time interval is one hour.The data are divided into the calibration set (90% of the data) and test set (10% of the data).The calibration set is used to train and determine the coefficients of the prediction model.The test set is used to evaluate effectiveness.For TPN, set p = 24 in Equation ( 1).This means that the input of the system is 24-h continuous data.So I 1 = 11040/24 × 90% = 414 (the number of samples), I 2 = 1 (the number of measuring time) and I 3 = 3 (the number of parameters).For PLS, LSSVM and NN, since the input of these models must be a matrix, the tensor input needs the slicing processing.Load matrix, temperature matrix and date matrix are combined into a large input matrix.The matrix contains 414 rows and 72 columns (there are 24 columns of load data, temperature data and date data, respectively).Root mean square error (RMSE) and mean absolute percentage error (MAPE) are used to evaluate the forecasting accuracies of the models.The two evaluation indexes can be calculated by where y n is the actual load of the test set.ŷn is the predicted load.n is the size of the test set.

Load Forecasting
Hourly loads in the next 12 h are forecasted.Tables 1 and 2 show the RMSE and MAPE of the proposed model and three comparative models, respectively.Figure 4 shows the trends of RMSE and MAPE.In each forecasting, both the RMSE and MAPE of the proposed model are the lowest.This indicates that the forecasting ability of TPN is the highest.The main reason is that the TPN outer model preserves the features and information of the input tensors and the inner model uses nonlinear structure.When the predicted interval is longer than 6 h, the forecasting accuracies of all four models are obviously reduced.This indicates that the correlation between temperature and load decreases with the predicted interval increasing, as well as lifestyle.where n y is the actual load of the test set.ˆn y is the predicted load.n is the size of the test set.

Load Forecasting
Hourly loads in the next 12 h are forecasted.Tables 1 and 2 show the RMSE and MAPE of the proposed model and three comparative models, respectively.Figure 4 shows the trends of RMSE and MAPE.In each forecasting, both the RMSE and MAPE of the proposed model are the lowest.This indicates that the forecasting ability of TPN is the highest.The main reason is that the TPN outer model preserves the features and information of the input tensors and the inner model uses nonlinear structure.When the predicted interval is longer than 6 h, the forecasting accuracies of all four models are obviously reduced.This indicates that the correlation between temperature and load decreases with the predicted interval increasing, as well as lifestyle.

Daily Load Peak Forecasting
TPN can also forecast the daily load peak and peak appearance time.Table 3 shows the forecasting result.Compared with the other three models, TPN has the highest daily load peak forecasting ability.There is little difference between the forecasting results of the four models for peak appearance time and the results are unsatisfactory.The main reason is that the actual peak appearance time is a timescale, but the outputs of the models are scalar quantities.

Discussion of the Results
Power load is affected by many factors.These factors constitute different dimensions of the system input.The relationship between different dimensions classifies as important information of the system.Usually, the information which is related to predict output is invisible.Restricted by the dimension of input data, traditional forecasting models such as PLS, LSSVM and NN need to reduce the dimensions of input.This process may cause the loss of hidden information.Using tensors to represent the system input can avoid this problem.In theory, tensors can accurately describe the invisible relationship between data in different dimensions without information loss.For TPN, the outer model, which used tensor PLS, can preserve the high dimensional structure of tensors.The inner model, which used NN, is very suitable for processing invisible information.This hidden information improves the prediction accuracy of the TPN model.

Conclusions
This paper proposed a short-term load forecasting model with the tensor partial least squares-neural network.The model regards prior load data and other relevant quantities (temperatureand lifestyle) as multisense tensors.The data processing method, which combines the outer model and inner nonlinear model, can avoid information loss to a certain extent.Compared with classical PLS, LSSVM and NN, the proposed model has the highest forecasting accuracy.

Figure 1 .
Figure 1.The structure of the TPN model.E and f are the residuals of X and y respectively.T is the latent variable matrix.

Figure 1 .
Figure 1.The structure of the TPN model.E and f are the residuals of X and y respectively.T is the latent variable matrix.

f
can be ignored if it is less than the threshold value by iteration.

Figure 2 .Figure 3 .
Figure 2. The schematic diagram of the decomposition procedure.

Figure 2 . 9 ˆf
Figure 2. The schematic diagram of the decomposition procedure.

Figure 2 .Figure 3 .
Figure 2. The schematic diagram of the decomposition procedure.

Figure 3 .
Figure 3.The structure of a neural network (NN) with three hidden layers.

Table 1 .
The root mean square error (RMSE) of four prediction models.

Table 1 .
The root mean square error (RMSE) of four prediction models.

Table 2 .
The mean absolute percentage error (MAPE) of four prediction models.

Table 3 .
The forecasting result of daily load peak and peak appearance time.