TCNformer Model for Photovoltaic Power Prediction

: Despite the growing capabilities of the short-term prediction of photovoltaic power, we still face two challenges to longer time-range predictions: error accumulation and long-term time series feature extraction. In order to improve the longer time range prediction accuracy of photovoltaic power, this paper proposes a seq2seq prediction model TCNformer, which outperforms other state-of-the-art (SOTA) algorithms by introducing variable selection (VS), long-and short-term time series feature extraction (LSTFE), and one-step temporal convolutional network (TCN) decoding. A VS module employs correlation analysis and periodic analysis to separate the time series correlation information, LSTFE extracts multiple time series features from time series data, and one-step TCN decoding realizes generative predictions. We demonstrate here that TCNformer has the lowest mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) in contrast to the other algorithms in the ﬁeld of the short-term prediction of photovoltaic power, and furthermore, the effectiveness of each module has been veriﬁed through ablation experiments.


Introduction
At present, with the rapid development of perovskite solar cell technology [1,2], the maximum efficiency [3] and stability [4] of photovoltaic power have been greatly improved.Photovoltaic power is increasingly important in the field of new energy.According to the data of the International Energy Agency (IEA), the growth rate of global photovoltaic installed capacity has reached as much as 49%.It is estimated that global photovoltaic power will reach 16% of the total power in 2050 [5].At the same time, China is promoting the construction of a new power system with new energy as the principal part.Photovoltaic power using solar energy is an important branch of new energy and one of the important means for China to achieve the goal of carbon neutrality.After the large-scale integration of photovoltaic power stations into the energy network, the manner by which to accurately predict photovoltaic power and then accordingly dispatch the power grid has become an urgent problem to be addressed.Therefore, improving the prediction accuracy of photovoltaic power is significant for improving the operation efficiency of power stations themselves and for maintaining the stability of power grids.
Many scholars in China and abroad have carried out a lot of research on the prediction of photovoltaic power.At present, the mainstream prediction methods focus on traditional random learning and deep learning methods.In the field of traditional random learning, literature [6] uses historical weather data and historical power data as inputs of a support vector machine (SVM) to build a short-term photovoltaic power prediction model, which has a higher level of accuracy than the traditional autoregressive model (AR) or the radial basis function (RBF) models.One study [7] proposed a model based on Support Vector Regression (SVR) and achieved better prediction performance.In the field of deep learning, recurrent neural network (RNN) structures, such as long short-term memory (LSTM), gated recurrent unit (GRU), and seq2seq structural models, are widely used to analyze and predict time series data for such applications as stock price prediction [8], gold price prediction [9], traffic flow [10], voice classification [11], etc.The prediction of photovoltaic power can also be regarded as a kind of time series data prediction, so the above algorithms have been used to predict short-term global horizontal irradiance (GHI) or comprehensive solar loads [12,13].Furthermore, in order to ensure the accuracy as much as possible and reduce the training time, the GRU network has been applied to short-term photovoltaic power prediction [14], and the multivariable GRU model [15][16][17] has been used to predict solar irradiance or power.Some hybrid models have been applied in the field of photovoltaic power generation prediction, such as the combination of a deep learning model and a heuristic algorithm [18,19], the combination of a deep learning model and a traditional random learning method [20,21], the combination of multiple deep learning models [22,23], etc.The seq2seq structural model represented by the Transformer series model takes the photovoltaic power prediction problem as a experimental sample of its model, such as Autoformer and Informer [24,25].However, in these models, usually the photovoltaic power prediction data are only used for prediction; that is, the corresponding weather data is not fully used, and the time series features of the data are not fully extracted.
Compared with traditional LSTM, GRU, and other models, the Transformer series seq2seq model can avoid the problem of error accumulation and read longer input data [26], but it is still limited by the length of the input data.It is difficult for the seq2seq series model to capture longer time series features.For this problem, [27] proposes long-and short-time series network (LSTnet) models.The Skip recurrent neural network (Skip RNN) structure is used to capture more long-term time series features.
Based on the above analysis, the current research mainly focuses on the prediction of data within a few hours.When applied to predict a longer time range [28] for photovoltaic power, these methods typically suffer from two major challenges: error accumulation and long-term time series feature extraction in order to simultaneously extract multiple time series features in the historical data of photovoltaic power and weather factors, and to avoid error accumulation.Inspired by the application of LSTM, LSTnet, and Transformer series models in the field of photovoltaic power prediction, this paper proposes a long and short temporary correction network (TCNformer), and we verified the model by using the real data of a photovoltaic station in Australia.According to the experimental results, the TCNformer model greatly optimizes various indicators compared with LSTM, SkipGRU, Transformer, and Informer, improving the accuracy of photovoltaic power prediction.
The contributions of this paper include the following: (1) According to the different impacts of various weather factors on photovoltaic power generation, a VS module was designed to screen and process data through correlation analysis and periodic analysis of data.(2) Aiming at the challenge to extract long-term time series features due to the limitation of the traditional Transformer, a LSTFE module was designed to extract multiple time series features through LSTM and a SkipGRU network.(3) In order to improve the temporal feature extraction and avoid error accumulation, one-step temporal convolutional network (TCN) decoding was used to realize the generative prediction.

Time Series Features of Photovoltaic Power Data
According to the literature [29,30], the current photovoltaic power prediction problem is usually defined as a time series data prediction problem.However, as the time granularity increases, the degree of the photovoltaic power data affected by external factors increases, and the self-similarity decreases.The basic photovoltaic power data studied in this paper are collected at a 15-min granularity, and they are greatly affected by external factors that have a certain regularity and contingency, so the statistical features of photovoltaic power data show certain periodicity, abruptness, and contingency.
As shown in Figure 1, the 4-day power history data of a photovoltaic station were randomly selected, showing obvious periodicity and volatility.
tors increases, and the self-similarity decreases.The basic photovoltaic power data studied in this paper are collected at a 15-min granularity, and they are greatly affected by external factors that have a certain regularity and contingency, so the statistical features of photovoltaic power data show certain periodicity, abruptness, and contingency.
As shown in Figure 1, the 4-day power history data of a photovoltaic station were randomly selected, showing obvious periodicity and volatility.As shown in Figure 2, in order to explore the long-term time series features of photovoltaic power data, this study employed the classic skills of a seasonal prediction model to select the historical data of a photovoltaic power station at 8:30 for 4 consecutive years Although they show greater volatility, a certain periodicity can still be seen.As shown in Figure 2, in order to explore the long-term time series features of photovoltaic power data, this study employed the classic skills of a seasonal prediction model to select the historical data of a photovoltaic power station at 8:30 for 4 consecutive years.Although they show greater volatility, a certain periodicity can still be seen.
external factors that have a certain regularity and contingency, so the statistical features of photovoltaic power data show certain periodicity, abruptness, and contingency.
As shown in Figure 1, the 4-day power history data of a photovoltaic station were randomly selected, showing obvious periodicity and volatility.As shown in Figure 2, in order to explore the long-term time series features of photovoltaic power data, this study employed the classic skills of a seasonal prediction model to select the historical data of a photovoltaic power station at 8:30 for 4 consecutive years Although they show greater volatility, a certain periodicity can still be seen.

LSTM and SkipGRU
LSTM is a classic model in the field of time series prediction.In the prediction process, LSTM updates the internal state and the external state at the same time, mainly through three gates: a forgetting gate, an input gate, and an output gate.
The GRU network [31] is a variant of the LSTM network, which combines the three gates of the LSTM unit into two gates.The SkipGRU module skips the connection layer.By sampling at intervals, it can look back for a longer period of time when the length of the sampling sequence remains unchanged so as to capture the long-term features.

Self-Attention Mechanism and ProbSparse Self-Attention Module
The calculation formula of a traditional self-attention mechanism is as follows: In the formula, W Q , W K , and W V are the three weight matrices.After random initialization, three vectors, Q, K, and V, are generated according to Equation (1), and then the result A(Q, K, V) weighted ion mechanism is calculated according to Equation (2).The result contains the information via the attention of all of the input data.
The ProbSparse Self-Attention proposes to calculate the sparsity measurement of each query using KL divergence: Based on the calculated sparsity metric, each key focuses on only u main queries to achieve probsparse self-attention: In the formula, Q is a sparse matrix with the same size as q, and it only contains top-u queries under sparse metric M(q, K).

Temporal Convolutional Network (TCN) Module
TCN is a variant of a convolutional neural network for processing sequence modeling tasks.It combines RNN and CNN architectures.TCN performs better than standard recursive networks on different tasks and data sets, and it demonstrates more long-term and efficient memory.The main component of the TCN network is Dilate Causal Conv.Other components are similar to the Feedforward module, which plays a role in deepening the linear features.

Problem Definition
The present study abstracts the photovoltaic power prediction problem as a multistep time series prediction problem, which can be defined as a data series with an input of I × n and an output of O × 1, where I is the length of the input data, and O is the length of the output data.For example, under a 15-min sampling frequency, if the historical data of photovoltaic power in the past 30 days are used to predict the photovoltaic power data in the future 24 h, the I length is 2880, and the O length is 96.The overall TCNformer network design follows the traditional Transformer structure, in which the Encoder module and the Decoder module are designed with a multilayer structure.

Variable Selection (VS) Module
Combined with the information shown in Figures 1 and 2, the historical data of photovoltaic power not only have timing features in the short term, but they also have certain timing features over the long term.Considering the length of the long-term cycle (as shown in Figure 2, the cycle is close to 365 days) and the subsequent optimization problems, it is difficult for the traditional model to capture these timing features at the same time.So, we designed a VS module to divide the input sequence into three dimensions through preliminary analysis and selection of the historical data.Then, the results from the VS module are transferred to the LSTFE module for feature fusion.
In the formula, resent preprocessed raw data, month-level time series data, week-level time series data, and day-level time series data.， ， and respectively represent the number of influencing factors.
(•) represents the VS module, and the specific calculation method is as follows.
Photovoltaic power data often show strong time series features.Although the volatility is strong, they still have a certain periodicity over a longer time range.In this paper, the Fourier transform decomposition curve of photovoltaic power data and its influencing factor data are selected for periodicity analysis [32] in order to obtain the fluctuation periods of different periodic curves and to provide a certain degree of reference for the analysis of photovoltaic power prediction.The formula of the Fourier transform is as follows: ( ) represents the Fourier series, ( ) represents the Fourier coefficient, represents the complex function, represents the coordinate in the frequency domain, and N represents the period.

Variable Selection (VS) Module
Combined with the information shown in Figures 1 and 2, the historical data of photovoltaic power not only have timing features in the short term, but they also have certain timing features over the long term.Considering the length of the long-term cycle (as shown in Figure 2, the cycle is close to 365 days) and the subsequent optimization problems, it is difficult for the traditional model to capture these timing features at the same time.So, we designed a VS module to divide the input sequence into three dimensions through preliminary analysis and selection of the historical data.Then, the results from the VS module are transferred to the LSTFE module for feature fusion.
In the formula, data ∈ R I×n , d l ∈ R I l ×n l , d s ∈ R I s ×n s , d t ∈ R I×n t respectively represent preprocessed raw data, month-level time series data, week-level time series data, and daylevel time series data.n, n l , n s and n t respectively represent the number of influencing factors.VariableSelection(•) represents the VS module, and the specific calculation method is as follows.
Photovoltaic power data often show strong time series features.Although the volatility is strong, they still have a certain periodicity over a longer time range.In this paper, the Fourier transform decomposition curve of photovoltaic power data and its influencing factor data are selected for periodicity analysis [32] in order to obtain the fluctuation periods of different periodic curves and to provide a certain degree of reference for the analysis of photovoltaic power prediction.The formula of the Fourier transform is as follows: X(k) represents the Fourier series, x(n) represents the Fourier coefficient, W nK N represents the complex function, k represents the x coordinate in the frequency domain, and N represents the period.
Photovoltaic power is correlated with a large number of weather factors, especially the strong correlation between solar radiation intensity and photovoltaic power.In this study, the Pearson correlation coefficient was selected for correlation analysis, and the calculation formula is as follows: The VS module processes the month-level time series data, week-level time series data, and day-level time series data according to the analytical results of correlation and periodicity.

Long-and Short-Time Series Feature Extraction (LSTFE) Module
In this study, we designed an LSTFE module, and we used it to extract time series features from each time scale.The structure of the LSTFE module is shown in Figure 4.The LSTFE mainly includes the LSTM unit, the SkipGRU unit, and the CycleEmbed unit.
Appl.Sci.2023, 13, x FOR PEER REVIEW 6 of 17 Photovoltaic power is correlated with a large number of weather factors, especially the strong correlation between solar radiation intensity and photovoltaic power.In this study, the Pearson correlation coefficient was selected for correlation analysis, and the calculation formula is as follows: The VS module processes the month-level time series data, week-level time series data, and day-level time series data according to the analytical results of correlation and periodicity.

Long-and Short-Time Series Feature Extraction (LSTFE) Module
In this study, we designed an LSTFE module, and we used it to extract time series features from each time scale.The structure of the LSTFE module is shown in Figure 4.The LSTFE mainly includes the LSTM unit, the SkipGRU unit, and the CycleEmbed unit.We transferred the week-level time-series-related data and the month-level time-series-related data to the LSTM network and the SkipGRU network in the LSTFE module for prediction.The prediction results of the LSTM network made full use of the short-term time series features, while the SkipGRU network made full use of the long-term time series features: = ( , , ) = ( ) In Formulas (10)   We transferred the week-level time-series-related data and the month-level time-seriesrelated data to the LSTM network and the SkipGRU network in the LSTFE module for prediction.The prediction results of the LSTM network made full use of the short-term time series features, while the SkipGRU network made full use of the long-term time series features: In Formulas (10) and (11), f l ∈ R I and f S ∈ R I represent the month-level time series feature extraction results and the week-level time series feature extraction results in the LSTFE module, respectively.Using the excellent feature extraction capabilities of the LSTM and the SkipGRU, the extracted feature results were transformed into the input length I of the Encoder module.
Using the LSTM and the SkipGRU, the time series features at weekly and monthly levels were extracted, but we were left wondering how to extract the time series features at an annual level?To solve this problem, we designed the CycleEmbed module.
The structure of the CycleEmbed unit is shown in Figure 5, including data projection, position coding, cycle coding, and timing coding.
Appl.Sci.2023, 13, x FOR PEER REVIEW 7 of 17 Using the LSTM and the SkipGRU, the time series features at weekly and monthly levels were extracted, but we were left wondering how to extract the time series features at an annual level?To solve this problem, we designed the CycleEmbed module.
The structure of the CycleEmbed unit is shown in Figure 5, including data projection, position coding, cycle coding, and timing coding.Data projection is based on the results of correlation and periodic analysis, mapping the output data to the vector of dimension, and aligning the dimensions.The alignment tool is a one-dimensional convolution filter.
The position coding is calculated in the same way as in Transformer: ) ) In Formulas ( 14) and ( 15), ∈ 1, . . ., ， is the input sequence length, and is the Encoder input dimension.Cycle coding is divided according to the results of periodic analysis and calculation.is the number of cycle data steps, which is determined by the results of periodic analysis and the granularity of the data sampling time ; that is, = / .Then, the cycle information of the input data is coded according to the results of ; that is, there are results in cycle coding, = % .
Timing coding is used to add the month and year to the coding to extract the longer time series features.In this way, the annual time series features of the data are introduced into the codec along with the embedding operation.
Combining the results of the four parts, the output result of the final period embedding module is the input of the Encoder: Data projection is based on the results of correlation and periodic analysis, mapping the output data to the vector of dimension, and aligning the dimensions.The alignment tool is a one-dimensional convolution filter.
The position coding is calculated in the same way as in Transformer: In Formulas ( 14) and ( 15), j ∈ 1, . . ., | d model 2 | , L x is the input sequence length, and d model is the Encoder input dimension.
Cycle coding is divided according to the results of periodic analysis and calculation.τ is the number of cycle data steps, which is determined by the results of periodic analysis T and the granularity of the data sampling time g; that is, τ = T/g.Then, the cycle information of the input data is coded according to the results of τ; that is, there are τ results in cycle coding, Timing coding is used to add the month and year to the coding to extract the longer time series features.In this way, the annual time series features of the data are introduced into the codec along with the embedding operation.
Combining the results of the four parts, the output result of the final period embedding module is the input of the Encoder:

Encoder
The input of the Encoder is the output of the LSTFE module.The structure of the Encoder is a multilayer network structure.Each layer of the Encoder is mainly composed of a sparse attention unit and a composition unit.
S l,2 en = FeedForward S l,1 en (18) In Formula ( 17), S l,1 en ∈ R I×d model is the calculation result of the sparse attention mechanism in the Layer l Encoder module, S l,2 en ∈ R I×d model is the calculation result of the Feedforward layer in the Layer l Encoder module, and FeedForward(•) is an important part of the traditional Transformer network structure which is used to deepen the linear representation and better extract the features.The Feedforward structure used in this paper is shown in Figure 6.ProbSelfAttention(•) is the sparse attention mechanism in the Informer model [24].

Encoder
The input of the Encoder is the output of the LSTFE module.The structure of th Encoder is a multilayer network structure.Each layer of the Encoder is mainly compose of a sparse attention unit and a composition unit.
In Formula ( 17), , ∈ × is the calculation result of the sparse attentio mechanism in the Layer l Encoder module, , ∈ × is the calculation result of th Feedforward layer in the Layer l Encoder module, and (•) is an importan part of the traditional Transformer network structure which is used to deepen the linea representation and better extract the features.The Feedforward structure used in this pa per is shown in Figure 6.ProbSelfAttention(•) is the sparse attention mechanism in th Informer model [24].

Decoder
In the Transformer model, the Encoder can be calculated in parallel, but the Decode needs to decode step by step.As with the LSTM model, error accumulation will occu This study introduced a one-step TCN decoding operation: In Formula ( 20), is the result of the zero-filling operation.One-step decoding d vides the Decoder's input into two parts through a zero-filling operation.The first da tum is a known sequence, the last datum is a sequence to be predicted, and is the Decoder's input data.At this time, part of the time information of th data to be predicted is also transmitted to the Decoder through the period embeddin module for prediction.The prediction process of the Decoder is similar to that of the En coder, but it has a more of a self-attention layer than does the Encoder.

Decoder
In the Transformer model, the Encoder can be calculated in parallel, but the Decoder needs to decode step by step.As with the LSTM model, error accumulation will occur.This study introduced a one-step TCN decoding operation: X des = concat(X, X 0 ) (21) In Formula (20), X 0 is the result of the zero-filling operation.One-step decoding divides the Decoder's input into two parts through a zero-filling operation.The first I datum is a known sequence, the last O datum is a sequence to be predicted, and X 0 de ∈ R (I+O)×d model is the Decoder's input data.At this time, part of the time information of the data to be predicted is also transmitted to the Decoder through the period embedding module for

Variable Selection Results and Discussion
The VS module in the long-and short-sequence correction network includes correlation analysis and periodicity analysis.The results of the correlation analysis on photovoltaic power are shown in Table 2.It can be seen from Table 2 that photovoltaic power is positively correlated with direct radiation intensity, scattered radiation intensity, temperature, and wind speed, while it is negatively correlated with humidity, direction, and rainfall.According to their numerical values, the data were filtered by 0.1.It can be seen that the correlation between direct radiation intensity and photovoltaic power is the largest, while variables such as scattered radiation intensity, temperature, humidity, and wind speed have a certain correlation with photovoltaic power, which show that these influencing factors have a certain degree of impact on the photovoltaic power, and the impact decreases in turn.Although wind direction and rainfall are negatively related to the photovoltaic power, the value is too small to impact the output.
It can be seen from Table 3 that the cycle of photovoltaic power, humidity, direct radiation intensity, and scattered radiation intensity is 24.03 h, approximately 1 day, and the cycle of the wind speed, wind direction, and rainfall is 0.17 h, which can be regarded as a periodicity.The temperature cycle is 8760 h; that is, the temperature cycle conforms to the changes in the four seasons, and the above results basically conform to the natural logic.By correlation analysis, five influencing factors should be selected, including direct radiation, scattered radiation, temperature, humidity, and wind speed.Three influencing factors, namely, direct radiation, scattered radiation, and humidity, were screened through periodic analysis.Finally, the time series related variables of photovoltaic power were screened through the VS module, those being direct radiation, scattered radiation, and humidity.

Prediction Results of Different Prediction Steps
In order to explore the prediction performance of each model under different prediction steps, this study selected LSTM, SkipGRU, Transformer, and Informer to compare with TCNformer.
The results are shown in Table 4.It can be seen from the results that, when the number of prediction steps is 1, the MSE errors of the five models have little difference.With the increase in the number of the prediction steps, the LSTM model demonstrated the largest error growth rate, and the error accumulation is obvious.Informer and TCNformer use the generative prediction method, so the error was relatively stable, and the error accumulation was low.The TCNformer model proposed in this paper not only had a low level of error accumulation, but it also had the lowest MSE error.In order to more intuitively observe the error accumulation in the models, the prediction results were visualized, as shown in Figure 9.

Prediction Performance of Different Models
In this experiment, each model was trained five times in a 24-h (96 prediction steps) scenario, and the average value was taken.The final test set prediction results are shown in Table 5.

Prediction Performance of Different Models
In this experiment, each model was trained five times in a 24-h (96 prediction steps) scenario, and the average value was taken.The final test set prediction results are shown in Table 5.

Error Analysis
Because the prediction of TCNformer model is a time series, we did not calculate the standard error for multiple series.Instead, error analysis was carried out through the MSE of prediction and ground truth.Figure 11 shows the standard error diagram.The error bar in the diagram represents the standard error.Table 6 shows the mean value, standard deviation (SD), and standard error (SE) of the error under different sample numbers.

Error Analysis
Because the prediction of TCNformer model is a time series, we did not calculate the standard error for multiple series.Instead, error analysis was carried out through the MSE of prediction and ground truth.As shown in Table 7, the three innovations proposed in this paper are a VS module, an LSTFE module, and the seq2seq generative model structure combined with Informer and Transformer.No matter which module was removed, the error of the model was increased.When the seq2seq model structure was not used, the error was the largest, and the VS module had the smallest impact on the overall model, but it still caused a decline in accuracy.From these data, it can be concluded that the TCNformer model proposed in this paper is effective, and its innovative modules are useful.

Conclusions
In this paper, a TCNformer model was proposed for photovoltaic power prediction, and we can draw the following three conclusions based on the experiment results: 1.The TCNformer model adopts the Transformer structure and introduces the sparse attention mechanism into the Informer model.The experimental results show that the photovoltaic output prediction accuracy is improved effectively.As shown in Table 6 and Figure 11, with the increase in the number of samples, the standard deviation and standard error gradually decreased, and the average value was closer to the average value of the overall sample.Therefore, the prediction result of the TCNformer model has a relatively stable level of error and a high level of reliability.

Ablation Experiment
In order to verify the effectiveness of each optimization module of the TCNformer model, we conducted ablation experiments, and we removed three innovative modules from the TCNformer model for comparative experiments, that is, we set them separately: As shown in Table 7, the three innovations proposed in this paper are a VS module, an LSTFE module, and the seq2seq generative model structure combined with Informer and Transformer.No matter which module was removed, the error of the model was increased.When the seq2seq model structure was not used, the error was the largest, and the VS module had the smallest impact on the overall model, but it still caused a decline in accuracy.From these data, it can be concluded that the TCNformer model proposed in this paper is effective, and its innovative modules are useful.

Conclusions
In this paper, a TCNformer model was proposed for photovoltaic power prediction, and we can draw the following three conclusions based on the experiment results: 1.
The TCNformer model adopts the Transformer structure and introduces the sparse attention mechanism into the Informer model.The experimental results show that the photovoltaic output prediction accuracy is improved effectively.2.
The VS module, LSTFE module, and one-step TCN decoding extract more efficiently the impact of multiple time series features and other weather factors on photovoltaic power by classifying the data based on the time series, periodicity, and correlation.

3.
Compared with the LSTM model and the Transformer series model, the TCNformer model has a higher level of accuracy in multistep prediction, but there is still room for optimization when the prediction range is further enlarged.In the follow-up study, we will focus on ways to solve the multistep prediction problem with a further increase in the time dimension.

For
the time series features of photovoltaic power data, this paper proposed a TCNformer prediction model.The structure of the model is shown in Figure 3. Based on the traditional Transformer architecture, the TCNformer model mainly includes four modules: a variable selection (VS) module, an long-and short-time series feature extraction (LSTFE) module, an Encoder, and a Decoder.
Appl.Sci.2023, 13, x FOR PEER REVIEW 5 of 17 traditional Transformer architecture, the TCNformer model mainly includes four modules: a variable selection (VS) module, an long-and short-time series feature extraction (LSTFE) module, an Encoder, and a Decoder.

Figure 3 .
Figure 3.The structure of TCNformer model.The overall TCNformer network design follows the traditional Transformer structure, in which the Encoder module and the Decoder module are designed with a multilayer structure.
and (11), ∈ and ∈ represent the month-level time series feature extraction results and the week-level time series feature extraction results in the LSTFE module, respectively.Using the excellent feature extraction capabilities of the LSTM and the SkipGRU, the extracted feature results were transformed into the input length I of the Encoder module.

Figure 5 .
Figure 5.The structure of the CycleEmbed module.

Figure 5 .
Figure 5.The structure of the CycleEmbed module.

Figure 6 .
Figure 6.The structure of the Feedforward layer.

Figure 6 .
Figure 6.The structure of the Feedforward layer.

Figure 9 .
Figure 9. Prediction performance of the different numbers of prediction steps for each model.
with the time series prediction model Informer, the MSE, MAE, and MAPE decreased by 81.90%, 50.03%, and 14.98%, respectively.The training time (153.43s) and running time (1.29 ms) of TCNformer are relatively long, but considering the 15-min sampling granularity and 24-h prediction scenario, the training time and running time do not affect the practical application of TCNformer.As shown in Figure10, this we visualized the prediction results of TCNformer using the test data set.The prediction results shown in the figure are 30 sets of 24-h prediction results, with little deviation when compared with the real data.It can be seen that TCNformer has a high level of accuracy and low number of errors.

Figure 9 .
Figure 9. Prediction performance of the different numbers of prediction steps for each model.As shown in Figure 10, this we visualized the prediction results of TCNformer using the test data set.The prediction results shown in the figure are 30 sets of 24-h prediction results, with little deviation when compared with the real data.It can be seen that TCNformer has a high level of accuracy and low number of errors.Appl.Sci.2023, 13, x FOR PEER REVIEW 14 of 17

Figure 10 .
Figure 10.Results of the TCNformer model.

Figure 10 .
Figure 10.Results of the TCNformer model.
Figure 11  shows the standard error diagram.The error bar in the diagram represents the standard error.Table6shows the mean value, standard deviation (SD), and standard error (SE) of the error under different sample numbers.

Figure 11 .
Figure 11.The standard error diagram of the MSE.

4. 6 .
Ablation Experiment In order to verify the effectiveness of each optimization module of the TCNformer model, we conducted ablation experiments, and we removed three innovative modules from the TCNformer model for comparative experiments, that is, we set them separately: Experiment 1: Removal of the VS module.Experiment 2: Removal of the long-and short-time series feature extraction module.Experiment 3: Removal of the seq2seq structure, and use of the VS module + longand short-time series feature extraction module + full connection network.Experiment 4: Removal of one-step TCN decoding.Experiment 5: Use of the complete TCNformer model.

Figure 11 .
Figure 11.The standard error diagram of the MSE.

Experiment 1 :
Removal of the VS module.Experiment 2: Removal of the long-and short-time series feature extraction module.Experiment 3: Removal of the seq2seq structure, and use of the VS module + longand short-time series feature extraction module + full connection network.Experiment 4: Removal of one-step TCN decoding.Experiment 5: Use of the complete TCNformer model.

Table 2 .
Results of correlation analysis.

Table 3 .
Results of periodicity analysis.

Table 5 .
The 24-h scenario prediction results.As shown inTable 5, the TCNformer performs best according to the three indicators of the MSE, MAE, and MAPE.Compared with the time series prediction model Informer, the MSE, MAE, and MAPE decreased by 81.90%, 50.03%, and 14.98%, respectively.The training time (153.43s) and running time (1.29 ms) of TCNformer are relatively long, but considering the 15-min sampling granularity and 24-h prediction scenario, the training time and running time do not affect the practical application of TCNformer.

Table 5 ,
the TCNformer performs best according to the three indicators of the MSE, MAE, and MAPE.Compared

Table 6 .
Results of error analysis.

Table 7 .
Results of ablation experiment.

Table 6 .
Results of error analysis.

Table 7 .
Results of ablation experiment.