Photovoltaic Power Output Prediction Based on TabNet for Regional Distributed Photovoltaic Stations Group

: With the increasing proportion of distributed photovoltaic (DPV) installations in county-level power grids, to improve the centralized operation and maintenance of the stations and to meet the needs of power grid dispatching, the output of the county-level regional DPV stations group needs to be predicted. In this paper, the weather prediction information is used to predict the output based on the model input average strategy. To eliminate the effect of the selected non-optimal training sample collection period on the prediction accuracy, an ensemble prediction method based on the minimum redundancy maximum relevance criterion and TabNet model is carried out. To reduce the inﬂuence of weather prediction errors on the power output prediction, a modiﬁed model based on error prediction is proposed. The ensemble prediction model is used to predict the day-ahead output, and a combination prediction model based on the proposed ensemble prediction model and the proposed modiﬁed model is established to predict the hour-ahead output. The experimental results verify the effectiveness of the proposed models. Compared with the corresponding reference models, the proposed ensemble prediction method reduces the normalized mean absolute errors (nMAEs) and the normalized root mean square errors (nRMSEs) of the day-ahead output prediction results by 2.86% and 5.51%, respectively. The combination prediction model reduces the nMAE and nRMSE of the hour-ahead output prediction results by 3.05% and 3.05%, respectively. Therefore, the prediction accuracy can be improved by the proposed models.


Introduction
As one of the most potential new energy utilization technologies, photovoltaic (PV) power generation has attracted the attention of most countries in the world [1].PV systems are available in two forms: Centralized and distributed.Compared with the centralized, distributed PV (DPV) systems are constructed close to loads, and the system output can be absorbed locally, which helps to overcome the common defects of mismatch between actual distribution and application demand of PV resources.Therefore, in recent years, the Chinese DPV industry has been vigorously developed along with the whole county advance.
However, with the increasing proportion of DPV installations in county-level regions, the power grid operation is greatly affected.Considering the system construction and operation costs, the traditional DPV system operation and maintenance are extensive, resulting in low system benefits.The centralized operation and maintenance management of DPV systems in a county can improve the overall efficiency of the systems.To meet the needs of power grid dispatching and the centralized operation and maintenance of DPV stations, the output of the regional DPV stations group needs to be predicted.Therefore, we take DPV systems in a county as a whole to study the regional DPV output prediction.
Regional PV prediction models can be divided into different types.Liu et al. [2] reviewed the research on regional PV prediction methods based on multiple time scales.Kim and Kim [3] divided the models into two categories: Type 1 is about the public utilityscale systems [4][5][6][7][8][9][10][11][12][13], Type 2 is about the system behind the meter [14][15][16][17].Pierro et al. [9] proposed an interesting classification method based on prediction strategies: (1) Bottom-up strategy.In this strategy, firstly, the output of each PV system in the region needs to be predicted, and then the regional output can be achieved by accumulating these predicted values.
(2) Upscaling strategy.The strategy can be further divided into models output average strategy and model input average strategy.Model output average strategy is based on the selection and output prediction of a subset of regional PV systems, which is taken as the representation of the whole systems.Then the predicted subset power output is rescaled based on the subset capacity and total capacity to predict the regional power output.In the model output average strategy, the PV output in the predicted area is taken as the output of a virtual PV system.Then, the regional PV power output is directly predicted.
For the model output average strategy, Shaker et al. [18] proposed a data-driven method to estimate the power output of invisible PV systems based on the measured values of a little of representative systems.The representative sites were selected using the proposed data dimension reduction model based on K-mean clustering and principal component analysis, and then regional PV power generation was obtained according to the mapping function.In addition to power generation estimation, Shaker et al. [17] also proposed a probabilistic prediction model based on a Fuzzy Arithmetic Wavelet Neural Network (FAWNN) to predict the power generation of a large number of small PV systems.Bright et al. [19] evaluated the satellite-only and upscaling-only PV output estimate methods, and the authors concluded that the method through combining the two methods is more beneficial.Saint-Drenan et al. [8] analyzed the performance of the upscaling strategy by using measured power data of a set of 366 PV systems.They found that the error decreases with an increasing number of reference systems and a decreasing number of un-metered systems, and the average distance between a reference and the unknown system has a great influence on the performance of a set of reference systems.
For the model input average strategy, Fonseca et al. [5] proposed a method based on principal component analysis, support vector regression, and weather prediction data.One-day ahead regional PV power outputs of the four main regions of Japan in 2009 were predicted with hourly power output data of 453 PV systems.Aillaud et al. [20] proposed a model through a combination of a convolutional neural network (CNN) with a long short-term memory architecture.The day-ahead regional PV power outs of Germany were predicted, and the main result of this study shows that the proposed model is more accurate than the Random Forest model.Moschella et al. [21] attempted to directly predict wind and solar power generation in each Italian region based on the model input average strategy.Based on the same strategy, Pierro [22] et al. conducted a more detailed study on the solar power generation prediction of six regions in Italy by comparing six different prediction models.Yu et al. [23] presented a probabilistic prediction method based on CNN and non-linear quantile regression (QR).The model was used to predict the regional PV power output of PV systems in the Weifang region of China, and the prediction result shows that the improved CNN can effectively process high-dimensional and complex input data and the non-linear QR model can provide quantile prediction information of regional PV power output.
The upscaling prediction strategy is improved in some interesting studies, such as the research of Pierro et al. [9], Wolff et al. [24], Saint-Drenan et al. [12], and Fu [10].They first clustered the PV systems in the region and then used the upscaling prediction strategy on the clustered subsets, respectively, to predict the PV power output of the whole region.
There are also some studies in which different strategies were compared.Fonseca et al. [6] conducted a comparative study on the four models, and each prediction method assumed a different scenario regarding the data available to make the prediction.In view of the complete availability of regional PV power data, the strategy of direct prediction and then accumulation is adopted.A prediction model based on stratified sampling is proposed for the partial availability of regional PV power data.In light of the availability of regional aggregate PV power, the model input average strategy is adopted.In the case that the power data cannot be obtained, the strategy of indirect prediction and then accumulation is adopted.By comparison, in the region with a variety of weather conditions, the prediction methods based on single systems' predictions and the one based on stratified sampling provided the best results.Zamo et al. [7] predicted the regional PV power generation in two counties based on the bottom-up strategy and the model input average strategy.By using a reference system to directly predict the regional aggregate PV power, the results can get an RMSE of about 6%, whatever the county and the RMSE can be reduced to about 5.8% by using the bottom-up strategy.Pierro et al. [9] firstly clustered PV systems in a region and then compared two strategies: (1) Calculate the average prediction results of each cluster to obtain the regional PV power (the models output average strategy), (2) the input variables of each cluster center are used to directly predict the regional PV power (the model input average strategy).The results show that the accuracy of the latter is a little better.
Saint-Drenan et al. [25] proposed a new strategy that can be used as an alternative to the upscaling strategy for the scenario where no or few power measurements are available.The strategy uses an average PV model to calculate the power output of the most frequent module orientation angles.The calculated power values are finally weighted according to their probability of occurrence to estimate the real power output.The basic condition of this strategy is that the physical model information of regional PV systems is available.However, in practice, it is usually difficult to meet this condition.
For regional PV output prediction, the bottom-up strategy needs to predict the output of all systems in the predicted area.It is necessary to establish a prediction model for each system and perform a lot of data processing and calculation.When some PV systems in the region lack historical data and cannot apply the data-driven model, it is necessary to adopt the prediction method based on the physical model.However, in reality, it is difficult to obtain the physical models of all PV systems in a region.Therefore, the bottom-up strategy is actually difficult to apply in practice [10].To reduce the amount of calculation through simplified methods, the research focus of regional PV output prediction mainly focuses on the upscaling strategy [9].
Through the study of the previous research, we found that there are two problems with the county-level regional DPV output prediction.The first is about the available data resources for prediction.Most of the DPV systems in county-level regions are small rooftop PV systems.Considering the construction cost, there is generally no single output prediction device in these systems, which leads to the lack of predicted output data from single systems.For the same reason, there is also no meteorological data acquisition device in these systems, which leads to the lack of locally measured meteorological data for output prediction.Although the easily available weather prediction data can be used to predict the regional power output, the inherent weather prediction errors will affect the output prediction accuracy.The lack of available data resources and the weather prediction errors make it difficult to directly use the previously proposed models to predict the county-level regional DPV output.The second is about the prediction method.Most of the previously proposed deep neural network (DNN) architectures are successfully applied to images, text, and audio but are not well suited for tabular data [26].Therefore, there are few studies on the regional PV output prediction method based on DNNs.As is the case with other data-driven models, when a new DNN for tabular data, such as TabNet, is applied to predict the county-level regional DPV output, the optimal training sample collection period (TSCP) is dynamically changing, and it is difficult to select this hyper-parameter, so generally a fixed value is used, or all historical samples are taken as the training samples, which will reduce the accuracy of the predicted results.
In this paper, the weather prediction information is used to predict the county-level regional DPV output based on the model input average strategy.To eliminate the effect of the selected non-optimal TSCP on the prediction accuracy, an ensemble prediction method based on the minimum redundancy maximum relevance (mRMR) criterion and the TabNet model is carried out.To reduce the influence of weather prediction errors on the power output prediction, a modified model based on error prediction is proposed.The proposed ensemble prediction method is used to predict the day-ahead output, and a combination prediction model based on the proposed ensemble prediction method and the modified model is established to predict the hour-ahead output.Finally, the performance of the proposed models is verified by error analysis.

TabNet Model
TabNet model [26], a novel DNN for tabular data, is introduced in this paper to predict the county-level regional DPV output.As shown in Figure 1, TabNet's encoding is based on sequential multi-step processing with N steps decision steps and can be summarized as follows: successfully applied to images, text, and audio but are not well suited for tabular data [26].Therefore, there are few studies on the regional PV output prediction method based on DNNs.As is the case with other data-driven models, when a new DNN for tabular data, such as TabNet, is applied to predict the county-level regional DPV output, the optimal training sample collection period (TSCP) is dynamically changing, and it is difficult to select this hyper-parameter, so generally a fixed value is used, or all historical samples are taken as the training samples, which will reduce the accuracy of the predicted results.
In this paper, the weather prediction information is used to predict the county-level regional DPV output based on the model input average strategy.To eliminate the effect of the selected non-optimal TSCP on the prediction accuracy, an ensemble prediction method based on the minimum redundancy maximum relevance (mRMR) criterion and the TabNet model is carried out.To reduce the influence of weather prediction errors on the power output prediction, a modified model based on error prediction is proposed.The proposed ensemble prediction method is used to predict the day-ahead output, and a combination prediction model based on the proposed ensemble prediction method and the modified model is established to predict the hour-ahead output.Finally, the performance of the proposed models is verified by error analysis.

TabNet Model
TabNet model [26], a novel DNN for tabular data, is introduced in this paper to predict the county-level regional DPV output.As shown in Figure 1, TabNet's encoding is based on sequential multi-step processing with  decision steps and can be summarized as follows: (1) -dimensional features  ∈ ℜ × are normalized by applying batch normalization (BN) and then passed to each decision step, where  is the batch size.
(3) Perform decision Step 1.Take   as the input of the attentive transformer to obtain the mask [] ∈ ℜ × .Employ [] to select the salient features and then process the selected features by using the feature transformer.Split the processed features (1) D-dimensional features f ∈ R B×D are normalized by applying batch normalization (BN) and then passed to each decision step, where B is the batch size.
(2) Process the normalized features using a feature transformer and then split into two parts, [d[0], a[0]], where d[0] ∈ R B×N d and a[0] ∈ R B×N a .
(3) Perform decision Step 1.Take a[0] as the input of the attentive transformer to obtain the mask M[1] ∈ R B×D .Employ M [1] to select the salient features and then process the selected features by using the feature transformer.Split the processed features for the decision step output and information for the subsequent decision step,  (6) Obtain the aggregate feature importance mask M agg-b,j by the following formula: where M b,j [i] is the element in the b th row and the j th column of the mask in the i th decision step.η b [i], which denotes the aggregate decision contribution at the i th decision step for the b th sample, is calculated as follows: where d b,c [i] is the the element in the b th row and the c th column of the splited features for the decision step i.
According to the structure of the attentive transformer shown in Figure 2, the feature selection mask , where sparsemax is the sparsemax normalization, a[i − 1] is the processed features from the preceding decision step, and h i is a trainable function with a FC (fully-connected) layer followed by BN.P[i − 1], the prior scale term denoting how much a particular feature has been used previously, is defined as follows: where γ is a relaxation parameter.P[0] is initialized as all ones, 1 B×D .
for the decision step output and information for the subsequent decision step, (6) Obtain the aggregate feature importance mask M agg-b,j by the following formula: where M b,j [i] is the element in the  row and the  column of the mask in the  decision step. b [i], which denotes the aggregate decision contribution at the  decision step for the  sample, is calculated as follows: where d b,c [i] is the the element in the  row and the  column of the splited features for the decision step .
According to the structure of the attentive transformer shown in Figure 2, the feature selection mask , where sparsemax is the sparsemax normalization, a[i − 1] is the processed features from the preceding decision step, and h is a trainable function with a FC (fully-connected) layer followed by BN. [ − ], the prior scale term denoting how much a particular feature has been used previously, is defined as follows: where  is a relaxation parameter.[] is initialized as all ones, 1 × .For parameter-efficient and robust learning with high capacity, a feature transformer should comprise layers that are shared across all decision steps, as well as decision step-dependent layers.Figure 3 shows the implementation as a concatenation of two shared layers and two decision step-dependent layers.Each FC layer is followed by BN and gated linear unit (GLU) nonlinearity [27], eventually connected to a normalized residual connection with normalization.Normalization with √0.5 helps to stabilize learning by ensuring that the variance throughout the network does not change dramatically [28].For faster training, the ghost BN [29] form is used in the feature transformer.For parameter-efficient and robust learning with high capacity, a feature transformer should comprise layers that are shared across all decision steps, as well as decision stepdependent layers.Figure 3 shows the implementation as a concatenation of two shared layers and two decision step-dependent layers.Each FC layer is followed by BN and gated linear unit (GLU) nonlinearity [27], eventually connected to a normalized residual connection with normalization.Normalization with √ 0.5 helps to stabilize learning by ensuring that the variance throughout the network does not change dramatically [28].For faster training, the ghost BN [29] form is used in the feature transformer.

Mutual Information
In information theory, mutual information (MI) is used to represent the degree of dependence between two systems or the correlation between two variables [10].MI can be represented based on system entropy, which expresses the complexity or uncertainty of a system.Assuming that the probability of system output  =  is (), the system entropy () is defined as follows:

Mutual Information
In information theory, mutual information (MI) is used to represent the degree of dependence between two systems or the correlation between two variables [10].MI can be represented based on system entropy, which expresses the complexity or uncertainty of a system.Assuming that the probability of system output Y = y is P(y), the system entropy H(Y) is defined as follows: When the system input X = x is known, the conditional entropy H(Y|X) is defined in Equation ( 5): where P Y|X (y|x) is the conditional probability of Y when the system input X = x.
The system joint entropy is defined as follows: Since the known X = x can reduce the uncertainty of the system, the conditional entropy H(Y|X) is smaller than the system entropy H(Y).MI is used to quantify the degree of system uncertainty decrease, represented by I(X, Y) in Equation ( 7): Based on Equations ( 4), (5), and (7), I(X, Y) is calculated as follows: where P XY (x, y) represents the joint probability distribution when X = x and Y = y.When X and Y are continuous variables, I(X, Y) is calculated by Equation ( 9):

mRMR Criterion
The mRMR criterion [30] is an eigenvalue selection method, and its core idea is to establish the optimal feature subset based on the maximum relevance condition and the minimum redundancy condition [31].Let the eigenvalue set consisting of m eigenvalues be F m , from which n(n ≤ m) eigenvalues are selected to form a subset S n , where F m = {v i , i = 1, 2, . . ., m} and S n = v j , j = 1, 2, . . ., n , and S n ⊆ F m .According to the maximum relevance condition, the average value of the MI between the characteristic variables in the subset and the target variable is the largest, and the constraints are as follows: where I(v i , c) is the MI between the i-th eigenvalue and the target variable c.Eigenvalues of the subset S n obtained by Equation ( 10) may be highly correlated, which may introduce unnecessary redundant.Therefore, in addition to meeting the maximum Energies 2023, 16, 5649 7 of 22 relevance condition, the mean value of MI between eigenvalues in the subset S n should be minimized.The minimum redundancy condition is as follows: where I v i , v j is the MI between the eigenvalues of the subset S n .Based on Equations ( 10) and ( 11), the mRMR criterion is expressed by Equation ( 12):

Data Experiment Scheme
As shown in Figure 4, the steps of the data experiment are: subset and the target variable is the largest, and the constraints are as follows: where ( , ) is the MI between the -th eigenvalue and the target variable .
Eigenvalues of the subset  obtained by Equation ( 10) may be highly correlated, which may introduce unnecessary redundant.Therefore, in addition to meeting the maximum relevance condition, the mean value of MI between eigenvalues in the subset  should be minimized.The minimum redundancy condition is as follows: where ( ,  ) is the MI between the eigenvalues of the subset  .Based on Equations ( 10) and ( 11), the mRMR criterion is expressed by Equation ( 12):

Data Experiment Scheme
As shown in Figure 4, the steps of the data experiment are:   Step 1: Collect the measured output data (sampling period: 15 min).Collect the weather prediction data (sampling period: 1 h), and then obtain the weather prediction data of the same period with the measured output data by linear interpolation.Preprocess and combine the output data and weather data to establish the experimental sample set.
Step 2: Take a test sample.
Step 3: For the test sample extracted in the previous step, n fixed TSCPs are randomly generated.Based on different TSCPs, n training sample sets are established by extracting qualified training samples from the experimental sample set.
Step 4: Based on the n training sample sets established in the previous step, TabNet model is trained to generate n prediction models.
Step 5: Taking the n prediction models generated in the previous step as base predictors, an ensemble prediction model based on mRMR is established.Step 6: Based on the test sample taken in Step 2 and the ensemble prediction model established in the previous step, the day-ahead and hour-ahead outputs are predicted, respectively.
Step 7: Based on the hour-ahead output predicted in the previous step and the proposed modified model, the final hour-ahead output is obtained.
Repeat Step 2 to Step 7 m (test sample size) times to obtain the day-ahead and hourahead output prediction series, respectively.Step 8 and Step 9 are prediction error analyses.
The normalized mean absolute error (nMAE) calculated by Equation ( 13) and the normalized root mean square error (nRMSE) calculated by Equation ( 14) are imported to present the prediction errors in this paper.
where n is the test sample size, p i is the normalized predicted regional DPV output, and p i is the normalized measured output.

Proposed Ensemble Prediction Model
In predicting the regional DPV output based on the model input average strategy, the training sample set affects the prediction performance of the trained TabNet model.The training sample set depends on the TSCP.For the output prediction of a specific period, there is a relatively optimal training sample set, which corresponds to a specific TSCP and the historical samples in the TSCP.With the systems running, the newly generated samples will update the historical sample set in the previous optimal TSCP, which will cause the training sample set in this TSCP is no longer optimal.Therefore, the TSCP corresponding to the optimal training sample set is dynamic.However, it is difficult to select the optimal value of this hyper-parameter.The TSCP selected by traditional methods is often not optimal, which affects the prediction accuracy.To solve this problem, we proposed an ensemble prediction model based on mRMR criterion and TabNet model, and the specific steps of the model are as follows: Step 1: Randomly generate n fixed TSCPs and establish n training sample sets for the day before the predicted day.Then, based on the established n training sample sets and TabNet model, the regional DPV output series {p 1 , p 2 , . . . ,p n } is predicted.
Step 2: Calculate each I(p i , p j ), the MI between the n regional DPV output prediction series in the previous step and calculate each I(p i , c), the MI between the n predicted output series and the measured output series c.
Step 3: Let F = {p 1 , p 2 , . . . ,p n }, select the p i with the largest I(p i , c), let S = {p i }, and then update F as follows: F = F − {p i }; Step 4: Select p k which meet the conditions expressed in Equation ( 12) from F, then update S and F as follows: Step 5: Repeat Step 4 for a total of m − 1 times to select the m output prediction series according to the mRMR criterion from the n output prediction series in Step 1 and constitute a set S m = {p 1 , p 2 , . . . ,p m }.The m fixed TSCPs corresponding to the m output prediction series in the set S m are extracted to constitute a set T m = {t 1 , t 2 , . . . ,t m }.Then calculate the MI between the predicted output series and the measured output series to constitute a set I(P m , c) = {I(p 1 , c), I(p 2 , c), . . . ,I(p m , c)}.
Energies 2023, 16, 5649 9 of 22 Step 7: Predict the regional DPV output series of the predicted day based on the fixed TSCPs in T m and TabNet model.Then, construct an output prediction matrix P = p 1 , p 2 , . . ., p m .
Step 8: Calculate the county-level regional DPV output prediction series p ω of the predicted day by Equation ( 16): p ω = P × w (16)

Proposed Modified Model
The weather prediction information is taken as the input to predict the power output of the regional DPV stations group.Therefore, weather prediction accuracy has a great influence on the prediction accuracy of power output.However, weather prediction errors are similar in adjacent time periods.In this paper, a modified model based on error prediction is established by mining this similarity.Based on the known prediction errors, the unknown prediction errors are predicted, thus the influence of weather prediction errors on the output prediction accuracy is reduced.
To describe the proposed modified model, the concepts of potential test sample (PTS), non-potential test sample (NPTS), and the closest similar sample (CSS) are defined.A test sample is defined as PTS if there are some historical samples with the same weather type on the same day, otherwise it is defined as NPTS.There is some potential that the power output prediction error in a PTS period can be reduced by the proposed modified model.The closest historical sample on the same day and with the same weather type of PTS is defined as the CSS of the PTS.As shown in Figure 5, the steps of the proposed modified model are: Step 1: Take a test sample and determine whether it is a PTS.If so, proceed to the next step, if not, return to take the next test sample.
Step 2: Extract the historical PTSs with the same weather type of the PTS taken in the previous step and the CSSs corresponding to the historical PTSs.
Step 3: Calculate the prediction errors in the periods of PTSs extracted in Step 2 by Equation ( 17), and transform the errors by Equation ( 18): Step 1: Take a test sample and determine whether it is a PTS.If so, proceed to the next step, if not, return to take the next test sample.
Step 2: Extract the historical PTSs with the same weather type of the PTS taken in the previous step and the CSSs corresponding to the historical PTSs.Step 3: Calculate the prediction errors in the periods of PTSs extracted in Step 2 by Equation (17), and transform the errors by Equation ( 18): where e is the prediction error, p p is the normalized predicted output, p a is the normalized measured output, and e T is the transformed error.
Step 4: Calculate the differences of extraterrestrial solar radiation in the periods of PTSs extracted in Step 2 by Equation (19), and transform the differences by Equation ( 20): where G d is the difference of extraterrestrial solar radiation in one of the periods of PTSs extracted in Step 2, G P is the normalized extraterrestrial solar radiation in the period of the PTS, G C is the normalized extraterrestrial solar radiation in the period of the CSS corresponding to the PTS, and G T is the transformed difference of extraterrestrial solar radiation.
Step 5: Establish a training sample set in which a training sample , where e C is the transformed error in the period of the CSS corresponding to the training sample.
Step 6: Train a TabNet model based on the training sample set established in the previous step, and then predict the error e p .
Step 7: Transform the predicted error according to Equation ( 21) and modify the predicted power output according to Equation (22).
where e m is the transformed predicted error, p m is the modified regional DPV output, and p p is the output of the proposed ensemble prediction model.

Experimental Data and DATA Preprocessing
The raw data used in the experiment include measured regional DPV output data and weather prediction data from 7 January 2021 to 30 September 2021.The measured output data (sampling period: 15 min) is collected from 27 DPV systems in Xiaoshan District, Hangzhou City, Zhejiang Province, China.Weather prediction data (sampling period: 1 h) was obtained from Xinzhi weather prediction platform.The weather prediction data (sampling period: 15 min) is obtained based on the original weather prediction data and linear interpolation.The experimental sample set is established by combining multisource data sets according to the time attribute of samples.The attributes of the established experimental sample set include: Extraterrestrial solar radiation, weather type, air temperature, air index, wind speed, and measured regional DPV output.
After establishing the experimental sample set, data preprocessing is carried out.In order to improve the convergence speed and accuracy of DNN models, Max-Min normalization is usually carried out on the input and output features as shown in Equation (23): where x nor is a normalized feature value, x is the original feature value, x max is the maximum feature value, and x min is the minimum feature value.

Output Prediction
According to the proposed ensemble prediction model, 16 fixed TSCPs (1, 3, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 180 days) were randomly generated in this paper to establish training sample sets.In particular, the experimental data on 27 September 2021 was selected to verify the proposed models.For output prediction, based on the mRMR criterion, the first 14 optimal output prediction series were selected from the 16 series on the day before the predicted day to calculate weights.The weight calculation results are listed in Table 1.The regional DPV output series on the predicted day were predicted based on the fixed TSCPs and TabNet model.Then based on these predicted output series and calculated weights, the day-ahead output prediction series was calculated according to Equation (16).As shown in Figure 6, the predicted values of the day-ahead outputs on this day are generally lower than the corresponding measured values, which may be caused by the similarity of the weather prediction errors of the day.
Energies 2023, 16 The regional DPV output series on the predicted day were predicted based on th fixed TSCPs and TabNet model.Then based on these predicted output series and calc lated weights, the day-ahead output prediction series was calculated according to Equ tion (16).As shown in Figure 6, the predicted values of the day-ahead outputs on th day are generally lower than the corresponding measured values, which may be cause by the similarity of the weather prediction errors of the day.

Performance Analysis
In order to further verify the validity of the proposed models, test samples from May 2021 to 30 September 2021 were used.Figure 8 presents the measured regional DP outputs versus the predicted outputs based on the proposed and reference model Power values are normalized from 0 to 1 to show a better comparison.The blue poin represent the predicted power values and the corresponding measured power values the same time.The red solid line  =  provides a reference when the predicted value equal to the measured value, so the dispersion degree of the blue points around the re line reflects the error between the measured power and the predicted power.The mo intensively the blue points aggregate to the red line, the less prediction error the corr sponding model shows.

Performance Analysis
In order to further verify the validity of the proposed models, test samples from 1 May 2021 to 30 September 2021 were used.Figure 8 presents the measured regional DPV outputs versus the predicted outputs based on the proposed and reference models.Power values are normalized from 0 to 1 to show a better comparison.The blue points represent the predicted power values and the corresponding measured power values at the same time.The red solid line y = x provides a reference when the predicted value is equal to the measured value, so the dispersion degree of the blue points around the red line reflects the error between the measured power and the predicted power.The more intensively the blue points aggregate to the red line, the less prediction error the corresponding model shows.

Performance Analysis
In order to further verify the validity of the proposed models, test samples from 1 May 2021 to 30 September 2021 were used.Figure 8 presents the measured regional DPV outputs versus the predicted outputs based on the proposed and reference models.Power values are normalized from 0 to 1 to show a better comparison.The blue points represent the predicted power values and the corresponding measured power values at the same time.The red solid line  =  provides a reference when the predicted value is equal to the measured value, so the dispersion degree of the blue points around the red line reflects the error between the measured power and the predicted power.The more intensively the blue points aggregate to the red line, the less prediction error the corresponding model shows.The proposed ensemble prediction strategy is used to predict the day-ahead output, and the proposed combination prediction strategy is used to predict the hour-ahead output.As shown in Figure 8a,c,e, the prediction errors of the models with the combination prediction strategy are less than that of the hour-ahead output persistence prediction model, and the prediction error of the TabNet-based combination prediction model is smaller than that of the SVM based combination prediction model.Figure 8b,d,f show that the prediction errors of the models with the ensemble prediction strategy are less than that of the day-ahead output persistence prediction model, and the prediction error of the TabNet-based ensemble prediction model is smaller than that of the SVM based ensemble prediction model.
As shown in Figure 8, the distribution of the measured power versus prediction power points reveals an intuitive comparison between different prediction models but is not reliable in quantitative evaluation.Therefore, the prediction errors based on different models are listed in Tables 2 and 3 and described in Figures 9 and 10.In Figure 9, the numbers 1, 3, ..., 180 represent the fixed TSCPs, E represents the proposed ensemble prediction strategy, and C represents the proposed combination prediction strategy.In the subsequent figures of this section, this abbreviation way is also applied.The proposed ensemble prediction strategy is used to predict the day-ahead output, and the proposed combination prediction strategy is used to predict the hour-ahead output.As shown in Figure 8a,c,e, the prediction errors of the models with the combination prediction strategy are less than that of the hour-ahead output persistence prediction model, and the prediction error of the TabNet-based combination prediction model is smaller than that of the SVM based combination prediction model.Figure 8b,d,f show that the prediction errors of the models with the ensemble prediction strategy are less than that of the day-ahead output persistence prediction model, and the prediction error of the TabNet-based ensemble prediction model is smaller than that of the SVM based ensemble prediction model.
As shown in Figure 8, the distribution of the measured power versus prediction power points reveals an intuitive comparison between different prediction models but is not reliable in quantitative evaluation.Therefore, the prediction errors based on different models are listed in Tables 2 and 3 and described in Figures 9 and 10.In Figure 9, the numbers 1, 3, . .., 180 represent the fixed TSCPs, E represents the proposed ensemble prediction strategy, and C represents the proposed combination prediction strategy.In the subsequent figures of this section, this abbreviation way is also applied.It can be seen from Figure 9 that, based on different data-driven models, the proposed ensemble prediction strategy can eliminate the influence of the non-optimal sampling period of training samples on the prediction accuracy, and the proposed combination prediction strategy can further improve the hour-ahead prediction accuracy.
Figure 10 presents a comparative analysis of the prediction errors between the proposed models and the reference models.For day-head output prediction, the nMAEs predicted by the TabNet-based ensemble prediction model, SVM-based ensemble prediction model, and day-ahead persistence prediction model are 8.40%, 8.85%, and 11.26% respectively.Meanwhile, the nRMSEs are 11.11%, 11.35, and 16.62%, respectively.For hour-head output prediction, the nMAEs predicted by the TabNet-based combination prediction model, SVM based combination prediction model, and hour-ahead persistence prediction model are 6.90%, 7.71%, and 9.95%, respectively.Meanwhile, the nRMSEs are 9.49%, 9.97%, and 12.54%, respectively.The experiment results verified that the proposed methods are more accurate than the corresponding reference models.
In order to analyze the stabilities of the proposed models, the daily errors are analyzed.Figures 11 and 12 present daily prediction error distributions of the TabNet-based models and the SVM-based models, respectively.It can be seen that the distributions of daily nMAE and nRMSE using the ensemble prediction strategy are lower than that using the fixed TSCPs, and the distributions using the combination prediction strategy are lower than that using the ensemble prediction strategy.
persistence prediction model are 6.90%, 7.71%, and 9.95%, respectively.Meanwhile, the nRMSEs are 9.49%, 9.97%, and 12.54%, respectively.The experiment results verified that the proposed methods are more accurate than the corresponding reference models.
In order to analyze the stabilities of the proposed models, the daily errors are analyzed.Figures 11 and 12 present daily prediction error distributions of the TabNet-based models and the SVM-based models, respectively.It can be seen that the distributions of daily nMAE and nRMSE using the ensemble prediction strategy are lower than that using the fixed TSCPs, and the distributions using the combination prediction strategy are lower than that using the ensemble prediction strategy.Figure 13 presents a comparative analysis about the distributions of daily dayahead prdiction errors between the proposed model and the reference models.The mean values of daily nMAE predicted by the TabNet-based ensemble prediction model, SVMbased ensemble prediction model, and day-ahead persistence prediction model are 8.22%, 8.77%, and 11.12%, respectively, and the median values are 7.95%, 8.39%, and 9.90%, respectively.The mean values of daily nRMSE predicted by the three models are 10.19%, 10.65%, and 14.39%, respectively, and the median values are 9.58%, 10.03%, and 12.92%, respectively.Both the boxplots and the statistical indicators verified that the nRMSEs are 9.49%, 9.97%, and 12.54%, respectively.The experiment results verified that the proposed methods are more accurate than the corresponding reference models.
In order to analyze the stabilities of the proposed models, the daily errors are analyzed.Figures 11 and 12 present daily prediction error distributions of the TabNet-based models and the SVM-based models, respectively.It can be seen that the distributions of daily nMAE and nRMSE using the ensemble prediction strategy are lower than that using the fixed TSCPs, and the distributions using the combination prediction strategy are lower than that using the ensemble prediction strategy.In order to analyze the variations in the performance of the proposed models with time, the monthly errors are analyzed.Figures 15 and 16, respectively, present the monthly nMAE and nRMSE of TabNet-based models.The closer the color of the histogram in the figure is to yellow, the greater the error value is, and the closer it is to blue, the smaller the error value is.It can be seen that, compared with the models with fixed TSCPs, the monthly errors by using the ensemble prediction strategy are lower.By using the proposed combination prediction strategy, the monthly errors are further reduced.In order to analyze the variations in the performance of the proposed models with time, the monthly errors are analyzed.Figures 15 and 16, respectively, present the monthly nMAE and nRMSE of TabNet-based models.The closer the color of the histogram in the figure is to yellow, the greater the error value is, and the closer it is to blue, the smaller the error value is.It can be seen that, compared with the models with fixed TSCPs, the monthly errors by using the ensemble prediction strategy are lower.By using the proposed combination prediction strategy, the monthly errors are further reduced.In order to analyze the variations in the performance of the proposed models with time, the monthly errors are analyzed.Figures 15 and 16, respectively, present the monthly nMAE and nRMSE of TabNet-based models.The closer the color of the histogram in the figure is to yellow, the greater the error value is, and the closer it is to blue, the smaller the error value is.It can be seen that, compared with the models with fixed TSCPs, the monthly errors by using the ensemble prediction strategy are lower.By using the proposed combination prediction strategy, the monthly errors are further reduced.Figures 17 and 18                     Figure 19 presents a comparative analysis of the day-ahead prediction models.In September, the nMAEs of the proposed day-ahead prediction model (TabNet-based ensemble prediction model) and the two reference models (SVM-based ensemble prediction model and day-ahead persistence prediction model) are close to each other, and the nRMSE of the proposed day-ahead prediction model is close to that of the SVM-based ensemble prediction model but significantly higher than that of the day-ahead persistence prediction model.In other months, considering the two error indicators comprehensively, the performance of the proposed day-ahead prediction model is better than that of the reference models.The experimental results verify that the proposed day-ahead and hour-ahead prediction models are more accurate and stable than the corresponding reference models and show robust performance with monthly variations.The experimental results verify that the proposed day-ahead and hour-ahead prediction models are more accurate and stable than the corresponding reference models and show robust performance with monthly variations.

Conclusions
This paper presents a new prediction method for the output of the county-level regional DPV stations group, which aims to improve the centralized operation and maintenance of the stations and to meet the needs of power grid dispatching.The weather prediction information is used to predict the output based on the model input average strategy.Considering the effect of the selected non-optimal TSCP on the prediction accuracy, an ensemble prediction method based on the mRMR criterion and TabNet model is carried out to predict the day-ahead output.Firstly, multiple fixed TSCPs are randomly generated, and the output prediction series of the day before the predicted period are predicted based on the fixed TSCPs and the TabNet model.The weight vector of the output prediction series of the previous day is calculated according to the mRMR algorithm.Then, based on the fixed TSCPs and the TabNet model, the output vector of the predicted period is predicted.Finally, the output prediction value of the predicted period is obtained by the weighted average method.The nMAEs and nRMSEs of the prediction results based on the fixed TSCPs are in the interval (8.85%, 16.09%) and the interval (12.06%, 23.74%), respectively.The nMAE and nRMSE of the prediction results based on the proposed ensemble prediction model are 8.4% and 11.11%, respectively.Therefore, the effect of the selected non-optimal TSCP on the prediction accuracy can be eliminated by the proposed ensemble prediction model.
Taking into account the influence of weather prediction errors on the power output prediction, a modified model based on error prediction is proposed.Firstly, the functional relationship between the prediction errors of the same weather type on the same day is learned.Then, based on the functional relationship, the prediction error of the predicted period is predicted.Finally, the output prediction result of the proposed ensemble prediction model is modified by the predicted error.The nMAE and nRMSE of the hour-ahead output prediction results obtained by this combination prediction model are 6.9% and 9.49%, respectively, which is less than that of the proposed ensemble prediction model.Thus, the influence of weather prediction errors on the power output prediction is reduced by the proposed modified model.
According to the overall error analysis, compared with the reference day-ahead prediction model, the proposed ensemble prediction model reduces the nMAE by 2.86% and the nRMSE by 5.51%, respectively, and compared with the reference hour-ahead prediction model, the proposed combination prediction model reduces the nMAE by

Conclusions
This paper presents a new prediction method for the output of the county-level regional DPV stations group, which aims to improve the centralized operation and maintenance of the stations and to meet the needs of power grid dispatching.The weather prediction information is used to predict the output based on the model input average strategy.Considering the effect of the selected non-optimal TSCP on the prediction accuracy, an ensemble prediction method based on the mRMR criterion and TabNet model is carried out to predict the day-ahead output.Firstly, multiple fixed TSCPs are randomly generated, and the output prediction series of the day before the predicted period are predicted based on the fixed TSCPs and the TabNet model.The weight vector of the output prediction series of the previous day is calculated according to the mRMR algorithm.Then, based on the fixed TSCPs and the TabNet model, the output vector of the predicted period is predicted.Finally, the output prediction value of the predicted period is obtained by the weighted average method.The nMAEs and nRMSEs of the prediction results based on the fixed TSCPs are in the interval (8.85%, 16.09%) and the interval (12.06%, 23.74%), respectively.The nMAE and nRMSE of the prediction results based on the proposed ensemble prediction model are 8.4% and 11.11%, respectively.Therefore, the effect of the selected non-optimal TSCP on the prediction accuracy can be eliminated by the proposed ensemble prediction model.
Taking into account the influence of weather prediction errors on the power output prediction, a modified model based on error prediction is proposed.Firstly, the functional relationship between the prediction errors of the same weather type on the same day is learned.Then, based on the functional relationship, the prediction error of the predicted period is predicted.Finally, the output prediction result of the proposed ensemble prediction model is modified by the predicted error.The nMAE and nRMSE of the hour-ahead output prediction results obtained by this combination prediction model are 6.9% and 9.49%, respectively, which is less than that of the proposed ensemble prediction model.Thus, the influence of weather prediction errors on the power output prediction is reduced by the proposed modified model.
According to the overall error analysis, compared with the reference day-ahead prediction model, the proposed ensemble prediction model reduces the nMAE by 2.86% and the nRMSE by 5.51%, respectively, and compared with the reference hour-ahead prediction model, the proposed combination prediction model reduces the nMAE by 3.05% and the nRMSE by 3.05%, respectively.Based on the daily error analysis, compared with the reference day-ahead prediction model, the proposed ensemble prediction model reduces the mean value of daily nMAE by 2.9% and daily nRMSE by 4.2%, respectively, and compared with the reference hour-ahead prediction model, the proposed combination

Figure 3 .
Figure 3.A feature transformer block example.

Figure 3 .
Figure 3.A feature transformer block example.

Figure 4 .
Figure 4.The data experiment scheme.

Figure 4 .
Figure 4.The data experiment scheme.

Figure 5 .
Figure 5.The proposed modified model.

Figure 5 .
Figure 5.The proposed modified model.

Figure 6 .
Figure 6.Prediction results based on the proposed ensemble prediction model.

Figure 6 .Figure 7 .
Figure 6.Prediction results based on the proposed ensemble prediction model.

Figure 7 .
Figure 7. Prediction results based on the proposed combination prediction model.

Energies 2023 , 23 Figure 7 .
Figure 7. Prediction results based on the proposed combination prediction model.

Figure 8 .
Figure 8.The predicted outputs versus the measured outputs: (a) using the TabNet-based combination prediction model; (b) using the TabNet-based ensemble prediction model; (c) using the SVM-based combination prediction model; (d) using the SVM-based ensemble prediction model; (e) using the hour-ahead output persistence prediction model; (f) using the day-ahead output persistence prediction model.

Figure 9 .
Figure 9. Prediction errors based on different models.(a) nMAEs predicted by the TabNet models with the fixed TSCPs, TabNet-based ensemble prediction model, and TabNet-based combination prediction model; (b) nRMSEs predicted by the TabNet models with the fixed TSCPs, TabNet based ensemble prediction model, and TabNet-based combination prediction model; (c) nMAEs predicted by the SVM models with the fixed TSCPs, SVM based ensemble prediction model, and SVM based combination prediction model; (d) nRMSEs predicted by the SVM models with the fixed TSCPs, SVM based ensemble prediction model, and SVM based combination prediction model.

Figure 9 .
Figure 9. Prediction errors based on different models.(a) nMAEs predicted by the TabNet models with the fixed TSCPs, TabNet-based ensemble prediction model, and TabNet-based combination prediction model; (b) nRMSEs predicted by the TabNet models with the fixed TSCPs, TabNet based ensemble prediction model, and TabNet-based combination prediction model; (c) nMAEs predicted by the SVM models with the fixed TSCPs, SVM based ensemble prediction model, and SVM based combination prediction model; (d) nRMSEs predicted by the SVM models with the fixed TSCPs, SVM based ensemble prediction model, and SVM based combination prediction model.

Figure 9 .Figure 10 .
Figure 9. Prediction errors based on different models.(a) nMAEs predicted by the TabNet models with the fixed TSCPs, TabNet-based ensemble prediction model, and TabNet-based combination prediction model; (b) nRMSEs predicted by the TabNet models with the fixed TSCPs, TabNet based ensemble prediction model, and TabNet-based combination prediction model; (c) nMAEs predicted by the SVM models with the fixed TSCPs, SVM based ensemble prediction model, and SVM based combination prediction model; (d) nRMSEs predicted by the SVM models with the fixed TSCPs, SVM based ensemble prediction model, and SVM based combination prediction model.

Figure 10 .
Figure 10.Histogram comparison of the prediction errors: (a) Of day-ahead output prediction results; (b) of hour-ahead output prediction results.

Figure 13
Figure 13 presents a comparative analysis about the distributions of daily dayahead prdiction errors between the proposed model and the reference models.The mean values of daily nMAE predicted by the TabNet-based ensemble prediction model, SVMbased ensemble prediction model, and day-ahead persistence prediction model are 8.22%, 8.77%, and 11.12%, respectively, and the median values are 7.95%, 8.39%, and 9.90%, respectively.The mean values of daily nRMSE predicted by the three models are 10.19%, 10.65%, and 14.39%, respectively, and the median values are 9.58%, 10.03%, and 12.92%, respectively.Both the boxplots and the statistical indicators verified that the

Figure 13 Figure 13 .
Figure 13 presents a comparative analysis about the distributions of daily day-ahead prdiction errors between the proposed model and the reference models.The mean values of daily nMAE predicted by the TabNet-based ensemble prediction model, SVM-based ensemble prediction model, and day-ahead persistence prediction model are 8.22%, 8.77%, and 11.12%, respectively, and the median values are 7.95%, 8.39%, and 9.90%, respectively.The mean values of daily nRMSE predicted by the three models are 10.19%, 10.65%, and 14.39%, respectively, and the median values are 9.58%, 10.03%, and 12.92%, respectively.Both the boxplots and the statistical indicators verified that the proposed TabNet-based ensemble prediction model is more stable than the reference day-ahead prediction models.

Figure 14 Figure 14 .
Figure 14 presents a comparative analysis about distributions of daily hour-ahead prediction errors between the proposed model and the reference models.The mean values of daily nMAE predicted by the TabNet based combination prediction model, SVM based combination prediction model, and hour-ahead persistence prediction model are 6.77%, 7.65%, and 9.88% respectively, and the median values are 6.46%, 7.39%, and 10.77% respectively.The mean values of daily nRMSE predicted by the three models are 8.83%, 9.49%, and 11.91%, respectively, and the median values are 8.38%, 9.12%, and 12.78% respectively.Both the boxplots and the statistical indicators verified that the proposed TabNet-based combination prediction model is more stable than the reference hour-ahead prediction models.

Figure 14 Figure 13 .
Figure 14 presents a comparative analysis about distributions of daily hour-ahead prediction errors between the proposed model and the reference models.The mean values of daily nMAE predicted by the TabNet based combination prediction model, SVM based combination prediction model, and hour-ahead persistence prediction model are 6.77%, 7.65%, and 9.88% respectively, and the median values are 6.46%, 7.39%, and 10.77% respectively.The mean values of daily nRMSE predicted by the three models are 8.83%, 9.49%, and 11.91%, respectively, and the median values are 8.38%, 9.12%, and 12.78% respectively.Both the boxplots and the statistical indicators verified that the proposed TabNet-based combination prediction model is more stable than the reference hour-ahead prediction models.

Figure 14 Figure 14 .
Figure 14 presents a comparative analysis about distributions of daily hour-ahead prediction errors between the proposed model and the reference models.The mean values of daily nMAE predicted by the TabNet based combination prediction model, SVM based combination prediction model, and hour-ahead persistence prediction model are 6.77%, 7.65%, and 9.88% respectively, and the median values are 6.46%, 7.39%, and 10.77% respectively.The mean values of daily nRMSE predicted by the three models are 8.83%, 9.49%, and 11.91%, respectively, and the median values are 8.38%, 9.12%, and 12.78% respectively.Both the boxplots and the statistical indicators verified that the proposed TabNet-based combination prediction model is more stable than the reference hour-ahead prediction models.
, respectively, present the monthly nMAE and nRMSE of SVM-based models.The experimental results are similar to those of the TabNet-based models.

Figures 17 and 18
Figures 17 and 18, respectively, present the monthly nMAE and nRMSE of SVM-base models.The experimental results are similar to those of the TabNet-based models.

Figure 15 .
Figure 15.The monthly nMAE histogram of the TabNet-based models.

Figure 16 .
Figure 16.The monthly nRMSE histogram of the TabNet-based models.

Figure 17 .
Figure 17.The monthly nMAE histogram of the SVM-based models.

Figure 15 .
Figure 15.The monthly nMAE histogram of the TabNet-based models.

Figures 17 and 18
Figures 17 and 18, respectively, present the monthly nMAE and nRMSE of SVM-base models.The experimental results are similar to those of the TabNet-based models.

Figure 15 .
Figure 15.The monthly nMAE histogram of the TabNet-based models.

Figure 16 .
Figure 16.The monthly nRMSE histogram of the TabNet-based models.

Figure 17 .
Figure 17.The monthly nMAE histogram of the SVM-based models.

Figure 16 .
Figure 16.The monthly nRMSE histogram of the TabNet-based models.

Figure 15 .
Figure 15.The monthly nMAE histogram of the TabNet-based models.

Figure 16 .
Figure 16.The monthly nRMSE histogram of the TabNet-based models.

Figure 17 .
Figure 17.The monthly nMAE histogram of the SVM-based models.

Figure 19
Figure19presents a comparative analysis of the day-ahead prediction models.I September, the nMAEs of the proposed day-ahead prediction model (TabNet-based en semble prediction model) and the two reference models (SVM-based ensemble predic tion model and day-ahead persistence prediction model) are close to each other, and th nRMSE of the proposed day-ahead prediction model is close to that of the SVM-base ensemble prediction model but significantly higher than that of the day-ahead persis tence prediction model.In other months, considering the two error indicators compre hensively, the performance of the proposed day-ahead prediction model is better tha that of the reference models.

Figure 20
Figure 20 presents a comparative analysis of the hour-ahead prediction models Considering the two error indicators comprehensively, the proposed hour-ahead predic tion model (TabNet-based combination prediction model) is better than the referenc models.

Figure 18 .
Figure 18.The monthly nRMSE histogram of the SVM-based models.

Figure 19
Figure19presents a comparative analysis of the day-ahead prediction models.In September, the nMAEs of the proposed day-ahead prediction model (TabNet-based ensemble prediction model) and the two reference models (SVM-based ensemble prediction model and day-ahead persistence prediction model) are close to each other, and the nRMSE of the proposed day-ahead prediction model is close to that of the SVM-based ensemble prediction model but significantly higher than that of the day-ahead persistence prediction model.In other months, considering the two error indicators comprehensively, the performance of the proposed day-ahead prediction model is better than that of the reference models.

Figure 20
Figure 20 presents a comparative analysis of the hour-ahead prediction models.Considering the two error indicators comprehensively, the proposed hour-ahead prediction model (TabNet-based combination prediction model) is better than the reference models.

Figure 20
Figure 20 presents a comparative analysis of the hour-ahead prediction models.Considering the two error indicators comprehensively, the proposed hour-ahead prediction model (TabNet-based combination prediction model) is better than the reference models.The experimental results verify that the proposed day-ahead and hour-ahead prediction models are more accurate and stable than the corresponding reference models and show robust performance with monthly variations.

a[i]], where d[i] ∈ R B×N d and a[i] ∈
) Similarly, perform the other decision steps to obtain the split features [d[i], R B×N a .(5) Construct the overall decision embedding as d out = ∑

W final d out to
get the output mapping.

W 𝐟𝐢𝐧𝐚𝐥 d out to
get the output mapping.

Table 1 .
Calculation results of MI and weight.

Table 2 .
Prediction errors for each TabNet-based model.

Table 3 .
Prediction errors for each SVM-based model.