Distributed Photovoltaic Short-Term Power Prediction Based on Personalized Federated Multi-Task Learning

Luo, Wenxiang; Shen, Yang; Li, Zewen; Deng, Fangming

doi:10.3390/en18071796

Open AccessArticle

Distributed Photovoltaic Short-Term Power Prediction Based on Personalized Federated Multi-Task Learning

by

Wenxiang Luo

,

Yang Shen

,

Zewen Li

and

Fangming Deng

^*

School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330032, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(7), 1796; https://doi.org/10.3390/en18071796

Submission received: 17 February 2025 / Revised: 23 March 2025 / Accepted: 26 March 2025 / Published: 3 April 2025

(This article belongs to the Special Issue AI Facilitated Cyber–Physical Energy Systems—Planning, Operation, and Markets)

Download

Browse Figures

Versions Notes

Abstract

In a distributed photovoltaic system, photovoltaic data are affected by heterogeneity, which leads to the problems of low adaptability and poor accuracy of photovoltaic power prediction models. This paper proposes a distributed photovoltaic power prediction scheme based on Personalized Federated Multi-Task Learning (PFL). The federal learning framework is used to enhance the privacy of photovoltaic data and improve the model’s performance in a distributed environment. A multi-task module is added to PFL to solve the problem that an FL single global model cannot improve the prediction accuracy of all photovoltaic power stations. A cbam-itcn prediction algorithm was designed. By improving the parallel pooling structure of a time series convolution network (TCN), an improved time series convolution network (iTCN) prediction model was established, and the channel attention mechanism CBAMANet was added to highlight the key meteorological characteristics’ information and improve the feature extraction ability of time series data in photovoltaic power prediction. The experimental analysis shows that CBAM-iTCN is 45.06% and 42.16% lower than a traditional LSTM, Mae, and RMSE. Compared with FL, the MAPE of the PFL proposed in this paper is reduced by 9.79%, and for photovoltaic power plants with large data feature deviation, the MAPE experiences an 18.07% reduction.

Keywords:

personalized federated learning; multi-task; deep learning; photovoltaic power prediction

1. Introduction

With the advent of the “dual carbon” targets, renewable energy has become a primary direction for future international energy development [1]. In this context, the installed capacity of distributed photovoltaic (DPV) systems has surged, making accurate photovoltaic power forecasting significantly important for optimizing grid scheduling and facilitating the high-proportion grid integration of renewable energy sources. However, DPV systems are widespread, and DPV power data are affected by geographical location differences, photovoltaic array sizes, and local meteorological conditions, leading to non-uniform distribution and complex correlations of the data, exhibiting significant non-IID characteristics. This significantly challenges the improvement of adaptability in photovoltaic prediction models within distributed environments.

With the advancement of artificial intelligence research, deep neural networks have been widely applied in photovoltaic power generation prediction [2]. The photovoltaic power station uploads the data to the server, and the server uses the data to train the prediction model to predict the photovoltaic power performance in the future [3]. At present, the deep learning centralized training model has been fully developed. However, the most widely used DPV systems are characterized by a dispersed installation of photovoltaic stations and the influence of geographical and meteorological conditions, causing temporal and spatial variations in photovoltaic data and resulting in heterogeneous data [4,5]. Therefore, it is impossible to establish a model with strong generalization ability and high prediction accuracy in the DPV system.

Federated learning is a novel distributed model-building framework. Different from centralized training, FL adopts a decentralized structure to retain data on local devices and jointly develop a global model through collaboration [6,7]. In the context of DPV systems, FL enables secure information sharing and knowledge distillation among DPV stations and can also enhance model performance in a distributed environment. However, during the training process of FL, influenced by the non-IID nature of photovoltaic (PV) data, traditional FL exhibits poor convergence and tends to bias model features toward PV stations with abundant data, resulting in varied performance of the global model among different PV stations [8]. Consequently, the conventional FL strategy, which involves training a solitary global forecasting model, proves insufficient to fulfill the distinct predictive demands of each individual PV station. Alternatively, the traditional FL methodology of developing a single, all-encompassing prediction model fails to address the specific prediction needs of every PV station adequately.

In this paper, a personalized federated learning approach is introduced for the prediction of DPV power. The innovations presented herein are as follows:

This paper introduces a framework for DPV power prediction, which is grounded in PFL. In this framework, local prediction models are trained at individual PV stations, and the model parameters are updated using federated learning algorithms on a cloud server. This enables secure information sharing and knowledge distillation among multiple PV stations.
In the process of Personalized Federated Multi-Task Learning (PFL), a multi-task training method is proposed. Clients have the capability to maintain the confidentiality of the Weight Normalization (WN) layer and develop distinct personalized models tailored to their unique data characteristics. This addresses the limitation of conventional federated learning (FL), wherein a unified global model fails to enhance prediction accuracy across all PV stations.
An improved CBAM-iTCN PV power prediction model is proposed. The parallel pooling structure of the TCN is modified, and the attention mechanism CBAM net is added. By assigning different weights to different features extracted by the hidden layers of the TCN, key meteorological feature information is emphasized, enhancing the feature extraction capability for time series data in PV power prediction.

The remainder of this paper is structured as follows. In Section 2, a comprehensive summary and review of the relevant literature in the field addressed by this article is provided. Section 3 presents the proposed methodology outlined in this article. Section 4 introduces the implementation process of the PFL algorithm. Section 5 introduces the CBAM iTCN prediction model. Section 6 validates the effectiveness of the proposed method. Section 7 summarizes the advantages and future development directions of this article’s plan.

2. Related Works

2.1. Photovoltaic Power Prediction Based on Deep Learning

Traditional prediction methods include statistical methods, physical models, and traditional machine learning methods. Statistical methods include time series analysis based on historical power data (such as ARIMA and SARIMA) and regression model prediction using statistical methods such as linear regression and multiple regression. Its advantage is low computational complexity, which is suitable for short-term forecasting, but it has difficulty dealing with nonlinear relations and cannot adapt to complex weather. Physical models include simulation prediction and indirect prediction methods based on physical formulas, which have the advantages of clear physical meaning and can be customized with equipment parameters to expand the prediction range but rely too much on high-precision data, resulting in large errors in practical applications. Traditional machine learning methods include support vector machines, random forest, gradient lifting, etc. Its advantages are that it can adapt to some nonlinear problems through function selection and has a strong anti-noise ability, but it has difficulty dealing with large-scale data and consumes a lot of model storage and calculation resources.

Deep learning (DL) can establish the prediction model between meteorological factors and photovoltaic power. Because of its strong nonlinear mapping ability, DL has been widely used in this field. While convolutional neural networks (CNNs) excel at processing the spatial correlation features present in weather image data for photovoltaic power forecasting [9,10], they are not proficient at capturing the long-range dependencies inherent in time series data. Therefore, the CNN is generally combined with Informer, Gating Unit (Gru), and other networks in photovoltaic power prediction [11,12,13]. The Cyclic Neural Network (RNN) has the ability to capture time dependence. The long-term and short-term memory network (LSTM), BiLSTM, and other networks [14,15,16] have good performance in photovoltaic power prediction; however, in the context of long-term series prediction, the challenge lies in avoiding gradient vanishing and effectively learning the long-range dependencies. Furthermore, researchers have also suggested that the integrated prediction methods, combined with a variety of neural networks, such as CNN-LSTM, GBDT-BiLSTM, and other integrated models [17,18,19], can realize photovoltaic power prediction; nevertheless, the computational expense is significant, leading to prolonged model training durations. The above paper aims at centralized photovoltaic power prediction, but the DPV system is more widely used.

The power prediction of the DPV system is affected by geographical diversity, data dispersion, as well as non-IID data, which impacts the performance of the prediction model, which faces challenges. Cheng et al. [20,21] proposed photovoltaic power prediction based on satellite image data sources. Through cloud image processing, short-term photovoltaic power generation prediction can be achieved, but processing satellite cloud images requires a lot of computing resources. Asiri et al. [22] introduced a prediction method for distributed photovoltaic power generation at the regional scale by dividing the region into different clusters and selecting a representative site in each cluster to realize photovoltaic power prediction. Some scholars use the idea of clustering to achieve photovoltaic power prediction, such as SOM [23] and K-means [24]. However, clustering only classifies the data samples without considering the learning ability of the model.

The above photovoltaic power prediction method usually adopts centralized model training and enhances the prediction accuracy solely by refining the algorithm’s architecture but does not take into account the differences in data. However, in a distributed environment, due to the scattered installation of photovoltaic power stations, the PV data are affected by factors such as the size, location, and meteorological conditions of PV arrays, and the distribution of photovoltaic power data in time and space is heterogeneous. Therefore, it is impossible to establish a model with strong generalization ability and high prediction accuracy in the DPV system.

2.2. Federated Learning

Since federal learning was proposed, its data privacy protection and other advantages have attracted the attention of many scholars. McMahan et al. [25,26] proposed a federal averaging (FedAvg) method. The model parameters or gradients are uploaded to the cloud server, and the cloud server distributes the weights of all clients equally. At the same time, researchers [27] optimized and improved federated learning from the perspective of client selection, communication, and security. However, the above methods have a poor convergence effect on highly heterogeneous data.

To improve the convergence of FL, scholars have tried to personalize the local devices and models. Prevalent methods for personalization encompass Federated Transfer Learning (FTL), Federated Meta-Learning (FML), and Adaptive Federated Learning (AFL), among others. Zhang et al. [28] used FTL to train devices with limited computational capabilities, thereby safeguarding data privacy and significantly enhancing the efficiency and adaptability of model training. Deng et al. [29] introduced an AFL framework, adding an adaptive algorithm to the original FL algorithm to improve the training efficiency and reduce the communication cost. Chen et al. [30] introduced a dynamic approach to FML for the purpose of dynamically forecasting small data samples. These personalization strategies are applied to customize the global model. Initially, a global model is constructed, followed by client-specific customization. However, such methods tend to prolong the model training duration and pose challenges in enhancing the accuracy of the global model when dealing with non-independent and identically distributed (non-IID) data. Therefore, the existing federal learning strategies cannot be directly applied to DPV power prediction [31].

In the current research on DPV power prediction, due to the dispersion of the DPV system, the PV array size and the meteorological environment of PV power stations in different locations are different, resulting in non-IID PV data, and it is difficult to train a prediction model with strong adaptability. Therefore, this paper proposes the PFL of multi-task strategies to realize the regional collaborative training prediction model and improve the adaptability of the prediction model.

3. Overall Framework

To validate the resilience and generalization potential of the OGGWO algorithm in practical implementations, this investigation establishes two distinct experimental frameworks. The first case study examines multi-objective optimization in integrated energy management systems incorporating carbon–economic constraints, while the subsequent evaluation focuses on renewable energy penetration maximization under grid-stability prerequisites.

Consider a DPV system within a specified region. The comprehensive framework is illustrated in Figure 1. This system comprises N decentralized PV stations and a cloud server; these power stations are distributed in different geographical locations, such as roofs, mountains and open areas; every power station is furnished with a photovoltaic power prediction model and has local historical power generation data and meteorological data. In the PFL prediction framework proposed in this paper, firstly, each photovoltaic power station uses its own data to train the prediction model and upload the trained parameter gradient to the cloud server; the cloud server uses the PFL algorithm to aggregate and generate a global model to develop the collaborative training framework for power prediction models across each DPV power station, while ensuring the privacy protection of photovoltaic data; the multi task strategy is added to the PFL to make each photovoltaic power station generate a personalized model separately, solve the problem of the FL single global model being inadequate for accurately predicting the performance of all power stations, and improve the prediction accuracy and adaptability of the photovoltaic power model. The PFL prediction framework presented in this paper, illustrated in Figure 2, is primarily structured into the subsequent steps:

For each photovoltaic power station, the local historical meteorological data and photovoltaic power data are preprocessed, including filling in missing data, abnormal data, and normalization.
In response to task requests from the cloud server, each photovoltaic power station receives the global model parameters and utilizes its local historical photovoltaic and meteorological data to train the prediction model.
Upon the completion of training, each photovoltaic power station will transmit its model parameters to the cloud server and retain its personalized update for the current round. The ECS then collects the received model parameters and updates the global model accordingly. If the accuracy threshold is satisfied, the FL training process is concluded, resulting in the final global model. The last round of private patches is pushed back to a personalized model to improve the prediction accuracy and model adaptability of photovoltaic power plants.
If the accuracy of the aggregated global prediction model does not meet the standard, the global model will be re-distributed to each photovoltaic power station, and steps 2 and 3 will be repeated until it meets the standard.

In the PFL prediction framework proposed in this paper, the edge server integrates the global model with the individual patch and trains the personalized model to optimize its applicability for the predictive endeavor of photovoltaic power plants. In the PFL process, the multi-task collaborative training mode provides a structured solution for DPV power prediction.

4. Personalized Federated Learning

4.1. Federated Learning

The introduction of federal learning disrupts the existing paradigm of data silos. The specific framework is shown in Figure 3. During the training process, all photovoltaic power station data are not interconnected, effectively ensuring the privacy of power data.

Federal learning represents, in essence, a distinct variant of distributed learning whereby each PV facility contributes model parameters to collaboratively enhance a global model. Owing to the non-IID characteristics of photovoltaic data, the localized prediction models developed by individual PV power stations exhibit significant divergence. As a result, the global model will exhibit a tendency toward the attributes of the photovoltaic power station during aggregation with a large amount of data in terms of model characteristics, resulting in poor user accuracy (UA) of some photovoltaic power stations. In this paper, the PFL with a multi-task strategy is used to avoid the impact of non-IID photovoltaic data on the performance of the model by training the personalized model locally.

4.2. Personalized Federated Learning

The Federal Average (FedAvg) algorithm stands as the foundational algorithm in FL. During the execution of FedAvg, each client performs one or multiple local parameter updates, as illustrated by Formula (1). Herein, η denotes the learning rate and (assume the original symbol for the local gradient was omitted; use

\nabla F_{i}

to represent it) signifies the local gradient.

ω_{i}^{t} = ω_{i}^{t - 1} - η \nabla F_{i} (ω)

(1)

In contrast to the gradient averaging algorithm, the FedAvg approach allows for multiple gradient descent iterations based on the computational capacity and communication latency of individual clients. Subsequently, the local model

ω_{i}^{t}

is updated and uploaded to the central server, where the next iteration of the global model is computed using Formula (2).

ω^{t} = \sum_{i = 1}^{N} \frac{n_{i}}{n} ω_{i}^{t}

(2)

N represents the total data of all participants, the number of FL clients by n, and the data set of the ith client by n_i. The central server’s objective function is formulated as Formula (3).

F_{i} = \min_{i \in R^{k}} \{\sum_{i = 1}^{N} \frac{n_{i}}{n} F_{i} (ω)\}

(3)

But in the PFL of this paper, during each round of model updating, the client will update the personalized model using local data in addition to training the global model. Consequently, the loss function in PFL comprises both the local and personalized models and is represented by Formula (4).

f_{i} (θ_{i}) + \frac{λ}{2} {‖θ_{i} - ω‖}^{2}

(4)

θ_i represents the personalized model of client training, whereas ω represents the global model from the preceding training round. If it is in the first round of training, ω represents the initial model issued by the ECS. λ indicates the impact intensity of the global model on the personalized model λ. The size of it is directly proportional to the client’s computing power and the extent of data heterogeneity. The PFL objective function can be rewritten into Formula (5) according to Formula (3).

F_{i} (ω) = \min_{θ_{i} \in R^{d}} \{f_{i} (θ_{i}) + \frac{λ}{2} {‖θ_{i} - ω‖}^{2}\} \min_{ω \in R^{d}} \{F (ω) = \frac{1}{N} \sum_{i = 1}^{N} F_{i} (ω)\}

(5)

In this study, we employ the WN layer of the convolutional neural network as the patching layer to tailor the fl to individual needs. There are two reasons for choosing the WN layer as private: (1) the parameter storage cost of the WN layer is low; (2) the local model parameters obtained from FL training exhibit characteristics akin to those of the data in the WN layer. Consequently, the client has the option to store these parameters within the WN layer, enabling their utilization in the subsequent round of federated training.

Initially, it is necessary to apply whitening to the input data of the WN layer, as illustrated by Formulas (6) and (7).

μ^{t} = \frac{1}{H} \sum_{i = 1}^{H} x_{i}^{t}

(6)

σ^{t} = \sqrt{\frac{1}{H} \sum_{i = 1}^{H} (x_{i}^{t} - μ^{t})^{2}}

(7)

μ^{t}

and

σ^{t}

represent the mean and variance, respectively, of the neuron x_i. Here, H denotes the number of nodes in a hidden layer, and I indicates the total count of MLP layers. The weighted moving average values for

μ^{t}

and

σ^{t}

, as recorded by the WN layer during training, can be directly utilized in the inference process to expedite the model’s reasoning time. By applying transformations to these two parameters, we can obtain data that follow a standard distribution with a mean of 0 and a variance of 1. Consequently, the input formulation for the preprocessed WN layer is presented in Formula (8).

{\overset{\land}{x}}_{i} = \frac{x^{t} - μ^{t}}{\sqrt{{(σ^{t})}^{2} + τ}}

(8)

τ is the minimum value introduced to prevent invalid calculation when the variance is 0.

In the convolutional neural network employed in this study, the relationship between the input and the output of the WN layer is not a straightforward sequential one. For any given node at time t, its input comprises a combination of the hidden layer state at time t − 1, denoted as h^t−1, and the input data at time t − 1, represented as c_t. Therefore, the input representation for the WN layer is formulated as shown in Formula (9).

α^{t} = W_{h h} h^{t - 1} + W_{c h} C^{t} W_{h h} \in R^{4 d_{h} \times d_{n}} W_{c h} \in R^{4 d_{h} \times d}

(9)

W_hh denotes the recursive hidden layer weight, while W_ch signifies the bottom-up input to the hidden layer weight. The state of the hidden layer h_t and the cell input c_t of the WN layer at a specific time step t can be described by Formulas (10) and (11).

c^{t} = σ (f^{t}) ⊙ c^{t - 1} + σ (i^{t}) ⊙ \tanh (g^{t})

(10)

h_{t} = σ (o t) ⊙ \tanh (L N (c_{t}))

(11)

During the execution of a CNN, numerous multiplications are carried out. When data values are less than 1, they progressively converge toward 0 after undergoing multiple multiplications, leading to the vanishing gradient problem. To address this, the incorporation of an activation function becomes imperative to introduce nonlinearity into the network layers, thereby enhancing the learning capability of the CNN. Given that the activation function is denoted as f, the ultimate output of the WN layer is expressed by Formula (12).

W N ({\bar{x}}_{i}) = f (\frac{g}{\sqrt{{(σ^{t})}^{2} + τ}}) ⊙ (a^{t} - μ^{t}) + b)

(12)

g and b, in the process of neural network training, serve as the reconstruction parameters. The normalization applied to the WN layer has an impact on the feature distribution of the subsequent layers. By introducing these reconstruction parameters, it is possible to restore the original feature distribution of the network, thereby ensuring that the model’s expressive capacity is not compromised as a result of normalization.

5. Improved CBAM-iTCN

The proposed CBAM-iTCN prediction algorithm, aiming at the power model prediction of a distributed photovoltaic power station in the region, first analyzes the meteorological factors strongly related to the photovoltaic power and introduces the CBAM net attention mechanism to highlight the key meteorological characteristic information. The improved CBAM-iTCN is used to realize the photovoltaic power prediction. The prediction process is shown in Figure 4.

5.1. Temporal Convolutional Networks

In this paper, the TCN model based on CNNs is used to predict the photovoltaic output, and the parallel pooling structure (iTCN) is improved. By introducing the parallel pooling structure into the residual module of the TCN, the activation function ReLU is replaced by GELU. Figure 5 shows the comparison between the original TCN and the improved TCN. Compared with LSTM network, iTCN is not only more stable but also can capture longer sequence dependencies than LSTM, further improving the accuracy of the model, and it has a simple structure, which can process data and improve the efficiency of model training [32].

The calculation of unfolding a convolutional network is Formula (13). The size of the convolutional network is k. d is the unfolding coefficient, and (s-di) is the convolutional kernel.

F (x_{s}) = (x * f_{d}) (s) = \sum_{i = 0}^{k - 1} f (i) x_{(s - d_{i})}, i \in (0, 1, \dots, k - 1)

(13)

The TCN introduces extended convolution, as shown in Figure 6. Unlike conventional convolution, dilated networks allow for input sampling intervals, which are controlled by d in the graph. Generally, the higher the level, the larger the value of d. Therefore, TCN can capture longer dependencies in temporal data by using fewer network layers.

The increase in network model parameters will lead to the disappearance or explosion of gradient. In order to ensure the stability of the TCN, the residual module is introduced. The residual formula is (1), the input x of the model is weighted and fused into the power F(x) of the model, and finally, the power o of the TCN is obtained.

o = R E L U [x + F (x)]

(14)

Using iTCN to establish the photovoltaic prediction model, iTCN can capture the photovoltaic power data, which depend on the light system with a longer time span, but TCN gives each input feature the same weight. However, in practice, factors such as humidity have little impact on the photovoltaic power. Therefore, by introducing the attention mechanism, the sensitivity of the TCN to important features is enhanced, and the prediction performance is improved.

5.2. Attention Mechanism

The Convolutional Block Attention Module (CBAM) is a channel attention mechanism. By using one-dimensional convolution to dynamically allocate weights to different channels, it can effectively capture local cross-channel interactions and allocate different weights, which can enhance the sensitivity of the TCN to important features. Its structure is shown in Figure 7, the CBAMnet structure diagram.

The global average pool layer gap calculation formula of CBAM is (15), where U is the weighted sequence, T is the time step of the sequence, and the convolution kernel size k is adjusted adaptively; Formula (16), where γ = 2 and b = 1.

G A P (U) = \frac{1}{T} {\sum_{i = 1}^{T} U}_{i, j}

(15)

k = |\frac{\log_{2} (c)}{γ} + \frac{b}{γ}|

(16)

Adding the CBAM attention mechanism to the TCN by focusing on the strongly correlated data features, the model can more effectively extract and use the key information in time series data. This not only optimizes the expression of features and improves the accuracy of prediction but also does not significantly increase the computational cost due to the efficiency of the CBAM.

6. Experimental Results and Analysis

To assess the efficacy of the PFL and DPV power prediction techniques presented in this article, experimental validation is conducted. Specifically, this paper examines and validates the PFL algorithm alongside the CBAM-iTCN prediction model. The experimental setup involves four computers within the same local area network (LAN), which are utilized to emulate DPV power stations (clients). Each of these power stations is equipped with an identical TCN prediction model for consistency.

See Table 1 for the experimental configuration.

6.1. Dateset

The data used in this paper are from photovoltaic power generation companies in North China. The data we selected were collected from 1 January 2018 to 31 December 2019 at an interval of 5 min. The data set includes the meteorological characteristics and photovoltaic power data of photovoltaic power stations. To achieve reliable prediction, the data set is preprocessed first, and the original data are repaired differentially by layering the missing values. Secondly, to reduce the noise influence caused by sensor errors, sudden weather change, and shadow occlusion, the data are processed by multi-modal noise. Finally, the collection time of the processed data set is unified to every hour. And let each photovoltaic power station have different training data sets to better simulate the data deviation in the actual situation.

6.2. Evaluating Indicator

Mean Absolute Error (MAE), Mean Absolute Error Ratio (MAPE), and Root Mean Square Error (RMSE) are selected to evaluate the performance of the prediction model. The Mae formula is (17), the MAPE formula is (18), and the RMSE formula is (19), where

{\tilde{y}}_{i}

refers to the actual value, and

y_{i}

refers to the predicted value in the ith hour, respectively.

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\tilde{y}}_{i}|

(17)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\tilde{y}}_{i}}{y_{i}}|

(18)

R A S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\tilde{y}}_{i})}^{2}}

(19)

6.3. Model Training and Experimental Results

Table 2 presents the correlation between photovoltaic power and various meteorological characteristics. According to the Grey Relational Analysis (GRA), the correlation coefficient for wind speed is less than 0.1, indicating a weak relationship between surface wind speed and photovoltaic power. The primary factors influencing active power are solar radiation and temperature.

By comparing the LSTM, TCN, CBAM-TCN, and CBAM-iTCN algorithms on the test data set, the effectiveness of the CBAM-iTCN prediction algorithm is verified. The experiment was conducted on typical sunny, cloudy, and rainy days.

Meteorological factors will vary greatly with seasons, but in the same season, meteorological factors on different days will also vary greatly. At the same time, in order to prevent the influence of seasonal deviation and verify the adaptability of the CBAM-iTCN model, the experiment considers two seasons, spring and summer, which are set in the same season and have different weather conditions, namely, sunny (18 April and 2 July 2018), cloudy (25 April and 8 July 2018), and rainy (19 April and 13 July 2018). The experimental results are shown in Table 3.

The experimental results show that the average Mae of the cbam-itcn model is reduced by 45.06%, 36.37%, and 29.97%, respectively, under different weather conditions; the average value of the RMSE decreased by 42.16%, 21.9%, and 13.75%, respectively. In order to more intuitively illustrate the performance advantages of the cbam-itcn model under different weather conditions, the predicted and true values of each model under different weather conditions are shown in Figure 8. In addition, the experimental results show that the MAE and RMSE values of the PFL-CBAM-iTCN model and the FL-CBAM-iTCN model are lower than those of the FL-Transformer model in both spring and summer.

To reflect the feasibility of the privatization of the WN layer in multi-task federated learning, during the FL aggregation process, we examine the activation variations in CNN neurons within the local client test set. When the WN layer is designated as the personalized private component in federated learning during model training, it exerts a greater inhibitory effect on neuron activation compared to a scenario where the WN layer is not utilized as a patch. This adjustment brings the neuron distribution closer to its pre-aggregation state. As illustrated in Figure 9, the incorporation of the WN layer between each CNN layer demonstrates a distinct approach compared to conventional FL, regularly constraining the mutation of mean and variance in the process of propagation and ultimately making the network power closer to the pre-aggregated power, as shown in Figure 9.

To verify the superiority of the PFL algorithm optimization strategy used in this paper, assuming that the average UA of the target is 0.85, the training rounds required by different optimization strategies to reach UA are shown in Table 4. The PFL and FedAvg Adam distributed optimization strategies adopted in this paper can be seen in the table.

The aforementioned analysis reveals that by privatizing the WN layer in FL and developing a distinct personalized model for each client, a significant enhancement in UA can be achieved. The efficacy of personalized federated learning (PFL) frameworks becomes particularly pronounced when addressing expanding client populations and heterogeneous data distributions across decentralized nodes. Compared with conventional federated learning (FL) paradigms, PFL implementations incorporating Adam-optimized federated averaging demonstrate enhanced convergence properties through adaptive gradient-based coordination mechanisms. This hybrid approach synergizes individualized model personalization with collaborative parameter aggregation, enabling accelerated convergence to predefined accuracy thresholds while minimizing inter-node communication overhead.

Next, this paper examines the performance of PFL using a real-world photovoltaic data set. All designated clients engage in the fl process, with the communication rounds set to a total of 75, and we set the training data set with large data feature deviation between clients to better simulate the non-IID data under the actual DPV power.

To reflect the advantages of the PFL algorithm, we compare the training results of pfedme, FL, and PFL. In the FedAvg approach, there is no provision for a client-specific private layer, and the training results are shown in Figure 10.

Figure 10 shows that when the Epoch = 1, the method proposed in this paper can achieve a higher UA than other methods, but due to the fact that the training loss during the personalization phase encompasses both the local model and the personalized model, a greater number of training rounds are required for the model to achieve accuracy convergence. Therefore, the performance of FedAvg in terms of training loss is not as good as FedAvg.

When Epoch = 5, the PFL algorithm is still better than other algorithms, and increasing the number of epochs leads to a reduction in the training loss of the personalized model, enabling it to reach convergence more rapidly while preserving a high level of UA.

In order to show the advantages of PFL over the FL method in detail, a comparative analysis of the training outcomes for the personalized model and the global model is presented in Table 5.

Despite achieving model convergence via multi-round collaborative training, substantial parameter divergence persists across participating nodes in federated learning frameworks, primarily attributed to statistical heterogeneity inherent in client-specific data distributions. Nevertheless, the private WN layer in PFL effectively preserves the unique characteristics of each client, allowing for the tailoring of a proprietary personalized model based on the specific attributes of local data. This, in turn, helps to maintain the prediction error within an acceptable range for all parties involved. When compared to the global model, the personalized model exhibits a 9.79% reduction in MAPE and a 7.06 kW decrease in RMSE. For clients with significant deviations in data features, the MAPE reduction can reach up to 18.07%.

7. Conclusions

This paper proposes a distributed photovoltaic power prediction method based on personalized federated learning. The PFL collaborative training prediction model is adopted to solve the problems of poor generalization ability and the low accuracy of prediction models caused by a high non-IID of photovoltaic data in a distributed environment. The PFL with a multi-task strategy improves the adaptability of the model by training a separate personalized model. At the same time, by improving the parallel pool structure of the TCN and introducing the attention mechanism CBAMnet, the photovoltaic power prediction model is established to improve the efficiency and accuracy of capturing multi-scale time series features. Lastly, the practicality and efficacy of the proposed methodology are substantiated through experimental analysis conducted on real DPV power data. Compared with the traditional prediction method and single TCN prediction method, the proposed CBAM-iTCN is 45.06% and 42.16% lower than LSTM, MAE, and RMSE. In the real DPV prediction scenarios, the PFL-iTCN algorithm achieves an overall reduction of 9.79% in the MAPE of the predicted values, and for photovoltaic power plants with large data feature deviation, the MAPE is reduced by 18.07%. Experiments show that the proposed scheme can improve the model accuracy in a DPV environment.

However, this paper does not consider the model’s performance when there is significant noise in highly heterogeneous data or meteorological data and ignores the prediction time and cost while improving the prediction accuracy. Therefore, how to improve the forecasting performance of newly built photovoltaic power plants with few samples is one of the key research directions in the future. In the future, this research can be further expanded in the following aspects:

(1): Improve the personalized federated learning algorithm to reduce computational cost;
(2): Enhance the generalization ability of federated learning to highly heterogeneous environments or noise data;
(3): Introduce continuous learning strategies and update model parameters regularly.

Author Contributions

Conceptualization, F.D.; data curation, W.L.; formal analysis, Y.S.; investigation, Z.L.; methodology, Y.S.; project administration, F.D.; resources, W.L. and Y.S.; software, W.L.; supervision, F.D.; validation, W.L. and Z.L.; writing—original draft, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

Natural Science Foundation of China (52167008, 52377103), Natural Science Foundation of Jiangxi Province (20232BAB204064).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

PFL	Personalized Federated Multi-Task Learning
iTCN	improved time series convolution network
TCN	time series convolution network
DPV	Distributed Photovoltaic
WN	Weight Normalization
FL	Federated Learning
CNN	Convolutional Neural Network
LSTM	long-term and short-term memory network
FedAvg	federal averaging
FML	Federated Meta-Learning
AFL	Adaptive Federated Learning
MAE	Mean Absolute Error
MAPE	Mean Absolute Error Ratio
GRA	Grey Relational Analysis

References

United Nations. Department of Economic and Social Affairs. The Sustainable Development Goals: Report 2022; UN: New York, NY, USA, 2022. [Google Scholar]
Dai, Q.; Huo, X.; Su, D.; Cui, Z. Photovoltaic power prediction based on sky images and tokens-to-token vision transformer. Int. J. Renew. Energy Dev. 2023, 12, 1104–1112. [Google Scholar] [CrossRef]
Kim, G.G.; Choi, J.H.; Park, S.Y.; Bhang, B.G.; Nam, W.J.; Cha, H.L.; Park, N.; Ahn, H.K. Prediction model for PV performance with correlation analysis of environmental variables. IEEE J. Photovolt. 2019, 9, 832–841. [Google Scholar] [CrossRef]
Nelega, R.; Greu, D.I.; Jecan, E.; Rednic, V.; Zamfirescu, C.; Puschita, E.; Turcu, R.V.F. Prediction of Power Generation of a Photovoltaic Power Plant Based on Neural Networks. IEEE Access 2023, 11, 20713–20724. [Google Scholar] [CrossRef]
Ma, D.; Xie, R.; Pan, G.; Zuo, Z.; Chu, L.; Ouyang, J. Photovoltaic Power Output Prediction Based on TabNet for Regional Distributed Photovoltaic Stations Group. Energies 2023, 16, 5649. [Google Scholar] [CrossRef]
Zhou, N.R.; Zhou, Y.; Gong, L.H.; Jiang, M.L. Accurate prediction of photovoltaic power power based on long short-term memory network. IET Optoelectron. 2020, 14, 399–405. [Google Scholar]
Chen, J.; Li, J.; Huang, R.; Yue, K.; Chen, Z.; Li, W. Federated Transfer Learning for Bearing Fault Diagnosis with Discrepancy-Based Weighted Federated Averaging. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Zhang, H.; Zeng, K.; Lin, S. FedUR: Federated Learning Optimization Through Adaptive Centralized Learning Optimizers. IEEE Trans. Signal Process. 2023, 71, 2622–2637. [Google Scholar] [CrossRef]
Abaoud, M.; Almuqrin, M.A.; Khan, M.F. Advancing Federated Learning Through Novel Mechanism for Privacy Preservation in Healthcare Applications. IEEE Access 2023, 11, 83562–83579. [Google Scholar] [CrossRef]
Saraswat, R.; Jhanwar, D.; Gupta, M. Sky Image Classification Based Solar Power Prediction Using CNN. Trait. Signal 2023, 40, 1731–1738. [Google Scholar] [CrossRef]
Jakoplić, A.; Franković, D.; Havelka, J.; Bulat, H. Short-Term Photovoltaic Power Plant Output Forecasting Using Sky Images and Deep Learning. Energies 2023, 16, 5428. [Google Scholar] [CrossRef]
Wu, Z.; Pan, F.; Li, D.; He, H.; Zhang, T.; Yang, S. Prediction of Photovoltaic Power by the Informer Model Based on Convolutional Neural Network. Sustainability 2022, 14, 13022. [Google Scholar] [CrossRef]
Yuqi, F.; Hui, L.I.; Lijuan, L.I. Voltage trajectory prediction of photovoltaic power station based on CNN-GRU. Electr. Power 2022, 55, 163–171. [Google Scholar]
He, Y.; Gao, Q.; Jin, Y.; Liu, F. Short-term photovoltaic power forecasting method based on convolutional neural network. Energy Rep. 2022, 8, 54–62. [Google Scholar] [CrossRef]
Ahn, H.K.; Park, N. Deep RNN-based photovoltaic power short-term forecast using power IoT sensors. Energies 2021, 14, 436. [Google Scholar] [CrossRef]
Succetti, F.; Rosato, A.; Araneo, R.; Panella, M. Deep Neural Networks for Multivariate Prediction of Photovoltaic Power Time Series. IEEE Access 2020, 8, 211490–211505. [Google Scholar] [CrossRef]
Lin, W.; Zhang, B.; Li, H.; Lu, R. Multi-step prediction of photovoltaic power based on two-stage decomposition and BILSTM. Neurocomputing 2022, 504, 56–67. [Google Scholar] [CrossRef]
Wang, S.; Ma, J. A novel GBDT-BiLSTM hybrid model on improving day-ahead photovoltaic prediction. Sci. Rep. 2023, 13, 15113. [Google Scholar] [CrossRef]
Liu, L.; Sun, Q.; Wennersten, R.; Chen, Z. Day-Ahead Forecast of Photovoltaic Power Based on a Novel Stacking Ensemble Method. IEEE Access 2023, 11, 113593–113604. [Google Scholar] [CrossRef]
Xiao, Z.; Huang, X.; Liu, J.; Li, C.; Tai, Y. A novel method based on time series ensemble model for hourly photovoltaic power prediction. Energy 2023, 276, 127542. [Google Scholar] [CrossRef]
Cheng, L.; Zang, H.; Wei, Z.; Ding, T.; Sun, G. Solar Power Prediction Based on Satellite Measurements—A Graphical Learning Method for Tracking Cloud Motion. IEEE Trans. Power Syst. 2021, 37, 2335–2345. [Google Scholar] [CrossRef]
Varma, R.K.; Akbari, M. Simultaneous Fast Frequency Control and Power Oscillation Damping by Utilizing PV Solar System as PV-STATCOM. IEEE Trans. Sustain. Energy 2019, 11, 415–425. [Google Scholar] [CrossRef]
Asiri, E.C.; Chung, C.Y.; Liang, X. Day-Ahead Prediction of Distributed Regional-Scale Photovoltaic Power. IEEE Access 2023, 11, 27303–27316. [Google Scholar] [CrossRef]
Nitisanon, S.; Hoonchareon, N. Solar power forecast with weather classification using self-organized map. In Proceedings of the 2017 IEEE Power & Energy Society General Meeting, Chicago, IL, USA, 16–20 July 2017; pp. 1–5. [Google Scholar]
Pan, C.; Tan, J. Day-Ahead Hourly Forecasting of Solar Generation Based on Cluster Analysis and Ensemble Model. IEEE Access 2019, 7, 112921–112930. [Google Scholar] [CrossRef]
Witt, L.; Heyer, M.; Toyoda, K.; Samek, W.; Li, D. Decentral and Incentivized Federated Learning Frameworks: A Systematic Literature Review. IEEE Internet Things J. 2022, 10, 3642–3663. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Aguera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. Artificial intelligence and statistics. Proc. Mach. Learn. Res. 2017, 54, 1273–1282. [Google Scholar]
Nishio, T.; Yonetani, R. Client selection for federated learning with heterogeneous resources in mobile edge. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7. [Google Scholar]
Deng, F.; Zeng, Z.; Mao, W.; Wei, B.; Li, Z. A Novel Transmission Line Defect Detection Method Based on Adaptive Federated Learning. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar] [CrossRef]
Chen, B.; Chen, T.; Zeng, X.; Zhang, W.; Lu, Q.; Hou, Z.; Zhou, J.; Helal, S. DFML: Dynamic federated meta-learning for rare disease prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 21, 880–889. [Google Scholar]
Zhang, P.; Sun, H.; Situ, J.; Jiang, C.; Xie, D. Federated Transfer Learning for IIoT Devices with Low Computing Power Based on Blockchain and Edge Computing. IEEE Access 2021, 9, 98630–98638. [Google Scholar] [CrossRef]
Chen, Q.; Liu, Y.B.; Ge, M.F.; Liu, J.; Wang, L. A Novel Bayesian-Optimization-Based Adversarial TCN for RUL Prediction of Bearings. IEEE Sens. J. 2022, 22, 20968–20977. [Google Scholar]

Figure 1. Overall Framework.

Figure 2. Flowchart of prediction system.

Figure 3. Personalized Federated Learning.

Figure 4. CBAM-iTCN forecast flowchart.

Figure 5. iTCN model structure.

Figure 6. TCN extended convolution.

Figure 7. CBAMnet structure diagram.

Figure 8. Prediction results under different weather conditions.

Figure 9. Distribution of neuronal activation before and after polymerization.

Figure 10. Comparison of training results between PFL and other algorithms.

Table 1. Experimental platform.

Category	Edition
Operating system	Windows10
CPU (Central server)	Intel Core i9-9900K Processor (Beijing, China)
CPU (Local server)	Intel Core i5-10200H Processor (Beijing, China)
GPU (Central server)	NVIDIA GeForce RTX 3080 Ti Graphics Card (Beijing, China)
GPU (Local server)	NVIDIA GeForce GTX 1660 Graphics Card (Beijing, China)
RAM	32 Gb

Table 2. Correlation analysis.

Meteorological Factors	GRA Correlation
Active power	1.000
Solar radiation	0.918
Temperature	0.56
Wind speed	0.056
Relative humidity	−0.408
Rainfall	−0.056

Table 3. Models training results under different weather conditions.

Models			Spring			Summer
Models			Sunny	Cloudy	Rain	Sunny	Cloudy	Rain
Centralized Learning	TCN	MAE	0.7764	1.0652	1.1115	0.6352	0.8956	1.0325
	TCN	RMAE	0.8756	1.5483	1.6168	0.7356	1.2497	1.4629
	LSTM	MAE	0.3895	0.9546	1.2057	0.3036	0.7936	0.9832
	LSTM	RMAE	0.4928	1.1719	1.4274	0.3982	1.0425	1.2952
	CBAM-TCN	MAE	0.3255	0.8454	1.1456	0.2891	0.7052	0.9826
	CBAM-TCN	RMAE	0.4565	0.9964	1.2564	0.4092	0.8925	1.1748
	Transformer	MAE	0.2951	0.7562	0.9098	0.2581	0.6982	0.8791
	Transformer	RMAE	0.3858	0.9076	1.2048	0.3287	0.8672	1.1349
	CBAM-iTCN	MAE	0.2641	0.6423	0.7158	0.2273	0.6013	0.6891
	CBAM-iTCN	RMAE	0.3562	0.8547	1.1259	0.3159	0.7903	0.9864
Federated Learning	FL-Transformer	MAE	0.2731	0.6891	0.8314	0.2243	0.6142	0.8032
	FL-Transformer	RMAE	0.3418	0.8158	1.0314	0.3014	0.7631	0.9868
	FL-CBAM-iTCN	MAE	0.2519	0.5981	0.6482	0.2107	0.5572	0.6194
	FL-CBAM-iTCN	RMAE	0.3215	0.7139	0.9381	0.2981	0.6739	0.8971
	PFL-CBAM-iTCN	MAE	0.2117	0.5341	0.5982	0.1982	0.4832	0.5531
	PFL-CBAM-iTCN	RMAE	0.2971	0.6538	0.7891	0.2541	0.5936	0.7013

Table 4. The communication times required in different optimization strategies.

Optimization Strategy	FedAvg		FedAdam		FedAvg-Adam
Optimization Strategy	Epoch = 1	Epoch = 5	Epoch = 1	Epoch = 5	Epoch = 1	Epoch = 5
FL	256	171	124	73	22	18
PFL	53	48	31	24	12	9

Table 5. Performance comparison between the global model and the personalized model.

	MAPE%		RASE/KW
	Global Model	Personalized Model	Global Model	Personalized Model
Client 1	7.82	5.79	8.81	2.55
Client 2	19.13	6.08	9.75	2.67
Client 3	11.87	5.84	9.58	2.62
Client 4	24.47	6.41	10.61	2.68
Average	15.82	6.03	9.69	2.63

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, W.; Shen, Y.; Li, Z.; Deng, F. Distributed Photovoltaic Short-Term Power Prediction Based on Personalized Federated Multi-Task Learning. Energies 2025, 18, 1796. https://doi.org/10.3390/en18071796

AMA Style

Luo W, Shen Y, Li Z, Deng F. Distributed Photovoltaic Short-Term Power Prediction Based on Personalized Federated Multi-Task Learning. Energies. 2025; 18(7):1796. https://doi.org/10.3390/en18071796

Chicago/Turabian Style

Luo, Wenxiang, Yang Shen, Zewen Li, and Fangming Deng. 2025. "Distributed Photovoltaic Short-Term Power Prediction Based on Personalized Federated Multi-Task Learning" Energies 18, no. 7: 1796. https://doi.org/10.3390/en18071796

APA Style

Luo, W., Shen, Y., Li, Z., & Deng, F. (2025). Distributed Photovoltaic Short-Term Power Prediction Based on Personalized Federated Multi-Task Learning. Energies, 18(7), 1796. https://doi.org/10.3390/en18071796

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Photovoltaic Short-Term Power Prediction Based on Personalized Federated Multi-Task Learning

Abstract

1. Introduction

2. Related Works

2.1. Photovoltaic Power Prediction Based on Deep Learning

2.2. Federated Learning

3. Overall Framework

4. Personalized Federated Learning

4.1. Federated Learning

4.2. Personalized Federated Learning

5. Improved CBAM-iTCN

5.1. Temporal Convolutional Networks

5.2. Attention Mechanism

6. Experimental Results and Analysis

6.1. Dateset

6.2. Evaluating Indicator

6.3. Model Training and Experimental Results

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI