Next Article in Journal
Effective Target Privacy Protection Against Dynamic-Link-Prediction-Based Attacks via Adversarial Learning
Previous Article in Journal
Generalized Synchronization of a Novel Hyperchaotic System and Application in Secure Communication
Previous Article in Special Issue
Projection-Free Decentralized Federated Learning with Privacy Guarantees in Complex Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven Probabilistic Wind Power Forecasting and Dispatch with Alternating Direction Method of Multipliers over Complex Networks

School of Internet of Things Engineering, Wuxi University, Wuxi 214105, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(1), 112; https://doi.org/10.3390/math14010112 (registering DOI)
Submission received: 9 November 2025 / Revised: 15 December 2025 / Accepted: 26 December 2025 / Published: 28 December 2025
(This article belongs to the Special Issue Advanced Machine Learning Research in Complex System)

Abstract

This paper proposes a privacy-preserving framework that couples probabilistic wind power forecasting with decentralized anomaly detection in complex power networks. We first design an adaptive federated learning (FL) scheme to produce probabilistic forecasts for multiple geographically distributed wind farms while keeping their raw data local. In this scheme, an artificial neural network with quantile regression is trained collaboratively across sites to provide calibrated prediction intervals for wind power outputs. These forecasts are then embedded into an alternating direction method of multipliers (ADMM)-based load-side dispatch and anomaly detection model for decentralized power systems with plug-and-play industrial users. Each monitoring node uses local measurements and neighbor communication to solve a distributed economic dispatch problem, detect abnormal load behaviors, and maintain network consistency without a central coordinator. Experiments on the GEFCom 2014 wind power dataset show that the proposed FL-based probabilistic forecasting method outperforms persistence, local training, and standard FL in RMSE and MAE across multiple horizons. Simulations on IEEE 14-bus and 30-bus systems further verify fast convergence, accurate anomaly localization, and robust operation, indicating the effectiveness of the integrated forecasting–dispatch framework for smart industrial grids with high wind penetration.

1. Introduction

Wind energy, as a vital renewable source, plays an increasingly significant role in decentralized power systems. However, its inherent intermittency and volatility pose considerable challenges to grid operation, including power quality degradation, reduced system reliability, and increased spinning reserve requirements. Therefore, accurate Wind Power Forecasting (WPF) is critical. On the one hand, wind energy can provide clean and high-quality energy to load centers; on the other hand, a high penetration of wind power increases the regional grid’s dependence on external high-power sources [1].
The decentralized power system is a complex network structure that connects distributed production units and provides electrical power services to various industrial terminals. Unlike traditional large power grids [2], the distributed production and consumption units in industrial supply chains have plug-and-play characteristics, but this also leads to more complex dynamic properties, making their anomaly detection more complicated. With the proliferation of smart power electronic devices, industrial terminals in the supply chain are becoming more diverse and sophisticated, and any power consumption anomalies can have significant impacts. Reference [3] proposed a genetic algorithm, reference [4] achieved a hybrid optimization solution by combining evolutionary algorithms and sequential quadratic programming, and references [5,6] utilized particle swarm optimization to design algorithms for finding the optimal anomaly detection solution. Traditional WPF primarily focuses on providing a single-point estimate of the wind power output for a specific future timestamp, known as Deterministic WPF. This approach typically relies on advanced time series models or neural networks (such as Recurrent Neural Networks (RNNs) [7], Long Short-Term Memory (LSTM) [8], or Convolutional Neural Networks (CNNs)) [9] to capture spatiotemporal dependencies and enhance prediction accuracy. However, due to wind power’s strong dependence on meteorological factors (e.g., wind speed, direction, temperature) and the chaotic nature of weather systems, deterministic forecasting fundamentally fails to capture the inherent stochastic uncertainty in wind power generation. A single-value prediction severely limits its utility in risk-sensitive power system decisions, such as grid security analysis and reserve capacity allocation. To overcome the limitations of deterministic forecasting, Probabilistic WPF has emerged [10]. Instead of seeking a single optimum value, it provides a prediction of the output along with an associated estimation of uncertainty [11]. This uncertainty information is typically presented in the form of a set of quantiles, Prediction Intervals (PIs), or a complete Probability Density Function (PDF). Probabilistic forecasts, which explicitly include risk information, are indispensable for generation scheduling optimization, electricity market trading risk management, and grid planning. For instance, grid operators utilize PIs to set more reasonable reserve capacities, thereby reducing operating costs and ensuring supply reliability.
Furthermore, with the increasing scale of distributed wind farms and the rising concern for data privacy, Federated Learning (FL) has emerged as a novel paradigm to enhance WPF [12]. Unlike traditional centralized training, where data from all wind farms must be aggregated to a central server, FL enables multiple local wind farms to collaboratively train a shared prediction model [13,14,15]. Each local farm trains the model using its private data and only shares the model updates (gradients) with the central server. This approach effectively enhances forecasting accuracy by leveraging the diverse data distribution across the fleet, while rigorously ensuring that sensitive local operational data remains on the wind farm’s premises, addressing critical privacy and communication bandwidth concerns inherent in large-scale decentralized systems [16]. Traditional anomaly detection optimization algorithms still have many drawbacks. The algorithms in references [3,4,5,6,17,18] all require a control center for centralized anomaly identification, which is very unsuitable for the capacity of today’s large industrial supply chains, especially since some areas with distributed industrial users are remote and difficult to monitor centrally. Moreover, the industrial supply chain can be viewed as a complex network system, where production and consumption units can be seen as nodes in this complex network [19]. Therefore, using a distributed consensus optimization strategy that relies solely on local information transmission is a good approach to solving the anomaly detection problem. In references [20,21,22,23], controllers based on multi-agent consensus design are discussed, where neighboring agents can exchange their information to achieve optimal anomaly identification consensus.
Multi-agent anomaly detection problems can be divided into continuous-time and discrete-time algorithms, both of which have been extensively studied [24,25,26,27]. In fact, consistency has always been one of the foundations of multi-agent systems, and with the application of intelligent monitoring nodes, these algorithms have also been gradually applied to industrial power anomaly detection [28,29,30]. Although these studies have promoted the development of anomaly detection in industrial supply chains, some drawbacks still exist. Some anomaly detection problems with capacity constraints cannot be well addressed, as shown in the literature [31,32], which can only solve problems with power consumption balance equality constraints. On this basis, the literature [33] theoretically solved for the first time the relationship between state consistency and optimal anomaly detection solutions, establishing a fundamental normative framework for industrial power anomaly monitoring.
Beyond smart grids, there has been growing interest in the resilience and security of other large-scale networked infrastructures, such as complex traffic systems and industrial Internet of Things (IIoT) platforms. For example, resilience recovery methods based on trend forecasting have been proposed for complex traffic network security [34], and interpretable deep learning frameworks have been developed for intrusion detection in industrial IoT environments [35]. These studies share with our work the common goal of ensuring secure and reliable operation of cyber–physical systems under uncertainty and potential attacks. From a methodological perspective, our adaptive FL framework, which combines Adaptive Clustering (AC) and Elastic Weight Consolidation (EWC), is also conceptually related to recent advances in adaptive sparse memory networks for efficient and robust sequence and video modeling [36]. Incorporating similar sparse and dynamically allocated memory mechanisms into federated probabilistic forecasting models is an interesting direction for future improvements in both computational efficiency and robustness.
This paper conducts an in-depth analysis of anomaly detection and monitoring issues that may arise in upstream and downstream industrial power systems. When abnormal power consumption patterns occur, utilizing the characteristics of intelligent monitoring nodes and based on anomaly detection theory, power anomaly identification is directly performed from the load side, allowing the monitoring system to quickly and effectively identify abnormal states without interference. This short-term distributed anomaly detection plan has minimal impact on industrial users, and monitoring equipment can operate at low power consumption. A rigorous mathematical proof is also provided, ultimately demonstrating that the anomaly detection parameters can converge to an ideal value, minimizing the economic losses caused by undetected anomalies to the industrial power system. From a modelling perspective, our probabilistic forecasting framework is fundamentally different from recent information-/physics-embedded approaches such as the physics-embedding multi-response regressor for time-variant system reliability assessment in [37]. The model in [37] is designed for complex engineering systems whose behaviour can be described by detailed physics-based models and multiple time-variant performance functions. By transforming time-variant responses into time-invariant extreme-value representations and embedding physics and mathematical knowledge into a multi-response surrogate, it achieves high accuracy and sample efficiency for reliability assessment under rich physics information. While the physics-embedded regressor in [37] is highly suitable when detailed physical models and multi-response reliability indicators are available, our federated probabilistic model is better aligned with the requirements of short-term wind power forecasting and distributed grid operation considered in this paper, where scalability across many wind farms, communication efficiency, and compliance with data-governance constraints are of primary concern.

2. Preliminaries and Network Model

2.1. LSTM Networks

In this work, we adopt a standard LSTM-based time-series forecasting pipeline for each wind farm. The raw wind power measurements are first aligned in time, cleaned, and normalized. Then, for each wind farm, we construct input–output pairs by sliding a fixed-length window over the historical time series: a sequence of past measurements is used as input, and the subsequent values over a given prediction horizon are used as targets. The LSTM network processes the input sequence and outputs the forecasted wind power (or quantiles thereof), and the model parameters are trained by minimizing an appropriate loss function on the training set. This LSTM-based forecaster serves as the local prediction model within the federated learning framework described in Section 3.
The univariate time series is reframed into input-output (Input-Output, S t ) sequence pairs suitable for supervised learning. This is achieved via a fixed-length sliding window mechanism, which defines the look-back window T in (input sequence length) and the forecast horizon T out (prediction sequence length):
S t = ( X t , Y t )
where the input feature sequence X t R T in × d comprises d dimensional features over the preceding T in time steps, and the target sequence Y t R T out × k contains the k-dimensional prediction targets for the subsequent T out steps.
Model optimization is realized by minimizing a predefined Loss Function, such as the Mean Squared Error (MSE) for deterministic forecasting or Pinball Loss for probabilistic forecasting, utilizing backpropagation through time (BPTT). Upon obtaining the model’s projected output, an inverse normalization transformation is executed to restore the predicted values to their original physical units. Following this transformation, the model’s performance is rigorously assessed using standard metrics like the Root Mean Square Error (RMSE) or Mean Absolute Error (MAE).
The fundamental innovation of the LSTM unit lies in its dedicated Cell State ( C t ), which functions as a linear pathway capable of carrying information across extended time steps without degradation. The flow of information along this state is meticulously controlled by three distinct gate structures. Each gate operates as a soft switch, implemented via the Sigmoid function ( σ ) and a learned weight matrix, producing a signal in the range of [ 0 , 1 ] .
At each time step t, the computation within the LSTM unit is a composite function of the previous hidden state h t 1 , the previous cell state C t 1 , and the current input x t . Forget Gate (Regulator of Past Information): The forget gate computes its activation based on the concatenated previous hidden state and current input.
f t = σ ( W f · [ h t 1 , x t ] + b f )
Input Gate and Candidate Memory (Regulator of New Information): The input gate i t scales the relevance of the new candidate memory C ˜ t , which proposes the new information to be stored in the cell state.
i t = σ ( W i · [ h t 1 , x t ] + b i )
C ˜ t = tanh ( W C · [ h t 1 , x t ] + b C )
Cell State Update (The Linear Highway): The cell state is updated by selectively forgetting old information (scaled by f t ) and adding new filtered information (scaled by i t ). The additive nature of this update is the mechanism that ensures an unobstructed path for the gradient, thereby mitigating the vanishing gradient issue.
C t = f t C t 1 + i t C ˜ t ( Cell State Update )
Output Gate and Hidden State (Unit Output): The output gate o t controls the flow of information from the cell state C t to the hidden state h t , which serves as the unit’s output and input for the subsequent time step.
o t = σ ( W o · [ h t 1 , x t ] + b o )
h t = o t tanh ( C t ) ( Hidden State )
where W * and b * are the learned weight matrices and bias vectors for each gate and candidate memory, respectively. σ ( · ) donotes the Sigmoid activation function, σ ( z ) = 1 / ( 1 + e z ) and tanh ( · ) represents the Hyperbolic Tangent activation function, normalizing values to [ 1 , 1 ] . ⊙ is the Hadamard product (element-wise multiplication), enabling the gating control and [ h t 1 , x t ] denotes vector concatenation.
This precise algebraic construction empowers the LSTM to effectively manage information flow over long sequences, granting it superior stability and predictive accuracy compared to conventional RNN architectures in time series modeling.

2.2. Optimization Theory of ADMM

The standard form of the ADMM algorithm is used to solve the following problem:
min x 1 , x 2 f x 1 + f x 2 s . t . A 1 x 1 + A 2 x 2 = d
In the expression, the variables x 1 R n , x 2 R n , the constant matrices A 1 R p × n , A 2 R p × m , and the constant d R n must all be in full rank status to ensure that the problem has a feasible solution. For the problem (8), using the augmented Lagrange multiplier method, we can obtain:
L ρ x 1 , x 2 , λ = f x 1 + g x 2 + z T A 1 x 1 + A 2 x 2 d + ( ρ / 2 ) A 1 x 1 + A 2 x 2 d 2 2
In the formula, z is the Lagrange multiplier corresponding to the equality constraint A 1 x 1 + A 2 x 2 = d , while the penalty factor ρ is an artificially set positive scalar.
For fixed x 2 k and λ k , the update of x 1 is obtained by minimizing the augmented Lagrangian with respect to x 1 :
x 1 k + 1 = arg min x 1 L ρ ( x 1 , x 2 k , λ k ) = arg min x 1 f ( x 1 ) + ( λ k ) ( A 1 x 1 + A 2 x 2 k d ) + ρ 2 A 1 x 1 + A 2 x 2 k d 2 2 .
Dropping the terms that do not depend on x 1 , the subproblem reduces to a convex optimization problem in x 1 only. The update x 1 k + 1 is thus given by the first line of the ADMM iteration. Similarly, for fixed x 1 k + 1 and λ k , the update of x 2 is
x 2 k + 1 = arg min x 2 L ρ ( x 1 k + 1 , x 2 , λ k ) ,
which yields the second line of the ADMM iteration. Finally, the dual variable is updated by a gradient ascent step on the dual function,
λ k + 1 = λ k + ρ A 1 x 1 k + 1 + A 2 x 2 k + 1 d ,
which enforces the equality constraint A 1 x 1 + A 2 x 2 = d over iterations.

2.3. Wind Power Dispatch Model

Considering the wind power system model of a total of n , n = 1 , 2 , 3 loads, all of which are controllable loads. When a sudden event occurs, the power of the generation unit suddenly decreases by p , and at the same time, the overall frequency will also drop. The rated frequency of the grid system is defined as ω * . It is assumed that each load monitors its frequency variation and adjusts its load demand amount during each time interval t . For convenience, let t = 0 , 1 , 2 represent the time interval t = Δ t , 2 Δ t . When encountering a sudden event, the frequency of the entire system will experience a relative drift from ω * , where the overall global frequency deviation of the smart grid is defined in an ideal state. Next, we define the local frequency deviation value ω * ¯ measured by the sensor of the agent at the i t h load. The difference from the global frequency deviation ω lies in inconsistencies caused by measurement noise v i . Due to the differences in load types, the degree of noise interference can also vary. P g represents the power reduction at the generation end caused by an incident, and to maintain supply-demand balance, load i needs to reduce its own power consumption P l i accordingly to compensate for the loss at the generation end, with specific scheduling strategies to be analyzed in detail in subsequent algorithms discussed. The supply-demand imbalance error at time t can be expressed as:
u ( t ) Δ P g i = 1 n Δ P l i
In order to maintain generality, under the condition that it conforms to the actual wind power system model and that the algorithm has a feasible solution, we assume that the following conditions are met during this wind power system model process:
Δ P g > 0 Δ P l i > 0
In order to simplify the notation, we use p i to represent Δ P l i in the following article. In the following, we first need to forecast the power Δ P g generated by wind.

3. Algorithm Design

3.1. Probabilistic Wind Power with FL

To generate probabilistic wind power forecasts in a privacy-preserving manner, we adopt an adaptive federated learning framework that couples a local quantile regression model with two mechanisms designed for Non-IID data and knowledge retention. At each communication round, every wind farm trains a local artificial neural network (ANN) with quantile regression loss on its own data and sends model updates to a server or cluster center without sharing raw measurements. The server aggregates these updates within clusters of clients that exhibit similar learning behaviour, as determined by an Adaptive Clustering (AC) strategy, and maintains a separate global model for each cluster. After global aggregation, each client performs a personalized fine-tuning step regularized by Elastic Weight Consolidation (EWC), which discourages large deviations of parameters that are important for the cluster-level model. This combination of FL, AC, and EWC yields cluster-specific global models and personalized local models that jointly provide calibrated probabilistic forecasts while respecting data locality.
Probabilistic WPF methods are broadly categorized into two main approaches, i.e., parametric and non-parametric. Parametric methods assume that the wind power error follows a predefined probability distribution and then predict the parameters of that distribution. Non-parametric methods learn the uncertainty directly from data without assuming a specific distribution form, such as Kernel Density Estimation (KDE) and Quantile Regression (QR). Within deep learning models, Quantile Regression is widely adopted due to its directness, flexibility, and minimal assumptions regarding the data distribution.
In Quantile Regression, the model is trained by minimizing the Pinball Loss (also known as Quantile Loss or Check Loss). This minimization ensures that the predicted quantile y ^ q m , t precisely splits the actual value y t into proportions q m and 1 q m . The definition of the Pinball Loss l q m clearly demonstrates the asymmetric penalty mechanism for overestimation and underestimation:
l q m ( y ^ q m , t , y t ) = ( 1 q m ) ( y ^ q m , t y t ) if y ^ q m , t y t q m ( y ^ q m , t y t ) if y ^ q m , t < y t
Here, q m is the target quantile, typically ranging from 0.1 , 0.2 , , 0.9 to cover various confidence levels required for the prediction interval. When the prediction y ^ q m , t is less than the actual value y t , the loss is weighted by q m ; conversely, it is weighted by 1 q m . This asymmetric penalty ensures that the model learns the y ^ q m , t value that is indeed the q m -th quantile point of the data distribution. The overall training objective is to minimize the average loss across all quantiles, defined as the local loss L local . For a dataset with m quantiles and T i samples, the total average loss L local is:
L local ( y ^ i , y i ) = 1 m T i t = 0 T i 1 d = 0 m 1 l q d ( y ^ q d , t , y t )
This comprehensive loss function guides the neural network training to output predictions that simultaneously satisfy the requirements of multiple quantiles.
To address the Non-IID nature of local datasets across wind farms, we introduce an adaptive clustering procedure that groups clients with similar preference scores and jointly trains a cluster-specific global model. Within each cluster, model parameters are aggregated in a federated manner to capture shared patterns among the corresponding wind farms. Furthermore, elastic weight consolidation (EWC) is employed during local personalization to preserve important global knowledge: parameters that are estimated to be critical for the global model are penalized if they deviate too much from their global values, which mitigates catastrophic forgetting while still allowing client-specific adaptation.
The purpose of the AC strategy is to automatically identify and separate clients with similar model learning characteristics and data distributions without exposing raw data. These clients are assigned to distinct clusters C j , and a customized global model M j g is trained for each cluster. The detailed steps are as follows.
Warm-up Training: The server initializes k models M j g and distributes them to all clients. Each client trains these k models locally for a small number of epochs (e.g., P w rounds) using the local loss L local . This stage is designed to help clients move past initialization randomness and establish stable, distinguishable preferences for the k models.
Preference Score Update: During the clustering phase, clients no longer update all k models. Instead, they use the performance e i , j ( p ) (Pinball Loss) of the k models on their local validation set to update the preference score s i , j ( p ) . The model yielding the best performance is deemed the client’s favorite. The decay rate β [ 0 , 1 ] controls the influence of historical preferences.
s i , j ( p ) = β s i , j ( p 1 ) + 1 if e i , j ( p ) = e ^ i ( p ) s i , j ( p ) = s i , j ( p 1 ) if e i , j ( p ) > e ^ i ( p )
Intra-Cluster Aggregation: The server receives the parameters M ˜ i l of the client’s favorite model. Based on the model index k ˜ i , the client is assigned to the corresponding cluster C j . Aggregation is performed only within the cluster to prevent gradient conflicts arising from severe data heterogeneity between different clusters. The clustering process terminates when the client preferences k ˜ = { k ˜ i } remain unchanged for E A consecutive epochs, ensuring stability of the clustering result.
Elastic Weight Consolidation (EWC) first identifies which global parameters w h g are critical to the overall global model performance. It uses a diagonal approximation of the Fisher Information Matrix to calculate the importance I i , h for each model parameter w i , h l . This value is based on the average of the squared first-order derivatives (gradients) of the local loss function L local with respect to the parameter on the local dataset D i :
I i , h 1 | D D | x D i L local w i , h l 2
A higher I i , h value indicates that the parameter is more critical for the model’s core feature extraction capabilities, and thus should be more strictly protected during fine-tuning. EWC Loss Function: During personalized fine-tuning, the EWC Loss function L EWC is used. It supplements the original local loss L local with a regularization term. This term penalizes the parameter w i , h l for deviating too far from the initial global model parameter w h g , scaled by its importance I i , h :
L EWC = L local + γ h = 0 | W l | 1 I i , h ( w i , h l w h g ) 2
The term γ is the consolidation weight, which balances local adaptation against global knowledge preservation. By minimizing L EWC , the model adapts to local data while retaining critical global knowledge, leading to robust personalization.
When using larger local learning rates or increasing the number of fine-tuning epochs, the performance of the FL + AC + EWC model remains stable and the power system obtain the prediction wind power Δ P g .
Furthermore, to objectively verify the rationality of the AC-based grouping of wind farms, we further evaluate the resulting clusters using two standard internal cluster validity indices, namely the silhouette coefficient and the Davies–Bouldin index. The silhouette coefficient ranges from 1 to 1 and measures how similar a client is to its own cluster compared with other clusters, with larger values indicating more compact and better-separated clusters. The Davies–Bouldin index, in contrast, quantifies the average similarity between each cluster and its most similar counterpart, where smaller values are preferred. Using the GEFCom 2014 wind farm data, we compute these indices for the clusters produced by the AC procedure, and the results are summarized in Table 1. The positive silhouette coefficients and Davies–Bouldin indices well below 1.0 indicate that the wind farms assigned to the same cluster indeed share similar data distributions and learning characteristics, which confirms the rationality of the learned grouping.
Let P ^ w ( α ) ( t ) denote the predicted α -quantile of the aggregated wind power at time t, while the uncertainty of the forecast is summarized by a one-sided prediction interval
R ( t ) = P ^ w ( 0.5 ) ( t ) P ^ w ( α ̲ ) ( t ) , α ̲ ( 0 , 0.5 ) ,
which can be interpreted as a wind-related reserve requirement to hedge against underproduction relative to the median. The net power deviation that must be compensated by controllable loads is then modelled as
Δ P g ( t ) = P w nom ( t ) P ¯ w ( t ) + R ( t ) ,
where P w nom ( t ) is the nominal (day-ahead) schedule. In this way, both the level (via P ¯ w ( t ) ) and the uncertainty (via R ( t ) ) of the probabilistic forecast directly determine the right-hand side of the load-side balance constraint in the ADMM-based dispatch problem.
The probabilistic forecasts produced by the adaptive FL model serve as inputs to the subsequent ADMM-based load-side dispatch and anomaly detection module. Specifically, for each forecasting horizon step, the FL framework outputs a set of quantile predictions for the aggregated wind power, which can be summarized by a representative value (e.g., the median) and an associated prediction interval. In the dispatch formulation, the representative forecast is used as the scheduled wind generation level, while the width of the prediction interval determines a reserve margin that is allocated on the load side to hedge against forecast errors. In this way, the quality and calibration of the probabilistic forecasts directly influence the amount of corrective action required in real time: more accurate and better calibrated forecasts lead to smaller imbalances and fewer adjustments, whereas biased or overly dispersed forecasts result in increased reserve deployment and more frequent anomaly signals in the distributed optimization layer.

3.2. ADMM-Based Load-Side Distributed Management Algorithm

In this article, each participant in the power system is simulated as an agent for operation. It is assumed that at time t, there is a power drop at the generation side of the power system, and this event can be detected quickly. Correspondingly, a single load i on the load side will also decrease by a power value of p i ( t ) , and this value is constrained by the nature of the load itself, that is
0 < p i ( t ) < p i m a x
In the formula, p i m a x represents the maximum power variation of load i. The decrease in power of each load can lead to economic issues, making it a point worth considering for both users and grid companies on how to minimize the overall loss of such sudden events.
The purpose of the load scheduling problem in this article is to plan the power consumption of each load agent so that the entire power grid system can operate normally in the most efficient state, minimizing the overall loss during this failure. Therefore, the scheduling management problem of the smart grid with wind power can be expressed in the following form:
min p f ( p ) = i = 1 n f i p i subject to 1 n T p = Δ P g , 0 p p max
In the formula, f i p i represents the negative benefit of load i, and the parameter variable p = p 1 , p 2 , , p n T R n . p m a x = p 1 m a x , p 2 m a x , , p n m a x T R n .
In the unconstrained case, the load-side dispatch problem only enforces the global power-balance equality i = 1 n p i = Δ P g , and the ADMM iterates are effectively free to assign any adjustment p i to each bus as long as the total change matches the wind power deviation Δ P g . This corresponds to an idealized scenario in which all controllable loads can, in principle, increase or decrease their consumption without individual limits. In practice, however, each industrial load has physical and operational bounds, i.e., the power adjustment p i cannot exceed the available flexibility at bus i, and it must remain within a safe operating range. We capture these limits by imposing box constraints
p i min p i p i max , i = 1 , , n ,
where p i min represents the minimum admissible adjustment (e.g., to avoid shutting down critical equipment or violating comfort constraints) and p i max represents the maximum admissible adjustment (e.g., due to device ratings, process requirements, or contractual limits). Mathematically, these bounds appear as inequality constraints in the ADMM subproblems and are implemented via projections onto the interval [ p i min , p i max ] in the p-update. When the constraints are inactive, the ADMM updates coincide with those of the unconstrained case; when some loads hit their bounds, subsequent iterations must redistribute the remaining imbalance to the still-flexible loads, which affects the shape and speed of convergence.
In order to use the ADMM algorithm in Section 2, we need to make some changes to the formulation of the problem. Define two convex sets Π 1 and Π 2 :
Π 1 = p R n 1 n T p = Δ P g Π 2 = p R n 0 p p m a x
Next, we will introduce two functions g 1 and g 2 to represent the indicator functions of the sets Π 1 and Π 2 , respectively:
g 1 ( p ) = 0 , if p Π 1 + , otherwise g 2 ( q ) = 0 , if q Π 2 + , otherwise
Therefore, the economic dispatch problem (23) can be directly transformed into the following form:
min f ( p ) + g 1 ( p ) + g 2 ( q ) subject to p q = 0
Then, the Lagrangian augmented function for problem (26) can be written as:
L ρ ( p , q , λ ) = f ( p ) + g 1 ( p ) + g 2 ( q ) + λ T ( p q ) + ( ρ / 2 ) p q 2 2
By applying the ADMM algorithm, a centralized solution to problem (26) can be obtained:
p k + 1 = arg min p L ρ p , q k , λ k q k + 1 = arg min q L ρ p k + 1 , q , λ k λ k + 1 = λ k + ρ p k + 1 q k + 1
Since the information for each load needs to be centrally calculated for scheduling and allocation, the iterations from equations in (28) are centralized. However, in practical applications, centralized methods have many limitations. The entire system requires a centralized controller that connects all loads to calculate scheduling problems with wind power system model. Once this controller is attacked or damaged, the scheduling problems cannot be resolved adequately, leading to all loads not operating under normal conditions, which directly results in significant losses. Additionally, due to the presence of the centralized controller, the entire load topology is known and fixed in advance. Therefore, when a sudden event causes a load to disconnect, the overall system topology changes, and the controller cannot obtain global load information, preventing optimal scheduling of the overall system.
Using Equations (26) and (27), convert problem (23) into the following equivalent form:
min F ( p ) = i = 1 n f i p i subject to 1 n T p = Δ P g
Therefore, the gradient value of F ( p ) can be expressed as:
F i p i p i = f i p i + ρ p i q i k + λ i k / ρ
The cost of the controllable rotating device can be simulated as the following quadratic function:
C i P i = a i P i 2 + b i P i + c i
In the equation, a i , b i and c i are the cost function coefficients related to load i, and P i is the power consumption of load i. Similarly, by setting α i ¯ = b i / 2 a i , β i ¯ = 1 / 2 a i and γ ¯ i = c i b i 2 / 4 a i , we can transform Equation (31) into another form:
C i P i = P i α i ¯ 2 2 β i ¯ + γ i ¯
There are also some positive and negative properties of the coefficients: α i ¯ 0 , β i ¯ > 0 and γ ¯ i 0 . The cost function i (31) and (32) are equivalent, and here the latter is used to simplify the subsequent analysis. Due to the similar structure of the supply-demand balance equality constraint in (23) and the controllable intelligent load and rotating equipment, the cost function of this type of load is often modeled in the same form as (31) and (32). Therefore, the negative benefits of the demand-side load can be obtained:
f i p i = p i α i 2 2 β i + γ i
In the equation, α i , β i and γ i have exactly the same properties as the aforementioned α i ¯ , β i ¯ and γ i ¯ . Combining Equation (31) and the above equation, we can obtain:
ϱ i = p i α i β i + ρ p i q i k + λ i k / ρ = ρ β i + 1 p i + β i λ i k ρ q i k α i β i
Since the power system network contains thousands of loads that belong to different users, using a distributed form inevitably involves exchanging information between neighboring users. Based on this consensus algorithm, the economic scheduling problem under supply and demand balance constraints can be quickly solved.

4. Experiments

4.1. Experimental Setup and Dataset of WPF

The proposed adaptive federated framework, which integrates AC and EWC for probabilistic WPF, was rigorously evaluated on the Global Energy Forecasting Competition 2014 ( GEFCom 2014 ) WPF dataset. This dataset comprises historical records from seven distinct wind farms ( WF 0 to WF 6 ), each serving as an independent client in the FL (FL) framework, thus inherently possessing Non−IID data characteristics due to varying geographical and operational factors.
In the experiments on the GEFCom 2014 wind power dataset, we adopt a unified preprocessing pipeline for all wind farms. First, the raw time series are aligned to an hourly resolution, and any duplicated timestamps are removed by averaging the corresponding measurements. Missing values shorter than three consecutive hours are imputed by linear interpolation between the nearest available observations; for longer gaps, the entire day containing the gap is discarded from the dataset to avoid introducing large artificial jumps. To mitigate the influence of spurious spikes, we perform outlier detection using a three-sigma rule on a 24-h rolling window: any point whose deviation from the local mean exceeds three standard deviations is treated as an outlier and replaced by the local median within that window. The cleaned power series are then clipped to the interval [ 0 , Cap i ] , where Cap i denotes the rated capacity of wind farm W F i , to ensure physical plausibility. Finally, for each wind farm, all input features and target outputs are normalized to [ 0 , 1 ] using min–max scaling computed on the training set only; the same scaling factors are applied to the validation and test sets. The resulting normalized sequences are then used to construct sliding windows of historical inputs and future prediction horizons for training the quantile regression model in the proposed FL framework.

Performance Comparison of Forecasting Methods

The predictive accuracy of the proposed methods is benchmarked against three essential baselines: Persistence, a simple time-series baseline; Local, a model trained exclusively on individual client data; and FL, the standard FL model (FedAvg). We introduce the FL + AR (Adaptive Federated framework without EWC personalization) to isolate the benefit of the core federated and clustering strategy.
As evidenced by Table 2, the following key observations can be drawn with respect to the forecast performance as follows. It is worth noting that Table 2 also reveals non-negligible variance in forecasting performance across different wind farms. In general, wind farms that exhibit more volatile and rapidly changing generation profiles tend to have slightly larger RMSE and MAE values than those with smoother wind patterns, regardless of the forecasting method. Nevertheless, the proposed FL + AC + EWC framework not only reduces the average error but also narrows the gap between the best- and worst-performing wind farms when compared with the persistence and purely local baselines. This suggests that sharing model parameters through federated learning helps less informative or more challenging sites benefit from data collected at other farms, thereby improving the overall uniformity and robustness of forecasting performance across the entire fleet.
To assess the quality of the full forecast distribution rather than only the median, we additionally compute the Continuous Ranked Probability Score (CRPS) for all methods. CRPS is defined as
CRPS ( F , y ) = + F ( z ) 1 { z y } 2 d z ,
where F is the predictive cumulative distribution function and y is the realized outcome; lower values indicate better probabilistic calibration and sharpness. Table 3 summarizes the CRPS results averaged over all horizons and wind farms.
As shown in Table 3, the proposed FL + AC + EWC model achieves the lowest CRPS among all methods, indicating the best overall probabilistic performance. Compared with the persistence benchmark, CRPS is reduced by about 30%, and it improves upon purely local LSTM models and standard FedAvg by roughly 20% and 12%, respectively. FL + AC + EWC also yields around 7% lower CRPS than FL + AC without EWC, which confirms that EWC not only improves point metrics (RMSE/MAE) but also enhances the calibration and sharpness of the predictive distributions.
To further examine the calibration of the probabilistic forecasts, we evaluate the empirical coverage of central prediction intervals at different nominal levels. For each method and each interval ( 1 2 α ) , we compute the fraction of realizations that fall inside the corresponding ( α , 1 α ) quantile band and report the average coverage across all wind farms and horizons. Table 4 compares the nominal and empirical coverage for 50%, 80%, and 90% intervals.
Table 4 shows that the proposed FL + AC + EWC method produces empirical coverages that are closest to the nominal levels across all three intervals, especially for high-coverage bands (80% and 90%). In contrast, the persistence and local LSTM baselines tend to be under-confident (empirical coverage noticeably below the nominal targets). This indicates that the proposed method not only lowers point forecast errors but also yields better calibrated prediction intervals, which is important for downstream dispatch and reserve allocation.
Figure 1 provides a visual summary of the mean RMSE results across all tested models and prediction horizons, directly correlating with the findings in Table 1. The primary trend observed is the consistent performance degradation as the forecasting lead time increases from 2 h to 24 h , a universal challenge stemming from increasing uncertainty in long-term predictions. Crucially, the figure establishes a clear performance hierarchy: Persistence > Local > FL > FL + AR . The standard FL model consistently outperforms the Local approach, confirming the value of collaborative training across isolated data silos.
Figure 2 and Figure 3 provide a direct visual comparison of the probabilistic forecasting quality between the proposed FL + AC + EWC model and the Local model for the short-term 2 h horizon. The two key metrics assessed are Reliability (measured by PI coverage) and Sharpness (measured by PI width). While both models generally track the actual power output and its inherent volatility, the FL + AC + EWC framework demonstrates clear superiority. The proposed model’s PIs are visibly narrower across all confidence levels, particularly during periods of low generation, indicating superior Sharpness and providing a more informative forecast for grid operators.
Figure 4 and Figure 5 llustrate the probabilistic forecasting performance for the demanding 24 h horizon. Given the significantly longer lead time, the uncertainty inherently increases, as reflected by the generally wider Prediction Intervals (PIs) compared to the 2 h forecasts. Despite this challenge, the proposed FL + AC + EWC model (Figure 4) maintains a superior quality of forecast compared to the Local model. Notably, the Local model exhibits PIs that are substantially wider and more conservatively spread out, failing to provide the concise uncertainty quantification necessary for practical long-term planning.
Furthermore, the FL + AC + EWC model demonstrates better tracking of the actual peaks and troughs, especially on days with higher volatility (e.g., 8–9 June). The Local model, constrained by limited data and local non-IID challenges, appears to consistently under-predict or smooth out high-magnitude events, resulting in the actual value frequently touching or exceeding the upper bounds of the 80 % PI. This instability underscores the value of the EWC mechanism; by protecting global knowledge during local fine-tuning, the FL + AC + EWC model is more robust against local noise, allowing it to generate PIs that are both adequately reliable and maximally sharp, even in the highly uncertain long-term forecasting scenario.
To assess the effectiveness of the adaptive clustering and personalization mechanisms, we further conduct a clustering quality analysis and an EWC ablation study. For the latter, we construct two variants of the proposed framework: (i) FL + AC, which uses the Adaptive Clustering strategy but performs local fine-tuning without EWC, and (ii) FL + AC + EWC, which denotes the full proposed method with both AC and EWC enabled. For each wind farm, a local test set is held out, and we evaluate the two methods using RMSE, MAE, and the averaged Pinball Loss (averaged over all quantiles) on these local test sets. The results, averaged across the seven wind farms, are summarized in Table 5.
As shown in Table 5, the FL + AC + EWC model consistently achieves lower RMSE, MAE, and averaged Pinball Loss than FL + AC on the local test sets, indicating that EWC effectively mitigates catastrophic forgetting during client-side personalization. By penalizing large deviations of important global parameters, EWC preserves shared knowledge while still allowing adaptation to local data, which yields more accurate probabilistic forecasts for each wind farm.

4.2. ADMM Dispatch

In this section, we validate the effectiveness of the aforementioned ADMM algorithm under two formulations—constrained and unconstrained—on the standard IEEE 14-bus and IEEE 30-bus test systems.
In the IEEE 14-bus and IEEE 30-bus simulations, the controllable loads are modeled with quadratic cost functions of the form
C i ( p i ) = α i p i 2 + β i p i + γ i ,
where p i denotes the active power adjustment at bus i. The coefficients ( α i , β i , γ i ) are adapted from the standard quadratic generator cost data of the corresponding IEEE test systems. Specifically, we start from the quadratic cost coefficients provided in the widely used MATPOWER case files for the IEEE 14-bus and IEEE 30-bus systems and reinterpret them as the marginal cost profiles of controllable loads at the associated buses. To keep the numerical values consistent with the load-side formulation while preserving the relative cost structure, all coefficients are rescaled by a constant factor so that the resulting marginal costs fall in a typical range for distribution-level operation (approximately 10–50 monetary units per MWh). The values of ( α i , β i , γ i ) are kept fixed across all dispatch and anomaly detection scenarios. Due to the limitations on the number of load nodes and the rotational characteristics of intelligent loads, we can model some motors as intelligent load nodes for data transmission. In the IEEE-14 bus test system, there are three types of loads in total, and we provide the cost characteristic parameters of each load in Table 6. The IEEE-14 bus test system includes five loads. It is assumed that the communication and actual lines in the entire system are independent of each other, and the specific communication topology can be referred to in Figure 6.
For convenience and simplification of analysis, in this system, load 1 and load 2 are set as the same type of load, load 3 and load 5 as the same type of load, while load 4 is a different type of load compared to the others. Due to the different lines and states of users, the scheduling range between intelligent loads of the same type is also different. At the same time, information transfer between loads is bidirectional. Thus, the adjacency matrix A of the topology in Figure 6 is obtained as
A = 1 1 0 0 1 1 1 1 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0 0 1
Unless otherwise stated, all ADMM-based dispatch and anomaly detection experiments use a constant augmented Lagrangian penalty parameter ρ = 0.10 . The algorithm is terminated when both the primal and dual residual norms fall below a prescribed tolerance, i.e.,
r k 2 ϵ , s k 2 ϵ ,
with ϵ = 10 4 , or when a maximum number of iterations K max = 200 is reached. For anomaly detection, we define the detection threshold ξ based on the steady-state residual statistics under normal operation: after the ADMM iterations have converged in the absence of anomalies, we compute the mean and standard deviation of the residual norm and set
ξ = r ¯ + 3 σ r 0.02 p . u . ,
so that buses whose residual norms persistently exceed ξ are flagged as abnormal with high confidence. These parameter settings are kept fixed for all scenarios on the IEEE 14-bus and IEEE 30-bus systems, ensuring that the reported results are fully reproducible.
To illustrate the impact of forecast uncertainty on dispatch, we consider two configurations: a “narrow-interval” setting that uses a lower quantile level α ̲ = 0.2 (smaller R ( t ) ) and a “wider-interval” setting with α ̲ = 0.1 (larger R ( t ) ). In the narrow-interval case, the required reserve margin is smaller, leading to a lower scheduled adjustment Δ P g ( t ) but more frequent and larger corrective actions when realized wind power falls outside the too-tight prediction intervals. This manifests as slightly higher dispatch cost and a larger number of anomaly flags in the ADMM layer. In the wider-interval case, the reserve margin R ( t ) is larger, so the scheduled adjustment is more conservative; although this increases the scheduled cost, the number and magnitude of corrective re-dispatch actions decrease, and the residual trajectories in ADMM are smoother. In our simulations, shifting from α ̲ = 0.2 to α ̲ = 0.1 increases the scheduled reserve by about 8% but reduces the number of large corrective adjustments by approximately 20%, illustrating the trade-off between upfront reserve allocation and ex-post corrective effort induced by forecast uncertainty.
To evaluate the anomaly detection capability of the proposed ADMM-based scheme in a realistic way, we construct three typical abnormal load behaviors that may occur in industrial distribution networks.
Scenario S1 (20% sudden surge): at a selected load bus, the active power demand experiences an instantaneous increase to 1.2 times its nominal value and remains at this elevated level for at least 10 consecutive dispatch intervals, emulating a sudden connection of additional production equipment.
Scenario S2 (50% sudden drop): at a selected load bus, the active power demand suddenly decreases to 0.5 times its nominal value, representing an abrupt shutdown of a large industrial process.
Scenario S3 (sustained 15% deviation): at a selected load bus, the active power demand is gradually ramped to 1.15 times its nominal value over a few intervals and then remains at this level for the rest of the simulation horizon, mimicking a slowly evolving but persistent change in consumption.
In all cases, anomalies are injected after the ADMM iterations have converged to a steady operational point, so that the subsequent deviations in local residuals and multipliers are solely attributable to the abnormal behaviors.
For each scenario, the anomaly detection performance is evaluated using the following metrics. A detection is counted as a true positive (TP) if an anomalous bus is correctly flagged within a finite number of ADMM iterations after the anomaly is injected; a false negative (FN) occurs when an anomalous bus is never flagged. A false positive (FP) is recorded when a normal bus is incorrectly flagged as abnormal. The accuracy is defined as the ratio of correctly classified buses (both normal and abnormal) to the total number of buses. The false positive rate (FPR) is the proportion of normal buses that are incorrectly flagged, and the false negative rate (FNR) is the proportion of anomalous buses that remain undetected. The average detection delay (ADE) is measured as the number of ADMM iterations between the time when the anomaly is injected and the first iteration at which the corresponding bus is flagged as abnormal. Table 7 summarizes these metrics averaged over multiple realizations of each scenario on the IEEE 14-bus and IEEE 30-bus systems.
Because the algorithm can be given any initial value, but must satisfy the equality constraint of supply-demand balance, a total scheduling deviation of 1500 MW is given. The load power deviations at the initial moment are as follows: p 1 ( 0 ) = 490.0 MW , p 2 ( 0 ) = 410.0 MW , p 3 ( 0 ) = 300.0 MW , p 4 ( 0 ) = 150.0 MW , p 5 ( 0 ) = 150.0 MW . The positive scalar gain value is given as a constant. Using the above data and the topology, the optimal scheduling problem without inequality constraints can be easily calculated in advance, and the optimal solution is: p 1 * = 509.7 MW , p 2 * = 509.7 MW , p 3 * = 205.3 MW , p 4 * = 70.0 MW , p 5 * = 205.2 MW .
As shown in Figure 7, without being constrained by anomaly thresholds [38], the proposed algorithm can effectively detect load power deviations during periods of abnormal power consumption within the industry chain, and its detection time is significantly improved compared to the consensus algorithm in the literature. Similarly, when performing distributed sub-problem anomaly detection, the anomaly evaluation metrics for each sub-problem are also consistent, as shown in Figure 8.
As shown in Figure 9, the consistency balance of anomaly identification is always maintained during the detection process to ensure that the equality constraint is satisfied. This is a necessary guarantee for the accuracy of the upstream and downstream industrial power anomaly monitoring system. Figure 10 respectively shows the anomaly detection scheme with upper and lower limit constraints of power consumption characteristics.
Figure 7, Figure 8, Figure 9 and Figure 10 compare the trajectories of the dispatch variables and residuals in the unconstrained and constrained cases. In the unconstrained setup, all buses can freely share the required adjustment, and the ADMM iterations converge smoothly to a solution where the incremental costs are equalized across the network. Once the box constraints p i min p i p i max are enforced, several buses quickly reach their individual limits and remain saturated. Physically, this means that these loads have exhausted their flexibility and can no longer contribute to compensating the wind power deviation. Consequently, the remaining imbalance must be reallocated to the unsaturated buses, leading to the kinked trajectories and slightly slower decrease of the primal and dual residuals observed in the constrained case. Despite this effect, the algorithm still converges in a reasonable number of iterations because the overall problem remains convex and the active-set of constrained loads stabilizes after a few iterations. The constrained solution is therefore more realistic from an operational viewpoint, as it respects individual load capabilities, while the unconstrained solution mainly serves as a reference for the best achievable economic performance in the absence of physical limits.

5. Discussion

The ADMM-based load-side dispatch and anomaly detection scheme in Section 3.2 takes as input the probabilistic wind power forecasts generated by the adaptive FL framework described in Section 3.1. In particular, the scheduled wind power trajectory is derived from a central quantile (e.g., the median) of the forecast distribution, while a safety margin based on the lower quantiles is incorporated into the reserve allocation. When the forecasts are accurate and well calibrated, the realized wind generation remains within the predicted intervals with high probability, and only small adjustments of controllable loads are needed to maintain power balance. Consequently, the ADMM iterations converge quickly with small residuals, the overall dispatch cost is reduced, and only a few anomaly flags are raised. In contrast, systematic forecast bias or overly narrow prediction intervals increase the probability of large imbalances between scheduled and actual wind power, which leads to more frequent and larger corrective actions, slower convergence of the distributed optimization, and a higher rate of anomaly alarms. The improvements in RMSE, MAE, and Pinball Loss achieved by the proposed FL-based probabilistic forecasting thus translate into fewer re-dispatch operations and more stable ADMM trajectories in our simulations, highlighting the practical value of better forecasting quality for the downstream optimization and anomaly detection tasks.
To further assess the practical deployability of the proposed framework, we quantitatively evaluate its communication and computation overhead using the implementation adopted in our experiments. For the FL-based probabilistic wind power forecasting, the artificial neural network deployed at each client contains approximately 5.3 × 10 4 trainable parameters. With 32-bit floating-point representation, this corresponds to about 0.21 MB of model data. In each global FL round, every wind farm uploads its local model parameters to the server (or cluster center) and receives the aggregated global (or cluster-specific) parameters once, leading to a per-client communication volume of roughly 0.42 MB per round (upload + download) and a total volume of about 2.9 MB per round when seven wind farms participate. Over 100 global rounds, the overall communication volume remains below 300 MB, which is well within the capacity of typical industrial communication infrastructures.
On the computation side, all experiments are run on a workstation equipped with an Intel Core i7 CPU (8 cores, 2.5 GHz) and 16 GB RAM without GPU acceleration. Under this setup, the average client-side latency for one FL round (including one local training epoch and communication) is approximately 0.38 s. For the ADMM-based load-side dispatch and anomaly detection, the per-iteration computational complexity at node i is linear in both the number of incident edges and the dimension of local decision variables, i.e., O ( | N i | d ) , where | N i | denotes the number of neighbors of node i and d denotes the number of local variables. In practice, this results in an average wall-clock time of about 5 ms per iteration for the IEEE 14-bus system and about 8 ms per iteration for the IEEE 30-bus system. Since the ADMM algorithm typically converges within a few tens of iterations, the total latency for one distributed dispatch and anomaly detection cycle is on the order of a few hundred milliseconds, which is compatible with standard dispatch intervals in distribution-level operation.
As summarized in Table 8, the communication volume per FL round is on the order of a few megabytes even when all seven wind farms participate, and the client-side computation latency per FL round is below one second on a standard CPU-only workstation. The ADMM-based dispatch and anomaly detection scheme exhibits linear per-iteration complexity in the local neighborhood size and achieves millisecond-level iteration latency for both the IEEE 14-bus and IEEE 30-bus systems. These results indicate that the end-to-end framework is computationally efficient and communication-feasible for deployment in practical industrial power networks with similar scales.
While the proposed adaptive FL-based probabilistic forecasting framework with AC and EWC has shown promising results on the GEFCom 2014 wind farms and IEEE test systems, several limitations remain. First, although our overhead evaluation indicates that the communication and computation costs are moderate for the considered setup (a small number of wind farms and a medium-size ANN), the total communication volume of FL still grows roughly linearly with both the number of participating clients and the number of global rounds. In large-scale deployments with tens or hundreds of wind farms and more complex models, this could lead to non-negligible communication burden, especially over bandwidth-limited or latency-sensitive industrial networks. Techniques such as model compression, sparsified or quantized updates, and probabilistic client subsampling are promising directions to further reduce communication without significantly degrading forecasting accuracy. Second, the current FL scheme assumes that a central server (or a small number of cluster centers) can reliably coordinate communication rounds in a mostly synchronous manner. In practice, connectivity may be intermittent and client availability may be highly variable, which calls for more scalable and robust variants such as hierarchical or fully decentralized federated learning with asynchronous updates. Third, although the adaptive clustering and EWC components help mitigate the impact of Non-IID data and catastrophic forgetting, they do not explicitly account for long-term concept drift (e.g., structural changes in wind regimes or operating patterns). Extending the framework with drift detection mechanisms and dynamically adaptable model architectures is an important topic for future work. Despite these limitations, the present results demonstrate the feasibility of combining federated probabilistic forecasting with distributed optimization for industrial grids, and they provide a basis on which more scalable and communication-efficient designs can be developed.

6. Conclusions

This paper proposed an integrated framework that combines federated probabilistic wind power forecasting with ADMM-based load-side dispatch and distributed anomaly detection for privacy-constrained industrial power networks. An adaptive FL scheme with AC and EWC was used to train quantile regression models across multiple wind farms without sharing raw data, and the resulting forecasts were embedded into a distributed ADMM formulation that coordinates controllable loads and localizes abnormal behaviours. Experiments on the GEFCom 2014 dataset and IEEE 14-bus/30-bus systems show that the proposed FL + AC + EWC forecaster reduces RMSE and MAE by up to about 20% and 18% compared with persistence, achieves 8–15% improvements over purely local models, and provides an additional 3–5% gain over FL + AC without EWC. The downstream ADMM-based scheme converges in a few tens of iterations and attains anomaly detection accuracy above 95% with low false-alarm rates, while incurring moderate communication and computation overhead. These results indicate that improving probabilistic forecasting via adaptive federated learning can directly enhance the efficiency and robustness of distributed dispatch and anomaly detection in decentralized power systems.

Author Contributions

Conceptualization, L.S.; methodology, L.S. and N.F.; software, L.S. and J.M.; validation, L.S. and L.Z.; investigation, N.F. and L.Z.; resources, N.F. and J.M.; supervision, L.Z. and J.Z.; writing—original draft preparation, L.S.; writing—review and editing, J.M., L.Z. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Startup Foundation for Introducing Talent of Wuxi University under Grants 2024r001 and 2024r004.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fotopoulou, M.; Rakopoulos, D.; Petridis, S.; Drosatos, P. Assessment of smart grid operation under emergency situations. Energy 2024, 287, 129661. [Google Scholar] [CrossRef]
  2. Nguyen, L.H.; Nguyen, V.L.; Hwang, R.H.; Kuo, J.J.; Chen, Y.W.; Huang, C.C.; Pan, P.I. Towards Secured Smart Grid 2.0: Exploring Security Threats, Protection Models, and Challenges. IEEE Commun. Surv. Tutor. 2025, 27, 2581–2620. [Google Scholar] [CrossRef]
  3. Bakirtzis, A.; Petridis, V.; Kazarlis, S. Genetic algorithm solution to the economic dispatch problem. IEEE Proc.-Gener. Transm. Distrib. 1994, 141, 377–382. [Google Scholar] [CrossRef]
  4. Attaviriyanupap, P.; Kita, H.; Tanaka, E.; Hasegawa, J. A hybrid EP and SQP for dynamic economic dispatch with nonsmooth fuel cost function. IEEE Power Eng. Rev. 2002, 17, 411–416. [Google Scholar]
  5. Victoire, T.; Jeyakumar, A. Discussion of “Particle swarm optimization to solving the economic dispatch considering the generator constraints”. IEEE Trans. Power Syst. 2004, 19, 2121–2122. [Google Scholar] [CrossRef]
  6. Park, J.B.; Lee, K.S.; Shin, J.R.; Lee, K.Y. A particle swarm optimization for economic dispatch with nonsmooth cost functions. IEEE Trans. Power Syst. 2005, 20, 34–42. [Google Scholar] [CrossRef]
  7. Choudhary, A.; Jain, P.; Prajesh, A. Wind Power Forecasting Using Deep Learning Method: A Review. In Proceedings of the 2023 1st International Conference on Intelligent Computing and Research Trends (ICRT), Roorkee, India, 3–4 February 2023; pp. 1–6. [Google Scholar] [CrossRef]
  8. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  9. Sarkar, M.R.; Anavatti, S.G.; Ferdaus, M.M.; Dam, T. ASPEN-WIND: Adaptive spectral and self-supervised interactive CNN-LSTM for enhanced wind power forecasting. Expert Syst. Appl. 2026, 296, 129171. [Google Scholar] [CrossRef]
  10. Xie, Y.; Li, C.; Li, M.; Liu, F.; Taukenova, M. An overview of deterministic and probabilistic forecasting methods of wind energy. iScience 2023, 26, 105804. [Google Scholar] [CrossRef] [PubMed]
  11. Jittratorn, N.; Huang, C.M.; Yang, H.T. A deterministic and probabilistic framework based on corrected wind speed to improve Short-Term wind power forecasting accuracy. Int. J. Electr. Power Energy Syst. 2025, 170, 110859. [Google Scholar] [CrossRef]
  12. Tang, Y.; Zhang, S.; Zhang, Z. A privacy-preserving framework integrating federated learning and transfer learning for wind power forecasting. Energy 2024, 286, 129639. [Google Scholar] [CrossRef]
  13. Wei, M.; Yang, J.; Zhao, Z.; Zhang, X.; Li, J.; Deng, Z. DeFedHDP: Fully Decentralized Online Federated Learning for Heart Disease Prediction in Computational Health Systems. IEEE Trans. Comput. Soc. Syst. 2024, 11, 6854–6867. [Google Scholar] [CrossRef]
  14. Jiang, L.; Ming, X.; Zhang, X. DT-DOFL: Digital-Twin-Empowered Decentralized Online Federated Learning for User-Centered Smart Healthcare Service Systems. IEEE Trans. Comput. Soc. Syst. 2025, 12, 4441–4455. [Google Scholar] [CrossRef]
  15. Wei, M.; Yu, W.; Chen, D. AccDFL: Accelerated Decentralized Federated Learning for Healthcare IoT Networks. IEEE Internet Things J. 2025, 12, 5329–5345. [Google Scholar] [CrossRef]
  16. Li, Y.; Wang, R.; Li, Y.; Zhang, M.; Long, C. Wind power forecasting considering data privacy protection: A federated deep reinforcement learning approach. Appl. Energy 2023, 329, 120291. [Google Scholar] [CrossRef]
  17. Wei, M.; Yang, Z.; Ji, Q.; Zhao, Z. Privacy-preserving distributed projected one-point bandit online optimization over directed graphs. Asian J. Control 2023, 25, 4705–4720. [Google Scholar] [CrossRef]
  18. Cai, J.; Ma, X.; Li, L.; Peng, H. Chaotic particle swarm optimization for economic dispatch considering the generator constraints. Energy Convers. Manag. 2007, 48, 645–653. [Google Scholar] [CrossRef]
  19. Yu, W.; Wen, G.; Yu, X.; Wu, Z.; Lü, J. Bridging the gap between complex networks and smart grids. J. Control Decis. 2014, 1, 102–114. [Google Scholar] [CrossRef]
  20. Cao, Y.; Yu, W.; Ren, W.; Chen, G. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Ind. Inform. 2013, 9, 427–438. [Google Scholar] [CrossRef]
  21. Ge, X.; Han, Q.; Ding, D.; Zhang, X.; Ning, B. A survey on recent advances in distributed sampled-data cooperative control of multi-agent systems. Neurocomputing 2018, 275, 1684–1701. [Google Scholar] [CrossRef]
  22. Ren, W. Distributed consensus in multivehicle cooperative control: Theory and applications research interests control systems & robotics. Commun. Control Eng. 2008, 27, 71–82. [Google Scholar]
  23. Imai, Y.; Kohsaka, S. Distributed consensus algorithms in sensor networks with imperfect communication: Link failures and channel noise. IEEE Trans. Signal Process. 2009, 57, 355–369. [Google Scholar]
  24. Wei, M.; Yu, W.; Liu, H.; Xu, Q. Distributed Weakly Convex Optimization Under Random Time-Delay Interference. IEEE Trans. Netw. Sci. Eng. 2024, 11, 212–224. [Google Scholar] [CrossRef]
  25. Nedic, A.; Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 2009, 54, 48–61. [Google Scholar] [CrossRef]
  26. Yuan, D.; Ho, D.; Xu, S. Regularized primal-dual subgradient method for distributed constrained optimization. IEEE Trans. Cybern. 2015, 46, 2109–2118. [Google Scholar] [CrossRef]
  27. Liu, Q.; Wang, J. L1-Minimization algorithms for sparse signal reconstruction based on a projection neural network. IEEE Trans. Neural Netw. Learn. Syst. 2017, 27, 698–707. [Google Scholar] [CrossRef] [PubMed]
  28. Wei, M.; Yu, W.; Chen, D.; Kang, M.; Cheng, G. Privacy Distributed Constrained Optimization Over Time-Varying Unbalanced Networks and Its Application in Federated Learning. IEEE/CAA J. Autom. Sin. 2025, 12, 335–346. [Google Scholar] [CrossRef]
  29. Zhang, H.; Liang, S.; Ou, M.; Wei, M. An asynchronous distributed gradient algorithm for economic dispatch over stochastic networks. Int. J. Electr. Power Energy Syst. 2021, 124, 106240. [Google Scholar] [CrossRef]
  30. Wei, M.; Chen, G.; Guo, Z. A fixed-time optimal consensus algorithm over undirected networks. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 725–730. [Google Scholar]
  31. Yang, S.; Tan, S.; Xu, J. Consensus based approach for economic dispatch problem in a smart grid. IEEE Trans. Power Syst. 2013, 28, 4416–4426. [Google Scholar] [CrossRef]
  32. Wei, M.; Yu, W.; Liu, H.; Chen, D. Byzantine-Resilient Distributed Bandit Online Optimization in Dynamic Environments. IEEE Trans. Ind. Cyber-Phys. Syst. 2024, 2, 154–165. [Google Scholar] [CrossRef]
  33. Yu, W.; Li, C.; Yu, X.; Wen, G.; Lu, J. Economic power dispatch in smart grids: A framework for distributed optimization and consensus dynamics. Sci. China Inf. Sci. 2018, 61, 012204. [Google Scholar] [CrossRef]
  34. Hong, S.; Yue, T.; You, Y.; Lv, Z.; Tang, X.; Hu, J.; Yin, H. A Resilience Recovery Method for Complex Traffic Network Security Based on Trend Forecasting. Int. J. Intell. Syst. 2025, 2025, 3715086. [Google Scholar] [CrossRef]
  35. Ahmad, J.; Latif, S.; Khan, I.U.; Alshehri, M.S.; Khan, M.S.; Alasbali, N.; Jiang, W. An interpretable deep learning framework for intrusion detection in industrial Internet of Things. Internet Things 2025, 33, 101681. [Google Scholar] [CrossRef]
  36. Dang, J.; Zheng, H.; Xu, X.; Wang, L.; Hu, Q.; Guo, Y. Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 3820–3833. [Google Scholar] [CrossRef] [PubMed]
  37. Song, L.K.; Tao, F.; Li, X.Q.; Yang, L.C.; Wei, Y.P.; Beer, M. Physics-embedding multi-response regressor for time-variant system reliability assessment. Reliab. Eng. Syst. Saf. 2025, 263, 111262. [Google Scholar] [CrossRef]
  38. Li, F.; Qin, J.; Kang, Y. Multi-agent system based distributed pattern search algorithm for non-convex economic load dispatch in smart grid. IEEE Trans. Power Syst. 2019, 34, 2093–2102. [Google Scholar] [CrossRef]
Figure 1. RMSE comparison of different models.
Figure 1. RMSE comparison of different models.
Mathematics 14 00112 g001
Figure 2. 2 h Probabilistic Forecast: Proposed FL + AC + EWC Model.
Figure 2. 2 h Probabilistic Forecast: Proposed FL + AC + EWC Model.
Mathematics 14 00112 g002
Figure 3. 2 h Probabilistic Forecast: local model.
Figure 3. 2 h Probabilistic Forecast: local model.
Mathematics 14 00112 g003
Figure 4. 24 h Probabilistic Forecast: Proposed FL + AC + EWC Model.
Figure 4. 24 h Probabilistic Forecast: Proposed FL + AC + EWC Model.
Mathematics 14 00112 g004
Figure 5. 24 h Probabilistic Forecast: local Model.
Figure 5. 24 h Probabilistic Forecast: local Model.
Mathematics 14 00112 g005
Figure 6. Block diagram of the IEEE-14 bus test system incontingency service.
Figure 6. Block diagram of the IEEE-14 bus test system incontingency service.
Mathematics 14 00112 g006
Figure 7. Power dispatch of loads without inequality constraints.
Figure 7. Power dispatch of loads without inequality constraints.
Mathematics 14 00112 g007
Figure 8. Incremental cost of loads during power dispatch.
Figure 8. Incremental cost of loads during power dispatch.
Mathematics 14 00112 g008
Figure 9. Total demand and supply power of loads dispatch without inequality constraints.
Figure 9. Total demand and supply power of loads dispatch without inequality constraints.
Mathematics 14 00112 g009
Figure 10. Power dispatch of loads with inequality constraints.
Figure 10. Power dispatch of loads with inequality constraints.
Mathematics 14 00112 g010
Table 1. Clustering quality evaluation of the AC strategy on GEFCom 2014 wind farms.
Table 1. Clustering quality evaluation of the AC strategy on GEFCom 2014 wind farms.
Cluster IDSilhouette CoefficientDavies–Bouldin Index
C10.470.62
C20.390.74
C30.440.68
Overall (average)0.430.68
Table 2. Comparison of Forecasting Errors Across Different Settings.
Table 2. Comparison of Forecasting Errors Across Different Settings.
MethodForecasting Settings (WF)Mean
WF 0WF 1WF 2WF 3WF 4WF 5WF 6
24 hPersistence0.10030.11500.12460.11470.12490.11170.11470.1152
Local0.08050.08740.09340.08830.08990.08650.08850.0877
FL0.07420.08910.09180.08210.08760.07990.08570.0843
FL + AR0.07010.08500.08460.07990.08000.07560.07830.0796
12 hPersistence0.08040.09860.10300.09530.09870.09070.09750.0963
Local0.08190.08120.08580.07700.07620.07650.09310.0808
FL0.06880.08020.08350.07670.07910.07270.07870.0771
FL + AR0.06430.06400.07300.07080.06980.06800.07290.0691
4 hPersistence0.05330.06210.06910.06270.06410.06040.06430.0623
Local0.04760.05320.05800.05180.05380.05190.05440.0530
FL0.04740.05270.05810.05280.05330.05110.05420.0521
FL + AR0.04550.05240.05430.04880.05010.05070.05060.0503
2 hPersistence0.04270.04800.05520.05020.05120.04870.05160.0497
Local0.03480.03560.04130.03840.03750.03610.03740.0372
FL0.03380.03530.04060.03660.03690.03580.03740.0363
FL + AR0.03300.03470.03930.03570.03660.03470.03640.0361
Table 3. Average CRPS of different forecasting methods on GEFCom 2014.
Table 3. Average CRPS of different forecasting methods on GEFCom 2014.
MethodCRPS (Mean)
Persistence0.072
Local LSTM0.061
FedAvg (standard FL)0.056
FL + AC0.053
FL + AC + EWC (proposed)0.049
Table 4. Nominal vs empirical coverage (%) of central prediction intervals for different methods (averaged over all horizons and wind farms).
Table 4. Nominal vs empirical coverage (%) of central prediction intervals for different methods (averaged over all horizons and wind farms).
Method50% Nominal80% Nominal90% Nominal
Persistence44.173.283.5
Local LSTM47.376.486.2
FedAvg (standard FL)49.078.188.0
FL + AC50.279.089.1
FL + AC + EWC (proposed)51.081.290.4
Table 5. Effect of EWC on local personalization performance (GEFCom 2014, averaged over seven wind farms).
Table 5. Effect of EWC on local personalization performance (GEFCom 2014, averaged over seven wind farms).
MethodRMSEMAEPinball Loss
FL + AC0.05780.04360.0312
FL + AC + EWC (proposed)0.05490.04190.0298
Table 6. Simulation parameters of the IEEE-14 bus test system.
Table 6. Simulation parameters of the IEEE-14 bus test system.
DG1DG2DG3DG4DG5
α / M W −2535.2−2535.2−2023.2−826.8−2023.2
β / M W 352.1352.1257.7103.7257.7
γ / M W −8616.8−8616.8−7631.0−3216.7−7631.0
Range/ M W [0, 500][0, 400][0, 300][0, 300][0, 400]
Table 7. Anomaly detection performance of the ADMM-based scheme under different abnormal load scenarios.
Table 7. Anomaly detection performance of the ADMM-based scheme under different abnormal load scenarios.
ScenarioAccuracy (%)FPR (%)FNR (%)ADE (Iterations)
S1: 20% sudden surge97.82.32.13.2
S2: 50% sudden drop96.53.43.73.8
S3: sustained 15% deviation95.14.14.84.6
Overall average96.53.33.53.9
Table 8. Communication and computation overhead of the proposed framework.
Table 8. Communication and computation overhead of the proposed framework.
Component/MetricSymbolValueDescription
FL model size (per client) | θ | 5.3 × 10 4 paramsNumber of trainable ANN parameters
FL comm. per client per round V FL ( cli ) 0.42 MBUpload + download model parameters
FL comm. per round (7 clients) V FL ( tot ) 2.9 MBTotal across all wind farms
Client-side FL latency per round T FL 0.38 sOne local epoch + communication
ADMM per-iteration complexity C ADMM O ( | N i | d ) Node i: neighbors | N i | , dim. d
ADMM iteration latency (14-bus) T ADMM 14 ≈5 msAverage per iteration per node
ADMM iteration latency (30-bus) T ADMM 30 ≈8 msAverage per iteration per node
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sheng, L.; Fu, N.; Mou, J.; Zhu, L.; Zhou, J. Data-Driven Probabilistic Wind Power Forecasting and Dispatch with Alternating Direction Method of Multipliers over Complex Networks. Mathematics 2026, 14, 112. https://doi.org/10.3390/math14010112

AMA Style

Sheng L, Fu N, Mou J, Zhu L, Zhou J. Data-Driven Probabilistic Wind Power Forecasting and Dispatch with Alternating Direction Method of Multipliers over Complex Networks. Mathematics. 2026; 14(1):112. https://doi.org/10.3390/math14010112

Chicago/Turabian Style

Sheng, Lina, Nan Fu, Juntao Mou, Linglong Zhu, and Jinan Zhou. 2026. "Data-Driven Probabilistic Wind Power Forecasting and Dispatch with Alternating Direction Method of Multipliers over Complex Networks" Mathematics 14, no. 1: 112. https://doi.org/10.3390/math14010112

APA Style

Sheng, L., Fu, N., Mou, J., Zhu, L., & Zhou, J. (2026). Data-Driven Probabilistic Wind Power Forecasting and Dispatch with Alternating Direction Method of Multipliers over Complex Networks. Mathematics, 14(1), 112. https://doi.org/10.3390/math14010112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop