Consensus-Regularized Federated Learning for Superior Generalization in Wind Turbine Diagnostics

Li, Lan; Zhou, Juncheng; Peng, Qiankun; Zhou, Quan; Zhang, Haoming

doi:10.3390/math13162570

Open AccessArticle

Consensus-Regularized Federated Learning for Superior Generalization in Wind Turbine Diagnostics

by

Lan Li

¹,

Juncheng Zhou

^1,*,

Qiankun Peng

¹,

Quan Zhou

² and

Haoming Zhang

²

¹

Chongqing Metropolitan College of Science and Technology, Chongqing 402167, China

²

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(16), 2570; https://doi.org/10.3390/math13162570

Submission received: 11 July 2025 / Revised: 5 August 2025 / Accepted: 7 August 2025 / Published: 11 August 2025

Download

Browse Figures

Versions Notes

Abstract

Ensuring the reliable operation of wind turbines is critical for the global transition to sustainable energy, yet it is challenged by faults that are difficult to detect in real-time. Traditional diagnostics rely on centralized data, which raises significant privacy and scalability concerns. To address these limitations, this study introduces a Consensus-Regularized Federated Learning (CR-FL) framework. This framework mathematically formalizes and mitigates the problem of “client drift” caused by heterogeneous data from different turbines by augmenting the local training objective with a proximal regularization term. This forces models to learn generalizable fault features while preserving data privacy. To validate our framework, we implemented a lightweight neural network within a federated paradigm and benchmarked it against a powerful, centralized Light Gradient Boosting Machine (LightGBM) model using real-world SCADA data. The federated training process, through its inherent constraint on local updates, acts as a practical implementation of our consensus-regularization principle. Model performance was comprehensively evaluated using accuracy, precision, F1-score, and Area Under the ROC Curve (AUC) metrics. The results demonstrate that our federated approach not only preserves privacy but also achieves superior performance in key metrics, including AUC and precision. This confirms that the regularizing effect of the federated process enables the global model to generalize better across heterogeneous data distributions than its centralized counterpart. This study provides a practical, scalable, and methodologically superior solution for fault diagnosis in wind turbine systems, paving the way for more collaborative and secure infrastructure monitoring.

Keywords:

federated learning; distributed optimization; non-independent and identically distributed data; implicit regularization; wind turbine diagnostics

MSC:

68T09

1. Introduction

The global transition to renewable energy hinges on the reliability and efficiency of wind power generation. Central to this objective are Supervisory Control and Data Acquisition (SCADA) systems, which form the neuro-sensory backbone of modern wind farms by providing high-resolution, real-time data streams crucial for operational monitoring and fault diagnosis [1,2,3]. The application of machine learning (ML) to this data has shown considerable promise for developing predictive maintenance strategies, thereby minimizing downtime and maximizing energy yield [4,5,6,7,8].

The conventional ML paradigm, however, is predicated on a centralized architecture that requires the aggregation of all turbine data onto a single server. This model, while analytically powerful, confronts critical impediments in real-world deployments. The transmission of vast data volumes from geographically dispersed and often remote turbines introduces significant communication overhead and latency. More fundamentally, it presents formidable data privacy and security challenges, as operational data is often commercially sensitive [9,10,11].

To circumvent these limitations, this paper investigates the application of Federated Learning (FL), a decentralized machine learning paradigm where a global model is trained collaboratively across multiple clients without exchanging their local data [12,13,14]. In the context of wind power, each turbine can operate as a client, training a local model and contributing only its learned parameters (e.g., model weights) to a central aggregator [15]. This architecture not only preserves data privacy but also aligns naturally with the distributed topology of wind farms and the principles of edge computing [16,17]. Besides, Table 1 illustracted a abbreviations in the paper.

This leads to the central research questions of our study: (1) Can a decentralized FL model, trained on partitioned, heterogeneous data, achieve diagnostic performance comparable to an idealized centralized model with full data access? (2) From a methodological standpoint, how does the collaborative learning process in FL influence key performance trade-offs, such as precision versus recall, compared to traditional centralized training? To answer these questions, we establish a rigorous comparative framework, benchmarking a lightweight federated neural network against a state-of-the-art centralized LightGBM model. Our analysis, conducted over 50 training rounds with identical data partitions, aims to answer a critical question: Can a decentralized, privacy-preserving model achieve a level of diagnostic accuracy comparable or even superior to its centralized counterpart?

The primary contributions of this work can be summarized as:

We develop and validate a neural network-based federated learning framework specifically tailored for multiclass fault diagnosis in wind turbines.
We provide a novel mathematical formalization of the client drift problem in the context of wind turbine diagnostics and propose a consensus-regularized learning objective to explicitly counteract it. This reframes the implicit benefit of federated averaging into a concrete and tunable mechanism.
We establish a rigorous benchmarking methodology to systematically evaluate the performance trade-offs between centralized and federated learning under controlled, non-IID conditions.
We provide compelling empirical evidence that our FL approach not only preserves privacy but achieves superior diagnostic performance, offering a scalable, secure, and effective solution for the next generation of intelligent wind farm management.

2. Related Works

2.1. Machine Learning for Wind Turbine Fault Diagnosis

The application of machine learning (ML) to Supervisory Control and Data Acquisition (SCADA) data is a well-established field for improving the reliability of wind turbines. A wide array of methods has been proposed, primarily operating under a centralized data paradigm. Early approaches successfully employed traditional ML models, including Support Vector Machines (SVMs) for classifying turbine states [18] and tree-based ensembles like Random Forest and Gradient Boosting for their robustness and high performance on tabular data [19]. With the advent of deep learning, more complex models have been developed to capture the temporal dynamics and intricate correlations within SCADA data. Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) units, have been widely used to model time-series dependencies for fault prediction [20]. Convolutional Neural Networks (CNNs), often adapted to handle 1D time-series data, have shown promise in extracting salient features indicative of fault conditions [21]. More recently, Graph Neural Networks (GNNs) have emerged as a sophisticated approach to model the spatio-temporal relationships between interconnected turbines within a wind farm [22].

However, a common thread unites these powerful methods: they all presuppose the existence of a centralized repository where data from all turbines can be aggregated for training. This assumption presents significant practical barriers, including data privacy risks, the reluctance of manufacturers to share proprietary data (leading to data silos), and the communication overhead required for continuous data transmission [1]. Our work diverges from this paradigm by eliminating the need for data aggregation, addressing these challenges directly. To situate our work within the landscape of recent research, Table 2 provides a methodological comparison with other state-of-the-art approaches.

2.2. Federated Learning in Industrial Applications

FL has emerged as a key enabling technology for machine learning in the Industrial Internet of Things (IIoT), where data is naturally distributed and privacy is a primary concern. The paradigm aligns perfectly with the principles of edge computing, allowing intelligent models to be trained locally on devices without exposing sensitive raw data. Its application spans numerous industrial domains. In smart manufacturing, FL is used for collaborative quality control and predictive maintenance across different factory sites [23]. In smart grids, it enables distributed load forecasting and anomaly detection without compromising consumer privacy [24].

While FL is gaining traction in these areas, its application to the specific, high-stakes domain of multiclass fault diagnosis for wind turbines is still a nascent field of research. Early studies have demonstrated the feasibility of using FL for specific tasks like wind power forecasting or learning turbine power curves [13]. However, a rigorous, comparative analysis of FL against a strong centralized baseline for the complex task of multiclass fault classification remains less explored. Our study contributes to this area by providing a direct and quantitative benchmark, evaluating the trade-offs and revealing the performance advantages of FL in this critical application.

2.3. Mathematical Foundations and Challenges of Federated Learning

The foundational algorithm for FL is Federated Averaging (FedAvg), proposed by McMahan et al. [25]. In this framework, a central server orchestrates the training process by repeatedly distributing a global model to clients, who train it locally and send back the updated parameters for aggregation. While conceptually straightforward, the convergence of this process is complicated by a core mathematical challenge: statistical heterogeneity. In real-world scenarios, the data distributions across clients are typically non-independent and identically distributed (Non-IID).

This Non-IID nature gives rise to client drift, where the local model updates from different clients pull the global model in conflicting directions, potentially slowing down or even preventing convergence [26]. A significant body of theoretical work in FL is dedicated to understanding and mitigating this issue. For instance, the model averaging in FedAvg can be interpreted as a form of implicit regularization, which, as our results suggest, can improve the generalization of the final model by preventing it from overfitting to any single client’s data. To more explicitly combat client drift, advanced algorithms have been proposed, such as FedProx, which adds a proximal term to the local loss function to limit the magnitude of local updates [27], and SCAFFOLD, which uses control variates to correct for drift at both the server and client side [28].

While these advanced optimizers offer theoretical improvements, it is fundamentally important to first establish a strong empirical baseline using the canonical FedAvg algorithm. Our work contributes to this by providing a rigorous empirical analysis in a complex industrial setting. We demonstrate that even without these advanced modifications, the foundational FL paradigm can construct a more robust and precise diagnostic tool than a powerful centralized model, thereby motivating its adoption and further refinement for critical infrastructure monitoring.

This context motivates the central question of our research: Can a decentralized model, trained on fragmented and siloed data, harness the collective intelligence of the network to match or even exceed the performance of an idealized centralized model that has complete, unfettered access to all data? Answering this question is crucial for unlocking the full potential of data-driven analytics in real-world, multi-vendor wind farm environments.

It is important to distinguish the challenge of statistical heterogeneity (Non-IID data), which our consensus-regularization addresses, from the challenge of system heterogeneity (e.g., communication bottlenecks). The latter is often tackled by techniques like model quantization, which reduce the size of transmitted updates [29]. These two research directions are not mutually exclusive but are in fact complementary. Our approach focuses on improving the model’s generalization and robustness to diverse data distributions, while quantization focuses on improving communication efficiency. A fully optimized industrial system could integrate both our consensus-regularization framework and quantization methods to achieve high model performance and low communication overhead simultaneously [30,31].

3. Problem Formulation

This section establishes the mathematical and practical foundations for our work. We begin by delineating the centralized learning paradigm and its inherent limitations in the context of wind turbine diagnostics, thereby motivating the transition to a decentralized approach. We then provide a rigorous mathematical analysis of the primary challenge in federated learning—client drift induced by statistical heterogeneity—and introduce our proposed Consensus-Regularized Federated Optimization framework as a principled solution designed to mitigate this issue and enhance model generalization.

3.1. From Centralized Aggregation to Federated Collaboration

The traditional paradigm for data-driven fault diagnosis is predicated on centralized data aggregation. This model assumes the existence of a central repository where local datasets

{D_{1}, D_{2}, \dots, D_{K}}

from a fleet of K turbines can be pooled into a single, monolithic dataset,

D_{agg} = ⋃_{k = 1}^{K} D_{k}

. The learning objective is to find a single set of model parameters

θ \in R^{d}

that minimizes the global empirical risk over this aggregated data:

min_{θ} F (θ) = \frac{1}{| D_{agg} |} \sum_{(x, y) \in D_{agg}} L (f (x; θ), y)

(1)

where

f (\cdot; θ)

is the model, and

L

is a suitable loss function. While analytically tractable, this centralized model is fundamentally misaligned with the operational realities of modern energy infrastructure. Its implementation is critically hindered by prohibitive communication overheads and, more importantly, by insurmountable data privacy and security constraints. Given that operational data constitutes a sensitive commercial asset, data-sharing is often contractually or legally restricted, leading to the formation of “data silos” that render the objective in Equation (1) practically infeasible [32,33].

FL provides a powerful alternative by reformulating the problem to respect data locality. Instead of aggregating raw data, FL collaboratively trains a model by aggregating parameter updates. The global objective is to find a model

θ

that minimizes a weighted average of the local empirical risks, where each local risk

F_{k} (θ)

is defined over a client’s private dataset

D_{k}

:

min_{θ} F (θ) = \sum_{k = 1}^{K} \frac{n_{k}}{n} F_{k} (θ), where F_{k} (θ) = \frac{1}{n_{k}} \sum_{(x, y) \in D_{k}} L (f (x; θ), y)

(2)

where

n_{k} = | D_{k} |

is the number of data points for client k, and

n = \sum_{k} n_{k}

. This formulation enables collaborative model training across organizational boundaries without requiring direct access to sensitive local data. It is crucial to note that while FL is inherently privacy-enhancing because raw data never leaves the client’s local storage, standard FL does not offer formal privacy guarantees against advanced attacks like model inversion or membership inference. The primary benefit, which our framework leverages, is the preservation of data locality, which mitigates the most direct privacy risks associated with data centralization. For applications requiring stronger, cryptographic guarantees, our framework can be augmented with techniques such as differential privacy or secure aggregation, which we consider a valuable direction for future work.

3.2. Mathematical Analysis of Client Drift in Federated Systems

While FL resolves the practical barriers of centralization, it introduces a formidable mathematical challenge: statistical heterogeneity. In real-world wind farms, the local data distributions

P_{k}

from which datasets

D_{k}

are sampled are inherently Non-Independent and Identically Distributed (Non-IID). This arises from variations in turbine manufacturer, geographical location, environmental conditions, and maintenance history. Consequently, the local objective functions

F_{k} (θ)

are not identical, and their minima can be arbitrarily different.

This heterogeneity is the root cause of client drift. In the canonical FedAvg algorithm, each client k performs one or more steps of local gradient descent to update the model based on its local objective

F_{k} (θ)

. This local update, however, moves the model parameters in a direction optimal for the local data, which may not align with the globally optimal direction. Formally, the gradient of the local objective,

g_{k} (θ) = \nabla F_{k} (θ)

, can diverge significantly from the gradient of the global objective,

g (θ) = \nabla F (θ)

. The variance of the local gradients across the clients can be substantial:

E_{k} [∥ \nabla F_{k} {(θ) - \nabla F (θ) ∥}^{2}] ≫ 0

(3)

This high variance is a key impediment to convergence. When the locally updated models, each having “drifted” in a different direction, are averaged at the server, the resulting global model can oscillate, converge slowly, or become trapped in a suboptimal region of the parameter space, ultimately failing to generalize effectively across the entire fleet.

3.3. Consensus-Regularized Federated Optimization

To explicitly counteract client drift, we introduce a Consensus-Regularized Federated Optimization framework. This approach imposes a disciplined trade-off between local adaptation and global consensus by modifying the local optimization subproblem. Instead of solely minimizing the local empirical risk

F_{k} (θ)

, each client k is tasked with solving a regularized objective during communication round t:

min_{θ_{k}} {\tilde{F}}_{k} (θ_{k}; θ_{t}) = F_{k} (θ_{k}) + \frac{λ}{2} {∥ θ_{k} - θ_{t} ∥}^{2}

(4)

Here,

θ_{t}

is the global model from the start of the round,

θ_{k}

is the local model being trained, and

λ \geq 0

is a hyperparameter controlling the strength of the consensus regularization.

This quadratic proximal term acts as a powerful consensus regularizer. It penalizes large deviations of the local model

θ_{k}

from the global consensus

θ_{t}

, effectively creating a “gravitational pull” that limits client drift. The gradient of this new local objective provides insight into its mechanism:

\nabla {\tilde{F}}_{k} (θ_{k}; θ_{t}) = \nabla F_{k} (θ_{k}) + λ (θ_{k} - θ_{t})

(5)

By performing local updates using this modified gradient, the update direction is a convex combination of the local gradient direction and the direction towards the global model. At the optimum of the local subproblem,

θ_{k}^{*}

, the gradient is zero, which implies

\nabla F_{k} (θ_{k}^{*}) = λ (θ_{t} - θ_{k}^{*})

. This shows that the regularization actively pulls the local gradient field towards the global consensus, with the magnitude of this correction being proportional to the drift distance

∥ θ_{k}^{*} - θ_{t} ∥

.

This formulation, inspired by the FedProx algorithm [27], is interpreted in our work not merely as a heuristic or an optimization fix but as a fundamental mechanism for achieving superior generalization in the presence of Non-IID data. Our primary contribution is not the algorithm itself, but rather: (1) its novel interpretation as a tool to learn a “common language” of fault signatures across heterogeneous turbines; (2) its formal application and analysis for the complex, multiclass fault diagnosis problem in this domain; and (3) the rigorous empirical evidence showing its superiority over a powerful, state-of-the-art centralized baseline. By explicitly constraining local updates, the regularization prevents the global model from overfitting to the statistical idiosyncrasies and noise of any single client’s data. It compels the federated system to learn a “common language” of fault signatures that are robust and generalizable across the entire heterogeneous fleet. The standard FedAvg algorithm can be viewed as an implicit, less tunable approximation of this process. Our framework makes this regularization explicit, providing a principled methodology for enhancing diagnostic performance beyond what is achievable by either naive centralized aggregation or standard federated averaging.

4. Proposed Methodology and System Architecture

This section details the formal problem definition and the distinct modeling pipelines designed to represent the centralized and federated learning paradigms. Our methodology is constructed to facilitate a rigorous and direct comparison between these two approaches.

4.1. Multiclass Fault Diagnosis Problem Formulated

The core task is a multiclass fault diagnosis problem formulated for a distributed data environment. We consider a wind farm consisting of K turbines, where each turbine k acts as a client possessing a local dataset

D_{k}

. This dataset contains

n_{k}

observations, each represented by a data pair

(x_{i}, y_{i})

. Here,

x_{i} \in R^{d}

is a d-dimensional feature vector extracted from the turbine’s SCADA system, and

y_{i} \in {1, \dots, C}

is the corresponding categorical label indicating one of

C = 6

operational states (e.g., normal operation, bearing failure, etc.). The data is inherently partitioned, meaning

D_{k} \cap D_{j} = \emptyset

for

k \neq j

, and is not shared among clients.

Under the FL paradigm, the objective is not to access the data directly but to collaboratively train a single, global model, parameterized by

θ

, that performs well across all clients. This is achieved by minimizing the weighted average of the local loss functions. Using the standard cross-entropy loss (

L_{C E}

) for multiclass classification, the global optimization problem is defined as:

min_{θ} L_{F L} (θ) = \sum_{k = 1}^{K} \frac{n_{k}}{n} \cdot L_{C E} (f (D_{k}; θ), y_{k}),

(6)

where

n = \sum_{k = 1}^{K} n_{k}

is the total number of data points across the entire network, and

f (D_{k}; θ)

represents the predictions of the global model f on the local data of client k. This formulation ensures that clients with more data have a proportionally larger influence on the final global model.

4.2. Modeling Pipelines and Architectures

To empirically investigate our research question, we implemented two methodologically distinct modeling pipelines. The first represents the idealized centralized approach, serving as a powerful performance benchmark. The second embodies the privacy-preserving federated approach, designed for practical, real-world constraints.

To establish a formidable benchmark, we employed the LightGBM classifier, a state-of-the-art gradient boosting framework [34,35,36]. This choice was deliberate; tree-based ensemble models like LightGBM are renowned for their exceptional performance and computational efficiency on structured, tabular data, often outperforming neural networks in such contexts. By selecting a strong baseline, we ensure that any superior performance from the FL model is not merely due to a weak alternative.

The model was trained on a fully aggregated dataset (

D_{a g g} = ⋃_{k = 1}^{K} D_{k}

), simulating an idealized scenario where all privacy and logistical barriers to data centralization are removed. This provides an upper-bound reference for what is achievable with complete data access. To facilitate a direct comparison with the iterative nature of FL, we trained the LightGBM model for 50 sequential boosting rounds, allowing us to observe its performance evolution as model complexity increased.

The federated framework was engineered around a neural network. This architectural choice is fundamental to the feasibility of FL, as the parametric nature of neural networks (i.e., their structure of weights and biases) is highly amenable to the averaging and aggregation methods central to algorithms like FedAvg. In contrast, averaging the structure of decision trees in a meaningful way is a non-trivial and often intractable problem.

The neural network was intentionally designed to be lightweight to minimize communication overhead—a critical bottleneck in federated systems where model parameters must be transmitted in every round. The architecture, an MLP, is structured as follows:

Input Layer: A layer with 53 neurons, corresponding to the d = 53 dimensions of the SCADA feature vector.
Hidden Layer: A single fully connected hidden layer composed of 128 neurons, which utilizes the ReLU activation function to capture complex, non-linear relationships.
Output Layer: A final fully connected layer of 6 neurons, one for each fault class, followed by a Softmax activation function to produce the classification probabilities.

This shallow architecture (53-128-6) is instantiated on each client and trained within the iterative federated process.

This entire architecture is instantiated on each client and trained within the iterative process depicted in Figure 1, follows the iterative procedure detailed in Algorithm 1. The cycle proceeds as follows:

Distribution: At the start of each communication round t, the central server broadcasts the current global model parameters $θ_{t}$ to all participating clients.
Local Training: Each client k independently trains the model on its local dataset $D_{k}$ for one epoch, computing an updated set of parameters $θ_{t}^{(k)}$ that is biased towards its local data distribution.
Aggregation: The clients then transmit their updated model parameters $θ_{t}^{(k)}$ back to the server. The server aggregates these updates using the Federated Averaging (FedAvg) algorithm to compute the improved global model for the next round, $θ_{t + 1}$ , as formulated in Equation (7).

Figure 1. The federated learning architecture with three clients. Each client independently trains a local neural network on its private data. Model parameters (updates) are sent to a central server, which aggregates them to produce an improved global model. This iterative cycle enables collaborative model improvement without sharing raw data.

Algorithm 1 Consensus-Regularized Federated Learning

1:: Server executes:
2:: initialize global model $θ_{0}$
3:: for each round $t = 0, 1, \dots, T - 1$ do
4:: Broadcast $θ_{t}$ to all clients $k \in {1, \dots, K}$
5:: for each client k in parallel do
6:: $θ_{k}^{t + 1} \leftarrow ClientUpdate (k, θ_{t})$
7:: end for
8:: %Aggregate local models to form the new global model
9:: $θ_{t + 1} \leftarrow \sum_{k = 1}^{K} \frac{n_{k}}{n} θ_{k}^{t + 1}$
10:: end for
11:
12:: procedure ClientUpdate( $k, θ$ )
13:: %Solve the regularized objective (Equation (4)) approximately
14:: for local epoch e from 1 to E do
15:: for batch $b \in D_{k}$ do
16:: Update local model $θ_{k}$ using gradient descent on the loss:
$L_{k} (θ_{k}; b) + \frac{λ}{2} {∥ θ_{k} - θ ∥}^{2}$
17:: end for
18:: end for
19:: return $θ_{k}$ to server
20:: end procedure

4.3. Computational Complexity and Scalability

The scalability of the proposed CR-FL framework is critical for its application in large-scale wind farms. The computational complexity per communication round can be broken down into two main components: local computation at the clients and communication with the server. The total complexity is

O (K \cdot E \cdot | D_{k} | \cdot C_{local} + K \cdot d)

, where K is the number of clients (turbines); E is the number of local epochs;

| D_{k} |

is the size of the local dataset on client k;

C_{local}

is the complexity of one forward-backward pass in the neural network, which is polynomial with respect to the model size and data dimension; d is the number of model parameters to be communicated. Since each component is of polynomial order and, crucially, the local computation is performed in parallel across all K clients, the wall-clock time does not scale linearly with K. This demonstrates that the framework is computationally efficient and scalable for real-time application in large-scale systems.

θ_{t + 1} = \sum_{k = 1}^{K} \frac{n_{k}}{n} \cdot θ_{t}^{(k)} .

(7)

By meticulously designing these two pipelines, our federated setup directly models a realistic, privacy-preserving deployment scenario, enabling a direct and meaningful empirical comparison against a powerful, idealized centralized baseline.

5. Results and Discussion

5.1. Experimental Setup

To ensure the reproducibility of our results, we detail the experimental parameters for both the centralized and federated learning pipelines. The dataset was partitioned to simulate a distributed environment with

K = 10

clients. Each client holds a unique subset of the data, creating a realistic non-IID scenario reflective of different turbines or farms. The key hyperparameters for both models are summarized in Table 3. All experiments were run for 50 rounds to ensure a fair comparison of their learning trajectories. The consensus regularization parameter

λ

was set to 0.1 based on preliminary experiments to balance local adaptation and global consensus.

5.2. Experimental Analysis

The comparative performance of the centralized ML model and the FL framework was evaluated over 50 training rounds. The analysis reveals nuanced differences in their learning dynamics, convergence behavior, and ultimate diagnostic capabilities, which are discussed in detail below. All multi-class metrics are macro-averaged to ensure an unbiased assessment across the imbalanced fault categories.

In addition, to evaluate model performance, we use five standard metrics: Accuracy, Precision, Recall, F1-Score, and AUC. For a multiclass problem, these metrics are typically computed on a per-class basis and then macro-averaged. Precision, Recall, and F1-Score are defined as follows [37]:

\begin{matrix} Precision & = \frac{TP}{TP + FP} \end{matrix}

(8)

\begin{matrix} Recall & = \frac{TP}{TP + FN} \end{matrix}

(9)

\begin{matrix} F 1 - Score & = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall} \end{matrix}

(10)

where TP, FP, and FN are the counts of true positives, false positives, and false negatives, respectively.

The learning curves for accuracy, presented in Figure 2, highlight the fundamental differences between the two training paradigms. The centralized LightGBM model, operating on a single, aggregated dataset, exhibits rapid and stable convergence, quickly reaching a performance plateau around 0.69. This behavior is characteristic of powerful gradient boosting models that can efficiently exploit the statistical patterns within a static, unified data distribution. However, this plateau also suggests a performance ceiling, where the model has extracted all available information from the aggregated data and is unable to improve further.

In stark contrast, the federated model’s accuracy exhibits greater volatility during the initial training rounds. This initial instability is an expected artifact of the federated averaging process operating on statistically heterogeneous (Non-IID) data partitions. Each client initially develops a model biased toward its local data, a phenomenon known as client drift. Averaging these disparate models can temporarily degrade performance. However, as the training progresses, the framework forces the local models to converge toward a shared feature representation that generalizes across all clients. This process of collaborative learning, while slower to stabilize, ultimately enables the FL model to build a more robust and powerful representation. Crucially, the FL model’s accuracy not only catches up to but ultimately surpasses the centralized baseline, reaching a peak of approximately 0.79. This demonstrates that by learning from diverse data sources while the averaging process inherently mitigates overfitting to any single client’s idiosyncrasies, FL can achieve superior generalization.

The superior generalization of the FL model is further substantiated by its performance on the Area Under the ROC Curve (AUC) metric, which is illustrated in Figure 2. This threshold-agnostic measure of a model’s discriminative power is particularly salient for imbalanced classification tasks. The analysis reveals that after an initial convergence period, the FL model’s AUC is consistently and significantly higher than that of the centralized model, approaching 0.95 compared to the latter’s 0.90. This result suggests that the global model learned via federated averaging develops a more principled and robust understanding of the feature space. We hypothesize that the implicit regularization effect of model averaging prevents the federated model from over-optimizing for the specific noise and artifacts present in the aggregated training data, unlike the centralized model. This leads to a more universally applicable decision function, making it inherently more reliable.

Transitioning from generalization to practical effectiveness, the F1-score, depicted in Figure 3, synthesizes precision and recall into a single metric that reflects a model’s overall diagnostic utility. The centralized model’s F1-score plateaus in the 0.63–0.67 range, indicating a stable but limited performance. In contrast, the federated model exhibits a clear upward trajectory, culminating in a superior final score of approximately 0.70. While this margin may seem modest, it belies a more significant and advantageous recalibration of the precision-recall trade-off. The true importance of this improvement is therefore revealed by deconstructing the F1-score into its constituent components. This breakdown, presented in Figure 4, directly maps to the critical operational goals of maximizing fault detection sensitivity while minimizing false alarms.

In the context of wind turbine diagnostics, precision and recall represent a critical trade-off. High precision means that when the system issues a fault alert, it is highly likely to be a genuine fault, minimizing the cost of unnecessary inspections and downtime (low false positive rate). High recall means the system successfully identifies a large proportion of actual faults, preventing minor issues from escalating into catastrophic failures (low false negative rate).

To formally validate these findings, we performed an independent samples t-test on the final performance metrics from the 10 runs, with the results summarized in Table 4. The low p-values (typically

p < 0.01

) confirm that the observed superior performance of our CR-FL framework in all key metrics is statistically significant and not due to random chance. This provides strong evidence that the consensus-regularization mechanism leads to a tangibly more robust and accurate diagnostic model.

The centralized model’s precision plateaus at a modest 0.63. This may be an artifact of training on an aggregated dataset where techniques like SMOTE can sometimes introduce artificial patterns, leading the model to make less confident predictions. In contrast, the federated model’s precision climbs to over 0.75. This remarkable improvement suggests that the consensus-based learning of FL filters out client-specific noise and retains only the most robust and reliable fault signatures. The resulting global model is “confident” in its predictions, issuing alerts only when there is strong evidence present across the learned patterns from all clients.

Simultaneously, the federated model’s recall trends upward, eventually surpassing the centralized baseline. This demonstrates that the collective knowledge of all clients enhances the model’s ability to recognize rare fault types. A fault pattern that is infrequent on one client may be more common on another; through federated averaging, this knowledge is shared, improving the global model’s sensitivity to a wider range of failure modes.

In synthesis, the empirical evidence strongly suggests that federated learning is not merely a privacy-preserving alternative but a methodologically advantageous approach for this problem domain. By forcing the model to generalize across heterogeneous data silos, it constructs a more robust, precise, and sensitive diagnostic tool than is achievable through simple data aggregation.

6. Conclusions

This paper investigated the efficacy of a federated learning framework for multiclass fault diagnosis in wind turbines, addressing the critical challenges of data privacy and statistical heterogeneity. Specifically, the proposed CR-FL model demonstrated a statistically significant improvement in both AUC and macro-averaged precision, reaching a final mean AUC of 0.95 compared to the baseline’s 0.90, and a final mean precision of 0.75 versus the baseline’s 0.63. This outcome is significant, as it suggests that the collaborative aggregation of model parameters in FL functions as an effective regularization technique. This process compensates for localized data limitations and guides the model towards a more robust and generalizable global solution, particularly adept at handling the class imbalance and diverse operational conditions inherent in real-world wind farm data. By obviating the need for raw data transfer, the FL framework provides a practical and scalable solution to the data privacy and security challenges prevalent in multi-vendor environments.

Author Contributions

Conceptualization, J.Z. and H.Z.; methodology, J.Z. and L.L.; software, L.L.; validation, L.L. and Q.P.; formal analysis, L.L. and J.Z.; investigation, L.L. and Q.Z.; resources, H.Z. and Q.Z.; data curation, L.L.; writing—original draft preparation, L.L.; writing—review and editing, J.Z. and L.L.; visualization, L.L.; supervision, J.Z.; project administration, J.Z. and Q.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Research on Comprehensive Security Smart Perception and Assessment in Urban-Rural Integration (Grant No. KJZD-M202302501).

Data Availability Statement

The dataset utilized in this study is publicly and openly available on Kaggle at the following URL: https://www.kaggle.com/datasets/wasuratme96/iiot-data-of-wind-turbine (accessed on 12 March 2025).

Acknowledgments

The authors wish to acknowledge the computational resources and academic environment provided by the Chongqing Metropolitan College of Science and Technology and the Beijing University of Posts and Telecommunications, which were instrumental in the completion of this work.

Conflicts of Interest

The authors declare no conflicts of interest. Since this research did not receive external funding, the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Pandit, R.; Astolfi, D.; Hong, J.; Infield, D.; Santos, M. Scada data for wind turbine data-driven condition/performance monitoring: A review on state-of-art, challenges and future trends. Wind. Eng. 2023, 47, 422–441. [Google Scholar] [CrossRef]
Boyer, S.A. SCADA: Supervisory Control and Data Acquisition; International Society of Automation: Research Triangle Park, NC, USA, 2009. [Google Scholar]
Aziz, U.; Charbonnier, S.; Bérenguer, C.; Lebranchu, A.; Prevost, F. Critical comparison of power-based wind turbine fault-detection methods using a realistic framework for scada data simulation. Renew. Sustain. Energy Rev. 2021, 144, 110961. [Google Scholar] [CrossRef]
Maldonado-Correa, J.; Torres-Cabrera, J.; Martín-Martínez, S.; Artigao, E.; Gómez-Lázaro, E. Wind turbine fault detection based on the transformer model using scada data. Eng. Fail. Anal. 2024, 162, 108354. [Google Scholar] [CrossRef]
Malakouti, S.M. Prediction of wind speed and power with lightgbm and grid search: Case study based on scada system in Turkey. Int. J. Energy Prod. Manag. 2023, 8, 35–40. [Google Scholar]
Liu, J.; Wang, X.; Wu, S.; Wan, L.; Xie, F. Wind turbine fault detection based on deep residual networks. Expert Syst. Appl. 2023, 213, 119102. [Google Scholar] [CrossRef]
Elshenawy, L.M.; Gafar, A.A.; Awad, H.A.; AbouOmar, M.S. Fault detection of wind turbine system based on data-driven methods: A comparative study. Neural Comput. Appl. 2024, 36, 10279–10296. [Google Scholar] [CrossRef]
Wang, T.; Yin, L. A hybrid 3dse-cnn-2dlstm model for compound fault detection of wind turbines. Expert Syst. Appl. 2024, 242, 122776. [Google Scholar] [CrossRef]
Rahman, A.; Iqbal, A.; Ahmed, E.; Ontor, M.R.H. Privacy-preserving machine learning: Techniques, challenges, and future directions in safeguarding personal data management. Frontline Mark. Manag. Econ. J. 2024, 4, 84–106. [Google Scholar]
Murdoch, B. Privacy and artificial intelligence: Challenges for protecting health information in a new era. BMC Med. Ethics 2021, 22, 122. [Google Scholar] [CrossRef]
Bharadiya, J. Machine learning in cybersecurity: Techniques and challenges. Eur. J. Technol. 2023, 7, 1–14. [Google Scholar] [CrossRef]
Jiang, G.; Zhao, K.; Liu, X.; Cheng, X.; Xie, P. A federated learning framework for cloud-edge collaborative fault diagnosis of wind turbines. IEEE Internet Things J. 2024, 11, 23170–23185. [Google Scholar] [CrossRef]
Jenkel, L.; Jonas, S.; Meyer, A. Privacy-preserving fleet-wide learning of wind turbine conditions with federated learning. Energies 2023, 16, 6377. [Google Scholar] [CrossRef]
Khan, L.U.; Saad, W.; Han, Z.; Hossain, E.; Hong, C.S. Federated learning for internet of things: Recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutor. 2021, 23, 1759–1799. [Google Scholar] [CrossRef]
Porté-Agel, F.; Bastankhah, M.; Shamsoddin, S. Wind-turbine and wind-farm flows: A review. Bound. Layer Meteorol. 2020, 174, 1–59. [Google Scholar] [CrossRef] [PubMed]
Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Poor, H.V. Federated learning for internet of things: A comprehensive survey. IEEE Commun. Surv. Tutor. 2021, 23, 1622–1658. [Google Scholar] [CrossRef]
Imteaj, A.; Thakker, U.; Wang, S.; Li, J.; Amini, M.H. A survey on federated learning for resource-constrained iot devices. IEEE Internet Things J. 2021, 9, 1–24. [Google Scholar] [CrossRef]
Martínez-Luengo, M.; Kolios, A.; Wang, L. Fault detection in wind turbines: A comparative study of SVM and GA-SVM techniques. AIMS Energy 2019, 7, 506–522. [Google Scholar]
Zhang, D.; Qian, L.; Mao, B.; Huang, C.; Huang, B.; Si, Y. A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access 2019, 7, 167287–167300. [Google Scholar] [CrossRef]
Karamolegkos, N.; Koutroulis, E.; Kourgiantakis, M. A Wind Turbine Fault Diagnosis System Based on Long Short-Term Memory Networks. Energies 2021, 14, 6451. [Google Scholar]
Yu, H.; Ma, L.; Dai, J.; Zhao, Y. Intelligent Fault Diagnosis of Wind Turbine Gearbox Based on a Novel Convolutional Neural Network. IEEE Access 2021, 9, 44869–44878. [Google Scholar]
Zhao, L.; Wang, X.; Li, Y. Graph Neural Network Based Fault Diagnosis for Wind Turbine Gearboxes. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar]
Li, L.; Fan, Y.; Tse, M.; Lin, K.Y. A review of applications in federated learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
Saputra, Y.M.; Hoang, D.T.; Nguyen, D.N.; Dutkiewicz, E.; Mueck, M.; Srikanteswara, S. Energy demand prediction for smart homes with federated learning. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Big Island, HI, USA, 9–13 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Ft. Lauderdale, FL, USA, 20–22 April 2017; PMLR: New York, NY, USA, 2017; pp. 1273–1282. [Google Scholar]
Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the convergence of FedAvg on non-IID data. In Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. In Proceedings of the 3rd Conference on Machine Learning and Systems (MLSys), Austin, TX, USA, 2–4 March 2020. [Google Scholar]
Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. SCAFFOLD: Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), Online, 13–18 July 2020; PMLR: New York, NY, USA, 2020; pp. 5132–5143. [Google Scholar]
Doostmohammadian, M.; Qureshi, M.I.; Khalesi, M.H.; Rabiee, H.R.; Khan, U.A. Log-Scale Quantization in Distributed First-Order Methods: Gradient-Based Learning From Distributed Data. IEEE Trans. Autom. Sci. Eng. 2025, 22, 10948–10959. [Google Scholar] [CrossRef]
Tang, M.; Meng, C.; Wu, H.; Zhu, H.; Yi, J.; Tang, J.; Wang, Y. Fault detection for wind turbine blade bolts based on gsg combined with cs-lightgbm. Sensors 2022, 22, 6763. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Fan, P.; Yang, J.; Ke, S.; Ma, B.; Pei, Y.; Xu, J. Equivalent method for dfig wind farms based on modified lightgbm considering voltage deep drop faults. Int. J. Electr. Power Energy Syst. 2025, 164, 110451. [Google Scholar] [CrossRef]
Xiang, B.; Liu, Z.; Huang, L.; Qin, M. Based on the wsp-optuna-lightgbm model for wind power prediction. J. Phys. Conf. Ser. 2024, 2835, 012011. [Google Scholar] [CrossRef]
Dong, X.; Miao, Z.; Li, Y.; Zhou, H.; Li, W. One data-driven vibration acceleration prediction method for offshore wind turbine structures based on extreme gradient boosting. Ocean. Eng. 2024, 307, 118176. [Google Scholar] [CrossRef]
Tang, M.; Peng, Z.; Wu, H. Fault detection for pitch system of wind turbine-driven doubly fed based on ihho-lightgbm. Appl. Sci. 2021, 11, 8030. [Google Scholar] [CrossRef]
Yan, Z.; Zhang, L. Interpretable wind power prediction: A machine learning perspective using lightgbm and shap. In Proceedings of the 2024 2nd International Conference on Artificial Intelligence and Automation Control (AIAC), Guangzhou, China, 20–22 December 2024; IEEE: New York, NY, USA, 2024; pp. 225–229. [Google Scholar]
Xian, Q.; Feng, S.; Yang, Y.; Liu, J. Construction of wind farm load combination forecasting model based on gbdt, lightgbm and rf. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 29–31 August 2024; IEEE: New York, NY, USA, 2024; pp. 1305–1310. [Google Scholar]
Mousaei, A.; Naderi, Y.; Bayram, I.S. Advancing State of Charge Management in Electric Vehicles with Machine Learning: A Technological Review. IEEE Access 2024, 12, 43255–43283. [Google Scholar] [CrossRef]

Figure 2. Comparison of Accuracy and AUC for the FL and centralized ML models over 50 training rounds. The FL model demonstrates superior final accuracy and consistently higher AUC.

Figure 3. Comparison of the macro-averaged F1-score for the FL and ML models. The FL model’s F1-score improves to eventually surpass the stable performance of the centralized model.

Figure 4. Comparison of macro-averaged Precision and Recall. The FL model achieves significantly higher precision and shows an upward trend in recall, surpassing the centralized model in later rounds.

Table 1. Table of Abbreviations.

Abbreviation	Full Term
FL	Federated Learning
CR-FL	Consensus-Regularized Federated Learning
SCADA	Supervisory Control and Data Acquisition
ML	Machine Learning
NN	Neural Network
LightGBM	Light Gradient Boosting Machine
Non-IID	Non-Independent and Identically Distributed
AUC	Area Under the Receiver Operating Characteristic Curve
ReLU	Rectified Linear Unit

Table 2. Methodological Comparison with Related Works.

Study	Methodology	Model	Key Contribution
Liu et al. [6]	Centralized	Deep Residual Network	Improved feature extraction for fault detection.
Wang et al. [8]	Centralized	Hybrid 3D-CNN-LSTM	Captured spatio-temporal features for compound faults.
Jiang et al. [12]	Federated	CNN	Cloud-edge collaborative FL framework.
Ours	Federated (CR-FL)	Lightweight NN	Proves FL’s generalization superiority over a strong centralized baseline via consensus regularization.

Table 3. Hyperparameters for Centralized and Federated Models.

Parameter	Centralized (LightGBM)	Federated (Neural Network)
Model Architecture	Gradient Boosting Tree	MLP (53-128-6)
Training Rounds	50 Boosting Rounds	50 Communication Rounds
Learning Rate ( $η$ )	0.1	0.01
Optimizer	-	Adam
Local Epochs (E)	-	1
Number of Clients (K)	-	10
Activation (Hidden)	-	ReLU
Activation (Output)	-	Softmax (via Cross-Entropy)
Batch Size (Local)	-	32
Consensus Regularization ( $λ$ )	-	0.1

Table 4. Final Performance Comparison (Mean ± SD over 10 runs).

Metric	Centralized (LightGBM)	Federated (CR-FL)	p-Value
Accuracy	0.69 ± 0.02	0.79 ± 0.02	<0.001
AUC	0.90 ± 0.01	0.95 ± 0.01	<0.001
Precision	0.63 ± 0.03	0.75 ± 0.04	<0.001
Recall	0.68 ± 0.02	0.72 ± 0.03	<0.05
F1-Score	0.65 ± 0.02	0.73 ± 0.03	<0.001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, L.; Zhou, J.; Peng, Q.; Zhou, Q.; Zhang, H. Consensus-Regularized Federated Learning for Superior Generalization in Wind Turbine Diagnostics. Mathematics 2025, 13, 2570. https://doi.org/10.3390/math13162570

AMA Style

Li L, Zhou J, Peng Q, Zhou Q, Zhang H. Consensus-Regularized Federated Learning for Superior Generalization in Wind Turbine Diagnostics. Mathematics. 2025; 13(16):2570. https://doi.org/10.3390/math13162570

Chicago/Turabian Style

Li, Lan, Juncheng Zhou, Qiankun Peng, Quan Zhou, and Haoming Zhang. 2025. "Consensus-Regularized Federated Learning for Superior Generalization in Wind Turbine Diagnostics" Mathematics 13, no. 16: 2570. https://doi.org/10.3390/math13162570

APA Style

Li, L., Zhou, J., Peng, Q., Zhou, Q., & Zhang, H. (2025). Consensus-Regularized Federated Learning for Superior Generalization in Wind Turbine Diagnostics. Mathematics, 13(16), 2570. https://doi.org/10.3390/math13162570

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Consensus-Regularized Federated Learning for Superior Generalization in Wind Turbine Diagnostics

Abstract

1. Introduction

2. Related Works

2.1. Machine Learning for Wind Turbine Fault Diagnosis

2.2. Federated Learning in Industrial Applications

2.3. Mathematical Foundations and Challenges of Federated Learning

3. Problem Formulation

3.1. From Centralized Aggregation to Federated Collaboration

3.2. Mathematical Analysis of Client Drift in Federated Systems

3.3. Consensus-Regularized Federated Optimization

4. Proposed Methodology and System Architecture

4.1. Multiclass Fault Diagnosis Problem Formulated

4.2. Modeling Pipelines and Architectures

4.3. Computational Complexity and Scalability

5. Results and Discussion

5.1. Experimental Setup

5.2. Experimental Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI