Next Article in Journal
Investigation of Cybersecurity Bottlenecks of AI Agents in Industrial Automation
Previous Article in Journal
HRCD: A Hybrid Replica Method Based on Community Division Under Edge Computing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on the Application of Federated Learning Based on CG-WGAN in Gout Staging Prediction

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
*
Author to whom correspondence should be addressed.
Computers 2025, 14(11), 455; https://doi.org/10.3390/computers14110455
Submission received: 27 August 2025 / Revised: 20 September 2025 / Accepted: 25 September 2025 / Published: 23 October 2025
(This article belongs to the Special Issue Mobile Fog and Edge Computing)

Abstract

Traditional federated learning frameworks face significant challenges posed by non-independent and identically distributed (non-IID) data in the healthcare domain, particularly in multi-institutional collaborative gout staging prediction. Differences in patient population characteristics, distributions of clinical indicators, and proportions of disease stages across hospitals lead to inefficient model training, increased category prediction bias, and heightened risks of privacy leakage. In the context of gout staging prediction, these issues result in decreased classification accuracy and recall, especially when dealing with minority classes. To address these challenges, this paper proposes FedCG-WGAN, a federated learning method based on conditional gradient penalization in Wasserstein GAN (CG-WGAN). By incorporating conditional information from gout staging labels and optimizing the gradient penalty mechanism, this method generates high-quality synthetic medical data, effectively mitigating the non-IID problem among clients. Building upon the synthetic data, a federated architecture is further introduced, which replaces traditional parameter aggregation with synthetic data sharing. This enables each client to design personalized prediction models tailored to their local data characteristics, thereby preserving the privacy of original data and avoiding the risk of information leakage caused by reverse engineering of model parameters. Experimental results on a real-world dataset comprising 51,127 medical records demonstrate that the proposed FedCG-WGAN significantly outperforms baseline models, achieving up to a 7.1% improvement in accuracy. Furthermore, by maintaining the composite quality score of the generated data between 0.85 and 0.88, the method achieves a favorable balance between privacy preservation and model utility.

1. Introduction

Gout is a common form of inflammatory arthritis, and accurate disease staging is essential for developing effective treatment plans. In traditional clinical practice, gout staging is typically determined through specialized diagnostic tests conducted after patients present with painful symptoms [1,2]. In recent years, machine learning techniques have made notable advances in gout staging prediction. For example, XGBoost-based models have been employed to assess disease risk using features such as serum uric acid levels [3], while other studies have leveraged natural language processing techniques to identify gout attacks from electronic medical records [4]. However, these approaches generally face two fundamental challenges.
On the one hand, the inherent non-independent and identically distributed (non-IID) nature of medical data significantly limits the generalization capability of machine learning models. Existing studies have explored three primary directions to address this issue: data rebalancing, client selection, and personalized modeling. Pu Tian et al. proposed alleviating distributional discrepancies by sharing a small portion of real data [5]; however, this approach poses significant privacy risks. Wang et al. introduced the FedACS framework, which optimizes client participation strategies through client-side filtering [6]. Nevertheless, its static selection mechanism struggles to adapt to the dynamically evolving distributions common in healthcare data. Jeong et al. [7] proposed the FAug method, which leverages Generative Adversarial Networks for data augmentation, but the quality and diversity of the generated samples remain suboptimal. Overall, these methods still face critical limitations, including potential privacy leakage and low-quality synthetic data, when addressing complex non-IID challenges in medical scenarios.
On the other hand, privacy concerns in cross-institutional data sharing pose significant challenges to multicenter collaborative research. Traditional federated learning mitigates privacy risks by aggregating model parameters instead of sharing raw data [8]. However, recent studies have demonstrated that model parameters can still leak sensitive information through reverse engineering techniques [9]. To further strengthen privacy protection, researchers have explored advanced methods such as differential privacy [10] and secure multi-party computation [11]. Nonetheless, these approaches often come at the cost of reduced model performance. This trade-off is particularly problematic for tasks like gout staging prediction, which demand fine-grained feature representations. As a result, existing privacy-preserving techniques struggle to achieve an optimal balance between security and practical utility in such settings.
This paper proposes a federated learning approach based on a Conditional Gradient Penalty Wasserstein Generative Adversarial Network (CG-WGAN) to address the aforementioned challenges. The main contributions are summarized as follows:
  • This paper proposes a Conditional Gradient Penalty Wasserstein Generative Adversarial Network (CG-WGAN), which incorporates an optimized gradient penalty mechanism by introducing gout-specific label condition information and feature-aware noise perturbations. This design enables the generation of high-quality medical tabular data and significantly alleviates the issue of category distribution bias caused by non-IID data.
  • This paper designs a FedCG-WGAN federated learning framework that avoids the transmission of raw data or model parameters by employing synthetic data sharing instead of traditional parameter aggregation. This approach enables clients to train local personalized prediction models while preserving data privacy and enhancing security.
  • Experiments conducted on real-world gout medical records and the MIMIC-III dataset demonstrate that the proposed method improves prediction accuracy by 6.0%, while reducing training time by 32% and communication overhead by 40%, thereby validating its effectiveness and practical applicability.

2. Related Works

2.1. Existing Approaches to Gout Prediction

Accurate staging of gout is essential for guiding treatment decisions. However, traditional diagnostic methods heavily rely on clinical symptom assessment, joint fluid analysis, radiographic imaging, and other factors [12,13]. These approaches are often time-consuming, invasive, and dependent on patient-reported symptoms, which may result in diagnostic delays or misdiagnoses, particularly in early-stage or atypical cases.
In recent years, machine learning has been extensively applied to the prediction of gout staging. For instance, Brikman et al. [3] developed an XGBoost model using a national cohort to predict the risk of gout development in patients with hyperuricemia based on features such as serum uric acid levels, age, and comorbidities. Zheng et al. [4] employed natural language processing (NLP) and machine learning techniques to identify gout flares from electronic medical records. Lei et al. [14] constructed a logistic regression model to predict the risk of tophus formation in gout patients and integrated it with the SHAP method to enable personalized risk assessment. However, these studies were limited to single-center datasets and did not address privacy concerns associated with multi-institutional collaborations.
In addition, Cüre et al. [15] employed a machine learning model to analyze factors associated with renal impairment in patients with gouty arthritis, identifying urea, creatinine, and hemoglobin levels as key predictors. Xiao et al. [16], on the other hand, developed an interpretable machine learning model for gout diagnosis based on clinical data and ultrasound features. These studies have expanded the application of machine learning in gout-related prediction from the perspectives of renal function correlation and multimodal data integration, respectively. However, none of them specifically addressed the prediction of gout staging.
Although existing studies have made notable progress in gout prediction, there remains a need for more comprehensive prediction frameworks for gout staging that consider data privacy concerns and support cross-institutional collaboration.

2.2. Applications of Federated Learning in Healthcare

In federated learning, handling the non-IID data remains a major challenge, particularly in healthcare scenarios. For instance, different hospitals may specialize in distinct diseases and employ varying devices for data collection, resulting in heterogeneous data distributions. Meanwhile, data privacy and security are of paramount importance in the medical domain. Therefore, it is essential to develop effective federated learning approaches capable of addressing the non-IID issue without requiring the sharing of raw data.
Pu Tian et al. [5] and Zhao et al. [17] proposed methods to address the non-IID data problem by sharing a small amount of real data among clients. Although these approaches improve model accuracy, they introduce potential privacy risks due to the exchange of real data. To mitigate the issue of uneven class distribution, Wang et al. [6] proposed FedACS, a client selection framework that filters participants based on data quality and diversity. Similarly, Taïk et al. developed a scheduling algorithm that prioritizes devices with high-quality datasets. While these strategies enhance client participation through selective data utilization, they may inadvertently exacerbate the model drift problem caused by substantial heterogeneity in client data distributions. To address label imbalance in non-IID settings, Jeong et al. [7] introduced FAug, a method that employs a Generative Adversarial Network (GAN) to generate missing labels. However, the inherent instability of traditional GAN training often results in limited sample diversity in the synthetic data produced by FAug.
In summary, although existing approaches have made notable progress in addressing the non-IID data challenge and enhancing model performance, they continue to face significant limitations, including potential risks of privacy leakage and a lack of sufficient adaptability to dynamic data environments.

2.3. Applications of Generative Adversarial Networks in Healthcare

Generative Adversarial Networks (GANs), as a powerful class of generative models, have been widely employed in data generation tasks in recent years. However, traditional GAN architectures face considerable challenges when processing structured data, such as clinical electronic medical records—particularly in generating complex data types and in managing imbalanced datasets.
In recent years, several studies have proposed GAN-based architectures specifically tailored for applications in the medical domain. For instance, MedGAN [18] integrates adversarial training with non-adversarial loss functions to enable high-quality image translation between source and target domains. Similarly, Fabbri et al. [19] combine conditional GAN (cGAN) with Wasserstein GAN (WGAN) to facilitate directed image modification and latent space exploration. Their approach enhances the stability of GAN training by extending the training duration and is primarily designed for medical image synthesis. While these methods demonstrate strong performance in generating synthetic image data, they remain limited in their ability to handle tabular data effectively.
A.H. Aziira et al. [20] proposed a method for synthesizing continuous numerical data using GAN, where the quality of the generated data is evaluated by comparing the classification accuracy between real and synthetic datasets using the XGBoost algorithm. GS-WGAN [21] introduced a gradient-sanitized variant of the Wasserstein GAN, which enables generator training with differential privacy guarantees and can be naturally extended to both centralized and federated data settings. CTAB-GAN [22] further addresses challenges such as long-tailed distributions and data imbalance by incorporating mixed-type encoders and conditional vectors. However, these GAN-based models exhibit notable limitations. Specifically, they are unable to generate class-specific samples for designated variables and face difficulties when handling mixed-type data that include both categorical and continuous attributes, as well as imbalanced class distributions. cGAN plays a critical role in addressing data imbalance by enabling targeted control over the class labels of generated samples. For instance, CTGAN [23] mitigates imbalance issues by introducing a conditional generator that models complex data distributions during synthetic data generation. Similarly, CW-GAN [24] utilizes conditional vectors to oversample minority classes and improves generation quality by optimizing the Wasserstein distance. Although these methods enhance the ability of GAN-based generators to control class-specific outputs, they still demonstrate suboptimal performance when applied to electronic health records (EMRs), particularly due to the inherent complexity of mixed-type data.
In summary, although existing GAN-based approaches have achieved notable progress in image synthesis and the generation of partially structured data, significant challenges remain when applying these methods to complex tabular data, such as clinical EMRs.

3. Method

To address the challenges arising from the non-IID nature of client data in federated learning—such as category distribution bias, slow model convergence, and concerns regarding privacy protection—this study proposes a federated learning framework named FedCG-WGAN, which is based on the Conditional Generative Wasserstein GAN (CG-WGAN). As illustrated in Figure 1, the proposed method is implemented in three phases. First, each client employs CG-WGAN to generate synthetic samples that reflect the statistical characteristics of its local data. Second, a central server aggregates high-quality synthetic data and redistributes it according to the specific requirements of individual clients. Finally, each client combines the received synthetic data with its original dataset to train a personalized predictive model. By replacing the traditional parameter aggregation mechanism with synthetic data sharing, this architecture effectively alleviates the challenges posed by non-IID data distributions while preserving data privacy.

3.1. Improved Generative Adversarial Network CG-WGAN

Generative Adversarial Networks (GANs) are designed to generate realistic synthetic data by training a generator G and a discriminator D in an adversarial framework [25]. The objective function is defined as shown in Formula (1).
min G max D V ( D , G ) = E x P r [ log D ( x ) ] + E z P z [ log ( 1 D ( G ( z ) ) ) ]
where x denotes a real data sample, z represents a random noise vector input to the generator, and G ( z ) is the synthetic data sample generated from z. The term D ( x ) indicates the probability assigned by the discriminator that x is a real sample. The notation x P r denotes samples drawn from the real data distribution, while z P z indicates samples drawn from the prior distribution over the latent space.
The Conditional Generative Adversarial Network (cGAN) extends the original GAN framework by incorporating an additional condition variable c (e.g., class labels or diagnostic categories) [26], enabling both the generator and the discriminator to model conditional distributions based on c. The objective function of cGAN is defined as shown in Formula (2).
min G max D V ( D , G ) = E x P r [ log D ( x | c ) ] + E z P z [ log ( 1 D ( G ( z | c ) | c ) ) ]
where G ( z | c ) denotes the sample generated by the generator conditioned on c, and D ( x | c ) represents the probability that the discriminator assigns to sample x being real, given the condition c.
Although cGAN introduces class-conditional control, it still adopts the traditional GAN cross-entropy loss function, which can lead to issues such as gradient vanishing and mode collapse during training. To measure the discrepancy between the real data distribution P r and the generated data distribution P g , Wasserstein GAN (WGAN) redesigns the loss function by introducing the Wasserstein distance to replace the cross-entropy, and quantifies the difference between the real and generated distributions [27]. The corresponding objective function is expressed as Formula (3).
max D D 1 E x P r [ D ( x ) ] E x ˜ P g [ D ( x ˜ ) ]
where D 1 denotes the set of functions that satisfy the 1-Lipschitz condition, and x ˜ P g represents samples drawn from the generator.
To ensure that the discriminator D lies within the specified function space, traditional WGAN enforces the Lipschitz continuity constraint through weight clipping [28,29]. However, this strategy often limits the expressive capacity of the model. To address this, WGAN-GP further introduces a gradient penalty term [30], which penalizes the deviation of the gradient norm at interpolated samples from 1, thereby achieving the 1-Lipschitz constraint more smoothly. The gradient penalty term is defined as Formula (4).
L g p = λ · E x ^ P x ^ x ^ D ( x ^ , c ) 2 1 2
where λ is the gradient penalty coefficient, x ^ denotes a randomly interpolated sample between real and generated samples, and x ^ D ( x ^ , c ) represents the gradient of the discriminator with respect to the interpolated sample x ^ .
However, traditional cGAN and WGAN-GP still encounter significant challenges during training [31,32,33]. In particular, the loss functions of the generator and discriminator often fail to effectively guide the training process, resulting in limited diversity and quality of the generated samples. To address these issues, this study proposes an improved model, CG-WGAN, which incorporates gout staging labels as conditional constraints, extending the conventional cGAN and WGAN-GP frameworks. Additionally, the computation of the gradient penalty term in the discriminator is optimized, and a feature-aware perturbation term is introduced into the generator’s input noise. These enhancements collectively improve both the realism and diversity of the generated samples. The overall architecture of CG-WGAN is illustrated in Figure 2.
The overall objective function of CG-WGAN is defined as Formula (5).
min G max D E x P r [ D ( x , c ) ] E z P z [ D ( G ( z , c ) , c ) ] + λ · L g p
where c denotes the conditional information (i.e., gout staging labels), G ( z , c ) represents the conditionally generated samples, D ( x , c ) is the conditional discriminator function, and λ is the weighting coefficient for the gradient penalty term.
The generator network architecture consists of multiple residual blocks, each comprising a linear layer and a ReLU activation function. This design enhances training stability and convergence speed, while residual connections help mitigate the vanishing gradient problem. The output layer of the generator is a linear layer that maps the hidden representation to the target data dimension. Depending on the feature type, different activation functions are applied at the generator output: continuous features use the tanh function, while categorical features are processed via Gumbel-Softmax to preserve differentiability during backpropagation through discrete variables. The loss function of the generator is defined as Formula (6).
L G = E z P z D ( G ( z , c ) , c )
where G ( z , c ) denotes the sample generated under condition c, and D ( G ( z , c ) , c ) represents the discriminator’s output for the generated sample. The negative sign indicates that the generator aims to maximize the discriminator’s output, thereby “fooling” the discriminator.
The discriminator is composed of multiple fully connected layers. It takes either generated or real samples as input. The hidden layers employ ReLU activation functions, while the output layer uses a Sigmoid activation function to produce a probability score indicating the authenticity of the input data. The loss function of the discriminator is defined as Formula (7).
L D = E z P z D ( G ( z , c ) , c ) E x P r D ( x , c ) + λ · L g p
To satisfy the 1-Lipschitz constraint required by the Wasserstein distance, a gradient penalty is introduced by interpolating between real and generated samples, as defined in Formula (8).
x ^ = ϵ · x + ( 1 ϵ ) · x ˜ , ϵ U ( 0 , 1 )
where ϵ is a randomly sampled weighting coefficient from a uniform distribution U ( 0 , 1 ) , which determines the interpolation between the real sample x and the generated sample x ˜ .
Residual structures improve gradient propagation and alleviate the vanishing gradient problem, thereby ensuring stable training. The Gumbel-Softmax technique provides a differentiable approximation for discrete features, allowing effective backpropagation when learning classification variables.
In contrast to CWGAN-GP [34], which does not explicitly address the challenge of non-IID data in federated learning, our method further refines the computation of gradient penalties to better capture the characteristics of medical tabular data. Specifically, we introduce feature-aware noise perturbations during interpolation, which align the gradient penalty direction with the principal components of the actual data distribution. This refinement proves particularly effective for handling mixed data with discontinuities and sparsity, enabling the generator to more efficiently learn the joint distribution of categorical and continuous variables.
During each training iteration, real samples along with their corresponding labels are randomly drawn from the dataset to train the discriminator, which learns to distinguish between real and generated data. Simultaneously, random noise is combined with conditional information—such as gout staging labels—and input into the generator to produce synthetic samples. Training is terminated after a predefined number of iterations to prevent overfitting and to ensure the generation of high-quality data. The pseudo-code for the CG-WGAN framework based on this design is presented in Algorithm 1.
Algorithm 1 CG-WGAN training algorithm.
Require:  Generator parameters G 0 , discriminator parameters D 0 , gradient penalty coefficient λ , number of critic steps n c r i t i c
Ensure:  Updated generator G θ and discriminator D ϕ
   1:
Initialize G G 0 , D D 0
   2:
Set hyperparameters: epochs n e p o c h s , batch size m, learning rates l r G , l r D
   3:
while not converged do
   4:
      for  t = 1 to n e p o c h s  do
   5:
            for  i = 1 to m do
   6:
                  Sample real data ( x , c ) P r
   7:
                  Sample noise z P z , generate fake data x ˜ = G ( z , c )
   8:
                  Sample ϵ U ( 0 , 1 )
   9:
                  Interpolate: x ^ = ϵ x + ( 1 ϵ ) x ˜
 10:
                  Compute gradient penalty: G P = λ · ( x ^ D ( x ^ , c ) 2 1 ) 2
 11:
                  Compute discriminator loss: L D = D ( x ˜ , c ) D ( x , c ) + G P
 12:
                  Update D: ϕ ϕ l r D · ϕ L D
 13:
                  if  i mod n c r i t i c = = 0  then
 14:
                        Sample noise z P z , generate x ˜ = G ( z , c )
 15:
                        Compute generator loss: L G = D ( x ˜ , c )
 16:
                        Update G: θ θ l r G · θ L G
 17:
                  end if
 18:
            end for
 19:
      end for
 20:
end while

3.2. CG-WGAN-Based Federated Learning Approach

To address the non-IID problem arising from the uneven distribution of client data in federated learning, as well as the potential risk of privacy leakage associated with traditional parameter aggregation, this paper proposes FedCG-WGAN—a federated learning framework based on CG-WGAN. FedCG-WGAN maintains the stability of GAN training while avoiding the need to transmit high-fidelity synthetic samples to the central server. This design choice mitigates privacy risks, as excessively realistic generated data may inadvertently reveal sensitive information [35,36,37]. Notably, the intentional moderation of data quality has a minimal impact on model performance, yet significantly enhances privacy protection and computational efficiency. The core procedures of the FedCG-WGAN framework are outlined as follows:
  • Local Model Training: Each client k trains a CG-WGAN using its local dataset D k . The newly generated synthetic samples D ˜ k are defined as in Formula (9). The feature distribution of these samples is evaluated locally. A comprehensive quality score Q is then computed by combining the discriminator confidence score (D-score) and the feature distribution similarity score (F-score). Finally, the synthetic samples, their quality scores, missing labels, and sample counts are uploaded to the central server.
    D ˜ k = G k ( z i c i ) z i p z , c i p local ( c )
    where G k denotes the generator network of the CG-WGAN on client k, z i is a random noise vector sampled from the noise distribution p z , and c i is a conditional label sampled from the local label distribution p local ( c ) .
  • Verification of Data Quality and Assessment of Data Balance: The central server receives the synthetic data and corresponding quality scores from each client. For the synthetic dataset D ˜ k uploaded by client k, candidate samples satisfying Q k Q i are retained.
  • Aggregation and Distribution of Project Data by the Central Server: The server constructs a shared dataset of synthetic cases by aggregating all candidate samples D ^ k from clients. Based on the missing label categories and the number of missing samples uploaded by each client, the server samples from the shared dataset D shared and distributes the data to each client accordingly. Assuming the missing category for client k is c m , the server allocates samples from the shared dataset D shared as Formula (10).
    D k new = x ˜ D shared c ( x ˜ ) { c m } , D k new = n m
    where D k new denotes the new synthetic dataset received by client k from the server, c ( x ˜ ) is the label category of the synthetic sample x ˜ , and n m is the number of samples requested by client k for category c m .
  • Client Model Update: After receiving the data distributed by the central server, each client merges the new data with its local dataset to form an updated dataset. This updated dataset is then used to train and update the client’s local model. The loss function is defined as Formula (11).
    L k ( θ k ) = E ( x , y ) D k D k new l ( f θ k ( x ) , y )
    where L k ( θ k ) denotes the loss function minimized with respect to the local model parameters θ k after merging the local and synthetic datasets on client k; is the cross-entropy loss, and f θ k is the local classification model, implemented as XGBoost.
  • Model Evolution and Optimization: The above process is repeated multiple times, where each iteration includes retraining the CG-WGAN to generate additional synthetic data and using the updated dataset to train the model. Each client’s model will gradually adapt to the newly generated data and increasingly complex tasks, resulting in a more personalized and accurate model. Algorithm 2 presents the pseudocode of the FedCG-WGAN model.
Algorithm 2 FedCG-WGAN training algorithm.
Require:  Local datasets { D k } k = 1 K , communication rounds T, quality threshold Q i
Ensure:  Personalized models { θ k } k = 1 K
  1:
for   t = 1 T   do
  2:
      for each client k = 1 K  do
  3:
            Train CG-WGAN using local dataset D k
  4:
            Generate synthetic data D ˜ k as per Formula (9)
  5:
            Evaluate data quality score Q k
  6:
            Upload D ˜ k , Q k , and missing labels to server
  7:
      end for
  8:
      for each client k = 1 K  do
  9:
             Retain candidate samples D ^ k { x ^ D ˜ k Q k Q i }
10:
     end for
11:
     Construct shared dataset D shared = k = 1 K D ^ k
12:
     for each client k = 1 K  do
13:
           Distribute necessary samples from D shared to client k as per Formula (10)
14:
     end for
15:
     for each client k = 1 K  do
16:
            Merge new data with local dataset to form D k new
17:
            Update model θ k using L k ( θ k ) as per Formula (11)
18:
     end for
19:
     Repeat the training and updating process to refine models
20:
end for

3.3. Convergence Analysis

To theoretically demonstrate the convergence of FedCG-WGAN under non-IID data conditions, this paper establishes an optimization-based analytical framework. By incorporating the effects of local stochastic updates and distributional mismatch, and modeling the regularization effect of data augmentation via CG-WGAN, an upper bound on the convergence of the global model’s gradient norm is derived. This analysis illustrates how the proposed method manages gradient variability and mitigates data heterogeneity, thereby enabling stable progress during training.
Assume that the local objective function f k ( w ) at client k satisfies Lipschitz smoothness. That is, for any parameter vectors w 1 , w 2 , there exists a Lipschitz constant L > 0 in Formula (12).
f k ( w 1 ) f k ( w 2 ) L · w 1 w 2
where f k ( w ) denotes the gradient of the local objective function at client k with respect to the model parameters. This smoothness assumption ensures the continuity and bounded variation in the gradients, serving as a prerequisite for the convergence of gradient-based optimization methods.
To quantify statistical noise arising from local computations on private datasets, it is assumed that the variance of the local gradient estimates is bounded, as shown in Formula (13).
E f k ( w ) F ( w ) 2 σ 2
where F ( w ) denotes the gradient of the global objective function, and σ 2 represents the bounded variance of the local gradients.
In highly non-IID data settings, traditional federated averaging methods may suffer from aggregation bias. To mitigate this issue, this paper employs CG-WGAN to generate distribution-aligned synthetic samples, enabling each client’s data distribution to approximate the central distribution P r . Let δ denote the discrepancy between the expected loss under the synthetic data distribution P g and that under the global data distribution P r , shown in Formula (14).
E x P g l ( x ^ ) E x P r l ( x ) δ
where E x P g denotes the expectation over the synthetic data distribution P g , E x P r denotes the expectation over the global data distribution P r , and l ( · ) denotes the loss function. This assumption effectively models the regularization effect introduced by CG-WGAN, allowing the distribution alignment to be quantified and controlled analytically.
In each communication round, each client performs local stochastic gradient descent based on the global model parameter w t at round t and obtains the updated local parameter in Formula (15).
w t + 1 k = w t η F k ( w t ) ,
where w t + 1 k denotes the updated local model of client k in round t, η is the learning rate, and F k ( w t ) represents the stochastic gradient of client k under the current global model. Let Δ t k = w t + 1 k w t denote the parameter deviation between client k and the global model in this round. By controlling the learning rate and bounding the variance, this paper derives Formula (16).
E Δ t k 2 η 2 F ( w t ) 2 + σ 2
This inequality quantifies the local update error and its dependence on the learning rate η , the number of iterations, and the gradient variance σ 2 .
When the central server aggregates the models, the bias caused by data heterogeneity can be represented by the bias term δ t , which satisfies the following condition under CG-WGAN, as shown in Formula (17).
δ t 1 m k = 1 m δ k
Therefore, the expected squared norm of the average bias across all clients is bounded as shown in Formula (18).
E δ t 2 1 m k = 1 m δ k 2 + σ 2 m
where m denotes the number of clients, δ k represents the systematic bias caused by distributional inconsistency, and σ 2 denotes the variance among clients. This paper shows that as the number of clients m increases, the distributional shift introduced by client heterogeneity can be theoretically mitigated through synthetic alignment.
Considering the change in the global loss function F ( w t ) between consecutive rounds, the following inequality is derived under the smoothness condition, as shown in Formula (19).
F ( w t + 1 ) F ( w t ) η F ( w t ) 2 + L η 2 2 F ( w t ) 2 + σ 2 + δ t 2
where w t denotes the global model parameters at the t-th communication round, F ( w t ) represents the gradient of the global loss function at round t, and δ t 2 denotes the variance of distributional shift caused by data heterogeneity in round t. After normalization across all T communication rounds, the upper bound on the expected average squared gradient norm is obtained, as shown in Formula (20).
1 T t = 1 T E F ( w t ) 2 2 F ( w 0 ) F * η T + L η σ 2 + 1 m k = 1 m δ k 2
where F ( w 0 ) denotes the initial global loss value, F * represents the minimum of the global loss function, and δ k 2 denotes the synthetic variance of client k, which reflects the squared deviation between local and global data distributions.
The results indicate that global convergence is influenced by various factors, including the total number of communication rounds, the variance of local gradients, and the degree of mismatch in data distributions across clients. Although increasing the number of communication rounds can reduce optimization error over time, a smaller gradient variance is more conducive to stable parameter updates. The distributional discrepancy δ k 2 reflects the impact of non-IID data, which is alleviated by the sample alignment mechanism introduced by CG-WGAN.
In summary, the proposed convergence analysis demonstrates that the CG-WGAN mechanism effectively controls gradient variance induced by statistical heterogeneity, thereby achieving stable and bounded convergence behavior. The derived convergence bounds provide theoretical support for the robustness and efficiency of the proposed method, particularly in real-world medical scenarios characterized by high data heterogeneity and limited data availability.

4. Experiment

4.1. Description of the Dataset

The experimental dataset was collected from gout outpatient clinics at three hospitals in Qingdao, Shandong Province. These are real-world electronic medical records obtained under strict confidentiality agreements, comprising a total of 55,032 records spanning from September 2016 to May 2023. After data preprocessing, 51,127 records met the inclusion criteria for analysis. These records were used for diagnosing gout and identifying its various clinical stages. Table 1 presents the final data distribution across the three hospitals. The data reveal a significant class imbalance among the samples, with the intermittent phase being the most represented and the acute arthritis phase the least. In addition, the MIMIC-III public database was utilized as a supplementary validation set [38]. A total of 2163 cases meeting the diagnostic criteria for gout (ICD-9: 274.x) were selected, each containing complete records of key clinical indicators such as serum uric acid (SUA) and C-reactive protein (CRP).

4.2. Pre-Processing of Gout History Data

Electronic medical records (EMRs) provide valuable digital resources for data analysis, mining, and research, supporting clinical decision-making and improving medical practice. However, EMRs often suffer from issues that limit their effectiveness in research applications. Common problems include missing test items, abnormal values caused by human or mechanical errors, and the lack of standardized formats across records [39]. To improve data quality, this study applied a systematic preprocessing workflow to gout case records collected from three hospitals in Qingdao. The preprocessing steps are as follows:
  • Outlier handling: The dataset contained ‘dirty data’, i.e., records with values beyond clinically reasonable ranges (e.g., height = 1888 cm, weight = 300 kg, negative ALT values, or random urine FEUA exceeding 1). Based on clinical guidelines and typical distributions of medical test results, acceptable ranges were defined for each feature, and unreasonable records were removed. A total of 3905 records (approximately 7.1% of the initial dataset) were discarded. The resulting clean dataset was denoted as feature set F1.
  • Missing value imputation: The missing rate of each feature in F1 was first calculated. Fifteen features with a missing rate 80 % were directly discarded. For the remaining features, different imputation strategies were applied:
    • Mean substitution stratified by gender for height and weight.
    • Group-wise averages stratified by gender and age for smoking duration.
    • Multivariate linear regression using height and weight as predictors for waist and hip circumference.
    • Random forest regression for biochemical test items, leveraging feature importance and ensemble predictions.
    After imputation, a complete dataset was obtained and denoted as feature set F2.
  • Feature selection: To improve diagnostic accuracy and ensure privacy, irrelevant or sensitive attributes were removed, including mobile phone numbers, ID numbers, home addresses, and prescription details. This step excluded eight features, resulting in feature set F3.
  • Standardization and alignment: To mitigate dataset drift and enhance generalization in the federated setting, F3 underwent standardization and alignment. Continuous features were standardized using Z-scores, categorical features were consistently encoded, and discrepancies in feature naming and units across hospitals were harmonized. This step is essential for alleviating inter-client distribution heterogeneity (non-IID). The final processed dataset was denoted as F4, which was used for all subsequent experiments.
To ensure fairness and reproducibility, the MIMIC-III dataset was processed strictly following the same preprocessing workflow as the Qingdao dataset. This unified workflow guarantees consistency in feature engineering across all input models, thereby enabling reliable performance comparisons between datasets.

4.3. Experimental Setup

To reduce the complexity and training time of the diagnostic model and further enhance its predictive accuracy, a correlation analysis was performed on the medical record dataset F4. After evaluating several analytical techniques, a combination of the chi-square test and analysis of variance (ANOVA) was adopted, supplemented by clinical relevance assessments. This integrated approach led to the selection of 30 key features for gout stage prediction. These features include fundamental physiological indicators such as age, gender, heart rate, and blood pressure, as well as critical biochemical markers like uric acid, creatinine, and bilirubin. Table 2 presents the statistical analysis results of representative features. Among them, uric acid was identified as both statistically significant and clinically essential; systolic blood pressure showed a correlation with the degree of inflammation; the arthrogryposis score proved particularly relevant for diagnosing the acute phase of gout; and gender was retained based on clinical consensus, despite its relatively limited statistical significance.
This study conducts a comparative evaluation of XGBoost, LightGBM, CatBoost, and MLP. All experiments were performed on a local dataset (Hospital A) under an IID setting, using identical training and testing splits. The evaluation focused on three aspects: single-round local training time, memory consumption, and prediction accuracy. The comparison results are summarized in Table 3.
The results show that XGBoost achieves slightly higher local accuracy compared to LightGBM and CatBoost and significantly outperforms MLP. While LightGBM demonstrates faster training speed, XGBoost offers a more favorable balance between accuracy and computational efficiency. More importantly, XGBoost exhibits superior capability in identifying feature importance when handling structured medical data. Based on these advantages, this study adopts XGBoost as the core client-side model.
To evaluate the effectiveness of the proposed method, a comparative analysis is conducted against existing federated learning approaches. The evaluation focuses on both model performance—measured by metrics such as accuracy and F1-score—and time efficiency, including single-round training time and communication overhead. The federated learning methods included in the comparison are as follows:
  • FedAvg [40] trains models locally on multiple clients and then aggregates the client models by averaging their parameters into a global model.
  • FedProx [41] is an improved version of FedAvg that introduces a proximal term to address the issue of non-IID data distribution.
  • FedMD [42] utilizes model distillation techniques to integrate knowledge from different clients.
Grid search was employed to optimize the hyperparameters of the XGBoost model. Key tuning parameters included the learning rate, the number of base learners (n_estimators), and the maximum tree depth (max_depth). The final configuration was determined as follows: the task type was set to multi-class classification, and the number of base learners was fixed at 3, meaning the model consists of three gradient boosting trees. Regarding the tree structure, max_depth was set to 9, and the minimum sum of instance weight required in a child node (min_child_weight) was set to 3, aiming to balance model complexity and the risk of overfitting.
This study adopts a multi-level validation strategy. Each client’s local dataset is randomly partitioned into training and independent test sets using an 80/20 split. Within the training set, 20% is further reserved as a validation set for hyperparameter tuning and monitoring the training process. The test set remains strictly isolated throughout and is used exclusively for final performance evaluation. To reduce bias introduced by random partitioning, all experiments are repeated five times under different random seeds, and the reported metrics represent the averaged results.

4.4. Experimental Evaluation Indicators

4.4.1. Composite Assessment Method for Data Quality

A single metric often reflects only one aspect of data quality. For example, the discriminator confidence score primarily indicates whether the generated data can deceive the discriminator, while the feature distribution similarity score focuses on the statistical closeness between generated and real data in the feature space.
To evaluate the quality of data generated by CG-WGAN, this paper proposes a dual-metric fusion-based composite evaluation mechanism, which integrates the discriminator confidence score ( D s c o r e ) and the feature distribution similarity score ( F s c o r e ) to construct a comprehensive quality score (Q). The computation process is defined in Formulas (21)–(23).
D score = 1 N i = 1 N D ( G ( z i ) )
F score = Wasserstein E x P r [ f ( x ) ] , E x P g [ f ( x ) ]
Q = α · D score + β · ( 1 F score )
where P r denotes the real data distribution, P g denotes the generated data distribution, and f ( x ) represents the feature extractor. The term D s c o r e reflects the average probability assigned by the discriminator to generated samples being real, indicating their realism. The F s c o r e measures the feature-level similarity between generated and real data using the Wasserstein distance; a smaller distance indicates better alignment.
The weight coefficients α and β are set based on the core principle of balancing privacy and utility in federated learning. Specifically, D s c o r e measures the local fidelity of generated samples. A higher D s c o r e indicates that the generated data are highly similar to real data at the sample level. This can improve model performance but may also lead to overfitting to individual samples, which significantly increases the risk of membership inference attacks. In contrast, F s c o r e evaluates the global consistency of feature distributions across clients. Its goal is to align data distributions without reproducing specific sample details, thus fundamentally mitigating the non-IID problem.
From a theoretical perspective, increasing the weight of F s c o r e helps enhance global consistency among clients. This effectively alleviates performance degradation caused by data heterogeneity. Reducing the weight of D s c o r e can suppress the generator’s overfitting to individual samples, serving as a built-in privacy protection mechanism.
Therefore, this paper adopts a weight setting of α = 0.3 and β = 0.7 . This prioritizes global alignment of data distributions across clients while limiting the over-realism of generated samples. Grid search experiments show that this combination maintains high model accuracy and effectively reduces the risk of privacy leakage. Further experiments demonstrate that this weighting strategy consistently balances privacy protection and model performance across different tasks, showing strong effectiveness.

4.4.2. Quantitative Analysis of Model Performance Improvement

To quantify the performance improvement of the model, this paper introduces a new metric termed the Relative Performance Improvement Rate (RPIR), defined as shown in Formula (24).
RPIR = P current P base
where P base denotes the test accuracy obtained from locally trained models without the proposed method, and P current represents the test accuracy achieved after applying the proposed approach.

4.4.3. Performance Indicators in Gout Diagnostic Tasks

This paper conducts an in-depth study on the diagnosis of gout, which is inherently a multi-class classification problem. Gout diagnosis typically involves the identification of multiple disease stages, such as the acute arthritis phase, intermittent phase, and chronic arthritis phase. To evaluate the classification performance, this paper introduces four commonly used metrics: Accuracy, Precision, Recall, and the F1-score, as defined in Formulas (25)–(28).
Accuracy = T P + T N T P + T N + F P + F N × 100 %
Precision = T P T P + F P × 100 %
Recall = T P T P + F N × 100 %
F 1 - score = 2 × Precision × Recall Precision + Recall × 100 %
where T P (True Positives) refers to the number of correctly predicted positive cases, F P (False Positives) denotes the number of incorrectly predicted positive cases, F N (False Negatives) indicates the number of incorrectly predicted negative cases, and T N (True Negatives) represents the number of correctly predicted negative cases.
Before transmission, all data are converted into an efficient binary array format and compressed using zlib to emulate the optimized transmission process in real-world network environments. To rigorously quantify communication efficiency within the federated learning framework, this study explicitly defines the metrics for communication cost. Specifically, MB values denote the compressed data size, representing the total effective payload transmitted between all clients and the central server in each round of federated learning.

5. Result and Discussion

The experiment is divided into two parts to comprehensively validate the effectiveness of the proposed method. In the first part, the quality and privacy preservation of the data generated by CG-WGAN are evaluated using a composite scoring mechanism and visualization analysis. In the second part, model performance is assessed under both independent and identically distributed (IID) and non-independent and identically distributed (non-IID) scenarios. Further comparative experiments under the non-IID setting are conducted using electronic medical record data of gout patients from three hospitals in Qingdao, along with 2163 cases from the MIMIC-III database that meet the diagnostic criteria for gout (ICD-9: 274.x).

5.1. CG-WGAN Generated Data Quality and Privacy Security Assessment

In this phase of the experiment, CG-WGAN models were independently constructed using electronic medical record data of gout patients from three hospitals in Qingdao to generate synthetic samples. To monitor the training process, three line charts were plotted, each depicting Q-value across different training rounds. In these plots, the horizontal axis represents the number of training rounds, while the vertical axis denotes the corresponding Q-value.
As illustrated in Figure 3, the experimental results demonstrate that the Q-values for all three institutions exhibit a pattern of rapid increase followed by gradual stabilization. This trend indicates that the quality of the generated data becomes stable during the middle to later stages of training. Based on this observation, the inflection point in the number of training rounds can be identified to effectively mitigate the risk of overfitting while maintaining data quality. This approach ensures a reliable foundation of synthetic data for subsequent experiments.
Subsequent experiments were conducted using different Q-value thresholds, and the corresponding RPIR was calculated. As shown in Table 4, the RPIR scores of all experimental groups significantly exceed the benchmark value, confirming the effectiveness of the proposed method in enhancing model performance. Notably, when the Q-value surpasses 0.85, the rate of RPIR improvement slows considerably, suggesting that performance optimization approaches its theoretical upper bound. Therefore, selecting a Q-value threshold of 0.85 strikes a balance between maintaining performance gains, reducing computational overhead, and, most importantly, mitigating the risk of privacy leakage associated with overly realistic synthetic data.
To intuitively assess the consistency between generated and real data distributions, this study further employs t-SNE for dimensionality reduction and visual comparison. A random subset of 3000 real samples and 3000 synthetic samples with Q-values above 0.85 were projected into a shared two-dimensional space, as shown in Figure 4. The resulting point clouds of real and synthetic data exhibit substantial overlap and intermixing, without forming distinct clusters. This suggests that, in the original high-dimensional feature space, the distribution of synthetic data produced by CG-WGAN closely approximates that of real data, making them indistinguishable through straightforward visualization methods.
To evaluate the clinical applicability of the generated data, the distribution of key clinical features in the synthetic dataset (with Q-value = 0.85) was compared to that of the real dataset. As shown in Table 5, the synthetic data exhibited a highly consistent distribution with the real data, with correlation coefficients exceeding 0.92 for core clinical features such as serum uric acid levels, sites of joint involvement, and types of complications. In addition, the synthetic data preserved reasonable diversity in feature combinations. For instance, the proportion of patients with elevated CRP during acute exacerbations was 72.3% in the synthetic data, compared to 75.1% in the real data; similarly, the incidence of tophi in patients during the chronic phase was 41.6%, versus 39.8% in the real data. These results demonstrate a high degree of fidelity and diversity, confirming the practical utility of the synthetic data for clinical research applications.

5.2. Comparison of Model Performance Based on EMRs

To evaluate the effectiveness of the proposed method, comparative experiments were conducted using data from three hospitals in Qingdao under both IID and non-IID settings. The Q-values of the generated samples were set between 0.85 and 0.88. The relationship between the number of communication rounds and the Relative Performance Improvement Rate (RPIR) was analyzed for FedAvg, FedProx, FedMD, and the proposed method. The RPIR values achieved by each method across different communication rounds were recorded and are presented in Figure 5.
In the IID scenario, the RPIR values of all methods increase steadily with the number of communication rounds. Among them, FedAvg achieves the best performance as training progresses. Although the FedCG-WGAN method outperforms the other three methods during the initial stages, it does not attain the highest performance in later rounds, which is consistent with expectations, as the method is primarily optimized for non-IID settings. In contrast, under non-IID conditions, traditional methods exhibit slow convergence and noticeable performance fluctuations. FedCG-WGAN, however, maintains a rapid convergence rate and a stable upward trend, with its RPIR values significantly surpassing those of the other methods at equivalent communication rounds. These results clearly demonstrate the method’s effectiveness in addressing data heterogeneity in federated learning environments.
Building on the above analysis, this study further integrates and examines the model performance on both the Qingdao hospital dataset and the MIMIC-III dataset under the non-IID scenario. A systematic comparison of the four federated learning methods—FedAvg, FedProx, FedMD, and the proposed method—is conducted in terms of accuracy, precision, recall, F1-score, round time, and communication cost. The detailed results are summarized in Table 6.
FedCG-WGAN demonstrates superior performance across both data sources. On the Qingdao hospital dataset, the model achieves an overall accuracy of 89.6%, with a prediction accuracy of 93.0% during the acute arthritis phase—representing a 4.5% improvement over the best-performing baseline method, and an average improvement of 6.0%. Furthermore, compared to other federated learning frameworks, the proposed method reduces the per-round training time by an average of 47.3 s and lowers communication overhead by an average of 5.6 MB, highlighting its advantages in both computational and communication efficiency.
On the MIMIC-III dataset, despite the limited sample size leading to a slight decrease in overall accuracy (85.3%), the model maintains a high recall (83.1%) and F1-score (84.2%), indicating robust generalization. Notably, by controlling the quality score of the generated data within the range of 0.85 to 0.88, the proposed method effectively mitigates the risk of privacy leakage while enhancing model performance. In addition, it achieves average reductions of 32% in training time and 40% in communication overhead per round, underscoring its combined strengths in computational efficiency and privacy preservation.
The results demonstrate that FedCG-WGAN not only achieves higher accuracy but also significantly reduces both single-round training time and communication overhead compared with baseline methods. Specifically, FedCG-WGAN attains the highest accuracy (89.6%) while incurring a communication overhead of only 7.3 MB—approximately 57% of that of FedAvg (12.8 MB). In contrast to parameter aggregation approaches such as FedAvg, FedProx, and FedMD, which require multiple rounds of high-frequency communication for incremental model refinement, FedCG-WGAN fundamentally alleviates data heterogeneity by transmitting synthetic data with fewer communication rounds. As a result, it reaches superior performance peaks while maintaining substantially lower overall communication costs.

5.3. Ablation Experiments and Analysis

To evaluate the effectiveness of the core components within the proposed FedCG-WGAN framework and their contributions to overall model performance, a series of systematic ablation studies and sensitivity analyses is conducted in this section. These experiments are performed using gout case datasets collected from three hospitals in Qingdao. All comparative experiments are carried out under consistent federated learning settings and utilize the same evaluation metrics to ensure fairness and reproducibility.

5.3.1. Core Component Ablation Experiment

As a core component of FedCG-WGAN, the architectural design of CG-WGAN plays a critical role in determining both the quality of generated data and the overall performance of the federated learning process. To assess its necessity, this experiment fixes the number of training rounds at 300 and substitutes CG-WGAN with two baseline generative models: standard WGAN-GP (which incorporates unconditional information with feature-aware noise perturbation) and the original GAN (trained using cross-entropy loss).
The experimental results, presented in Figure 6, reveal substantial performance disparities among the different GAN architectures. CG-WGAN consistently outperforms both baseline models across all evaluation metrics. This comparison clearly demonstrates that high-quality gout staging prediction within the federated learning framework can only be achieved when the proposed CG-WGAN architecture is employed. In contrast, alternative GAN architectures exhibit notable performance degradation, primarily due to their limited capacity to model the complex distributions inherent in medical data.

5.3.2. Sensitivity Analysis

To evaluate the robustness of FedCG-WGAN with respect to key hyperparameters, this section investigates the impact of training rounds and scoring weights on model performance. Specifically, the number of training rounds for CG-WGAN is gradually reduced from 300 to 200, generating synthetic data with varying Q-values, and the corresponding changes in model performance are analyzed.
As shown in Figure 7, when the number of training rounds is set to 300, the Q-value reaches 0.88 and the model achieves an accuracy of 89.8%, representing optimal performance. However, as the training rounds decrease to 250 and 200, the accuracy drops to 86.2% and 79.7%, respectively. These results support the rationale for setting the Q-value threshold between 0.80 and 0.85 in the main experiments. Although increasing the number of training rounds can improve model performance, it also leads to significantly higher computational costs. Moreover, when the Q-value falls below 0.80, model performance deteriorates markedly, negatively impacting the final prediction outcomes.
To verify the optimality of the weighting coefficients α and β in the comprehensive quality score, this study employs a grid search strategy to traverse combinations of α and β from 0.1 to 0.9, generating a total of nine parameter configurations. For each configuration, the corresponding model accuracy is recorded, and the performance trend is analyzed.
As illustrated in Figure 8, the model achieves the highest accuracy of 89.6% when α = 0.3 and β = 0.7 . In contrast, the case of equal weighting ( α = β ), marked by the blue dashed line in the figure, consistently yields lower accuracy compared to the optimal configuration. This finding suggests that simply assigning equal weights to the D-score and F-score fails to achieve an ideal balance between model performance, feature distribution alignment, and privacy preservation.
Specifically, a higher β value (e.g., β 0.8 ) enhances distribution alignment but reduces the diversity of the generated data, thereby limiting model performance. Conversely, a higher α value (e.g., α 0.4 ) increases the realism of synthetic data, which may raise privacy concerns. Therefore, the configuration of α = 0.3 and β = 0.7 provides the best trade-off among accuracy, distribution alignment, and privacy protection.
By keeping the network architecture and training iterations fixed, we evaluated federated learning performance across different λ values. As shown in Table 7, an optimal range exists for λ . When λ = 1 , the gradient penalty is too weak to effectively enforce the Lipschitz constraint, resulting in unstable training, potential mode collapse, and the poorest performance. At λ = 10 , the model achieves optimal stability and performance. Further increasing λ to 20 or 30 maintains training stability; however, excessively strict constraints limit the discriminator’s ability to learn critical distinctions, reducing gradient information. While the quality of generated data remains stable, this ultimately causes declines in both model accuracy and F1-scores.

5.3.3. Quantitative Analysis of Privacy Risks

To quantitatively assess the practical effectiveness of synthetic data in privacy protection and its relationship with data quality, this study employs membership inference attack (MIA) as a benchmark metric for measuring privacy risk. A candidate set of 1000 randomly selected real gout cases from three Qingdao hospitals was constructed, with 500 samples used to train the CG-WGAN model as member samples and the remaining 500 serving as non-member samples. The attacker’s goal is to determine whether a given sample belongs to the training set based on the synthetic data. Experimental results are summarized in Table 8.
When the attack model is applied to real raw data, both attack accuracy and AUC reach their maximum values, indicating a significant privacy leakage risk. In contrast, synthetic data substantially reduce this risk. A clear trade-off exists between privacy protection and the quality score of generated data: at a Q-value of 0.80, attack performance approaches random guessing, yielding the strongest privacy protection. As the Q-value increases to 0.88, the synthetic data better approximate the true data distribution, resulting in an attack accuracy of 58.7% and an AUC of 0.62, reflecting an increase in privacy risk.
These quantitative results demonstrate that privacy protection can be effectively controlled by adjusting the Q-value of the generated data. This study selects a Q-value range of 0.85–0.88 to achieve an optimal balance between model performance and privacy risk. Under this setting, the AUC for MIA attacks remains between 0.53 and 0.62—substantially lower than the 0.74 observed with raw data—indicating that the risk of privacy leakage from original training data is effectively mitigated.

6. Conclusions and Future Work

To address the challenges of non-IID settings and privacy protection in federated learning within the medical domain, this study proposes a novel federated learning framework, FedCG-WGAN, based on conditional gradient penalization in Wasserstein GAN (CG-WGAN). The proposed method leverages CG-WGAN to generate high-quality synthetic data, thereby mitigating the impact of non-IID data distributions. Additionally, a synthetic data sharing mechanism is introduced to protect the privacy of original data and prevent risks associated with parameter inversion attacks.
Experimental results on medical datasets collected from three hospitals in Qingdao demonstrate that FedCG-WGAN achieves an accuracy of 89.6%, significantly outperforming other federated learning frameworks, which generally maintain accuracy levels above 80%. In comparison with existing methods, the proposed framework yields comprehensive improvements across all performance metrics, notably enhancing prediction accuracy and offering a promising solution for intelligent gout diagnosis. Furthermore, when applied to the MIMIC-III dataset, the method achieves an accuracy of 85.3%, indicating strong generalizability and applicability to broader clinical scenarios.
While the CG-WGAN-based data augmentation approach effectively alleviates the non-IID problem, it does not fully eliminate it. Over-optimization of the CG-WGAN network may result in the generation of synthetic samples that closely resemble real data, potentially leading to privacy leakage. To mitigate this risk, data quality is controlled by limiting the number of training rounds. Experimental findings indicate that even with lower Q-values, significant improvements in model performance can still be achieved.
This study focuses on stage prediction of gout disease, but the proposed FedCG-WGAN method offers a generalizable framework to address common challenges in federated learning, demonstrating strong potential for generalization. The method can be directly applied to other classification tasks involving tabular EMR data, such as predicting diabetes progression, cancer subtypes, or sepsis risk. CG-WGAN handles mixed data types by combining Gumbel-Softmax for categorical variables and activation functions for continuous variables, which further supports its generalization capability across various clinical prediction tasks based on structured data.
While FedCG-WGAN shows strong performance within the scope of this study, its generalization to other data modalities (e.g., medical imaging, time-series signals) remains to be explored. Future work will involve adapting the generator architecture—such as using convolutional or recurrent neural networks—and modifying the loss functions to accommodate multimodal data. The framework’s effectiveness will also be validated on a broader range of diseases beyond gout.

Author Contributions

J.W. and K.Z. designed the methodology and contributed to the development and implementation of the model. J.W. and Z.G. coordinated the research activities and proposed the initial concept. K.Z. and Z.Y. performed the experiments and analyzed the data. C.M. provided critical insights into the data interpretation. K.Z. and H.H. wrote and revised the manuscript. All authors made significant contributions to the writing and revision of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation, China (No. 62172123), the Key Research and Development Program of Heilongjiang (Grant No. 2022ZX01A36) and the Harbin Manufacturing Technology Innovation Talent Project (No. CXRC20221104236).

Institutional Review Board Statement

This study utilized anonymized observational data of public behavior without any intervention. According to the “Research on Life Sciences and Medicine Involving Human Beings” (released in February 2023 by The National Health Commission, Ministry of Education, Ministry of Science and Technology, and State Administration of Traditional Chinese Medicine of China), ethical review is exempted for research that involves the use of human information data or biological samples under the condition that it does not cause harm to the human body, does not involve sensitive personal information or commercial interests, and does not involve prohibited activities such as human embryonic and reproductive cloning, chimerism, or heritable gene manipulation. This study fully complies with these provisions; therefore, ethical approval and participant consent were not required.

Informed Consent Statement

Informed consent for participation was not required for this study in accordance with the local legislation [Research on Life Sciences and Medicine Involving Human Beings" (released in February 2023 by The National Health Commission, Ministry of Education, Ministry of Science and Technology, and State Administration of Traditional Chinese Medicine of China)].

Data Availability Statement

The data that support the findings of this study are available from gout outpatient department of the hospital in Shan Dong but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of gout outpatient department of the hospital in Shan Dong.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bodakçi, E. How well do we recognise gout disease? Dicle Tıp Derg. 2024, 51, 173–1181. [Google Scholar] [CrossRef]
  2. Newberry, S.J.; FitzGerald, J.D.; Motala, A.; Booth, M.; Maglione, M.A.; Han, D.; Tariq, A.; O’Hanlon, C.E.; Shanman, R.; Dudley, W.; et al. Diagnosis of gout: A systematic review in support of an American College of Physicians Clinical Practice Guideline. Ann. Intern. Med. 2017, 166, 27–36. [Google Scholar] [CrossRef]
  3. Brikman, S.; Serfaty, L.; Abuhasira, R.; Schlesinger, N.; Bieber, A.; Rappoport, N. A machine learning-based prediction model for gout in hyperuricemics: A nationwide cohort study. Rheumatology 2024, 63, 2411–2417. [Google Scholar] [CrossRef]
  4. Zheng, C.; Rashid, N.; Wu, Y.L.; Koblick, R.; Lin, A.T.; Levy, G.D.; Cheetham, T.C. Using natural language processing and machine learning to identify gout flares from electronic clinical notes. Arthritis Care Res. 2014, 66, 1740–1748. [Google Scholar] [CrossRef]
  5. Tian, P.; Chen, Z.; Yu, W.; Liao, W. Towards asynchronous federated learning based threat detection: A DC-Adam approach. Comput. Secur. 2021, 108, 102344. [Google Scholar] [CrossRef]
  6. Wang, Z.; Zhu, Y.; Wang, D.; Han, Z. FedACS: Federated skewness analytics in heterogeneous decentralized data environments. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan, 25–28 June 2021; pp. 1–10. [Google Scholar]
  7. Jeong, E.; Oh, S.; Kim, H.; Park, J.; Bennis, M.; Kim, S.L. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv 2018, arXiv:1811.11479. [Google Scholar]
  8. Fang, L.; Yin, C.; Zhu, J.; Ge, C.; Tanveer, M.; Jolfaei, A.; Cao, Z. Privacy protection for medical data sharing in smart healthcare. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2020, 16, 1–18. [Google Scholar] [CrossRef]
  9. Xu, X.; Wu, J.; Yang, M.; Luo, T.; Duan, X.; Li, W.; Wu, Y.; Wu, B. Information leakage by model weights on federated learning. In Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, Virtual, 9 November 2020; pp. 31–36. [Google Scholar]
  10. El Ouadrhiri, A.; Abdelhadi, A. Differential privacy for deep and federated learning: A survey. IEEE Access 2022, 10, 22359–22380. [Google Scholar] [CrossRef]
  11. Kanagavelu, R.; Li, Z.; Samsudin, J.; Yang, Y.; Yang, F.; Goh, R.S.M.; Cheah, M.; Wiwatphonthana, P.; Akkarajitsakul, K.; Wang, S. Two-phase multi-party computation enabled privacy-preserving federated learning. In Proceedings of the 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, 11–14 May 2020; pp. 410–419. [Google Scholar]
  12. Clebak, K.T.; Morrison, A.; Croad, J.R. Gout: Rapid evidence review. Am. Fam. Physician 2020, 102, 533–538. [Google Scholar]
  13. Han, T.; Chen, W.; Qiu, X.; Wang, W. Epidemiology of gout—Global burden of disease research from 1990 to 2019 and future trend predictions. Ther. Adv. Endocrinol. Metab. 2024, 15, 20420188241227295. [Google Scholar] [CrossRef]
  14. Lei, T.; Guo, J.; Wang, P.; Zhang, Z.; Niu, S.; Zhang, Q.; Qing, Y. Establishment and validation of predictive model of tophus in gout patients. J. Clin. Med. 2023, 12, 1755. [Google Scholar] [CrossRef] [PubMed]
  15. Cüre, O.; Bal, F. Application of Machine Learning for Identifying Factors Associated with Renal Function Impairment in Gouty Arthritis Patients. Appl. Sci. 2025, 15, 3236. [Google Scholar] [CrossRef]
  16. Xiao, L.; Zhao, Y.; Li, Y.; Yan, M.; Liu, Y.; Liu, M.; Ning, C. Developing an interpretable machine learning model for diagnosing gout using clinical and ultrasound features. Eur. J. Radiol. 2025, 184, 111959. [Google Scholar] [CrossRef]
  17. Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated learning with non-iid data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
  18. Armanious, K.; Jiang, C.; Fischer, M.; Küstner, T.; Hepp, T.; Nikolaou, K.; Gatidis, S.; Yang, B. MedGAN: Medical image translation using GANs. Comput. Med. Imaging Graph. 2020, 79, 101684. [Google Scholar] [CrossRef]
  19. Fabbri, C. (University of Minnesota, Minneapolis, MN, USA). Conditional Wasserstein Generative Adversarial Networks. Unpublished student paper. 2017. [Google Scholar]
  20. Aziira, A.; Setiawan, N.; Soesanti, I. Generation of synthetic continuous numerical data using generative adversarial networks. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2020; Volume 1577, p. 012027. [Google Scholar]
  21. Chen, D.; Orekondy, T.; Fritz, M. Gs-wgan: A gradient-sanitized approach for learning differentially private generators. Adv. Neural Inf. Process. Syst. 2020, 33, 12673–12684. [Google Scholar]
  22. Zhao, Z.; Kunar, A.; Birke, R.; Chen, L.Y. Ctab-gan: Effective table data synthesizing. In Proceedings of the Asian Conference on Machine Learning, Bangkok, Thailand, 1–3 December 2021; pp. 97–112. [Google Scholar]
  23. Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional GAN. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA, 2019. Article No. 659. pp. 1–11. [Google Scholar]
  24. Engelmann, J.; Lessmann, S. Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. arXiv 2020, arXiv:2008.09202. [Google Scholar] [CrossRef]
  25. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  26. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
  27. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
  28. Liu, K.; Qiu, G. Lipschitz constrained GANs via boundedness and continuity. Neural Comput. Appl. 2020, 32, 18271–18283. [Google Scholar] [CrossRef]
  29. Zhou, Z.; Song, Y.; Yu, L.; Wang, H.; Liang, J.; Zhang, W.; Zhang, Z.; Yu, Y. Understanding the effectiveness of lipschitz-continuity in generative adversarial nets. arXiv 2018, arXiv:1807.00751. [Google Scholar] [CrossRef]
  30. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  31. Wei, M.; Vogel, C. Generative Adversarial Networks in Federated Learning. In Applications of Artificial Intelligence and Neural Systems to Data Science; Springer: Berlin/Heidelberg, Germany, 2023; pp. 341–350. [Google Scholar]
  32. Durgadevi, M.; Karthika, S. Generative Adversarial Network (GAN): A general review on different variants of GAN and applications. In Proceedings of the 2021 6th International Conference on Communication and Electronics Systems (ICCES), Coimbatre, India, 8–10 July 2021; pp. 1–8. [Google Scholar]
  33. Alajaji, S.A.; Khoury, Z.H.; Elgharib, M.; Saeed, M.; Ahmed, A.R.; Khan, M.B.; Tavares, T.; Jessri, M.; Puche, A.C.; Hoorfar, H.; et al. Generative adversarial networks in digital histopathology: Current applications, limitations, ethical considerations, and future directions. Mod. Pathol. 2024, 37, 100369. [Google Scholar] [CrossRef]
  34. Zheng, M.; Li, T.; Zhu, R.; Tang, Y.; Tang, M.; Lin, L.; Ma, Z. Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification. Inf. Sci. 2020, 512, 1009–1023. [Google Scholar] [CrossRef]
  35. Tan, A.Z.; Yu, H.; Cui, L.; Yang, Q. Towards personalized federated learning. IEEE Trans. Neural Networks Learn. Syst. 2022, 34, 9587–9603. [Google Scholar] [CrossRef] [PubMed]
  36. Cao, X.; Sun, G.; Yu, H.; Guizani, M. PerFED-GAN: Personalized federated learning via generative adversarial networks. IEEE Internet Things J. 2022, 10, 3749–3762. [Google Scholar] [CrossRef]
  37. Ji, X.; Tian, J.; Sun, C.; Zhang, M. PFed-ME: Personalized Federated Learning Based on Model Enhancement. In Proceedings of the International Conference on Intelligent Computing, Tianjin, China, 5–8 August 2024; pp. 263–274. [Google Scholar]
  38. Johnson, A.E.; Pollard, T.J.; Shen, L.; Lehman, L.w.H.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Anthony Celi, L.; Mark, R.G. MIMIC-III, a freely accessible critical care database. Sci. Data 2016, 3, 160035. [Google Scholar] [CrossRef]
  39. Özkan, Y.; Demirarslan, M.; Suner, A. Effect of data preprocessing on ensemble learning for classification in disease diagnosis. Commun. Stat.-Simul. Comput. 2024, 53, 1657–1677. [Google Scholar] [CrossRef]
  40. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
  41. Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
  42. Li, D.; Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar] [CrossRef]
Figure 1. The architecture diagram of the FedCG-WGAN framework.
Figure 1. The architecture diagram of the FedCG-WGAN framework.
Computers 14 00455 g001
Figure 2. The architecture diagram of the CG-WGAN framework.
Figure 2. The architecture diagram of the CG-WGAN framework.
Computers 14 00455 g002
Figure 3. Q-values of EMRs from three hospitals in Qingdao at different training rounds of CG-WGAN.
Figure 3. Q-values of EMRs from three hospitals in Qingdao at different training rounds of CG-WGAN.
Computers 14 00455 g003
Figure 4. t-SNE visualization of real versus synthetic data distributions.
Figure 4. t-SNE visualization of real versus synthetic data distributions.
Computers 14 00455 g004
Figure 5. Trends in RPIR Values for Different Methods under Various Data Distributions: (a) RPIR Values under IID Conditions; (b) RPIR Values under Non-IID Conditions.
Figure 5. Trends in RPIR Values for Different Methods under Various Data Distributions: (a) RPIR Values under IID Conditions; (b) RPIR Values under Non-IID Conditions.
Computers 14 00455 g005
Figure 6. Comparative analysis of GAN architectures on federated data augmentation effectiveness.
Figure 6. Comparative analysis of GAN architectures on federated data augmentation effectiveness.
Computers 14 00455 g006
Figure 7. Impact of training rounds on model performance and efficiency for CG-WGAN.
Figure 7. Impact of training rounds on model performance and efficiency for CG-WGAN.
Computers 14 00455 g007
Figure 8. Heatmap of accuracy under different α and β combinations. The color boxes represent varying accuracy levels, with darker colors indicating higher accuracy.
Figure 8. Heatmap of accuracy under different α and β combinations. The color boxes represent varying accuracy levels, with darker colors indicating higher accuracy.
Computers 14 00455 g008
Table 1. Gout electronic medical record dataset.
Table 1. Gout electronic medical record dataset.
Gout StagingParticipants
Hospital A Hospital B Hospital C
Acute arthritis phase1008612737
Intermittent phase10,00987509550
Chronic arthritis phase805061106301
Table 2. Statistical significance of retained features.
Table 2. Statistical significance of retained features.
Feature NameChi-Square
( χ 2 )
ANOVA
(F)
Clinical Relevance
Uric acid15.218.6Core indicator for staging
Systolic BP6.85.4Positively correlates with inflammation
Joint swelling score12.1Acute-phase specific marker
Gender1.2Retained (clinically essential)
Table 3. Performance comparison of different classifiers on local data.
Table 3. Performance comparison of different classifiers on local data.
ModelLocal AccuracyComplete Training TimePeak Memory Usage
XGBoost82.5%1250.5 s245 MB
LightGBM81.8%1050.2 s210 MB
CatBoost82.1%1480.7 s290 MB
MLP78.9%1720.3 s180 MB
Table 4. RPIR Values at Different Q-Value Thresholds.
Table 4. RPIR Values at Different Q-Value Thresholds.
Q-Value Threshold0.750.800.850.900.95
RPIR1.371.401.421.431.44
Table 5. Clinical Characteristics and Correlation Analysis.
Table 5. Clinical Characteristics and Correlation Analysis.
Feature NameReal
Distribution
Synthetic
Distribution
Correlation
Coefficient
Blood Uric Acid (μmol/L) 512 ± 118 503 ± 125 0.96
Percentage of Knee Involvement68.3%66.9%0.93
Acute Phase (CRP > 10 mg/L)75.1%72.3%0.94
Incidence of Gouty Stones39.8%41.6%0.91
Percentage of Renal Function Abnormalities22.4%24.1%0.89
Proportion of Combined Hypertension43.6%41.2%0.95
Table 6. Performance comparison of different models.
Table 6. Performance comparison of different models.
Models
Metric
Baseline MethodsProposedImprovement
FedAvg FedProx FedMD FedCG-WGAN vs. Best vs. Avg
AccuracyQingdao
Hospitals
83.2%85.1%82.5%89.6%↑4.5%↑6.0%
MIMIC-III78.3%80.1%77.6%85.3%↑5.2%↑6.7%
PrecisionQingdao
Hospitals
81.6%83.5%80.8%86.3%↑2.8%↑4.3%
MIMIC-III76.5%78.9%75.2%87.5%↑8.6%↑10.3%
RecallQingdao
Hospitals
80.9%82.7%79.4%85.5%↑2.8%↑4.5%
MIMIC-III75.8%77.3%74.1%83.1%↑5.8%↑7.3%
F1-scoreQingdao
Hospitals
81.2%83.1%80.1%85.9%↑2.8%↑4.4%
MIMIC-III76.1%78.1%74.7%84.2%↑6.1%↑8.0%
Round TimeQingdao
Hospitals
142.7 s135.4 s155.2 s97.1 s↓38.3 s↓47.3 s
MIMIC-III138.5 s130.1 s148.7 s94.3 s↓35.8 s↓45.2 s
Comm CostQingdao
Hospitals
12.8 MB11.2 MB14.6 MB7.3 MB↓3.9 MB↓5.6 MB
MIMIC-III11.5 MB10.3 MB13.2 MB6.8 MB↓3.2 MB↓5.1 MB
Table 7. Impact of gradient penalty coefficient λ on federated learning performance.
Table 7. Impact of gradient penalty coefficient λ on federated learning performance.
λ AccuracyRecallF1-ScoreTraining Stability
182.3%79.5%80.8%Unstable (mode collapse)
588.5%86.1%87.2%Relatively stable
10 (Default)89.8%87.3%88.2%Stable
2088.9%86.8%87.8%Stable
3087.2%84.9%86.0%Stable (slower convergence)
Table 8. Membership inference attack (MIA) evaluation results.
Table 8. Membership inference attack (MIA) evaluation results.
Training Data SourceAttack AccuracyAttack AUC
Synthetic Data (Q = 0.80)51.0%0.52
Synthetic Data (Q = 0.85)51.2%0.53
Synthetic Data (Q = 0.88)58.7%0.62
Original Data68.7%0.74
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Zhang, K.; Guan, Z.; Ye, Z.; Ma, C.; Huang, H. Research on the Application of Federated Learning Based on CG-WGAN in Gout Staging Prediction. Computers 2025, 14, 455. https://doi.org/10.3390/computers14110455

AMA Style

Wang J, Zhang K, Guan Z, Ye Z, Ma C, Huang H. Research on the Application of Federated Learning Based on CG-WGAN in Gout Staging Prediction. Computers. 2025; 14(11):455. https://doi.org/10.3390/computers14110455

Chicago/Turabian Style

Wang, Junbo, Kaiqi Zhang, Zhibo Guan, Zi Ye, Chao Ma, and Hai Huang. 2025. "Research on the Application of Federated Learning Based on CG-WGAN in Gout Staging Prediction" Computers 14, no. 11: 455. https://doi.org/10.3390/computers14110455

APA Style

Wang, J., Zhang, K., Guan, Z., Ye, Z., Ma, C., & Huang, H. (2025). Research on the Application of Federated Learning Based on CG-WGAN in Gout Staging Prediction. Computers, 14(11), 455. https://doi.org/10.3390/computers14110455

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop