The proposed Federated Graph-Transformer Network (FGTN) was evaluated on a dataset comprising 232 X-ray coronary angiography images from 231 patients, annotated with clinically computed SYNTAX scores and angiographic variables. The vascular trees were segmented and converted into node-edge graphs, preserving topological features. The FGTN processed these graphs using graph convolutional layers combined with transformer-based self-attention to capture both local stenosis and global vessel relationships.
The class distribution across all dataset splits remained statistically balanced, ensuring unbiased evaluation of CAD severity grading. The classification is as follows: green for non-obstructive or mild plaques, yellow for moderate stenosis, and red for severe stenosis. The bottom-right panel indicates the severity classification for the major coronary branch. This indicates severe stenosis in the key vessel show in red, as well as the region representing moderate stenosis and the region representing non-obstructive plaques. This indicates a spectrum of coronary artery disease. The quantitative analysis indicates severe stenosis in 40% of the vessel length, moderate stenosis in 25%, and non-obstructive plaques in 35%. This indicates clinically significant plaque dispersion for planning targeted involvement.
The highlighted vascular regions generated by the proposed FGTN framework showed strong spatial correspondence with lesion locations identified by expert clinical assessment. In severe CAD cases, the attention-weighted graph representations predominantly focused on stenotic vessel segments exhibiting pronounced luminal narrowing and irregular contrast flow patterns that were consistent with clinically annotated high-risk lesions. This alignment suggests that the topology-aware graph transformer successfully captured diagnostically relevant vascular structures rather than relying on unrelated image artifacts or background patterns.
5.1. Quantitative Performance
The vascular trees were segmented and converted into node-edge graphs, preserving topological features. The FGTN processed these graphs using graph convolutional layers combined with transformer-based self-attention to capture both local stenosis and global vessel relationships. All comparative baseline models (CNN, Attention U-Net, and Capsule Network) were independently re-implemented and evaluated using the same preprocessing pipeline, augmentation strategy, patient-level training-validation-testing splits, federated partitioning protocol, optimization settings, and evaluation metrics to ensure fair and reproducible comparison. Hyperparameter tuning for all models was conducted using validation-based optimization under identical experimental conditions.
The FGTN achieved remarkable performance in CAD severity grading.
Figure 8 compares the performance of FGTN with the standard models, including CNN, Attention U-Net, and Capsule. The FGTN achieved high volumes in all languages, together with an accuracy 99.4%, precision 97.6%, recall 98.8%, and F1-score 98.2%. The baselines, with F1-scores ranging between 83 and 94 percent, highlight the advantages of topology-aware graph embedding and transformer-based attention.
Table 3 demonstrates consistent performance across folds with low variance, confirming the stability and generalization capability of the proposed FGTN framework. The low standard deviation observed across folds indicates stable model performance and reduced sensitivity to variations in dataset partitioning. The overall AUC values reported in
Table 4 correspond to the macro-average multi-class AUC computed from one-vs-rest ROC analysis across the four CAD severity categories (non-obstructive, mild, moderate, and severe). This macro-average AUC summarizes the model’s overall discrimination capability across all severity classes under the same patient-level 10-fold cross-validation protocol.
To evaluate statistical reliability, performance metrics were computed using patient-level 10-fold cross-validation and are reported as the mean ± standard deviation across folds. In addition, 95% confidence intervals (CI) were estimated for the primary evaluation metrics of the proposed FGTN model. The FGTN achieved an accuracy of 98.3% (95% CI: 97.8–98.8%), F1-score of 98.2% (95% CI: 97.7–98.7%), and macro-average AUC of 0.96 (95% CI: 0.95–0.97). Paired statistical comparisons across folds demonstrated that the improvements of FGTN over baseline CNN, Attention U-Net, and Capsule Network models were statistically significant (p < 0.05). The relatively high performance is considered plausible due to the combined effects of topology-aware graph representation, transformer-based global context encoding, robust vascular segmentation, and strict patient-level data separation during cross-validation, which collectively improved feature discrimination while minimizing data leakage.
The shallow CNN baseline achieved substantially lower performance compared to the proposed FGTN framework, indicating that the reported improvements were not solely attributable to dataset characteristics or evaluation bias. The progressive performance gains observed from shallow CNNs to conventional CNNs, and subsequently to graph-transformer-based architectures, further support the contribution of topology-aware vascular representation learning and transformer-based contextual modeling toward improved CAD severity discrimination. To assess training stability, the proposed FGTN framework was evaluated across three independent runs using different random initialization seeds. Only minor performance variation (below ±0.5%) was observed across runs, indicating stable convergence and reproducible optimization behavior.
5.3. Federated Learning Performance
The federated experiments were conducted using three simulated non-IID clinical clients over 20 communication rounds with one local training epoch per round using FedAvg aggregation. To centralize training, a global model employing horizontal Federated Learning mimicking multi-site collaboration achieved defended convergence, together with a 1.8% performance degeneration comparison. The proposed FGTN is capable of competently training from dispersed heterogeneous statistics that do not compromise accuracy or isolation. The results of the experiments indicate that the FGTN framework effectively integrates local and global vessel information. The graph convolutional layer captures local stenosis forms close to vessel segments and bifurcation during the transformer-based self-attention, which enables modeling of long-range dependence across the vascular tree. This synthesis facilitates highly accurate severity scaling, surpassing traditional CNNs, Attention U-Net, and Capsule systems, which emphasize neighborhood features or insufficient ability to continue the vessel graph topology. The Federated Learning ensures privacy-preserving cooperation with stimulated clinical sites. The FGTN should be acceptable for real multi-structured applications where the proportion of persevering statistics is limited to centralize the training to show robustness to information heterogeneity.
The global federated model was evaluated using the independent test dataset obtained from the same publicly available X-ray angiography dataset. Patient-wise separation was strictly maintained so that no patient images appeared simultaneously in training and testing subsets across clients.
The classification by severity shows a slightly lower AUC for severe circumstances (0.95), mimicking the inherent problem with selective severe stenosis in moderate scenarios due to the overlap of anatomic features in X-ray angiography. However, the high clinical applicability of the automated CAD badness scale may help the cardiologist to establish, lower the manual note attempt, and improve treatment. Overall, the FGTN provides a topology-aware, privacy-preserving, and clinically reliable framework for automated CAD assessment from coronary angiography, combining the advantages of graph-based modeling, transformer attention, and Federated Learning.
To centralize the training above 20 training rounds,
Figure 10 shows the progression of the F1-score in the federated global model comparison. The federated FGTN is steadily approaching 98.2% F1-score, closely matching the centralized model, which achieves 97.0%, demonstrating safe convergence and minimal performance degradation (1.8%) in a privacy-preserving multi-client setup.
The global federated model, as shown in
Table 6, is continuously improving in the round, surpassing individual clients, demonstrating effective aggregation and minimal performance loss compared to centralized training.
The iterative performance of the FGTN model above 20 federated training rounds is shown in
Figure 11. The main panel shows the convergence of the F1-score for the three simulated clients and the global model, highlighting the steady increase in client and global performance, with the global F1-score reaching 98.2% by round 20. The following panel shows the progression of severity of AUC in non-obstructive, mild, moderate, and severe CAD workshops, showing firm convergence together with final AUC standards of 0.98, 0.97, 0.96, and 0.95 respectively. In addition to unchanging and reliable image prediction, the third panel gives average image-level prediction confidence, which increased from 91.2% to 97.5%, while the standard deviation decreased. In general, the figure shows that the federated FGTN model achieves reliable, precise, and self-assured image-level CAD incorrect classification using iterative collaborative learning.
Figure 12 illustrates the effect of varying the hidden embedding dimension on the validation F1-score of the proposed FGTN model. Increasing the hidden dimension from 64 to 128 improves the validation F1-score from 96.1% to 98.2%, indicating enhanced feature representation capability. However, a further increase to 256 results in a marginal decrease to 98.1%, suggesting that larger embedding dimensions provide limited additional benefit while increasing computational complexity. Therefore, a hidden dimension of 128 was selected as the optimal configuration.
Figure 13 shows the impact of different dropout rates on the validation F1-score. The model achieves a validation F1-score of 97.2% at a dropout rate of 0.1, which increases to 98.2% at 0.3, demonstrating improved generalization and reduced overfitting. When the dropout rate is further increased to 0.5, the validation F1-score decreases slightly to 97.6%, likely due to excessive regularization. These results confirm that a dropout rate of 0.3 provides the best balance between model robustness and classification performance.
The confusion matrix shown in
Figure 14 represents patient-level CAD severity predictions obtained from the independent testing subsets during patient-level 10-fold cross-validation. Although the complete dataset contained a substantially larger number of angiographic images acquired from multiple projections and frames, all images belonging to the same patient were grouped during evaluation to maintain patient exclusivity and avoid data leakage. Predictions from individual angiographic images were aggregated to generate a single final CAD severity classification for each patient. Consequently, the 232 entries shown in the confusion matrix correspond to patient-wise evaluation samples rather than individual angiographic images. The confusion matrix for the classification performance of the proposed model for the four-artery stenosis severity classes: Non-Obstructive, Mild, Moderate, and Severe. This is demonstrated in
Figure 14. From these results, there is a strong dominance along the diagonal, indicating a high level of correctness. There is minimal misclassification, which mostly takes place between neighboring malignancy levels, such as between Mild and Moderate, or between Moderate and Severe, indicating ambiguity, which is characteristic of borderline stenotic lesions.
The ablation results presented in
Table 7 indicate that each component of the proposed model contributes to improved performance. Removing the graph module reduces the ability of the model to capture vessel topology, leading to lower classification accuracy. Excluding the transformer module limits the model’s capability to learn global contextual relationships across vessel segments. Similarly, removing the Federated Learning framework slightly decreases the generalization capability of the model across distributed datasets. The complete FGTN architecture achieves the best performance, confirming the effectiveness of integrating graph learning, transformer attention, and federated optimization for accurate coronary artery disease severity classification.
Further, the performance of the FGTN model is compared with various recent deep learning architectures that have been proposed for CAD detection, such as CNN, RF-CNN-F, U-Net, Capsule Networks, Graph Neural Networks, Transformer, and Federated Learning architectures.
As shown in
Table 8, the proposed model achieves superior performance across key evaluation metrics such as accuracy, F1-score, and AUC. Conventional CNN and segmentation-based approaches reported relatively lower performance due to limited capability in capturing complex vascular structures [
8,
22], whereas hybrid learning models such as RF-CNN-F demonstrated moderate improvements [
23]. Graph-based learning methods improved topology awareness of coronary vessels [
7], and transformer architectures enhanced global contextual modeling through attention mechanisms [
10,
24]. Federated Learning frameworks further enabled privacy-preserving distributed training across institutions [
11,
20,
21]. By integrating graph learning, transformer attention, and federated optimization, the proposed FGTN framework achieves higher diagnostic accuracy and robustness compared to existing methods.
Table 8 shows that the proposed FGTN achieves the highest classification performance (99.4% accuracy, 98.2% F1-score, and 0.96 AUC) with 10.1 M parameters and 17.3 GFLOPs. Although some models, such as the Transformer model (12.8 M parameters, 24.7 G FLOPs) and Capsule Network (11.3 M parameters, 22.5 G FLOPs), exhibit higher computational complexity, they achieve lower classification performance. Conversely, lightweight CNN-based methods require fewer parameters and FLOPs but provide substantially lower accuracy. These results indicate that the superior performance of FGTN arises from the effective combination of graph-based topological learning and transformer-based contextual modeling rather than increased model complexity alone, resulting in a favorable complexity–performance trade-off.
The proposed FGTN framework is intended as a decision-support tool for cardiologists rather than a replacement for clinical expertise. In practical settings, the model could assist in automated CAD severity assessment by highlighting high-risk stenotic regions and providing preliminary severity grading during coronary angiography interpretation. Such support may help improve diagnostic consistency, reduce interpretation time, and facilitate early triage of patients requiring urgent interventional evaluation. However, the current framework should be considered a promising assistive technology requiring further large-scale clinical validation before routine deployment.
In real-world federated healthcare environments, participating hospitals may exhibit substantially different CAD severity distributions, imaging protocols, patient demographics, and acquisition conditions. Consequently, the local datasets available at each institution are inherently heterogeneous and non-IID, which may affect the generalization capability of a single global federated model across all hospitals. Although the present study simulated mildly heterogeneous client distributions, stronger institutional variability may introduce client-specific performance imbalance and convergence challenges. The proposed FGTN framework improves robustness by combining topology-aware graph learning with transformer-based global contextual encoding; however, future work will investigate personalized Federated Learning, adaptive aggregation strategies, and validation using real multi-center hospital datasets to further improve hospital-specific generalization and robustness under highly heterogeneous clinical settings.