HFed-MIL: Patch Gradient-Based Attention Distillation Federated Learning for Heterogeneous Multi-Site Ovarian Cancer Whole-Slide Image Analysis
Abstract
1. Introduction
- Patch-CAM-based structural distillation. We extend the intuition of Grad-CAM to the patch level in WSIs and propose Patch-CAM, which computes gradient-based attention scores for each patch embedding. This design enables knowledge distillation in heterogeneous MIL frameworks without requiring explicit attention modules, while avoiding the privacy risks associated with directly sharing gradients or features.
- Dual-level distillation objective (class-level and structural-level). Beyond conventional logit distillation, we introduce structural distillation that enforces consistency of patch-level attention distributions across clients. This prevents the vanishing effect of naive averaging and strengthens the discriminative power and interpretability of the global model.
- Balanced trade-off between privacy, efficiency, and heterogeneity. Patch-CAM scores lie between logits and raw features: they provide sufficient information for effective distillation while leaking minimal private data (MIA AUC ≈ 0.6, close to random guessing). In addition, HFed-MIL reduces communication cost to 0.32 MB per round, which is orders of magnitude smaller than parameter or gradient sharing, making it practical for real-world federated pathology.
- Enhanced interpretability and generalization in heterogeneous federated WSI analysis. Extensive experiments on multiple cancer subtypes and cross-domain datasets (Camelyon16, BreakHis) demonstrate that HFed-MIL yields more robust performance across heterogeneous models. Moreover, the global attention visualizations show sharper and clinically meaningful heatmaps, offering pathologists transparent insights into model decisions.
2. Related Work
2.1. Advancements in Ovarian Cancer
2.2. Attention Mechanisms in Deep Learning
2.3. Federated Learning with WSIs
- Meaningful Gigapixel Analysis: WSIs require patch-wise modeling and MIL-based aggregation, which complicates communication and synchronization in federated settings.
- Data and Model Heterogeneity: Differences in data distributions, annotation quality, and model architecture across hospitals cause a decrease in performance in conventional FL pipelines, further discussed in the subsequent Section 2.4.
- Privacy vs. Performance trade-off: Existing methods often compromise diagnostic performance for stronger privacy or vice versa.
2.4. Challenges in Multi-Site Data Distribution
3. Dataset and Methodology
3.1. Dataset
3.2. Local WSI MIL Modeling
3.3. Label Distillation Fed-MIL
3.4. Patch-CAM-Based Distillation
3.5. Model Architecture Diagram
| Algorithm 1: Algorithm for Proposed HFed-MIL |
![]() |
4. Experiments and Discussion
4.1. Baseline and Parameter Setting
- Optimizer: Adam with learning rate ;
- Number of communication rounds: 20;
- Distillation temperature: ;
- Patch sampling: 100 patches per WSI, each of size ;
- Number of clients: 4;
- Batch size: 1 WSI bag per iteration (due to memory constraints with large patch sets);
- GPU: 4× Titan RTX.
4.2. Experiment Type—A: Comparative Analysis
| Method | Model | Data | Accuracy | Loss | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| Centralized | - | - | 75.7% | 0.70 | 76.2% | 75.0% |
| FedAvg [21] | Homo | Homo | 74.0% | 0.74 | 73.5% | 74.2% |
| Heter | 47.2% | 1.54 | 46.0% | 48.0% | ||
| Heter | Homo | - | - | - | - | |
| Heter | - | - | - | - | ||
| FedDF [28] | Homo | Homo | 56.5% | 1.07 | 57.0% | 56.0% |
| Heter | 42.0% | 1.33 | 41.5% | 42.2% | ||
| Heter | Homo | 61.5% | 1.08 | 62.0% | 61.0% | |
| Heter | 50.1% | 1.32 | 50.5% | 49.8% | ||
| FedMD [30] | Homo | Homo | 54.7% | 1.14 | 55.0% | 54.5% |
| Heter | 47.1% | 1.37 | 47.5% | 46.8% | ||
| Heter | Homo | 55.7% | 1.25 | 56.0% | 55.2% | |
| Heter | 49.3% | 1.31 | 49.0% | 49.5% | ||
| FedRAD [32] | Homo | Homo | 48.3% | 1.31 | 48.5% | 48.0% |
| Heter | 43.5% | 1.45 | 44.0% | 43.0% | ||
| Heter | Homo | 54.0% | 1.23 | 54.5% | 53.8% | |
| Heter | 50.0% | 1.20 | 50.5% | 49.8% | ||
| HFed-MIL | Homo | Homo | 66.1% | 0.91 | 66.5% | 65.8% |
| Heter | 57.3% | 1.03 | 57.0% | 57.5% | ||
| Heter | Homo | 57.2% | 0.88 | 57.5% | 57.0% | |
| Heter | 53.9% | 0.95 | 54.0% | 53.8% |


4.3. Experiment Type—B: Global Model Attention Visualization
4.4. Experiment Type—C: Ablation Study of Heterogeneity
4.5. Experiment Type—D: Ablation Study of Client Number
4.6. Experiment Type—E: Generalization Test with Proxy Dataset from Another Domain
4.7. Experiment Type—F: Privacy and Security Analysis
4.8. Experiment Type—G: Computation and Communication Cost Analysis
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Che, M.; Yin, R. Analysis of the global burden of ovarian cancer in adolescents. Int. J. Gynecol. Cancer 2025, 35, 101620. [Google Scholar] [CrossRef]
- Farahani, H.; Boschman, J.; Farnell, D.; Darbandsari, A.; Zhang, A.; Ahmadvand, P.; Jones, S.J.; Huntsman, D.; Köbel, M.; Gilks, C.B.; et al. Deep learning-based histotype diagnosis of ovarian carcinoma whole-slide pathology images. Mod. Pathol. 2022, 35, 1983–1990. [Google Scholar] [CrossRef]
- Breen, J.; Allen, K.; Zucker, K.; Adusumilli, P.; Scarsbrook, A.; Hall, G.; Orsi, N.M.; Ravikumar, N. Artificial intelligence in ovarian cancer histopathology: A systematic review. NPJ Precis. Oncol. 2023, 7, 83. [Google Scholar] [CrossRef]
- Jiang, Y.; Wang, C.; Zhou, S. Artificial intelligence-based risk stratification, accurate diagnosis and treatment prediction in gynecologic oncology. In Seminars in Cancer Biology; Elsevier: Amsterdam, The Netherlands, 2023; Volume 96, pp. 82–99. [Google Scholar]
- Ilse, M.; Tomczak, J.M.; Welling, M. Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 2127–2136. [Google Scholar]
- Lu, M.Y.; Williamson, D.F.; Chen, T.Y.; Chen, R.J.; Barbieri, M.; Mahmood, F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 2021, 5, 555–570. [Google Scholar] [CrossRef] [PubMed]
- Li, B.; Li, Y.; Eliceiri, K.W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14318–14328. [Google Scholar]
- Shao, Z.; Bian, H.; Chen, Y.; Wang, Y.; Zhang, J.; Ji, X. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 2021, 34, 2136–2147. [Google Scholar]
- Zhang, H.; Meng, Y.; Zhao, Y.; Qiao, Y.; Yang, X.; Coupland, S.E.; Zheng, Y. Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18802–18812. [Google Scholar]
- Lin, T.; Yu, Z.; Hu, H.; Xu, Y.; Chen, C.W. Interventional bag multi-instance learning on whole-slide pathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19830–19839. [Google Scholar]
- Tang, W.; Huang, S.; Zhang, X.; Zhou, F.; Zhang, Y.; Liu, B. Multiple instance learning framework with masked hard instance mining for whole slide image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4078–4087. [Google Scholar]
- Zhong, L.; Wang, G.; Liao, X.; Zhang, S. HAMIL: High-Resolution Activation Maps and Interleaved Learning for Weakly Supervised Segmentation of Histopathological Images. IEEE Trans. Med. Imaging 2023, 42, 2912–2923. [Google Scholar] [CrossRef] [PubMed]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Konečnỳ, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Rieke, N.; Hancox, J.; Li, W.; Milletari, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 1–7. [Google Scholar] [CrossRef]
- Liu, Y.; Kang, Y.; Zou, T.; Pu, Y.; He, Y.; Ye, X.; Ouyang, Y.; Zhang, Y.Q.; Yang, Q. Vertical Federated Learning: Concepts, Advances, and Challenges. IEEE Trans. Knowl. Data Eng. 2024, 36, 3615–3634. [Google Scholar] [CrossRef]
- Guo, W.; Zhuang, F.; Zhang, X.; Tong, Y.; Dong, J. A Comprehensive Survey of Federated Transfer Learning: Challenges, Methods and Applications. arXiv 2024, arXiv:2403.01387. [Google Scholar] [CrossRef]
- Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 5132–5143. [Google Scholar]
- Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the convergence of fedavg on non-iid data. arXiv 2019, arXiv:1907.02189. [Google Scholar]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated learning with non-iid data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. In Proceedings of the Machine Learning and Systems, Seoul, Republic of Korea, 25–28 October 2020; Volume 2, pp. 429–450. [Google Scholar]
- Xie, H.; Xia, M.; Wu, P.; Wang, S.; Huang, K. Decentralized Federated Learning with Asynchronous Parameter Sharing for Large-scale IoT Networks. IEEE Internet Things J. 2024, 11, 34123–34139. [Google Scholar] [CrossRef]
- Chen, J.; Pan, X.; Monga, R.; Bengio, S.; Jozefowicz, R. Revisiting distributed synchronous SGD. arXiv 2016, arXiv:1604.00981. [Google Scholar]
- Xiao, Z.; Chen, Z.; Liu, S.; Wang, H.; Feng, Y.; Hao, J.; Zhou, J.T.; Wu, J.; Yang, H.; Liu, Z. Fed-grab: Federated long-tailed learning with self-adjusting gradient balancer. Adv. Neural Inf. Process. Syst. 2024, 36, 77745–77757. [Google Scholar]
- Wang, H.; Xu, W.; Fan, Y.; Li, R.; Zhou, P. AOCC-FL: Federated Learning with Aligned Overlapping via Calibrated Compensation. In Proceedings of the IEEE INFOCOM 2023-IEEE Conference on Computer Communications, New York, NY, USA, 17–20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–10. [Google Scholar]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble distillation for robust model fusion in federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2351–2363. [Google Scholar]
- Zhu, Z.; Hong, J.; Zhou, J. Data-free knowledge distillation for heterogeneous federated learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 12878–12889. [Google Scholar]
- Li, D.; Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar] [CrossRef]
- Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.; Khazaeni, Y. Federated learning with matched averaging. arXiv 2020, arXiv:2002.06440. [Google Scholar] [CrossRef]
- Sturluson, S.P.; Trew, S.; Muñoz-González, L.; Grama, M.; Passerat-Palmbach, J.; Rueckert, D.; Alansary, A. Fedrad: Federated robust adaptive distillation. arXiv 2021, arXiv:2112.01405. [Google Scholar] [CrossRef]
- Yang, Z.; Zhang, Y.; Zheng, Y.; Tian, X.; Peng, H.; Liu, T.; Han, B. FedFed: Feature distillation against data heterogeneity in federated learning. Adv. Neural Inf. Process. Syst. 2024, 36, 60397–60428. [Google Scholar]
- Rocher, L.; Hendrickx, J.M.; De Montjoye, Y.A. Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 2019, 10, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Geiping, J.; Bauermeister, H.; Drozdzal, M.; Moeller, M. Inverting gradients–How easy is it to break privacy in federated learning? Adv. Neural Inf. Process. Syst. 2020, 33, 16937–16947. [Google Scholar]
- Zhu, L.; Liu, Z.; Han, S. Deep leakage from gradients. Adv. Neural Inf. Process. Syst. 2019, 32, 14774–14784. [Google Scholar]
- Zhang, Y.; Jia, R.; Pei, H.; Wang, W.; Li, B.; Song, D. The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 253–261. [Google Scholar]
- Carlini, N.; Liu, C.; Erlingsson, Ú.; Kos, J.; Song, D. The secret sharer: Evaluating and testing unintended memorization in neural networks. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 267–284. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Beg, A.; Shah, S.N.A.; Parveen, R. Deep learning approaches for interpreting Non-coding regions in Ovarian cancer. In Deep Learning in Genetics and Genomics; Elsevier: Amsterdam, The Netherlands, 2025; pp. 71–86. [Google Scholar]
- Hong, M.K.; Ding, D.C. Early diagnosis of ovarian cancer: A comprehensive review of the advances, challenges, and future directions. Diagnostics 2025, 15, 406. [Google Scholar] [CrossRef]
- Xu, Y.; Jia, Z.; Wang, L.B.; Ai, Y.; Zhang, F.; Lai, M.; Chang, E.I.C. Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features. BMC Bioinform. 2017, 18, 281. [Google Scholar] [CrossRef]
- Nahid, A.A.; Kong, Y. Histopathological breast-image classification using local and frequency domains by convolutional neural network. Information 2018, 9, 19. [Google Scholar] [CrossRef]
- Ahmed, A.; Xiaoyang, Z.; Tunio, M.H.; Butt, M.H.; Shah, S.A.; Chengxiao, Y.; Pirzado, F.A.; Aziz, A. OCCNET: Improving Imbalanced Multi-Centred Ovarian Cancer Subtype Classification in Whole Slide Images. In Proceedings of the 2023 20th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 15–17 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
- Wang, Y.; Lin, W.; Zhuang, X.; Wang, X.; He, Y.; Li, L.; Lyu, G. Advances in artificial intelligence for the diagnosis and treatment of ovarian cancer. Oncol. Rep. 2024, 51, 1–17. [Google Scholar] [CrossRef]
- Jahanifar, M.; Raza, M.; Xu, K.; Vuong, T.T.L.; Jewsbury, R.; Shephard, A.; Zamanitajeddin, N.; Kwak, J.T.; Raza, S.E.A.; Minhas, F.; et al. Domain generalization in computational pathology: Survey and guidelines. ACM Comput. Surv. 2025, 57, 1–37. [Google Scholar] [CrossRef]
- Yoon, J.S.; Oh, K.; Shin, Y.; Mazurowski, M.A.; Suk, H.I. Domain generalization for medical image analysis: A review. Proc. IEEE 2024, 112, 1583–1609. [Google Scholar] [CrossRef]
- Herath, H.; Herath, H.; Madusanka, N.; Lee, B.I. A Systematic Review of Medical Image Quality Assessment. J. Imaging 2025, 11, 100. [Google Scholar] [CrossRef] [PubMed]
- Koçak, B.; Ponsiglione, A.; Stanzione, A.; Bluethgen, C.; Santinha, J.; Ugga, L.; Huisman, M.; Klontzas, M.E.; Cannella, R.; Cuocolo, R. Bias in artificial intelligence for medical imaging: Fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagn. Interv. Radiol. 2025, 31, 75. [Google Scholar] [CrossRef] [PubMed]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Xi, R.; Ahmed, A.; Zeng, X.; Hou, M. A novel transformers-based external attention framework for breast cancer diagnosis. Biomed. Signal Process. Control 2025, 110, 108065. [Google Scholar] [CrossRef]
- Cornia, M.; Stefanini, M.; Baraldi, L.; Cucchiara, R. Meshed-memory transformer for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10578–10587. [Google Scholar]
- Huo, Y.; Gang, S.; Guan, C. FCIHMRT: Feature cross-layer interaction hybrid method based on Res2Net and transformer for remote sensing scene classification. Electronics 2023, 12, 4362. [Google Scholar] [CrossRef]
- Tan, F.; Zhai, M.; Zhai, C. Foreign object detection in urban rail transit based on deep differentiation segmentation neural network. Heliyon 2024, 10, e37072. [Google Scholar] [CrossRef]
- Tang, Y.; Yi, J.; Tan, F. Facial micro-expression recognition method based on CNN and transformer mixed model. Int. J. Biom. 2024, 16, 463–477. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2021; pp. 10012–10022. [Google Scholar]
- Alahmadi, A. Towards ovarian cancer diagnostics: A vision transformer-based computer-aided diagnosis framework with enhanced interpretability. Results Eng. 2024, 23, 102651. [Google Scholar] [CrossRef]
- Adak, D.; Sonawane, S.; Verma, G. OvarianNet-Ca: A Hybrid Cross-Attention Ensemble Model Approach Using MixTransformer and EfficientNet. In Proceedings of the 2025 4th OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 5.0, Raigarh, India, 9–11 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar]
- Liu, P.; Fu, B.; Ye, F.; Yang, R.; Ji, L. DSCA: A dual-stream network with cross-attention on whole-slide image pyramids for cancer prognosis. Expert Syst. Appl. 2023, 227, 120280. [Google Scholar] [CrossRef]
- Gupta, D.K.; Mago, G.; Chavan, A.; Prasad, D.K. Patch gradient descent: Training neural networks on very large images. arXiv 2023, arXiv:2301.13817. [Google Scholar] [CrossRef]
- Yufei, C.; Liu, Z.; Liu, X.; Liu, X.; Wang, C.; Kuo, T.W.; Xue, C.J.; Chan, A.B. Bayes-MIL: A New Probabilistic Perspective on Attention-based Multiple Instance Learning for Whole Slide Images. In Proceedings of the the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 25–29 April 2022. [Google Scholar]
- Chen, R.J.; Chen, C.; Li, Y.; Chen, T.Y.; Trister, A.D.; Krishnan, R.G.; Mahmood, F. Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16144–16155. [Google Scholar]
- Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient datao. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef]
- Xu, W.; Fu, Y.L.; Zhu, D. ResNet and its application to medical image processing: Research progress and challenges. Comput. Methods Programs Biomed. 2023, 240, 107660. [Google Scholar] [CrossRef] [PubMed]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Hassanin, M.; Anwar, S.; Radwan, I.; Khan, F.S.; Mian, A. Visual attention methods in deep learning: An in-depth survey. Inf. Fusion 2024, 108, 102417. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- d’Ascoli, S.; Touvron, H.; Leavitt, M.L.; Morcos, A.S.; Biroli, G.; Sagun, L. Convit: Improving vision transformers with soft convolutional inductive biases. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 2286–2296. [Google Scholar]
- Hsu, T.M.H.; Qi, H.; Brown, M. Measuring the effects of non-identical data distribution for federated visual classification. arXiv 2019, arXiv:1909.06335. [Google Scholar] [CrossRef]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3–18. [Google Scholar]







| Proxy Composition | Method | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| Camelyon16 | FedDF | 31.0 | 30.2 | 31.8 |
| FedMD | 30.2 | 29.5 | 31.0 | |
| FedRAD | 31.8 | 32.5 | 31.1 | |
| HFed-MIL | 34.1 | 34.0 | 33.5 | |
| BreakHis | FedDF | 25.0 | 26.0 | 24.3 |
| FedMD | 29.0 | 28.2 | 29.8 | |
| FedRAD | 26.3 | 27.0 | 25.5 | |
| HFed-MIL | 32.0 | 32.8 | 31.2 | |
| Both (Camelyon16 + BreakHis) | FedDF | 28.2 | 27.6 | 28.9 |
| FedMD | 29.5 | 29.0 | 30.1 | |
| FedRAD | 29.0 | 29.1 | 28.8 | |
| HFed-MIL | 35.0 | 35.6 | 34.3 |
| Shared Information | MIA AUC |
|---|---|
| Gradients | 0.85 |
| Patch Embedding Feature | 0.80 |
| Class Logits | 0.55 |
| Gradient-Based Attention | 0.60 |
| HFed-MIL (Logits + Attention) | 0.62 |
| Method | Training Time/Round (s) | Communication Cost | GPU Memory (MB) |
|---|---|---|---|
| FedAvg | 20.4 | 0.37 GB | 3100 |
| FedDF | 28.4 | 7.8 KB | 3100 |
| FedMD | 29.1 | 15.6 KB | 3250 |
| FedRAD | 28.9 | 7.8 KB | 3180 |
| HFed-MIL | 32.5 | 0.32 MB | 3450 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zeng, X.; Ahmed, A.; Tunio, M.H. HFed-MIL: Patch Gradient-Based Attention Distillation Federated Learning for Heterogeneous Multi-Site Ovarian Cancer Whole-Slide Image Analysis. Electronics 2025, 14, 3600. https://doi.org/10.3390/electronics14183600
Zeng X, Ahmed A, Tunio MH. HFed-MIL: Patch Gradient-Based Attention Distillation Federated Learning for Heterogeneous Multi-Site Ovarian Cancer Whole-Slide Image Analysis. Electronics. 2025; 14(18):3600. https://doi.org/10.3390/electronics14183600
Chicago/Turabian StyleZeng, Xiaoyang, Awais Ahmed, and Muhammad Hanif Tunio. 2025. "HFed-MIL: Patch Gradient-Based Attention Distillation Federated Learning for Heterogeneous Multi-Site Ovarian Cancer Whole-Slide Image Analysis" Electronics 14, no. 18: 3600. https://doi.org/10.3390/electronics14183600
APA StyleZeng, X., Ahmed, A., & Tunio, M. H. (2025). HFed-MIL: Patch Gradient-Based Attention Distillation Federated Learning for Heterogeneous Multi-Site Ovarian Cancer Whole-Slide Image Analysis. Electronics, 14(18), 3600. https://doi.org/10.3390/electronics14183600


