A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment
Abstract
1. Introduction
2. Related Works
2.1. Traditional Approaches to Sperm Morphology Classification
2.2. Knowledge Distillation for Sperm Morphology Image Classification
3. Materials and Methods
3.1. Dataset Information and Preprocessing
- Rotation (): Selected to introduce orientation variance, mimicking the natural angular variations of sperm cells on a slide. Larger rotations were avoided to prevent the elongated tail structures from being cut off by the crop boundaries.
- Translation (Width and Height Shift 0–0.1): Minor spatial shifts were incorporated to compensate for potential slight misalignments or off-center positioning that occurred during the manual, expert-guided bounding box cropping phase.
- Horizontal Flipping: Applied with a probability of to enhance viewpoint diversity, reflecting the arbitrary left-right positioning of the cells under the microscope.
- Shear () and Zoom (0.9–1.0): These subtle operations were utilized to simulate the physical, fluid-induced bending of the sperm tail and slight variations in focal magnification, respectively.
3.2. The Implementation Details of Knowledge Distillation
3.3. Multi-Teacher Knowledge Distillation Framework
3.4. Evaluation Metrics
4. Results
- Majority Class Bias and the Class 1–3 Trade-off: In the BesLab dataset (Figure 5a), a critical interaction is observed between ‘Amorphous Head’ (Class 1) and ‘Narrow Acrosome’ (Class 3). The Multi-Teacher model achieves a substantial improvement for Class 1 (Diagonal: +44), significantly reducing instances where Amorphous heads were mistaken for Narrow Acrosome (Row 1, Column 3: −41). However, this improvement comes with a penalty: the model becomes overly conservative regarding Class 3, leading to a sharp drop in its diagonal accuracy (−66) and a corresponding surge in misclassifying actual Narrow Acrosomes as Amorphous (Row 3, Column 1: +42). This suggests that the Multi-Teacher ensemble effectively minimizes false positives for the majority class but tends to absorb morphologically similar minority samples into the majority category.
- Consistent Robustness in Vacuolated Head (Class 8) and Asymmetric Neck (Class 9): Unlike the volatile classes, specific morphological defects show stable and significant improvements across all datasets. ‘Vacuolated Head’ (Class 8) exhibits remarkable diagonal gains, particularly in BesLab (+25), followed by Histoplus (+13) and GBL (+10). Similarly, ‘Asymmetric Neck’ (Class 9) shows consistent diagonal increases of +11 (BesLab), +14 (Histoplus), and +22 (GBL). This indicates that the Multi-Teacher architecture extracts more distinctive features for these specific anomalies, successfully distinguishing them from similar defects such as ‘Pyriform Head’ (Class 5), where confusion with Class 8 was reduced by 12 samples in the BesLab dataset.
- Systematic Sensitivity Struggles in Thick Neck (Class 10): Conversely, for ‘Thick Neck’ (Class 10), the Multi-Teacher model exhibits a consistent performance deterioration across all datasets: BesLab (−20), Histoplus (−32), and GBL (−18). This widespread drop in true positives, accompanied by increased confusion dispersed among various classes, suggests a systematic challenge. While we cannot definitively isolate the cause without further ablation studies, we hypothesize that the specific inductive bias of the Single-Teacher model may have been more advantageous for defining neck thickness. It is possible that the ensemble averaging process softened the sharp decision boundaries required to distinguish this specific midpiece defect from normal morphology.

5. Discussion
Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shahzad, S.; Ilyas, M.; Lali, M.I.U.; Rauf, H.T.; Kadry, S.; Nasr, E.A. Sperm abnormality detection using sequential deep neural network. Mathematics 2023, 11, 515. [Google Scholar] [CrossRef]
- Guo, Y.; Li, J.; Hong, K.; Wang, B.; Zhu, W.; Li, Y.; Lv, T.; Wang, L. Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification. Appl. Sci. 2024, 14, 11303. [Google Scholar] [CrossRef]
- Shojaedini, S.V.; Kermani, A.; Nafisi, V. A new method for sperm detection in human semen: Combination of hypothesis testing and local mapping of wavelet sub-bands. Iran. J. Med. Phys. 2012, 9, 283–292. [Google Scholar]
- Bijar, A.; Benavent, A.P.; Mikaeili, M.; Khayati, R. Fully automatic identification and discrimination of sperm’s parts in microscopic images of stained human semen smear. J. Biomed. Sci. Eng. 2012, 5, 384–395. [Google Scholar] [CrossRef][Green Version]
- Shi, Y.; Wang, Y.K.; Tian, X.P.; Zhang, T.Y.; Yao, B.; Wang, H.; Shao, Y.; Wang, C.C.; Zeng, R. SPEHEATAL: A Cluster-Enhanced Segmentation Method for Sperm Morphology Analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 729–737. [Google Scholar]
- Lei, P.; Saadat, M.; Hassani, M.G.; Shu, C. Deep Learning Models for Multi-Part Morphological Segmentation and Evaluation of Live Unstained Human Sperm. Sensors 2025, 25, 3093. [Google Scholar] [CrossRef]
- Tseng, K.K.; Li, Y.; Hsu, C.Y.; Huang, H.N.; Zhao, M.; Ding, M. Computer-assisted system with multiple feature fused support vector machine for sperm morphology diagnosis. BioMed Res. Int. 2013, 2013, 687607. [Google Scholar] [CrossRef]
- Keskenler, M.F.; Haşiloğlu, A.; Özyer, G.T.; Özyer, B.; Şımşek, E. Sperm Detection and Analysis Using Feature Description Algorithms. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU); IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
- Diyasa, I.G.S.M.; Prasetya, D.A.; Kuswardhani, H.A.C.; Halim, C. Detection of Abnormal Human Sperm Morphology Using Support Vector Machine (SVM) Classification. Inf. Technol. Int. J. 2024, 2, 57–63. [Google Scholar] [CrossRef]
- Mirsky, S.K.; Barnea, I.; Levi, M.; Greenspan, H.; Shaked, N.T. Automated analysis of individual sperm cells using stain-free interferometric phase microscopy and machine learning. Cytom. Part A 2017, 91, 893–900. [Google Scholar] [CrossRef]
- Kılıç, Ş. Deep feature engineering for accurate sperm morphology classification using CBAM-enhanced ResNet50. PLoS ONE 2025, 20, e0330914. [Google Scholar] [CrossRef] [PubMed]
- Liang, B.; Wang, M. Deep learning-based approach for sperm morphology analysis. BMC Urol. 2025, 25, 261. [Google Scholar] [CrossRef] [PubMed]
- Suleman, M.; Ilyas, M.; Lali, M.I.U.; Rauf, H.T.; Kadry, S. A review of different deep learning techniques for sperm fertility prediction. AIMS Math. 2023, 8, 16360–16416. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, J.; Zha, X.; Zhou, Y.; Cao, Y.; Chen, D. Improving human sperm head morphology classification with unsupervised anatomical feature distillation. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI); IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
- Iqbal, I.; Mustafa, G.; Ma, J. Deep learning-based morphological classification of human sperm heads. Diagnostics 2020, 10, 325. [Google Scholar] [CrossRef]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Chen, W.; Gao, L.; Li, X.; Shen, W. Lightweight convolutional neural network with knowledge distillation for cervical cells classification. Biomed. Signal Process. Control 2022, 71, 103177. [Google Scholar] [CrossRef]
- Belinga, A.G.; Tekouabou Koumetio, C.S.; El Haziti, M.; El Hassouni, M. Knowledge Distillation in Image Classification: The Impact of Datasets. Computers 2024, 13, 184. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2017; pp. 843–852. [Google Scholar]
- Aktas, A.; Serbes, G.; Yigit, M.H.; Aydin, N.; Uzun, H.; Ilhan, H.O. Hi-LabSpermMorpho: A Novel Expert-Labeled Dataset with Extensive Abnormality Classes for Deep Learning-Based Sperm Morphology Analysis. IEEE Access 2024, 12, 196070–196091. [Google Scholar] [CrossRef]
- Şavkay, O.L.; Cesur, E.; Yalçın, M.E.; Tavşanoğlu, V. Sperm Morphology Analysis with CNN based algorithms. In Proceedings of the 2014 14th International Workshop on Cellular Nanoscale Networks and Their Applications (CNNA); IEEE: Piscataway, NJ, USA, 2014; pp. 1–2. [Google Scholar]
- Alquézar-Baeta, C.; Gimeno-Martos, S.; Miguel-Jiménez, S.; Santolaria, P.; Yániz, J.; Palacín, I.; Casao, A.; Cebrián-Pérez, J.Á.; Muiño-Blanco, T.; Pérez-Pé, R. OpenCASA: A new open-source and scalable tool for sperm quality analysis. PLoS Comput. Biol. 2019, 15, e1006691. [Google Scholar] [CrossRef]
- Kruger, T.F.; DuToit, T.C.; Franken, D.R.; Acosta, A.A.; Oehninger, S.C.; Menkveld, R.; Lombard, C.J. A new computerized method of reading sperm morphology (strict criteria) is as efficient as technician reading. Fertil. Steril. 1993, 59, 202–209. [Google Scholar] [CrossRef]
- Yi, W.J.; Park, K.S.; Paick, J.S. Morphological classification of sperm heads using artificial neural networks. In MEDINFO’98; IOS Press: Amsterdam, The Netherlands, 1998; pp. 1071–1074. [Google Scholar]
- Maalej, R.; Abdelkefi, O.; Daoud, S. Advancements in automated sperm morphology analysis: A deep learning approach with comprehensive classification and model evaluation. Multimed. Tools Appl. 2025, 84, 27345–27378. [Google Scholar] [CrossRef]
- Adinugroho, S.; Nakazawa, A. Deep learning-based sperm motility and morphology estimation on stacked color-coded MotionFlow. Inform. Med. Unlocked 2024, 45, 101459. [Google Scholar] [CrossRef]
- Nguyen, V.D.; Ngo, T.T.H.; Duong, L.M. Automated Sperm Morphology Analysis with Deep Learning and Feature Extraction. In Proceedings of the 2024 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES); IEEE: Piscataway, NJ, USA, 2024; pp. 458–463. [Google Scholar]
- Xu, Y.; Chen, Y.; Zhang, B.; Yan, Y.; Liao, H.; Liu, R. Deep learning-based morphological analysis of human sperm. Med. Biol. Eng. Comput. 2026, 64, 49–59. [Google Scholar] [CrossRef]
- Abdelkefi, O.; Maalej, R.; Rebai, T.; Sellami, A.; Daoud, S. Deep-learning based model for sperm morphology assessment using the SMD/MSS dataset. Future Sci. OA 2025, 11, 2583010. [Google Scholar] [CrossRef]
- Shaker, F.; Monadjemi, S.A.; Alirezaie, J.; Naghsh-Nilchi, A.R. A dictionary learning approach for human sperm heads classification. Comput. Biol. Med. 2017, 91, 181–190. [Google Scholar] [CrossRef]
- Chang, V.; Garcia, A.; Hitschfeld, N.; Härtel, S. Gold-standard for computer-assisted morphological sperm analysis. Comput. Biol. Med. 2017, 83, 143–150. [Google Scholar] [CrossRef]
- Javadi, S.; Mirroshandel, S.A. A novel deep learning method for automatic assessment of human sperm images. Comput. Biol. Med. 2019, 109, 182–194. [Google Scholar] [CrossRef] [PubMed]
- Turkoglu, A.K.; Serbes, G.; Uzun, H.; Aktas, A.; Yigit, M.H.; Ilhan, H.O. Category-aware two-stage divide-and-ensemble framework for sperm morphology classification. Diagnostics 2025, 15, 2234. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Yoon, K.J. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3048–3068. [Google Scholar] [CrossRef] [PubMed]
- Nabipour, A.; Shams Nejati, M.J.; Boreshban, Y.; Mirroshandel, S.A. Less-supervised learning with knowledge distillation for sperm morphology analysis. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2024, 12, 2347978. [Google Scholar] [CrossRef]
- World Health Organization. WHO Laboratory Manual for the Examination and Processing of Human Semen; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
- TorchVision Maintainers and Contributors. TorchVision: PyTorch’s Computer Vision Library, 2016. Available online: https://github.com/pytorch/vision (accessed on 16 April 2026).
- Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
- Ilham, W.; Ahmad, A. A Comprehensive Review of ConvNeXt Architecture in Image Classification: Performance, Applications, and Prospects. IJACI Int. J. Adv. Comput. Inform. 2026, 2, 108–114. [Google Scholar] [CrossRef]
- Hosseini, A.; Serag, A. Is synthetic data generation effective in maintaining clinical biomarkers? Investigating diffusion models across diverse imaging modalities. Front. Artif. Intell. 2025, 7, 1454441. [Google Scholar] [CrossRef] [PubMed]




| Study | Distillation Strategy | Dataset(s) | Classification Scope | Limitations & Addressed Gaps |
|---|---|---|---|---|
| Nabipour et al. [38] | Less-supervised learning (Anomaly detection via normal samples) | MHSMA | Binary (Normal vs. Abnormal parts) | Focuses solely on anomaly detection rather than multi-class defect categorization. Uses a single dataset domain. |
| Zhang et al. [14] | Unsupervised anatomical feature distillation (pseudo-masks) | SCIAN, HuSHeM | Multi-class (Head only) | Limited to sperm head morphology. Relies on single-teacher distillation, prone to single-domain architectural biases. |
| Proposed Method | Multi-Teacher Soft Distillation with Cross-Dataset Training | Hi-LabSpermMorpho (BesLab, Histoplus, GBL) | 18 fine-grained classes (Head, Neck, Tail) | Addresses extreme class imbalance and cross-domain stain variations by aggregating diverse architectural features (SwinV2 + ConvNeXtV2). |
| Morphology Class | BesLab | Histoplus | GBL |
|---|---|---|---|
| Amorf Head | 3572 | 2861 | 1537 |
| Double Head | 48 | 35 | 52 |
| Narrow Acrosome | 2055 | 1992 | 1355 |
| Pin Head | 782 | 423 | 118 |
| Pyriform Head | 979 | 1843 | 1123 |
| Round Head | 257 | 415 | 201 |
| Tapered Head | 1399 | 1356 | 1150 |
| Vacuolated Head | 1697 | 604 | 509 |
| Asymmetric Neck | 366 | 492 | 458 |
| Thick Neck | 1978 | 1340 | 934 |
| Thin Neck | 192 | 173 | 180 |
| Twisted Neck | 1154 | 1566 | 855 |
| Curly Tail | 1445 | 1385 | 955 |
| Double Tail | 200 | 183 | 81 |
| Long Tail | 42 | 128 | 106 |
| Short Tail | 991 | 538 | 359 |
| Twisted Tail | 706 | 2502 | 2277 |
| Normal | 599 | 433 | 381 |
| Strategy | Padding | Primary Transform | Secondary Transform |
|---|---|---|---|
| 1 | Edge (50 px) | Rotation () | — |
| 2 | Edge (30 px) | Height Shift (0–0.1) | Shear () |
| 3 | Edge (30 px) | Horizontal Flip () | Zoom (0.9–1.0) |
| 4 | Reflect (20 px) | Width Shift (0–0.1) | Height Shift (0–0.1) |
| 5 | Reflect (20 px) | Horizontal Flip () | Height Shift (0–0.1) |
| Parameters | Student Model | Teacher Model |
|---|---|---|
| Optimizer | AdamW | AdamW |
| Learning Rate | 0.00002 | 0.00002 |
| Learning Rate Scheduler | StepLR | StepLR |
| LR Scheduler Gamma | 0.45 | 0.5 |
| LR Scheduler Step Size | 1 | 2 |
| Weight Decay | 0.0001 | 0.01 |
| Epochs | 10 | 4 |
| Batch Size | 4 | 4 |
| Stochastic Depth | 0.3 | - |
| Label Smoothing | 0.1 | - |
| Distillation Weight | 0.3 | - |
| Distillation Temperature | 8 | - |
| Exp. | Aug | Dist | SD | LS | TT | Accuracy | Wei. F1 | Wei. Precision | Wei. Recall | Macro F1 |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✘ | ✘ | ✘ | ✘ | ✘ | 0.6929 ± 0.0050 | 0.6886 | 0.6865 | 0.6929 | 0.6433 ± 0.0169 |
| 2 | ✔ | ✘ | ✘ | ✘ | ✘ | 0.7002 ± 0.0037 | 0.6958 | 0.6945 | 0.7002 | 0.6587 ± 0.0081 |
| 3 | ✔ | ✔ | ✘ | ✘ | ✘ | 0.7063 ± 0.0039 | 0.6993 | 0.6984 | 0.7063 | 0.6444 ± 0.0143 |
| 4 | ✔ | ✔ | ✔ | ✘ | ✘ | 0.7063 ± 0.0053 | 0.6996 | 0.6978 | 0.7063 | 0.6382 ± 0.0153 |
| 5 | ✔ | ✔ | ✘ | ✔ | ✘ | 0.7067 ± 0.0046 | 0.6998 | 0.6993 | 0.7067 | 0.6456 ± 0.0119 |
| 6 | ✔ | ✔ | ✔ | ✔ | ✘ | 0.7072 ± 0.0047 | 0.7004 | 0.6988 | 0.7072 | 0.6413 ± 0.0187 |
| 7 | ✔ | ✔ | ✔ | ✔ | ✔ | 0.7082 ± 0.0064 | 0.7033 | 0.7033 | 0.7082 | 0.6624 ± 0.0102 |
| Comparison | Metric | t-Value | p-Value | Significance () |
|---|---|---|---|---|
| Exp. 7 vs. Exp. 1 | Macro F1 | 3.1830 | 0.0334 | Yes |
| Exp. 7 vs. Exp. 2 | Macro F1 | 0.8931 | 0.4223 | No |
| Exp. 7 vs. Exp. 3 | Macro F1 | 4.2303 | 0.0134 | Yes |
| Exp. 7 vs. Exp. 4 | Macro F1 | 3.9221 | 0.0172 | Yes |
| Exp. 7 vs. Exp. 5 | Macro F1 | 2.2154 | 0.0911 | No |
| Exp. 7 vs. Exp. 6 | Macro F1 | 3.1358 | 0.0350 | Yes |
| Method | Model Configuration | Dataset | Accuracy | Wei. F1 | Wei. Precision | Wei. Recall | Macro F1 |
|---|---|---|---|---|---|---|---|
| Single Teacher | SwinV2-large | BesLab | 0.7082 ± 0.0064 | 0.7033 | 0.7033 | 0.7082 | 0.6624 ± 0.0102 |
| Histoplus | 0.7348 ± 0.0057 | 0.7302 | 0.7312 | 0.7348 | 0.6818 ± 0.0120 | ||
| GBL | 0.7163 ± 0.0073 | 0.7095 | 0.7115 | 0.7163 | 0.6644 ± 0.0113 | ||
| Multi Teacher | SwinV2-large + EfficientNetV2-m | BesLab | 0.7088 ± 0.0067 | 0.7040 | 0.7036 | 0.7088 | 0.6666 ± 0.0101 |
| Histoplus | 0.7344 ± 0.0059 | 0.7313 | 0.7318 | 0.7344 | 0.6881 ± 0.0154 | ||
| GBL | 0.7129 ± 0.0042 | 0.7073 | 0.7076 | 0.7129 | 0.6630 ± 0.0133 | ||
| SwinV2-large + ConvNeXtV2-large | BesLab | 0.7094 ± 0.0068 | 0.7052 | 0.7053 | 0.7094 | 0.6765 ± 0.0056 | |
| Histoplus | 0.7361 ± 0.0047 | 0.7324 | 0.7327 | 0.7361 | 0.6895 ± 0.0093 | ||
| GBL | 0.7153 ± 0.0043 | 0.7107 | 0.7114 | 0.7153 | 0.6726 ± 0.0065 |
| Dataset | Comparison (Multi vs. Single) | t-Value | p-Value | Significance () |
|---|---|---|---|---|
| BesLab | SwinV2 + ConvNeXtV2-large vs. SwinV2-large | 4.9327 | 0.0079 | Yes |
| Histoplus | SwinV2 + ConvNeXtV2-large vs. SwinV2-large | 2.8506 | 0.0464 | Yes |
| GBL | SwinV2 + ConvNeXtV2-large vs. SwinV2-large | 3.2177 | 0.0324 | Yes |
| Classes | BesLab | Histoplus | GBL | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F1 | Prec. | Rec. | PR-AUC | F1 | Prec. | Rec. | PR-AUC | F1 | Prec. | Rec. | PR-AUC | |
| Amorf Head | 0.6345 | 0.6400 | 0.6291 | 0.6926 | 0.6457 | 0.6532 | 0.6383 | 0.7055 | 0.6199 | 0.6050 | 0.6357 | 0.6626 |
| Asymmetric Neck | 0.2476 | 0.4088 | 0.1776 | 0.2337 | 0.3140 | 0.4474 | 0.2419 | 0.3437 | 0.3295 | 0.4715 | 0.2533 | 0.3489 |
| Curly Tail | 0.9003 | 0.8812 | 0.9203 | 0.9617 | 0.8806 | 0.8520 | 0.9111 | 0.9449 | 0.8250 | 0.7834 | 0.8712 | 0.8983 |
| Double Head | 0.7750 | 0.9687 | 0.6458 | 0.7420 | 0.6349 | 0.7143 | 0.5714 | 0.7699 | 0.8132 | 0.9487 | 0.7115 | 0.9191 |
| Double Tail | 0.8763 | 0.9477 | 0.8150 | 0.9044 | 0.8626 | 0.8674 | 0.8579 | 0.9360 | 0.7792 | 0.8219 | 0.7407 | 0.8396 |
| Long Tail | 0.3390 | 0.5882 | 0.2381 | 0.4534 | 0.3676 | 0.5965 | 0.2656 | 0.4562 | 0.1935 | 0.6667 | 0.1132 | 0.3355 |
| Narrow Acrosome | 0.6706 | 0.6480 | 0.6947 | 0.7390 | 0.6967 | 0.6737 | 0.7214 | 0.7716 | 0.6832 | 0.6837 | 0.6827 | 0.7567 |
| Normal | 0.8545 | 0.8647 | 0.8445 | 0.9250 | 0.7173 | 0.7258 | 0.7090 | 0.7968 | 0.7378 | 0.8153 | 0.6737 | 0.8192 |
| Pin Head | 0.9910 | 0.9961 | 0.9859 | 0.9915 | 0.9905 | 0.9976 | 0.9835 | 0.9928 | 0.9573 | 0.9655 | 0.9492 | 0.9762 |
| Pyriform Head | 0.6459 | 0.6175 | 0.6769 | 0.6924 | 0.7461 | 0.7363 | 0.7562 | 0.8210 | 0.7317 | 0.6991 | 0.7676 | 0.8120 |
| Round Head | 0.5500 | 0.6612 | 0.4708 | 0.6284 | 0.6192 | 0.6316 | 0.6072 | 0.6833 | 0.5514 | 0.6036 | 0.5075 | 0.6552 |
| Short Tail | 0.8757 | 0.8735 | 0.8779 | 0.9382 | 0.8542 | 0.8707 | 0.8383 | 0.9118 | 0.7588 | 0.7686 | 0.7493 | 0.8090 |
| Tapered Head | 0.6846 | 0.6687 | 0.7012 | 0.7658 | 0.6873 | 0.6873 | 0.6873 | 0.7599 | 0.6791 | 0.6782 | 0.6800 | 0.7535 |
| Thick Neck | 0.6582 | 0.6619 | 0.6545 | 0.7358 | 0.6344 | 0.6023 | 0.6701 | 0.6934 | 0.6199 | 0.6115 | 0.6285 | 0.6451 |
| Thin Neck | 0.3211 | 0.4486 | 0.2500 | 0.2926 | 0.4364 | 0.5882 | 0.3468 | 0.4375 | 0.4158 | 0.5859 | 0.3222 | 0.4831 |
| Twisted Neck | 0.7975 | 0.7725 | 0.8241 | 0.8768 | 0.8276 | 0.8290 | 0.8263 | 0.9082 | 0.7760 | 0.7265 | 0.8327 | 0.8612 |
| Twisted Tail | 0.7211 | 0.6732 | 0.7762 | 0.8230 | 0.8782 | 0.8776 | 0.8782 | 0.9413 | 0.8851 | 0.8758 | 0.8946 | 0.9445 |
| Vacuolated Head | 0.6350 | 0.6396 | 0.6305 | 0.7007 | 0.6177 | 0.5906 | 0.6177 | 0.6558 | 0.6021 | 0.5935 | 0.6110 | 0.6210 |
| Accuracy | 0.7094 | 0.7361 | 0.7163 | |||||||||
| Macro Avg. | 0.6765 | 0.7200 | 0.6563 | 0.7276 | 0.6895 | 0.7190 | 0.6755 | 0.7516 | 0.6644 | 0.7169 | 0.6458 | 0.7300 |
| Weighted Avg. | 0.7052 | 0.7053 | 0.7094 | 0.7673 | 0.7324 | 0.7327 | 0.7361 | 0.7965 | 0.7095 | 0.7115 | 0.7063 | 0.7706 |
| Role | Model Architecture | Parameters (M) | Inference Time per Image (ms) |
|---|---|---|---|
| Teacher 1 | SwinV2-large | 197 | 11.64 |
| Teacher 2 | ConvNeXtV2-large | 198 | 9.78 |
| Student | SwinV2-base | 88 | 6.64 |
| Staining Technique | Class Number | Model | Accuracy | Wei. F1 | Wei. Precision | Wei. Recall |
|---|---|---|---|---|---|---|
| BesLab | 18 | EfficientNetV2-M [23] | 65.05 | 64.39 | 64.35 | 65.05 |
| Splitter Model + Ensemble [36] | 69.43 | 68.77 | 68.84 | 69.43 | ||
| Proposed Multi-Teacher Method | 70.94 | 70.52 | 70.53 | 70.94 | ||
| Histoplus | 18 | EfficientNetV2-M [23] | 67.42 | 67.13 | 67.00 | 67.42 |
| Splitter Model + Ensemble [36] | 71.34 | 70.60 | 70.67 | 71.34 | ||
| Proposed Multi-Teacher Method | 73.61 | 73.24 | 73.27 | 73.61 | ||
| GBL | 18 | EfficientNetV2-M [23] | 63.58 | 63.10 | 63.03 | 63.46 |
| Splitter Model + Ensemble [36] | 68.41 | 67.54 | 67.73 | 68.41 | ||
| Proposed Multi-Teacher Method | 71.63 | 70.95 | 71.15 | 71.63 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tutay, O.E.; Ilhan, H.O.; Uzun, H.; Huner Yigit, M.; Serbes, G. A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment. Diagnostics 2026, 16, 1230. https://doi.org/10.3390/diagnostics16081230
Tutay OE, Ilhan HO, Uzun H, Huner Yigit M, Serbes G. A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment. Diagnostics. 2026; 16(8):1230. https://doi.org/10.3390/diagnostics16081230
Chicago/Turabian StyleTutay, Osman Emre, Hamza Osman Ilhan, Hakkı Uzun, Merve Huner Yigit, and Gorkem Serbes. 2026. "A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment" Diagnostics 16, no. 8: 1230. https://doi.org/10.3390/diagnostics16081230
APA StyleTutay, O. E., Ilhan, H. O., Uzun, H., Huner Yigit, M., & Serbes, G. (2026). A Multi-Teacher Knowledge Distillation Framework for Enhancing the Robustness of Automated Sperm Morphology Assessment. Diagnostics, 16(8), 1230. https://doi.org/10.3390/diagnostics16081230

