Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification

Guo, Yunbo; Li, Junbo; Hong, Kaicheng; Wang, Bilin; Zhu, Wenliang; Li, Yuefeng; Lv, Tiantian; Wang, Lirong

doi:10.3390/app142311303

Open AccessArticle

Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification

by

Yunbo Guo

^1,*,

Junbo Li

¹,

Kaicheng Hong

¹,

Bilin Wang

¹,

Wenliang Zhu

¹,

Yuefeng Li

²,

Tiantian Lv

¹ and

Lirong Wang

^2,*

¹

Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou 215163, China

²

School of Electronic and Information Engineering, Soochow University, Donghuan Road No. 50, Suzhou 215031, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(23), 11303; https://doi.org/10.3390/app142311303

Submission received: 6 November 2024 / Revised: 28 November 2024 / Accepted: 3 December 2024 / Published: 4 December 2024

Download

Browse Figures

Versions Notes

Abstract

Male infertility remains a significant global health concern, with abnormal sperm head morphology recognized as a key factor impacting fertility. Traditional analysis of sperm morphology through manual microscopy is labor-intensive and susceptible to variability among observers. In this study, we introduce a deep learning framework designed to automate sperm head classification, integrating EdgeSAM for precise segmentation with a Sperm Head Pose Correction Network to standardize orientation and position. The classification network employs flip feature fusion and deformable convolutions to capture symmetrical characteristics, which enhances classification accuracy across morphological variations. Our model achieves a test accuracy of 97.5% on the HuSHem and Chenwy datasets, outperforming existing methods and demonstrating greater robustness to rotational and translational transformations. This approach offers a streamlined, automated solution for sperm morphology analysis, providing a reliable tool to support clinical fertility diagnostics and research applications.

Keywords:

sperm head classification; sperm head segmentation; infertility; deep learning; feature fusion

1. Introduction

Infertility is a widespread health concern, affecting about 15% of couples globally, with male-related factors responsible for 30–40% of cases. Abnormal sperm head morphology is one of the main factors contributing to this problem [1]. A typical sperm head is oval and consists primarily of the acrosome and nucleus. Upon reaching the egg, the acrosome releases hydrolytic enzymes that degrade the protective layers, corona radiata, and zona pellucida, allowing the sperm’s genetic material to enter and initiate fertilization [2]. Abnormal size, shape, or structure of the sperm head can impair motility and compromise its ability to penetrate protective barriers, reducing fertilization potential [3]. Accurate sperm morphology analysis is essential for fertility prognosis and treatment planning.

There exists a wide range of abnormal sperm head morphologies that can impair fertility and reduce the likelihood of successful fertilization. Among these, amorphous, pyriform, and tapered forms are the most frequently observed [4], as illustrated in Figure 1. Amorphous heads lack symmetry and defined structure, with irregular borders deviating from the standard oval shape. Pyriform heads display a narrowing from base to tip, resembling a pear shape; although they maintain symmetry along the long axis, the short axis shows pronounced asymmetry. Tapered heads are excessively elongated, often ending in a sharp or pointed tip, diverging significantly from the standard oval structure. These abnormal morphologies can impair sperm motility, acrosome function, and DNA integrity, ultimately reducing fertility potential and decreasing the likelihood of successful fertilization.

Traditionally, sperm morphology has been manually evaluated by trained clinicians using microscopy. The assessment involves preparing specimens by smearing, washing, and staining semen samples. Under the microscope, at least 200 sperm heads and tails are examined for morphological features [3]. The proportion of normal sperm is then calculated to determine whether it meets clinical criteria. However, this method is labor-intensive, highly subjective, and prone to inconsistencies across different laboratories and observers [6,7]; inter-laboratory coefficients of variation in analysis results have been reported to range from 4.8% to as high as 132% [8].

Computer-aided sperm analysis (CASA) systems have been developed to automate and standardize sperm morphology evaluation to address these limitations, reducing subjectivity and improving consistency. In CASA systems, sperm head morphology recognition algorithms based on digital image processing techniques form a crucial element. Traditional algorithms have largely relied on hand-crafted features derived from images, including commonly used features such as area, length-to-width ratio, perimeter, Fourier descriptors, image moments, and image gradient [9,10]. Some studies design specialized ones targeting the symmetry of abnormal sperm heads. For example, when sperm heads are oriented horizontally, pyriform sperm exhibits symmetry along the long axis but asymmetry along the short axis (Figure 2), which can be captured using features like Quadrant Fitness and Bilateral Symmetry [4]. However, handcrafted feature extraction involves multiple steps, including image preprocessing, feature extraction, sperm head segmentation, and numerical analysis, making the algorithms overly complex with numerous hyperparameters. This complexity can lead to cumulative errors and reduced efficiency.

In recent years, research on sperm morphology analysis has increasingly focused on deep learning models, which offer the advantage of automatically learning key features from images without the need for manual feature extraction. For example, VGG16 achieved an accuracy of 94% for identifying tapered, pyriform, amorphous, and small-headed sperm on the HuSHeM dataset [5,11,12]. InceptionV3 reached an accuracy of 87.3% [13,14], and other models without significant modifications also achieved over 85% accuracy [15]. With the ongoing development of deep learning, various new techniques have been introduced. For example, a combination of Generative Adversarial Networks (GANs) and Capsule Networks (CapsNets) has been employed to synthesize sperm images, effectively addressing data imbalance issues and achieving an accuracy of 97.8% [16,17]. Similarly, SHMC-Net, a model designed for both sperm segmentation and classification, integrates features across multiple scales and has achieved a notable accuracy of 98.3% [18]. These models have demonstrated high accuracy when trained on specific datasets, surpassing traditional methods in terms of precision and robustness.

Classical ensemble methods remain effective for sperm head classification. For example, integrating SHMC-Net models with different structural variations achieved an accuracy of 99.17%. Similarly, some studies have combined Transformer and MobileNet [19] or integrated VGG16, VGG19, ResNet34, and DenseNet161 [20], with accuracy surpassing that of individual models. However, the computational complexity of these models is considerable, which limits their feasibility for practical clinical applications.

However, deep learning models are sensitive to changes in the target position and orientation, which affects their robustness and limits their applicability in diverse clinical settings [21,22]. Moreover, the sperm head under examination often overlaps with other sperm heads or noise, causing the network to extract irrelevant features, which interfere with predictions. Most studies manually standardize the position, angle, and scale of the sperm head before inputting it into the model for classification [12,23]. This approach is cumbersome and labor-intensive, and it is also influenced by human subjectivity and fatigue, making it difficult to improve the accuracy and efficiency of sperm head morphology classification in clinical settings.

In this paper, we propose an adaptive sperm head pose correction and morphology classification model. We use EdgeSAM for initial feature extraction and segmentation, suppressing irrelevant content in the feature map through prompts [24]. Then, a sperm head poses a correction network, and Rotated RoI alignment is used to normalize the sperm head’s position and orientation [25]. To leverage the symmetry of pyriform and amorphous sperm heads in morphology classification, we adapted a flip feature fusion module with deformable convolutions to align and enhance feature maps, ultimately improving classification accuracy. Lastly, this architecture has been rigorously tested across two sperm head image datasets, demonstrating its effectiveness and superiority.

The main contributions of this article are summarized as follows:

We introduce EdgeSAM into the sperm head segmentation task, using a single coordinate point as a prompt to indicate the rough location of the sperm head, enabling accurate feature extraction and segmentation for the specific sperm.
We propose a sperm head pose correction network that can accurately predict the position, angle, and orientation of the sperm head, achieving standardization with a low computational cost. This significantly improves the accuracy and efficiency of sperm head morphology classification.
We propose a flip feature fusion module that leverages the symmetry of pyriform and amorphous sperm heads by processing flipped feature maps to enhance the accuracy of sperm head morphology classification.

2. Materials and Methods

2.1. Dataset Collection and Preprocessing

Our method uses three annotations during training: contour annotations for fine-tuning EdgeSAM, acrosome position to determine sperm polarity after angle correction, and morphology categories for training the classification network.

The Human Sperm Head Morphology dataset (HuSHem) consists of 216 RGB images, including 54 images of normal sperm heads, 57 images of pyriform sperm heads, 52 images of amorphous sperm heads, and 53 images of tapered sperm heads [5]. Most images are sized 131 × 131, although some have slight deficiencies. Experienced male fertility specialists annotated the sperm head contours and the vertex (Figure 3). The dataset was split into training and testing sets at an 8:2 ratio. Table 1 shows the images used in the test set, while the remaining images are used to construct the training set.

Chenwy Sperm-Dataset contains 320 RGB images with a resolution of 1280 × 1024 [26]. The authors split the data into training and testing sets at an 8:2 ratio. Each image contains 5–6 complete sperm cells, with annotations for the contours of the sperm head, midpiece, and tail. The head is further divided into the acrosome, nucleus, and vacuole. However, this dataset does not provide sperm morphology categories, so it is only used to evaluate the model’s segmentation performance. Based on the annotations, we extracted the sperm heads and used only the head contour and acrosome position annotations, resulting in a total of 1314 images resized to 201 × 201 pixels.

For preprocessing, all images in the HuSHem dataset were first resized to 131 × 131 using reflection padding and then upsampled to 201 × 201 to match the dimensions of the Chenwy dataset. Data augmentation techniques, including rotation, translation, brightness, and color jittering, were applied to expand the training data of both the HuSHem and Chenwy Sperm-Datasets, increasing the number of images from 8450 to 26,280.

In addition, the training set was split at an 8:2 ratio for five-fold cross-validation. In each fold, the sub-training sets consisted of augmented images, while the validation sets contained only original images. To prevent data leakage, original and augmented images of the same sperm head were ensured not to appear in both the sub-training and validation sets within the same fold.

2.2. Model Construction

In the following section, we introduce the Sperm Head Pose Correction and Classification architecture, as shown in Figure 4. We begin with a thorough overview of the architecture, followed by a detailed explanation of its key components, including sperm feature extraction and segmentation, the Sperm Head Pose Correction Network, and the Sperm Head Classification Network.

2.2.1. Sperm Feature Extraction and Segmentation

EdgeSAM achieves results similar to the original SAM while utilizing only 1.5% of the trainable parameters, significantly lowering the computational demands for both training and inference. As a result, we use EdgeSAM for feature extraction and segmentation. However, the pre-trained EdgeSAM mistakenly considers the sperm tail as part of the head, which leads to inaccurate segmentation results. To address this issue and enable mini-batch training on a single 24 GB GPU, we freeze the weights in the prompt encoder, re-train RepViT, and fine-tune the self-attention parameters Q, K, and V in the mask decoder using Low-Rank Adaptation (LoRA) technology [27].

Clinical analysts often provide rough annotations by marking any point within the sperm head when collecting images, leading to deviations between the center points of the captured image and the sperm head. Therefore, during training, our prompt encoder utilizes a single random point within the sperm head as input, while during inference, it employs the center point of the image. This approach allows the network to achieve accurate segmentation with limited prompts.

2.2.2. Sperm Head Pose Correction Network

The network includes four main modules: Detection Network (DNet), Direction Estimation Module (DEM), Polarity Network (PNet), and Rotated Region of Interest Align (RROI align) module. The first three modules predict the position, size, and angle of the sperm head. These predictions are then input into the RROI align module for feature sampling at specified locations, thereby enabling the standardization of sperm head pose.

The input of DNet is the confidence map from the mask decoder, which has been downsampled. DNet includes two 3 × 3 convolutional layers, followed by two pooling layers and two fully connected layers. The last fully connected layer is used to regress the center point (x_c, y_c) and length s of the square bounding box. The benefit of the square bounding box is that it eliminates concerns about aspect ratio changes affecting features when standardizing image scale, and it helps the network learn the spatial differences between pyriform, tapered, and normal sperm.

In image angle prediction, the periodicity of angles (e.g., the square bounding boxes corresponding to 0 and 360 degrees are exactly the same) causes the loss function to be unable to effectively calculate the true gradients, which makes the network training difficult to converge or unstable. In our work, we decompose the sperm’s direction into angle and polarity. The angle between the sperm’s long axis and the x-axis is calculated in DEM using the image moments and the arctangent function, with a value range of [−90, 90). The equation is shown as follows:

μ_{20} = \sum_{x = 0}^{W - 1} \sum_{y = 0}^{H - 1} (x - \bar{x}) I (x, y)

μ_{20} = \sum_{x = 0}^{W - 1} \sum_{y = 0}^{H - 1} (y - \bar{y}) I (x, y)

μ_{11} = \sum_{x = 0}^{W - 1} \sum_{y = 0}^{H - 1} (x - \bar{x}) (y - \bar{y}) I (x, y)

θ = \frac{1}{2} {t a n}^{- 1} (\frac{{2 μ}_{11}}{μ_{20} - μ_{02}})

where

\bar{x}

and

\bar{y}

represent the centroid of the image,

μ_{20}

,

μ_{20}

,

a n d μ_{11}

represent the second central moments and the second mixed moments.

Polarity refers to whether the sperm’s acrosome is positioned on the left or right of the image’s vertical centerline when the angle is zero degrees. Here, we downsample the input image and use RROI align to sample the sperm head based on the predicted bounding box and angle, then feed it into PNet for polarity prediction. The hyperparameters in PNet are the same as in DNet. Once all pose parameters are obtained, RROI Align extracts the region containing the sperm head features from the feature map produced by EdgeSAM’s RepViT encoder. The extracted features have a shape of 256 × 64 × 64. If the output from PNet is labeled as L (with a value of 0), it indicates that the sperm head is oriented to the left. In this case, the extracted features are flipped both vertically and horizontally to correct the orientation. If the output from PNet is not L, the extracted features are retained without any flipping.

Notably, EdgeSAM’s features are not used for training DNet and PNet, but they are used only for sampling. Using confidence maps and downsampled images significantly reduces computational overhead; for example, PNet’s complexity is only 0.082 GFLOPS, whereas utilizing feature maps would push the complexity of conv1 alone to 0.6 GFLOPS. Additionally, PNet’s input is downsampled from the original image, and DNet can be trained using ground truth to simulate segmentation results. Since these networks are independent of EdgeSAM’s features, they allow for separate training.

2.2.3. Sperm Head Classification Network

As mentioned earlier, the symmetry of pyriform and amorphous sperm heads is a useful characteristic for describing their morphology. To make full use of this property, we adapted the flip feature fusion module, originally developed for lane feature extraction [28], and applied it to the task of sperm head morphology classification. As illustrated in Figure 3, the feature maps are combined with their horizontally and vertically flipped versions; these feature maps are processed separately through convolutional and normalization layers, then fused by element-wise addition and activated using the ReLU function. We expect this will enable the model to learn the inherent symmetry of the sperm head.

To address slight misalignments, such as rotational or positional shifts during sperm head orientation correction, we apply deformable convolutions with a 3 × 3 kernel. These convolutions adjust the flipped features by learning offsets from the original feature map, ensuring proper alignment and improving the accuracy of sperm morphology classification. The fused features are then passed into a classification network composed of residual modules and fully connected layers to classify sperm head morphology. All convolutional layers within the flip feature fusion module have output dimensions of 256. The residual modules used in the classification part are taken from layers 3 and 4 of ResNet, while the fully connected layers have a hidden layer with 1024 neurons.

2.2.4. Three-Phase Training Strategy

We employ a three-phase training strategy. In the first phase, we train the EdgeSAM and the Sperm Head Pose Correction Network independently. The true bounding boxes and polarity labels are derived directly from the annotations rather than relying on segmentation outputs from EdgeSAM. After completing the initial training, the weights of both networks are frozen, and we proceed to train the trainable layers of the Sperm Head Classifier. In the final phase, we load the weights from the previous two steps and perform end-to-end training of all components.

This approach is designed to prevent potential errors in RRoI alignment and incorrect region extraction that might result from inaccurate segmentation, ensuring the stability and accuracy of the subsequent network training. For EdgeSAM, we apply both binary cross-entropy and Dice loss for optimization. DNet is optimized using L1 smooth loss, while PNet is optimized with binary cross-entropy loss and the head classifier with multi-class cross-entropy.

3. Implementation Detail

In the experiment, EdgeSAM, Sperm Head Pose Correction Network, and Sperm Head Classifier use the AdamW optimizer, with a batch size of 8 and an initial learning rate of 1 × 10⁻⁴. The models are trained for 30 epochs, and the learning rate is reduced by a factor of 10 every 10 epochs. The PyTorch version is 2.1.0 with CUDA 12.1. All models were trained sequentially on a single RTX 4090 GPU, with the CPU being an Intel^® Xeon^® Gold 6133 (Santa Clara, CA, USA).

A fixed random seed was used to ensure reproducibility throughout the process, including both CPU and GPU computations. Additionally, the cuDNN backend was configured for deterministic operations, and automatic benchmarking was disabled to prevent variations in the selection of operations. These measures were taken to minimize variability during training and evaluation, ensuring consistent results across different runs.

4. Evaluation Metric

EdgeSAM’s segmentation results are evaluated using the Dice coefficient to measure overlap between the predicted segmentation and ground truth, while the Hausdorff distance assesses boundary alignment by calculating the maximum deviation between predicted and actual boundaries.

D i c e = \frac{2 | S \cap G |}{|S| + | G |}

H (G, S) = m a x \{{s u p}_{x ∊ G} {i n f}_{y ∊ S} d (x, y), {s u p}_{x ∊ S} {i n f}_{y ∊ G} d (x, y)\}

where

S

represents the result of the segmentation,

G

represents the corresponding segmentation ground truth, and

|*|

represents the number of pixels,

d (x, y)

denotes the Euclidean distance.

The Intersection over Union (IoU) metric is used to evaluate the deviation between the predicted bounding box (bbox) from DNet and the ground truth bbox. The average of the three metrics (including IoU, Dice coefficient, and Hausdorff distance) is calculated to assess the performance of each model on individual folds of the validation set and the overall test set.

I o U = \frac{| B_{p r e d} \cap B_{t r u e} |}{| B_{p r e d} \cup B_{t r u e} |}

For evaluating sperm head polarity and morphology classification, accuracy and the confusion matrix are employed, with macro F1-score being additionally used for morphology classification.

M a r c o F 1 = \frac{1}{N} \sum_{i = 1}^{N} \frac{2 \times {P r e c i s i o n}_{i} \times {R e c a l l}_{i}}{{P r e c i s i o n}_{i} + {R e c a l l}_{i}}

For the validation set, the metrics are averaged across all folds, and their standard deviation is calculated to provide a comprehensive evaluation of model performance.

5. Results

Table 2 and Figure 5 compare existing sperm head classification methods across metrics. Our method shows slightly lower performance than SHMC-Net cross-validation results, achieving 97.6% accuracy in cross-validation compared to SHMC-Net’s 98.2%. The inclusion of a separate test dataset reduces the number of images available for training and cross-validation, increasing the impact of individual samples on overall accuracy and F1 scores. For instance, in the test set, each sample contributes 2.5% to the overall accuracy (the test set contains 40 images). Despite this, the small standard deviation of our results demonstrates that our method is notably more stable across different folds compared to SHMC-Net, highlighting the consistency of our approach. The macro-specificity of our method on the test set is 99.1%.

Furthermore, most studies incorporate pose correction, except for one that reports the lowest accuracy, underscoring the critical role of pose correction in achieving reliable results. While some methods rely on manual correction, this approach is labor-intensive, time-consuming, and impractical for clinical applications. Automated strategies, such as those proposed by Liu et al. and Sapkota et al., involve complex workflows or additional computational overhead. By contrast, our method integrates segmentation, pose correction, and classification in an end-to-end manner, effectively reducing complexity and cumulative errors while enhancing both efficiency and accuracy.

6. Discussion

6.1. Segmentation Result

Table 3 presents a performance comparison between the two models. For the Chenwy dataset, the Dice coefficient improves from 0.905 to 0.975 during validation and from 0.906 to 0.974 in the test set, while the Hausdorff Distance reduces from 8.19 to 2.785 in validation and from 7.962 to 2.011 in the test set, highlighting significant gains in segmentation accuracy and boundary precision after retraining. Similarly, on the HuSHem dataset, the retrained model shows consistent improvements, with the Dice coefficient rising from 0.924 to 0.975 during validation and from 0.925 to 0.973 in the test set. The Hausdorff Distance also decreases from 3.615 to 3.145 in validation and from 3.867 to 2.615 in the test set. These results demonstrate that task-specific retraining effectively enhances model generalization and performance.

Figure 6 presents the performance statistics across original and retrained models using box plots, where the median is represented by the central red line, the 25th and 75th percentiles are the bottom and top of each box, and outliers are shown as red points. It can be concluded that the retrained model consistently achieves higher Dice coefficients and lower Hausdorff Distances on both two datasets, with tighter distributions and fewer outliers, indicating more accurate and reliable segmentation results.

In Figure 7, the image showcases the segmentation results of the retrained model (red) and the original model (yellow). In both datasets, the retrained model demonstrates significantly better segmentation accuracy, with the red contours closely following the sperm head boundaries, capturing even subtle and irregular shapes. In contrast, the original model’s yellow contours exhibit less precision, particularly around edges and complex shapes, where deviations from the true boundaries are more pronounced.

6.2. Pose Correction Result

Table 4 presents the performance of the sperm head pose correction network on two datasets, Chenwy and HuSHem, evaluated on both validation (5-fold cross-validation) and test tasks. The IoU scores of 0.950 and 0.916 in validation for Chenwy and HuSHem, respectively, and 0.946 and 0.887 in the test phase, suggest that DNet is performing well in localizing the sperm heads. The standard deviation values in the table and the boxplot in Figure 8 both provide insights into the stability of the model’s performance across datasets. PNet achieves perfect polarity prediction accuracy (100%) during validation on both datasets and maintains 100% accuracy on Chenwy, with a slight drop to 97.5% on HuSHem, and the confusion matrix shows only one mistake out of 40 predictions. These results demonstrate that the network effectively performs polarity prediction with accurate RoI extraction and reliable generalization across both datasets.

Figure 9 illustrates the results of the sperm head pose correction network across two datasets. Although the pose correction network primarily extracts features from the original network, the original images are used here to demonstrate its function better. In both datasets, the network effectively normalizes the orientation of sperm heads, aligning them consistently despite variations in the original images. Furthermore, this correction is especially useful for removing the effects of background noise and the presence of other sperm, which may otherwise interfere with accurate classification.

6.3. Classification Results on Augmented Test Data

We applied 20 transformations using rotation and displacement of each image in the test dataset to evaluate the accuracy and robustness of the proposed method. In addition, the performance of 24 CNN models was evaluated on both the cross-validation and test sets (Table 5). The top five models were then selected to classify the augmented test data for comparison with the proposed method.

Table 6 and Figure 10 provide a comparative analysis of performance metrics, specifically accuracy, precision, recall, F1 score, and confusion matrix, for five CNN models (VGG13, SqueezeNet1_1, ResNet34, ShuffleNet_V2_x1_0, MobileNet_V3_small) as well as our proposed method. Our method demonstrates superior performance across all metrics, achieving an accuracy of 95.8%, precision of 95.9%, recall of 96.1%, and F1 score of 95.7%. These results represent a substantial improvement over the next-highest performing model, ShuffleNet_V2_x1_0, indicating that the proposed method offers enhanced classification accuracy and robustness. The higher precision and recall values also suggest a strong capacity to minimize both false positives and false negatives. Figure 11 shows representative misclassified images.

7. Limitations and Future Work

EdgeSAM operates with an input size of 1024 × 1024, resulting in considerable computational requirements. While the use of RepViT facilitates re-parameterization during inference to mitigate computational overhead, the overall model complexity remains high. Furthermore, the heterogeneity in imaging equipment, such as variations in microscopes, cameras, and staining protocols across different hospitals, leads to inconsistencies in the acquired sperm images. These discrepancies present challenges for achieving consistent image standardization and may impact the generalizability of the model across diverse clinical settings.

In the future, we plan to reduce the model’s parameter size and input image resolution to decrease computational complexity, thereby enabling deployment on resource-constrained devices. Additionally, we will investigate approaches for standardizing the color style of sperm images or refining the training methodology to improve the robustness and generalization capability of the classification model.

8. Conclusions

In this study, we developed an advanced deep-learning framework for automated sperm head classification, integrating efficient segmentation, pose correction, and morphological classification. By leveraging EdgeSAM for precise segmentation, we achieved high accuracy with minimal computational requirements. The inclusion of a Sperm Head Pose Correction Network standardized the orientation and position of sperm heads, enhancing the consistency and reliability of downstream classification tasks. Furthermore, the use of a flip feature fusion module improved the model’s ability to capture the inherent symmetries in abnormal sperm morphology, ultimately increasing classification accuracy. This robust, computationally efficient approach provides a clinically viable solution for automated sperm morphology analysis.

Author Contributions

Conceptualization, Y.G.; Methodology, Y.G.; Software, K.H.; Validation, Y.G. and T.L.; Investigation, Y.G. and W.Z.; Resources, Y.G. and J.L.; Writing—original draft, Y.G.; Writing—review & editing, Y.G., J.L., K.H., B.W., Y.L. and L.W.; Project administration, Y.G., J.L. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by The High-tech industrialization special project in cooperation between Jilin Province and the Chinese Academy of Sciences under Grant 2024SYHZ0045 for the project “AI-Based Automated Analysis System for Human Chromosomal Structural Abnormalities”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Acknowledgments

We thank Peng Song and Miao Zhang for their support in software development and data processing. Their expertise and dedication were instrumental in enhancing the quality and efficiency of our work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Agarwal, A.; Mulgund, A.; Hamada, A.; Chyatte, M.R. A Unique View on Male Infertility around the Globe. Reprod. Biol. Endocrinol. 2015, 13, 37. [Google Scholar] [CrossRef] [PubMed]
Hamada, M.Z.; Saadeldin, I.M. Physiology of the Reproductive System. In Recent Advances in Biotechnology; Saadeldin, I.M., Ed.; Bentham Science Publishers: Sharjah, United Arab Emirates, 2022; Volume 5, pp. 1–59. ISBN 978-981-5051-66-7. [Google Scholar]
World Health Organization (WHO). Laboratory Manual for the Examination and Processing of Human Semen, 6th ed.; World Health Organization: Geneva, Switzerland, 2021.
Chang, V.; Heutte, L.; Petitjean, C.; Härtel, S.; Hitschfeld, N. Automatic Classification of Human Sperm Head Morphology. Comput. Biol. Med. 2017, 84, 205–216. [Google Scholar] [CrossRef] [PubMed]
Shaker, F.; Monadjemi, S.A.; Alirezaie, J.; Naghsh-Nilchi, A.R. A Dictionary Learning Approach for Human Sperm Heads Classification. Comput. Biol. Med. 2017, 91, 181–190. [Google Scholar] [CrossRef] [PubMed]
Kruger, T.F.; Menkveld, R.; Stander, F.S.H.; Lombard, C.J.; Van Der Merwe, J.P.; Van Zyl, J.A.; Smith, K. Sperm Morphologic Features as a Prognostic Factor in in Vitro Fertilization. Fertil. Steril. 1986, 46, 1118–1123. [Google Scholar] [CrossRef]
Kohn, T.P.; Kohn, J.R.; Ramasamy, R. Effect of Sperm Morphology on Pregnancy Success via Intrauterine Insemination: A Systematic Review and Meta-Analysis. J. Urol. 2018, 199, 812–822. [Google Scholar] [CrossRef]
Wang, Y.; Yang, J.; Jia, Y.; Xiong, C.; Meng, T.; Guan, H.; Xia, W.; Ding, M.; Yuchi, M. Variability in the Morphologic Assessment of Human Sperm: Use of the Strict Criteria Recommended by the World Health Organization in 2010. Fertil. Steril. 2014, 101, 945–949. [Google Scholar] [CrossRef]
Beletti, M.E.; Costa, L.D.F.; Viana, M.P. A Comparison of Morphometric Characteristics of Sperm from Fertile Bos Taurus and Bos Indicus Bulls in Brazil. Anim. Reprod. Sci. 2005, 85, 105–116. [Google Scholar] [CrossRef]
Chang, V.; Garcia, A.; Hitschfeld, N.; Härtel, S. Gold-Standard for Computer-Assisted Morphological Sperm Analysis. Comput. Biol. Med. 2017, 83, 143–150. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Riordon, J.; McCallum, C.; Sinton, D. Deep Learning for the Classification of Human Sperm. Comput. Biol. Med. 2019, 111, 103342. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv 2015, arXiv:1512.00567. [Google Scholar]
Ilhan, H.O.; Sigirci, I.O.; Serbes, G.; Aydin, N. A Fully Automated Hybrid Human Sperm Detection and Classification System Based on Mobile-Net and the Performance Comparison with Conventional Methods. Med. Biol. Eng. Comput. 2020, 58, 1047–1068. [Google Scholar] [CrossRef] [PubMed]
Yüzkat, M.; Ilhan, H.O.; Aydin, N. Multi-Model CNN Fusion for Sperm Morphology Analysis. Comput. Biol. Med. 2021, 137, 104790. [Google Scholar] [CrossRef] [PubMed]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: San Francisco, CA, USA, 2017; Volume 30. [Google Scholar]
Jabbari, H.; Bigdeli, N. New Conditional Generative Adversarial Capsule Network for Imbalanced Classification of Human Sperm Head Images. Neural. Comput. Applic. 2023, 35, 19919–19934. [Google Scholar] [CrossRef]
Sapkota, N.; Zhang, Y.; Li, S.; Liang, P.; Zhao, Z.; Zhang, J.; Zha, X.; Zhou, Y.; Cao, Y.; Chen, D.Z. Shmc-Net: A Mask-Guided Feature Fusion Network for Sperm Head Morphology Classification. In Proceedings of the 2024 IEEE International Symposium on Biomedical Imaging (ISBI), Athens, Greece, 27–30 May 2024; IEEE: New York, NY, USA, 2024; pp. 1–5. [Google Scholar]
Spencer, L.; Fernando, J.; Akbaridoust, F.; Ackermann, K.; Nosrati, R. Ensembled Deep Learning for the Classification of Human Sperm Head Morphology. Adv. Intell. Syst. 2022, 4, 2200111. [Google Scholar] [CrossRef]
Mahali, M.I.; Leu, J.-S.; Darmawan, J.T.; Avian, C.; Bachroin, N.; Prakosa, S.W.; Faisal, M.; Putro, N.A.S. A Dual Architecture Fusion and AutoEncoder for Automatic Morphological Classification of Human Sperm. Sensors 2023, 23, 6613. [Google Scholar] [CrossRef]
Azulay, A.; Weiss, Y. Why Do Deep Convolutional Networks Generalize so Poorly to Small Image Transformations? arXiv 2019, arXiv:1805.12177. [Google Scholar]
Zhang, R. Making Convolutional Networks Shift-Invariant Again. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; PMLR: Birmingham, UK, 2019; Volume 97, pp. 7324–7334. [Google Scholar]
Liu, R.; Wang, M.; Wang, M.; Yin, J.; Yuan, Y.; Liu, J. Automatic Microscopy Analysis with Transfer Learning for Classification of Human Sperm. Appl. Sci. 2021, 11, 5369. [Google Scholar] [CrossRef]
Zhou, C.; Li, X.; Loy, C.C.; Dai, B. EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM. arXiv 2024, arXiv:2312.06660. [Google Scholar]
Huang, J.; Sivakumar, V.; Mnatsakanyan, M.; Pang, G. Improving Rotated Text Detection with Rotation Region Proposal Networks. arXiv 2018, arXiv:1811.07031. [Google Scholar]
Chen, W.; Song, H.; Dai, C.; Jiang, A.; Shan, G.; Liu, H.; Zhou, Y.; Abdalla, K.; Dhanani, S.N.; Fatemeh Moosavi, K.; et al. Automated Sperm Morphology Analysis Based on Instance-Aware Part Segmentation. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: New York, NY, USA, 2024; pp. 17743–17749. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
Feng, Z.; Guo, S.; Tan, X.; Xu, K.; Wang, M.; Ma, L. Rethinking Efficient Lane Detection via Curve Modeling. arXiv 2022, arXiv:2203.02431. [Google Scholar]
Iqbal, I.; Mustafa, G.; Ma, J. Deep Learning-Based Morphological Classification of Human Sperm Heads. Diagnostics 2020, 10, 325. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The typical abnormal sperm head morphology [5].

Figure 2. The symmetry of normal, pyriform, and amorphous sperm heads.

Figure 3. Original images with annotations and their augmented versions from the Chenwy (top row) and HuSHem (bottom row) datasets.

Figure 4. Illustration of Sperm Head Pose Correction and Morphology Classification Model.

Figure 5. Performance comparison between existing sperm head classification methods and our proposed method.

Figure 6. Statistics for Dice and Hausdorff distance on the test results obtained from the original and retrained EdgeSAM.

Figure 7. A sample of typical results for the original and retrained EdgeSAM is shown, with yellow contours for ground truth and red for segmentation, where the left side shows results from the retrained EdgeSAM, and the right side shows results from the original EdgeSAM.

Figure 8. Statistics for Sperm Head Pose Correction Network results.

Figure 9. A sample of typical results for the Sperm Head Pose Correction Network is shown, where the left side shows the original sperm head and the right side shows the corrected sperm head.

Figure 10. Confusion matrices of top five CNNs and our method.

Figure 11. Typical misclassified images; the image source IDs are Amorphous_004 and Amorphousimage_011. The image on the left and its augmented image are misclassified as normal or pyriform, while the image on the right and its augmented image are misclassified only as pyriform.

Table 1. The image numbers in the HuSHem test set for different morphologies.

Amorphous	1	4	6	8	11	15	18	24	35	42
Normal	5	8	11	15	26	28	37	39	48	50
Pyriform	5	15	21	24	25	35	39	46	48	57
Tapered	6	14	23	27	32	37	41	48	50	52

Table 2. Comparison of sperm head classification results for existing and our method.

Method	PC	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
Shaker et al., 2017 [5]	M	92.2	93.5	92.3	92.9
Iqbal et al., 2020 [29]	M	95.7	96.1	95.5	95.5
Yüzkat et al., 2021 [15]	-	85.2	85.2	85.3	89
Riordon et al., 2019 [12]	M	94.0	94.7	94.1	94.1
Liu et al., 2021 [23]	A	96.4	96.4	96.1	96.0
Sapkota et al., 2024 [18]	A	98.2 ± 0.3	98.3 ± 0.3	98.1 ± 0.3	98.2 ± 0.3
Ours (validation)	A	97.6 ± 0.01	97.9 ± 0.01	97.6 ± 0.01	97.2 ± 0.01
Ours	A	97.5	97.7	97.5	97.5

PC: Pose correction approach, M: Manual approach, A: Automatic approach, -: No Correction.

Table 3. Mean and standard deviation values for different metrics and segmentation methods.

Task	Dataset	Module	Dice Coefficient		Hausdorff Distance
Task	Dataset	Module	Mean	Std	Mean	Std
Validation (5 fold)	Chenwy	Original	0.905	0.005	8.19	0.424
	Chenwy	Retrained	0.975	0.002	2.785	0.086
	HuSHem	Original	0.924	0.003	3.615	0.335
	HuSHem	Retrained	0.975	0.002	3.145	0.332
Test	Chenwy	Original	0.906	0.054	7.962	6.208
	Chenwy	Retrained	0.974	0.011	2.011	1.594
	HuSHem	Original	0.925	0.023	3.867	2.832
	HuSHem	Retrained	0.973	0.021	2.615	1.096

Table 4. Mean values and standard deviations for the metrics of sperm head detection and polarity classification.

Task	Dataset	IoU		Accuracy
Task	Dataset	Mean	Std	Mean	Std
Validation (5 fold)	Chenwy	0.950	0.001	100	0
Validation (5 fold)	HuSHem	0.916	0.009	100	0
Test	Chenwy	0.946	-	100	-
Test	HuSHem	0.887	-	97.5	-

Table 5. Comparison of validation and test results for CNNs and our method.

Method	Cross Validation				Test
	Accuracy (%)		Macro—F1 (%)		Accuracy (%)	Macro—F1 (%)
	Mean	Std	Mean	Std	Accuracy (%)	Macro—F1 (%)
VGG11	91.7	0.029	91.4	0.03	80.0	79.7
VGG13	91.1	0.043	91.1	0.041	92.5	92.6
VGG16	92.3	0.064	92.1	0.066	82.5	81.8
VGG19	88.8	0.034	88.4	0.036	80.0	79.7
ResNet18	92.3	0.024	92.4	0.024	85.0	84.8
ResNet34	90.6	0.015	90.5	0.015	90.0	89.6
ResNet50	89.3	0.042	89.4	0.041	82.5	81.9
ResNet101	92.9	0.031	93.1	0.03	87.5	87.4
ResNet152	91.7	0.034	92.0	0.034	82.5	82.3
SqueezeNet1_0	94.7	0.035	94.5	0.037	87.5	87.2
SqueezeNet1_1	90.9	0.023	90.3	0.023	90.0	90.1
DenseNet121	92.9	0.023	92.8	0.024	82.5	82.6
DenseNet161	91.7	0.012	91.6	0.012	85.0	84.3
DenseNet169	91.7	0.054	91.6	0.054	85.0	85.1
DenseNet201	92.9	0.015	92.9	0.014	87.5	87.5
ShuffleNet_v2_x0_5	86.4	0.035	86.4	0.033	85.0	84.8
ShuffleNet_v2_x1_0	91.7	0.047	91.6	0.047	90.0	90.0
MobileNet_v2	90.6	0.06	90.5	0.061	87.5	87.6
MobileNet_v3_small	88.2	0.041	88.3	0.04	90.0	90.0
MobileNet_v3_large	88.8	0.039	88.6	0.04	80.0	79.6
ResNeXt50_32x4d	94.1	0.026	94.1	0.026	80.0	79.5
ResNeXt101_32x8d	91.1	0.033	91.1	0.034	82.5	82.6
WideResNet50_2	92.9	0.015	92.9	0.014	87.5	87.2
WideResNet101_2	89.3	0.025	89.3	0.025	82.5	82.3
Ours	97.6	0.012	97.2	0.011	97.5	97.5

Table 6. Comparison of top five CNNs with our method on the augmented test data.

Method	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
VGG13	91.0	91.8	91.0	91.0
SqueezeNet1_1	87.6	88.3	87.6	87.6
ResNet34	89.3	90.2	89.3	89.0
ShuffleNet_V2_x1_0	92.1	92.9	92.1	92.1
MobileNet_V3_small	91.4	91.9	91.4	91.3
Ours	95.8	96.2	95.8	95.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, Y.; Li, J.; Hong, K.; Wang, B.; Zhu, W.; Li, Y.; Lv, T.; Wang, L. Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification. Appl. Sci. 2024, 14, 11303. https://doi.org/10.3390/app142311303

AMA Style

Guo Y, Li J, Hong K, Wang B, Zhu W, Li Y, Lv T, Wang L. Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification. Applied Sciences. 2024; 14(23):11303. https://doi.org/10.3390/app142311303

Chicago/Turabian Style

Guo, Yunbo, Junbo Li, Kaicheng Hong, Bilin Wang, Wenliang Zhu, Yuefeng Li, Tiantian Lv, and Lirong Wang. 2024. "Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification" Applied Sciences 14, no. 23: 11303. https://doi.org/10.3390/app142311303

APA Style

Guo, Y., Li, J., Hong, K., Wang, B., Zhu, W., Li, Y., Lv, T., & Wang, L. (2024). Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification. Applied Sciences, 14(23), 11303. https://doi.org/10.3390/app142311303

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Deep Learning Model for Sperm Head Segmentation, Pose Correction, and Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Collection and Preprocessing

2.2. Model Construction

2.2.1. Sperm Feature Extraction and Segmentation

2.2.2. Sperm Head Pose Correction Network

2.2.3. Sperm Head Classification Network

2.2.4. Three-Phase Training Strategy

3. Implementation Detail

4. Evaluation Metric

5. Results

6. Discussion

6.1. Segmentation Result

6.2. Pose Correction Result

6.3. Classification Results on Augmented Test Data

7. Limitations and Future Work

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI