You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

31 October 2024

Deep Learning Approaches for the Assessment of Germinal Matrix Hemorrhage Using Neonatal Head Ultrasound

,
,
,
,
,
,
,
and
1
Departments of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 31451, Saudi Arabia
2
College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia
3
Department of Radiology, King Fahad University Hospital, Khobar 34445, Saudi Arabia
*
Author to whom correspondence should be addressed.
This article belongs to the Section Biomedical Sensors

Abstract

Germinal matrix hemorrhage (GMH) is a critical condition affecting premature infants, commonly diagnosed through cranial ultrasound imaging. This study presents an advanced deep learning approach for automated GMH grading using the YOLOv8 model. By analyzing a dataset of 586 infants, we classified ultrasound images into five distinct categories: Normal, Grade 1, Grade 2, Grade 3, and Grade 4. Utilizing transfer learning and data augmentation techniques, the YOLOv8 model achieved exceptional performance, with a mean average precision (mAP50) of 0.979 and a mAP50-95 of 0.724. These results indicate that the YOLOv8 model can significantly enhance the accuracy and efficiency of GMH diagnosis, providing a valuable tool to support radiologists in clinical settings.

1. Introduction

Germinal matrix hemorrhage (GMH) is a significant cause of morbidity in premature neonates, especially those with very low birth weight or gestational age less than 32 weeks [1,2]. This condition, characterized by bleeding within the fragile and vascularized germinal matrix, can progress to intraventricular hemorrhage (IVH), leading to severe neurological impairments such as cerebral palsy and cognitive disabilities [3,4]. Therefore, early detection and accurate grading of GMH are crucial for effective clinical management and better neurodevelopmental outcomes [5].
The conventional method of diagnosing GMH involves cranial ultrasound imaging, which relies on the subjective interpretation of experienced radiologists. This method, however, is prone to variability and can be time-consuming. Advances in deep learning have opened promising avenues for automating and enhancing diagnostic accuracy in medical imaging, particularly in conditions like GMH [6,7]. By integrating artificial intelligence with imaging diagnostics, it is possible to reduce variability and expedite diagnosis. In this study, we aim to develop and evaluate deep learning models for the automatic detection and classification of GMH in cranial ultrasound images.
We utilized a dataset from King Fahad Hospital that includes 586 neonates who underwent cranial ultrasound exams. Given the limited availability of public datasets on neonatal ultrasounds, privacy concerns, and the unique challenges of acquiring data from multiple hospitals, we were restricted to a single-source dataset. Although this dataset size is relatively small, it offers high-quality images and detailed labels, which are essential for accurate model training. Additionally, GMH is a rare condition, making this dataset uniquely valuable for research and clinical applications. To overcome the limitations of dataset size, we applied augmentation techniques and evaluated the performance of several deep learning models, including ResNet-18, ResNet-50, ResNet-152, and YOLOv8, for GMH detection and classification.

3. Materials and Methods

3.1. System Framework

As shown in Figure 1, the system requires the user to input the necessary data, which includes labeled cranial ultrasound images and MaterNAI/Neonatal risk factors. The process begins with preprocessing the ultrasound images through steps like filtering, denoising, contrast enhancement, and region of interest (ROI) selection.
Figure 1. Classification training model.
The preprocessed images then enter the ResNet-18 model, where the image data are fused with the text input (MaterNAI and neonatal risk factors) at the late fusion layer. This fusion results in the combination of both visual and textual data, which is crucial for generating accurate predictions. During the training phase, 70% of the dataset is used for training the model. The preprocessed images and text inputs pass through the ResNet-18 model, and the results are saved in the trained classifier. For the testing phase, the remaining 30% of the dataset undergoes the same preprocessing steps and passes through the trained classifier. The system compares the testing results to the training data to calculate the model’s accuracy. Ultimately, this process will support the detection, grading, and standardization of diagnoses for GMH.

3.2. Study Population

The study population comprised neonates who underwent cranial ultrasound (US) at King Fahad Hospital of the University (KFUH) in Khobar, Kingdom of Saudi Arabia. Neonatal cerebral ultrasound images were collected for this study. The inclusion criteria encompassed neonates who had undergone cranial US at KFUH, with complete sets of left sagittal, right sagittal, and coronal cranial ultrasound images, and those with confirmed diagnoses of germinal matrix hemorrhage (GMH), as shown in Figure 2. Exclusion criteria included neonates without complete sets of the required cranial US images and patients with incomplete or missing medical records. The study initially considered a total of 582 neonates who underwent cranial US, among whom approximately 40 were identified to have GMH.
Figure 2. Sample of the dataset ((left): coronal US, (right): left US).

3.3. Data Collection

We reviewed a hospital database of 586 neonates who underwent cranial ultrasound examinations over five years, focusing on germinal matrix hemorrhage (GMH). Due to ethical considerations and the rarity of this condition, data availability was limited. Our inclusion criteria prioritized high-quality images from advanced ultrasound devices, excluding those from portable devices due to noise concerns. Although this decision reduced the dataset size, it ensured data reliability and consistency, crucial for training accurate deep learning models.
Despite the smaller dataset, our model benefits from the detailed and high-quality labels. Each neonate’s brain regions (sagittal left, sagittal right, and coronal views) were comprehensively covered, and bilateral GMH cases were classified separately to capture the full spectrum of severity. This focus on a rare and critical condition adds significant value to the dataset.
While larger datasets are generally preferred for deep learning, we addressed the limitations of our smaller dataset through advanced augmentation techniques, which artificially increased the diversity of the training data. This approach allows our model to generalize effectively across the different GMH grades, even for rare cases like Grade 4 hemorrhages. Additionally, our dataset’s focus on rare and severe conditions highlights its clinical significance, as early detection and accurate classification of GMH can have profound implications for patient care and treatment decisions.
In future iterations, we aim to explore advanced noise reduction and image enhancement techniques to incorporate images from portable devices and further expand the dataset, balancing data quantity with quality. However, the dataset we have curated remains a significant contribution to GMH research, given the rarity of such high-quality, labeled medical data in neonatal care.
The dataset was meticulously organized into patient-specific folders, each labeled with a unique patient ID and containing the corresponding ultrasound images. This systematic approach to data management facilitated efficient retrieval and analysis of the images.

3.4. Dataset Augmentation and Preprocessing

To optimize the performance of the YOLOv5 model for detecting germinal matrix hemorrhages (GMHs), we implemented several preprocessing and augmentation techniques aimed at enhancing data quality and increasing the dataset’s diversity. Given the relatively small size and imbalance of our dataset, these steps were crucial for improving model generalization and mitigating the risks of overfitting.
Initially, the dataset was highly imbalanced, as shown in Figure 3. G1 (Grade 1) contained the largest number of images at 60, followed by G2 (Grade 2) with 57, and the Normal class with 47 images. G3 (Grade 3) held 41 images, while G4 (Grade 4) was significantly underrepresented, with only 7 images. This imbalance posed challenges for training a deep learning model, as underrepresented classes like G4 could lead to poor model performance in those categories.
Figure 3. Dataset count by grade before preprocessing chart.
To mitigate this issue, we utilized a range of data augmentation techniques. These included horizontal and vertical translations (±10%), zooming in and out (±5%), rotations (±10°) as shown in Figure 4, and random horizontal flips. These augmentations simulated variations that could occur during real-world image acquisition, such as shifts in scan angles or changes in focus, making the model more robust in recognizing hemorrhagic regions under diverse conditions.
Figure 4. Dataset count by grade after preprocessing chart.
So, augmentation was performed to artificially expand the number of images in the underrepresented classes. After augmentation, the dataset saw a significant increase in images for each class, as illustrated in Figure 4. G2 now contains 513 images, G1 has 495, and the Normal class includes 235 images. G3 has increased to 342 images, while G4 has risen from 7 to 90 images.
While these augmentation techniques helped address the imbalance, we recognize that augmentation alone may not completely resolve the challenge posed by such limited examples in classes like G4. In future work, we aim to explore more advanced methods, such as class reweighting, synthetic data generation, or transfer learning from larger, similar datasets, to further enhance the model’s ability to generalize across all classes.
To address preprocessing, we applied cropping to automatically center and focus on the region of interest (ROI) in each image, as shown in Figure 5. This step removed irrelevant background and emphasized the critical areas where hemorrhages were most likely to appear. We followed this by resizing all images to a uniform size of 255 × 255 pixels to ensure consistency across the dataset, allowing the model to process the images efficiently while maintaining sufficient resolution for identifying fine details in the ultrasound scans.
Figure 5. Rotation by 10%.
Accurate annotation was performed using Roboflow, with bounding boxes manually drawn around hemorrhagic areas under the guidance and supervision of KFHU doctors. Different colors were assigned to represent each hemorrhage grade (g1, g2, g3, g4) and the Normal class, ensuring the model could effectively distinguish between them. This meticulous annotation process allowed the YOLOv8 model to focus on the most relevant areas, significantly enhancing its functionality. For instance, Figure 6 illustrates an orange box marking a Grade 3 hemorrhage, a red box for Grade 4, a purple box for Grade 1, a green box for Grade 2, and a yellow-green box for Normal class. Together, these preprocessing and augmentation techniques played a critical role in enhancing the quality and diversity of the dataset, which in turn improved the model’s performance in detecting and classifying GMH across various grades.
Figure 6. Cropped ROI.

3.5. Conducted Experiments

In this study, we conducted three experiments using ResNet-18, ResNet-50, and ResNet-152 models. Comparing these experiments revealed extensive differences in performance and suitability for GMH detection and classification.
First Experiment using Resnet-18: The first experiment utilized the Residual Network with 18 deep layers (ResNet-18), a well-known model widely used for classification and object detection. We loaded the pre-trained ResNet-18 model from the TorchVision library, which consisted of 20 convolutional layers with filter sizes ranging from 64 to 512, one max pooling layer, and three downsampling layers. We trained the model on the preprocessed dataset, randomly splitting the data into 80% for training and 20% for testing. The results from this model, illustrated in Table 1, showed promising accuracy in initial tests. However, when evaluated on the entire dataset, it failed to maintain consistent performance.
Table 1. Resnet-18 accuracy on each side.
Second Experiment using ResNet-50: The second experiment was conducted using the Residual Network with 50 deep layers (ResNet-50). We loaded the pre-trained ResNet-50 model from the TensorFlow library, which consists of 53 convolutional layers with filter sizes ranging from 64 to 2048, along with one max pooling layer. In this experiment, we divided the dataset into three segments based on the imaging sides: left, right, and coronal. Each subset comprised five classes: Grade 1, Grade 2, Grade 3, Grade 4, and Normal. Despite careful segmentation, the model struggled significantly with imbalanced data, particularly for underrepresented classes such as Grade 4, yielding deficient results, as demonstrated in Table 2.
Table 2. Resnet-50 evaluation on each side.
The following confusion matrices in Figure 7, Figure 8 and Figure 9 illustrate the classification performance of each model, providing insights into false positives and false negatives that may have significant clinical implications, as follows:
Figure 7. Confusion matrix for the right side for Resnet-50.
Figure 8. Confusion matrix for the left side for Resnet-50.
Figure 9. Confusion matrix for the coronal side for Resnet-50.
Third Experiment using Resnet-152: The third experiment was conducted using the Residual Network with 152 deep layers (ResNet-152), designed primarily for recognition and classification tasks. We loaded the pre-trained ResNet-152 model from the TensorFlow libraries. This model comprises 151 convolutional layers (including the initial layers) with filter sizes ranging from 64 to 2048, one max pooling layer, and two dropout layers (the first with a rate of 0.5 after the dense layer with 1024 units, and the second with a rate of 0.3 after the second dense layer with 512 units). Following the same approach as the second experiment, we trained this model on the provided dataset, which was randomly split into 80% for training and 20% for testing. Although this model showed improved performance on coronal images, it performed inadequately for the left and right sides, as illustrated in Table 3.
Table 3. Resnet-150 evaluation on each side.
The following confusion matrices in Figure 10, Figure 11 and Figure 12 illustrate the classification performance of each model, providing insights into false positives and false negatives that may have significant clinical implications, as follows:
Figure 10. Confusion matrix for the right side for Resnet-152.
Figure 11. Confusion matrix for the left side for Resnet-152.
Figure 12. Confusion matrix for the coronal side for Resnet-152.
The following Table 4 summarizes the accuracy results for each model (ResNet-18, ResNet-50, and ResNet-152) across the three imaging segments: left, right, and coronal views. These results highlight the differences in the performance of the models based on the side from which the cranial ultrasound images were taken. As shown, ResNet-18 consistently outperformed the other models, particularly in the left view, while ResNet-50 and ResNet-152 demonstrated significant challenges, especially in the right view.
Table 4. Model accuracy on each side.

3.6. Model Training

After noticing no positive changes in the previous experiments, we decided to change the model as well as our approach; this time we used the YOLOv8 (You Only Look Once) model by utilizing the ultralytics library [31]. Training this model involves several primary steps to achieve optimal performance. As shown in Figure 13, the data are organized into five classes: Grade 1, Grade 2, Grade 3, Grade 4, and Normal. By benefiting from transfer learning, the pretrained weights were imported in order to adapt the model, especially for this task.
Figure 13. YOLOv8 training with Albumentations library chart.
The YOLOv8 model relies heavily on annotation, which establishes the foundation for successfully training the algorithm to identify and categorize objects within images. This method was made possible using third-party applications like Roboflow, which made it possible to efficiently label a number of classes, including g1, g2, g3, g4, and Normal. This method improves the model’s ability to detect various abnormalities in medical images, especially when it comes to detecting germinal matrix hemorrhages. By means of careful annotation, the model receives the necessary labeled data to identify and extrapolate patterns, which allows for accurate hemorrhagic region detection and categorization. By drawing boxes above the hemorrhagic white areas, the annotation method makes sure the model concentrates on important information, which improves inference accuracy. Therefore, this thorough annotation approach significantly improves the YOLOv8 model’s functionality. As shown in Figure 14, there is the purple box for Grade 1, green box for Grade 2, orange box for Grade 3, red for Grade 4, and yellow-green for Normal.
Figure 14. Annotated images.
Figure 15 shows an analysis of the bounding box annotations in the GMH classification with observations of the label analysis, as follows:
Figure 15. An analysis of the bounding box annotations.
  • Grade 2 has the most instances.
  • Grade 4 has the least instances.
  • The annotations are concentrated in the central area of the images, indicating a specific region of interest.
  • There is a positive correlation between bounding box width and height.
The YOLOv8 model architecture, which consists of 225 layers and about 11.1 million parameters, was adjusted to recognize and comprehend these five classes. The training of the model was running on 100 epochs along with eight data loader workers, while the Tensor Board logs and visualizes the training process in real time and tracks the evaluation metrics. Moreover, the training environment for the model was set up to manage the computational requirements of YOLOv8. Using Google Collab Pro has provided us access to a T4 GPU and high RAM setting, which optimized efficiency. There are some necessary Python Ver 3.10 tools and libraries such as ultralytics, roboflow, OpenCV-python, torch, torch vision, and finally pytorch-lighting, which were imported and used to manage the dataset, build the model, and for visualization. Also, this process includes augmentations from the Albumentations library to enhance the model’s resilience. In addition, the training process for the model includes splitting the dataset into training, validation, and testing subsets in a ratio of 70–20–10. Here, the pre-processing step ensures that all the images are resized to a unified size which is 640 × 640 pixels before being inputted into the model. Also, an adaptive optimizer is used to automatically decide the optimal learning rate and momentum. During the training process, evaluation metrics such as the F1–Confidence Curve, Precision–Confidence Curve, and Precision–Recall Curve will be monitored to assess and analyze the performance of the model. A thorough analysis of these metrics assists in fine-tuning the model and handling challenges, such as distinguishing between similar grades such as Grade 2 and Grade 3, this will guarantee a strong and resilient classification model for GMH.

3.7. Model Performance

The performance of the finished model was evaluated by testing the data. Confusion matrices were constructed using the scikit-learn library, which provided perceptions into the methodology of YOLOv8 model. The YOLO loss function is composite, designed to train the model on several tasks simultaneously: accurately predicting bounding box parameters and class probabilities. The loss function consists of several components, as follows:
λ c o o r d         i = 0 S 2   j = 0 B 1 i j obj   x i x ^ i 2 + y i y ^ i 2   +   λ c o o r d         i = 0 S 2   j = 0 B 1 i j obj   w i w ^ i 2 + h i h ^ i 2   +   i = 0 S 2   j = 0 B 1 i j obj     C i C ^ i 2 +   λ noobj         i = 0 S 2   j = 0 B 1 i j noobj   C i C ^ i 2 +   i = 0 S 2 1 i obj     c classes p i c p ^ i c 2
Coordinate Loss: Penalizes deviations from the true bounding box (box center coordinates b_(x), b_(y), width b_(w), and height b_h). It includes terms for the box’s center, width, and height differences, weighted by λ_coord to emphasize the importance of accurate localization.
Class Probability Loss: Penalizes the difference in the predicted probabilities for the classes, ensuring the predicted class matches the true class.
Each bounding box prediction consists of five elements: b_(x), b_(y), b_(w), b_h, and p_c (the confidence score).
The YOLO algorithm is a cornerstone in real-time object detection due to its efficiency and accuracy, allowing it to run well even on limited hardware [32].

4. Results and Visualization

The YOLOv8 model illustrated excellent results in classifying the images into five categories. Throughout the validation process, the model demonstrated its capabilities in accurately differentiating between the given classes, as well as achieving high precision and recall scores over the board. The validation metrics shown in Table 5 were remarkable, the YOLOv8 model maintained a mean average precision (mAP50) of 0.979, which reflects its efficiency in accurately recognizing and detecting objects in the input images. As for the mAP50-95, which explains the range of intersection over union (IoU) thresholds, it was 0.724, which highlights the model’s robust performance. Furthermore, a comprehensive analysis of the Precision–Confidence and Recall–Confidence curves denotes that the YOLOv8 model presented a good and consistent performance even at varying confidence thresholds, ensuring reliable and resilient predictions. Moreover, the model’s capabilities of handling data augmentation and pre-processing efficiently played an essential key in its success. By resizing the input images into a unified size of 640 × 640 pixels and applying multiple augmentation methods, the model was able to generalize better and become more robust to dataset variability. This step was essential, considering the imbalanced nature of the used dataset, where some classes are underrepresented. Additionally, the confusion matrices that were constructed over the validation phase presented a detailed view of the model’s performance, emphasizing its strength in detecting “Normal” and “Grade 4” classes with almost perfect precision and recall. Nevertheless, the confusion matrices showed a small challenge in distinguishing between Grade 2 and Grade 3, since they have close bleeding placements, which highlights the inherited complications in the medical image classifications. In general, the YOLOv8 model demonstrated its ability to be an effective and reliable potential tool that can be used in GMH detection and classification, capable of assisting radiologists in making an accurate diagnosis.
Table 5. Validation metrics of YOLOV8 in our experiment.
Figure 16 demonstrates the validation confusion matrices for YOLO version 8. The confusion matrices illustrate the classification performance of each model, providing insights into false positives and false negatives that may have significant clinical implications. Figure 17 demonstrates the validation confusion matrices for YOLO version 8 after dataset normalization. Figure 18, Figure 19, Figure 20 and Figure 21 demonstrate the validation curve.
Figure 16. YOLOv8 validation confusion matrix.
Figure 17. YOLOv8 validation confusion matrix normalized.
Figure 18. The validation F1-score and confidence curve.
Figure 19. The validation precision and confidence curve.
Figure 20. The validation recall and confidence curve.
Figure 21. The validation precision and recall curve.
For a comparative analysis of GMH detection capabilities, Table 6 summarizes the performance metrics of YOLOv8 against ResNet-18, ResNet-50, and ResNet-152, providing a comprehensive overview of each model’s precision, recall, and mean average precision. This comparison underscores YOLOv8’s superior mAP50 and mAP50-95 values, illustrating its efficacy in detecting and classifying GMH cases.
Table 6. Comparative performance of YOLOv8, ResNet-18, ResNet-50, and ResNet-152 on GMH Detection.

5. Model Evaluation

The addition of class weights and k-fold cross-validation resulted in significant improvements in model recall, as shown in Table 7, particularly for underrepresented classes. Notably, the recall for Grade 1 and Grade 2 increased substantially, indicating the model’s enhanced ability to consistently detect positive instances from these classes. However, this improvement in recall came with a slight trade-off in precision, which impacted the mean average precision (mAP), particularly under the more rigorous mAP50-95 metric. This trade-off suggests that while the model has become more adept at handling imbalanced data, its precision in distinguishing between similar or overlapping classes has slightly diminished. Despite this, the inclusion of these techniques enhanced the model’s robustness and generalizability across various subsets of data, making it more reliable in real-world scenarios.
Table 7. Comparative performance of YOLOv8 original results and after adding weights and K-fold.
To comprehensively evaluate the model’s performance, several key metrics were analyzed: the F1–Confidence curve, Precision–Recall (PR) curve, and Recall–Confidence curve.
The F1–Confidence curve (Figure 22) demonstrates how the F1-score varies with the confidence threshold across different classes. The model achieves an optimal F1-score of 0.89 at a confidence threshold of 0.405 for the combined class output, indicating strong performance across multiple categories. However, classes like Grade 3 exhibit a steeper decline in F1-score as the confidence threshold increases, highlighting the model’s sensitivity to confidence variations for certain classes.
Figure 22. The validation F1-score and confidence curve after adding weights and K-folds.
The Precision–Recall (PR) curve (Figure 23) provides further insights into the balance between precision and recall. At an IoU threshold of 0.5, the model achieves an overall mean average precision (mAP) of 0.954. While classes such as Grade 1 and Grade 4 maintain high precision, the slight decline in precision, particularly for Grade 2, aligns with the trade-off between recall and precision. Nonetheless, the PR curve demonstrates the model’s ability to maintain high precision, even when balancing recall, making it especially effective in detecting classes with fewer instances.
Figure 23. The validation precision and recall curve after adding weights and K-fold.
The Recall–Confidence curve (Figure 24) highlights how recall varies across confidence levels. For the combined class output, the model achieves a perfect recall of 1.0 at a confidence threshold of 0, though recall values for certain classes, such as Grade 3, exhibit sharper declines as the confidence threshold increases. This behavior illustrates the model’s strength in maintaining high recall values, particularly for underrepresented classes, despite some loss at higher confidence thresholds.
Figure 24. The validation recall and confidence curve after adding weights and K-fold.
Finally, the Precision–Confidence curve (Figure 25) further emphasizes the trade-off between precision and confidence. The model maintains high precision at lower confidence thresholds, but precision for classes like Grade 3 and Grade 4 drops off as the confidence level rises. This pattern reflects the overall behavior observed throughout the evaluation, where a high precision at lower thresholds is balanced by a reduction in performance for certain classes at higher confidence levels.
Figure 25. The validation precision and confidence curve after adding weights and K-fold.
In summary, the evaluation metrics confirm that while the model exhibits some trade-offs between precision and recall, it remains highly robust and generalizable. The addition of class weights and k-fold cross-validation has improved its performance, particularly for imbalanced classes, with overall strong precision, recall, and F1-scores across varying confidence thresholds.

6. Discussion

Our study demonstrates the efficacy of the YOLOv8 model in the automated grading of GMH from cranial ultrasound images, achieving high precision and recall rates, particularly for the Normal and Grade 4 categories. The initial experiments using ResNet-18, ResNet-50, and ResNet-152 models provided critical insights into model performance and limitations. As shown in Figure 26, ResNet-18 showed promising results initially but failed to maintain accuracy across the entire dataset, particularly in handling complex hemorrhages like Grade 4 (mAP50 of 0.815). ResNet-50 struggled significantly with imbalanced data, particularly underrepresented classes like Grade 4, resulting in unsatisfactory outcomes (mAP50-95 of 0.670). ResNet-152 improved performance for coronal images but still performed inadequately for left- and right-side images (mAP50-95 of 0.690), as illustrated in Figure 27. These challenges led us to adopt the YOLOv8 model, which, through effective data augmentation and pre-processing, handled image variability better and improved performance on underrepresented classes (mAP50 of 0.979). However, the model faced challenges in distinguishing between Grade 2 and Grade 3 hemorrhages due to their similar bleeding patterns, highlighting the complexity of medical image classification. Despite this, YOLOv8’s strong performance indicates its potential as a reliable tool for radiologists, enhancing diagnostic accuracy and efficiency. Ensemble methods were considered as a potential alternative to improve classification performance, combining the strengths of YOLOv8 with ResNet models. However, these methods did not yield significant improvements compared to YOLOv8’s single model performance. Future research should focus on refining feature extraction and classification strategies to address closely related classes and explore integrating additional clinical data to further improve diagnostic capabilities.
Figure 26. YOLOv8 and Resnet model accuracy chart.
Figure 27. Model accuracy for each side chart.

6.1. Challenges and Limitations

Throughout this study, we encountered several significant challenges, primarily related to data availability and quality. The most prominent challenge was the limited size of the dataset, which impacted the model’s performance. When trained on this small dataset, the models tended to overfit, showing strong results on the training data but failing to generalize effectively to new, unseen cases. This limitation also affected the models’ accuracy, as they struggled to learn meaningful patterns from the limited data available.
To mitigate these issues, we implemented data augmentation techniques that expanded the size and diversity of the dataset. This approach improved the model’s ability to generalize by simulating variations in the data, thus reducing overfitting and enhancing overall performance.
Another key challenge was the quality of the initial images. The cranial ultrasound scans we received from the hospital were often unclear or of low resolution, making it difficult for models such as ResNet-18, ResNet-50, and ResNet-152 to accurately detect and classify hemorrhage locations. To address this, we employed a more rigorous data annotation process, ensuring that the images were precisely labeled. This improvement in data quality allowed the YOLOv8 model to perform significantly better, leading to more accurate and reliable detection and classification of hemorrhage sites.
Despite these challenges, the steps taken to address data limitations and improve image quality have led to a more robust model. However, we acknowledge that the limited dataset and lack of external validation remain constraints, which we aim to address in future work through collaboration with other institutions and exploring advanced image enhancement techniques.

6.2. Generalizability and External Validation

We acknowledge the absence of external validation in this study, as the model was trained and tested exclusively on data from King Fahad Hospital. While we recognize the importance of external datasets for assessing generalizability, the availability of neonatal ultrasound data is extremely limited. Due to privacy regulations and ethical considerations, we were unable to acquire data from other hospitals. Furthermore, there are no publicly available datasets of neonatal cranial ultrasound images, particularly for conditions as rare as germinal matrix hemorrhage (GMH).
Despite this limitation, our dataset remains valuable due to its high-quality imaging and detailed labeling of a rare condition. Each image was rigorously selected and annotated, ensuring that the model was trained on data of exceptional quality. Additionally, the use of advanced data augmentation techniques enhances the model’s ability to generalize by simulating variations that could occur in different clinical settings.
While external validation is not feasible at this stage, we believe that the model provides a strong foundation for GMH detection and classification in neonates. In future work, we aim to collaborate with other institutions to explore the possibility of data sharing, adhering to privacy and ethical standards, and eventually validating the model on diverse datasets.
Given the rarity of neonatal cranial ultrasound data and the constraints of patient privacy, this study represents a significant step forward in GMH research, with the potential to inform future work on this critical condition.

7. Conclusions

In conclusion, this study demonstrates the efficiency of the YOLOv8 model in the automated detection and classification of Germinal Matrix Hemorrhage (GMH) in premature neonates using cranial ultrasound imaging. Employing advanced deep learning techniques, including transfer learning and data augmentation, the model analyzed images from a dataset of 586 patients, categorizing them into five distinct grades. The YOLOv8 model achieved a mean average precision (mAP50) of 0.979 and a mAP50-95 of 0.724, highlighting its capability to distinguish between various GMH grades with high accuracy. Despite challenges in differentiating between closely similar grades, the model’s overall performance supports its potential as a substantial aid in clinical settings, enhancing the accuracy and efficiency of GMH diagnosis. This could significantly impact patient management by facilitating early intervention and potentially improving neurodevelopmental outcomes in this vulnerable population.

Author Contributions

Conceptualization, N.M.I.; methodology, N.M.I., H.A. (Hadeel Alanize), L.A. and L.J.A.; software, H.A. (Hadeel Alanize), L.A., L.J.A., R.A. and W.A.; validation, N.M.I., A.A. and H.A. (Hanadi Althani); formal analysis, N.M.I., H.A. (Hadeel Alanize), L.J.A. and L.A.; investigation, H.A. (Hadeel Alanize), L.A., L.J.A., R.A. and W.A.; data curation, H.A. (Hadeel Alanize) and H.A. (Hanadi Althani); writing—original draft, H.A. (Hadeel Alanize), L.A., L.J.A., W.A. and R.A.; writing—review and editing, N.M.I. and H.A. (Haila Alabssi); manuscript revision, N.M.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Imam Abdulrahman bin Faisal University (IRB: 2023-09-393, NCBE Registration No.: (HAP-05-D-003) 19 October 2023).

Data Availability Statement

Data were obtained from King Fahd Hospital of Imam Abdulrahman bin Faisal University and are available from the authors with the permission of King Fahd Hospital of the University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. You, S.K. Neuroimaging of Germinal Matrix and Intraventricular Hemorrhage in Premature Infants. J. Korean Neurosurg. Soc. 2023, 66, 239–246. [Google Scholar] [CrossRef] [PubMed]
  2. Kim, K.Y.; Nowrangi, R.; McGehee, A.; Joshi, N.; Acharya, P.T. Assessment of germinal matrix hemorrhage on head ultrasound with deep learning algorithms. Pediatr. Radiol. 2022, 52, 533–538. [Google Scholar] [CrossRef] [PubMed]
  3. Parodi, A.; Govaert, P.; Horsch, S.; Bravo, M.C.; Ramenghi, L.A. Cranial ultrasound findings in preterm germinal matrix haemorrhage, sequelae and outcome. Pediatr. Res. 2020, 87, 13–24. [Google Scholar] [CrossRef] [PubMed]
  4. Zegarra, R.; Ghi, T. Use of artificial intelligence and deep learning in fetal ultra-sound imaging. Ultrasound Obstet. Gynecol. 2023, 62, 185–194. [Google Scholar] [CrossRef]
  5. Wang, L.; Lei, Y.; Liu, S.X.; Wang, T. Deep Learning in Medical Ultrasound Analysis: A Review. Engineering 2019, 5, 261–275. [Google Scholar] [CrossRef]
  6. Eason, G.; Noble, B.; Sneddon, I.N. On certain integrals of Lipschitz-Hankel type involving products of Bessel functions. Phil. Trans. R. Soc. 1955, A247, 529–551. [Google Scholar]
  7. Maxwell, J.C. A Treatise on Electricity and Magnetism, 3rd ed.; Clarendon: Oxford, UK, 1892; Volume 2, pp. 68–73. [Google Scholar]
  8. Qiao, S.; Pang, S.; Luo, G.; Pan, S.; Wang, X.; Wang, M.; Zhai, X.; Chen, T. Automatic Detection of Cardiac Chambers Using an Attention-based YOLOv4 Framework from Four-chamber View of Fetal Echocardiography. arXiv 2020, arXiv:2011.13096. [Google Scholar]
  9. Dadjouy, S.; Sajedi, H. Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN. In Proceedings of the 2024 10th International Conference on Artificial Intelligence and Robotics (QICAR), Qazvin, Iran, 29 February 2024; pp. 227–231. [Google Scholar] [CrossRef]
  10. Ömer Faruk Ertuğrul, Muhammed Fatih Akıl, Detecting hemorrhage types and bounding box of hemorrhage by deep learn-ing. Biomed. Signal Process. Control. 2022, 71 Pt A, 103085. [CrossRef]
  11. Dadjouy, S.; Sajedi, H. Artificial intelligence applications in the diagnosis of gallbladder neo-plasms through ultrasound: A review. Biomed. Signal Process. Control. 2024, 93, 106149. [Google Scholar] [CrossRef]
  12. Lin, M.-F.; He, X.; Hao, C.; He, M.; Guo, H.; Zhang, L.; Xian, J.; Zheng, J.; Xu, Q.; Feng, J.; et al. Real-time artificial intelligence for detection of Fetal Intracranial malfor-mations in Ultrasonic images: A multicenter retrospective diagnostic study. Authorea 2020. [Google Scholar] [CrossRef]
  13. Selcuk, B.; Serif, T. Brain Tumor Detection and Localization with YOLOv8. In Proceedings of the 8th International Conference on Computer Science and Engineering (UBMK), Burdur, Turkiye, 13–15 September 2023; pp. 477–481. [Google Scholar] [CrossRef]
  14. Paul, S.; Ahad, M.T.; Hasan, M.M. Brain Cancer Segmentation Using YOLOv5 Deep Neural Network. J. Med. Imaging Health Inform. 2021, 11, 658–665. [Google Scholar]
  15. Wang, Y.; Yang, C.; Yang, Q.; Zhong, R.; Wang, K.; Shen, H. Diagnosis of cervical lymphoma using a YOLO-v7-based model with transfer learning. Sci. Rep. 2024, 14, 11073. [Google Scholar] [CrossRef] [PubMed]
  16. Kothala, L.P.; Jonnala, P.; Guntur, S.R. Localization of mixed intracranial hemorrhages by using a ghost convolution-based YOLO network. Biomed. Signal Process. Control 2023, 80, 104378. [Google Scholar] [CrossRef]
  17. Pham, T.-L.; Le, V.-H. Ovarian Tumors Detection and Classification from Ultrasound Images Based on YOLOv8. J. Adv. Inf. Technol. 2024, 15, 264–275. [Google Scholar] [CrossRef]
  18. Holland, L.; Torres, S.I.H.; Snider, E.J. Using AI Segmentation Models to Improve Foreign Body Detection and Triage from Ultrasound Images. Bioengineering 2024, 11, 128. [Google Scholar] [CrossRef] [PubMed]
  19. Widayani, A.; Putra, A.M.; Maghriebi, A.R.; Adi, D.Z.C.; Ridho, M.H.F. Review of Application YOLOv8 in Medical Imaging. Indones. Appl. Phys. Lett. 2024, 5, 23–33. [Google Scholar] [CrossRef]
  20. Inui Mifune, Y.; Nishimoto, H.; Mukohara, S.; Fukuda, S.; Kato, T.; Furukawa, T.; Tanaka, S.; Kusunose, M.; Takigami, S.; Kuroda, R. Detection of Elbow OCD in the Ultrasound Image by Arti-ficial Intelligence Using YOLOv8. Appl. Sci. 2023, 13, 7623. [Google Scholar] [CrossRef]
  21. Passa, R.S.; Nurmaini, S.; Rini, D.P. YOLOv8 Based on Data Augmentation for MRI Brain Tumor Detection. Sci. J. Inform. 2023, 10, 363–370. [Google Scholar]
  22. Qureshi, M.; Ragab, M.G.S.; Abdulkader, J.; Muneer, A.; Alqushaib, A.; Sumiea, B.H.; Alhussian, H. A Review of YOLO-Based Object Detection in Medical Imaging: Applications and Advancements (2018–2023). Biomed. Signal Process. Control. 2024, 81. [Google Scholar]
  23. Yan, L.; Ling, S.; Mao, R.; Xi, H.; Wang, F. A deep learning framework for identifying and segmenting three vessels in fetal heart ultrasound images. Biomed. Eng. Online 2024, 23, 39. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  24. Natali, T. Automatic real-time prostate detection in transabdominal ultra-sound images. Med. Imaging Deep Learn. 2024. under review. [Google Scholar]
  25. Vidhya, V.; Raghavendra, U.; Gudigar, A.; Basak, S.; Mallappa, S.; Hegde, A.; Menon, G.R.; Barua, P.D.; Salvi, M.; Ciaccio, E.J.; et al. YOLOv5s-CAM: A Deep Learning Model for Automated Detection and Classification for Types of Intracranial Hematoma in CT Images. IEEE Access 2023, 11, 141309–141324. [Google Scholar] [CrossRef]
  26. Cortes-Ferre, L.; Ramos-Polla, A.; Sierra, C.-M.; Perez-Dıaz, J. Intracranial Hemor-rhage Detection in CT Scans Using EfficientDet and Grad-CAM. J. Imaging 2023, 9. [Google Scholar]
  27. Mansour, R.F.; Aljehane, N.O. An optimal segmentation with deep learning based inception network model for intracranial hemorrhage diagnosis. Neural Comput. Appl. 2021, 33, 13831–13843. [Google Scholar] [CrossRef]
  28. Passa, R.S.; Khan, M.A.; Hussain, T.; Akhtar, F. A Comprehensive Systematic Review of YO-LO for Medical Object Detection (2018 to 2023). Authorea 2023, 10, 363–370. [Google Scholar]
  29. Burkitt, K.; Kang, O.; Jyoti, R.; Mohamed, A.L.; Chaudhari, T. Comparison of cranial ultra-sound and MRI for detecting BRAIN injury in extremely preterm infants and correlation with neu-rological outcomes at 1 and 3 years. Eur. J. Pediatr. 2019, 178, 1053–1061. [Google Scholar] [CrossRef]
  30. Ye, H.; Gao, F.; Yin, Y.; Guo, D.; Zhao, P.; Lu, Y.; Wang, X.; Bai, J.; Cao, K.; Song, Q.; et al. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur. Radiol. 2019, 29, 6191–6201. [Google Scholar] [CrossRef]
  31. Ultralytics. Ultralytics GitHub Repository. GitHub Repository. Available online: https://github.com/ultralytics/ultralytics (accessed on 28 October 2024).
  32. Available online: https://medium.com/adventures-with-deep-learning/yolo-v1-part3-78f22bd97de4 (accessed on 28 October 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.