Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm

Meza, Gerardo; Ganta, Deepak; Gonzalez Torres, Sergio

doi:10.3390/a17110471

Open AccessArticle

Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm

by

Gerardo Meza

^†,

Deepak Ganta

^*,†

and

Sergio Gonzalez Torres

School of Engineering, Texas A&M International University, Laredo, TX 78041, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2024, 17(11), 471; https://doi.org/10.3390/a17110471

Submission received: 17 September 2024 / Revised: 14 October 2024 / Accepted: 21 October 2024 / Published: 22 October 2024

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial intelligence (AI)-assisted computer vision is an evolving field in medical imaging. However, accuracy and precision suffer when using the existing AI models for small, easy-to-miss objects such as bone fractures, which affects the models’ applicability and effectiveness in a clinical setting. The proposed integration of the Hybrid-Attention (HA) mechanism into the YOLOv8 architecture offers a robust solution to improve accuracy, reliability, and speed in medical imaging applications. Experimental results demonstrate that our HA-modified YOLOv8 models achieve a 20% higher Mean Average Precision (mAP 50) and improved processing speed in arm fracture detection.

Keywords:

YOLOv8; deep learning; CNN; computer vision; bone fracture; object detection

1. Introduction

Bone fractures are a typical emergency in hospital settings, especially in pediatric patients, who frequently sustain wrist or arm injuries [1]. In Great Britain, approximately one-third of children experience at least one fracture before reaching 17 years of age [2]. Accurate diagnosis and treatment are critical to prevent long-term complications. Radiology, the medical specialty that uses imaging to diagnose and treat diseases within the body, plays a crucial role in emergency medicine, particularly in detecting bone fractures. Fractures can be classified into four types: angle fractures (with some degree of bone angulation), typical fractures (aligned bone pieces), line fractures (thin fractures without significant displacement), and messed-up angle fractures (involving fracture, displacement, or rotational displacement) [3,4]. Treatment varies depending on the fracture type and location, making accurate and rapid identification essential.

To diagnose fractures, radiologists utilize different imaging devices, including Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and X-rays [5], with X-rays being the most used due to their cost-effectiveness and versatility, while MRI has a superior detection rate [6]. However, the effectiveness of medical imaging is limited by the skill of the surgeons analyzing the scans. In regions with a shortage of experienced surgeons, patient safety can be compromised [7]. Studies concluded that a significant portion of patients’ X-rays were misinterpreted in African hospitals [8]. To combat this, integrating computer vision into radiology has been proposed as a promising solution to improve patient care significantly. Fracture detection has been a popular topic as the complexity of its data and its requirement for accuracy challenge any object detection algorithm.

Recent advancements in deep learning and computer vision offer promising solutions to these challenges. Object detection models, particularly those based on convolutional neural networks (CNNs), have shown great potential in medical image analysis [9]. Among these, the YOLO (You Only Look Once) [10] series of algorithms is particularly well-suited for real-time applications, due to its balance between accuracy and inference speed [11]. Deep learning models, such as R-CNN [12], Fast R-CNN [13], and the YOLO series, are commonly used in computer vision for efficient and accurate image classification. These models are divided into one-stage and two-stage algorithms, with two-stage algorithms having distinct stages for producing regions and searching for features. In contrast, one-stage algorithms generate regions and search for features simultaneously.

Attention-based models, like transformers, were initially developed for natural language processing and language translation [14] but have started to gain popularity in computer vision [15]. A study on the Dilated Convolutional Feature Pyramid Network (DCFPN) [16] was trained to achieve an Average Precision (AP) of 82.1% on a dataset of thigh fracture X-ray images [17]. Further, the Regional Convolutional Neural Network (R-CNN) model, alongside image preprocessing, was used to obtain an AP of 62% accuracy [18] on the Musculoskeletal Radiograph (MURA) [19] dataset of nearly 4000 arm fractures. A Faster Region with Convolutional Neural Network (Faster R-CNN) model [20] was able to achieve an 88.4% mAP score [21], from a dataset of 1,052 images, 526 corresponding to various fractures. Their results showed great potential for object detection algorithms in the medical field; however, more accurate, two-stage convolution neural network approaches tend to be slow and hardware-intensive for practical uses. More modern uses of CNN methods for medical imaging include the application of IoMT (Internet of Medical Things)-based deep learning architecture for fetal QRS utilizing the ResNet18 and MobileNet pre-trained models, with an accuracy of 83.2% and 88.7%, respectively [22] and the addition of a soft attention block into the DenseNet algorithm utilized to classify SPECT images used for Parkinson’s Disease classification, with an overall accuracy of 99.2% [23].

Due to the extensive hardware requirements and inference times related to two-stage CNN algorithms, an emphasis on improving one-stage algorithms such as YOLO has arisen in recent years. The You Only Look Once (YOLO) series is one of the most well-known object detection algorithms with a balance between accuracy, inference speed, system requirements, ease of use, and adaptability, with each iteration building on the previous iterations to improve accuracy and performance [11]. YOLOv8 is a notable iteration in the YOLO series, known for its balance between speed and accuracy in object detection tasks [24]. YOLOv8 offers a variety of model sizes to cater to different computational resources and performance requirements. These models range from YOLOv8n (nano) and YOLOv8s (small) for lightweight applications with limited computational power to YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra-large) for scenarios demanding higher accuracy and more detailed detections. Each model size is optimized to provide the best possible performance for its intended use case, making YOLOv8 a versatile choice for diverse computer vision applications, especially in medical imaging that includes bone fracture detection.

An improved version of YOLOv2 managed to achieve a mAP value of 75.3% [25] on a dataset of spinal fracture lesions. Because of using a one-stage system, the average detection time per image is 0.027 s, making it extremely useful in emergency rooms. A YOLOv8 with augmented fracture scans obtained an average mAP score of 75.3% [26] from the GRAZPEDWRI-DX dataset [27]. A modified variant of YOLOv7 with an attention-based mechanism (the Squeeze-and-Excitation module) achieved a mAP score of 86.2% on the FracAtlas dataset [28]. An implementation of YOLO-NAS (You Only Look Once- Neural Architecture Search) [29] and Scalable and Efficient Object Detection (EfficientDet) [30] trained on a dataset of 4736 images of hand fractures had an average mAP score of 98.1% [31]. Although they have high accuracy, they lack the precision to detect fine-grained features, lack faster real-time detection on a large medical dataset, and are not flexible and customizable to modify the algorithm, especially with the addition of an attention mechanism.

There is a knowledge gap in evaluating the effectiveness of integrating the attention mechanisms into the convolutional neural networks (CNNs) framework. Many studies have concentrated solely on employing CNNs for fracture detection, notably utilizing the existing YOLO series algorithms. Although CNNs generally offer an effective balance between model execution time and accuracy, they often face challenges identifying which features are crucial, potentially overlooking details in more intricate images. This limitation can lead to lower accuracy in detecting smaller objects, such as line fractures. By integrating an attention-based mechanism into CNNs, we can improve detail and object detection capabilities, thereby enhancing the performance of the YOLOv8 model and maintaining its competitiveness among other advanced object detection models in fracture or feature detection.

This motivated us to address the knowledge gap by integrating the attention mechanism into the YOLOv8 models, thereby improving their feature extraction capabilities. The Hybrid-Attention (HA) mechanism combines channel and spatial attention modules to allow the network to focus on the most relevant features. Integrating attention mechanisms into YOLO models helps distinguish between relevant and irrelevant features, boosting performance on small, easy-to-miss objects and improving detection accuracy. Additionally, the HA mechanism does not use a transformer. Instead, it combines two types of attention mechanisms: channel attention and spatial attention. Channel attention emphasizes the important channels in the feature maps, improving the model’s ability to capture critical features across different channels. On the other hand, spatial attention highlights significant spatial locations, enhancing the network’s sensitivity to essential regions within the image. The HA attention mechanism improves CNNs’ overall performance and accuracy, particularly in complex object detection tasks such as those handled by the YOLO series. These mechanisms enhance feature representations in CNNs by focusing only on the relevant channels and spatial locations.

In this paper, we modify the YOLOv8 model architecture to integrate the HA mechanism, which combines spatial and channel attention and improves the model’s performance in detecting fractures in the hand on the FracAtlas dataset [32]. We compare the performance of the YOLOv8 models with our modified YOLOv8-HA. The model’s hybrid approach enhances its ability to focus on the input data’s most informative spatial regions and relevant feature channels.

The remaining sections of this paper are structured as follows: Section 2 describes the model architecture in detail. Section 3 describes the methods for our study in detail. Section 4 presents the experimental results and discusses how to analyze them. Section 5 summarizes the conclusions.

2. Model Architecture

The YOLOv8 architecture has three main components: the backbone, the neck, and the head. Figure 1 provides a detailed overview of each part and explains the design principles and functionalities of the various modules.

The backbone of the YOLOv8 model is essential for feature extraction from input images. It employs a Cross-Stage Partial (CSP) architecture, which divides the feature map into two paths [33]. The first path undergoes convolution operations, while the second path concatenates with the output from the first, enhancing learning capabilities and reducing computational overhead. The CSP technique enhances the gradient flow during the training process by creating shortcuts for gradients, reducing the vanishing gradient problem, which is an issue that occurs during deep neural network training where the gradients of the loss function become extremely small as they are backpropagated through the network layers, making the gradients too small for effective learning. Additionally, CSP allows some maps to bypass layers, which helps preserve unique features. This is important because those features contribute to the model’s ability to learn and recognize different patterns. If all the feature maps are processed the same way, the model can miss important variations that could improve the performance. A significant change in YOLOv8 is the introduction of the C2f module, which merges concepts from the C3 module and Efficient Layer Aggregation Networks (ELAN) used in YOLOv7 [34]. The C2f module comprises two ConvModules and a series of DarknetBottleNeck units linked through split and concatenate operations. Each ConvModule includes convolution, batch normalization, and SiLU activation functions. This module enhances feature representation by enabling cross-stage connections that combine shallow and deep features. Cross-stage connectivity promotes partial feature reuse, ensuring critical information is preserved throughout the network.

By integrating the C2f module, YOLOv8 achieves full gradient flow and more efficient learning, further facilitated by the model’s architecture that minimizes the number of parameters. The number of blocks in each stage has been optimized to 3, 6, 6, and 3 blocks in Stages 1 through 4, respectively. This reduces the computational costs while maintaining high performance. Furthermore, including the Spatial Pyramid Pooling-Fast (SPPF) module in Stage 4 enhances inference speed, balancing its learning capability and efficiency [35].

Figure 2 illustrates the architecture of the YOLOv8 detection algorithm, showcasing the ranked composition of the model’s three main components: the Backbone, Neck, and Head. The backbone is responsible for the initial feature extraction using the Cross-Stage Partial Architecture (CSP), which enhances the learning capabilities at low computational costs. Further, the ConvModule and C2f modules are used in various stages to facilitate effective feature propagation across different layers. The C2 module in YOLOv8 stands for the CSP (Cross-Stage Partial) Bottleneck with two convolutions. It is a building block used in the YOLOv8 architecture. The C2f module is a faster implementation of the C2 module. It improves the execution speed of the model while maintaining similar performance. This optimization is achieved by making certain modifications to the original C2 module.

The neck of YOLOv8 plays a critical role in combining the multi-scale features and merging outputs from different network layers to improve prediction accuracy. Although deep networks capture more features, they sometimes need more precise location information for small objects, due to large-scale convolution operations. YOLOv8 combines Feature Pyramid Network (FPN) and Path Aggregation Network (PAN) architectures to reduce this [24].

Drawing inspiration from YOLOv5 [36], the FPN in YOLOv8 up-samples features from higher layers to enhance the feature information in the lower layers. Conversely, the PAN down-samples features from the lower layers to the higher ones, ensuring that the upper layers retain detailed feature information. This approach lets the model make precise predictions across various image sizes. YOLOv8 adopts an improved version of this method, called FP-PAN (Feature Pyramid-Path Aggregation Network.), which eliminates some convolution operations during up-sampling to reduce computational costs. The head of the YOLOv8 model is designed for the final object detection, including classification and localization tasks. YOLOv5 utilizes a coupled head, where classification and detection are combined; on the other hand, YOLOv8 utilizes a decoupled head architecture. The separation of classification and regression heads enhances detection precision and accuracy.

YOLOv8’s head eliminates the objectness branch, focusing solely on the classification and regression branches. Unlike traditional anchor-based methods that rely on numerous anchors for object location, YOLOv8 employs an anchor-free approach. This method determines the object center and estimates the bounding box dimensions relative to the center, allowing for more accurate and flexible object localization. By leveraging a decoupled head and anchor-free method, YOLOv8 achieves superior accuracy and efficiency in object detection, especially when using complex datasets.

3. Methods

3.1. Dataset

The dataset used to train the modified YOLOv8 model was the FracAtlas dataset [32], a musculoskeletal bone fracture dataset used widely in many published works, manually annotated in COCO, VGG, YOLO, and Pascal VOC formats. It consists of 4083 images that have been manually annotated; among them, 717 images depict fractures. It is divided into X-ray images showing fractures and non-fractures and subdivided into fracture classification and localization. The preprocessing stage of the dataset included various steps. The first step was to divide the dataset into two sets: the hand fracture-exclusive and full-body fracture sets. Then, each set was divided into three subsets: a training set, a validation set, and a testing set, as seen in Figure 3.

Each set consists of two subfolders: the image and annotations folders. The data of the training set is then augmented using various techniques, including resizing the image to a manageable size. Additional preprocessing steps include normalization and data augmentation, such as rotation, flipping, contrast adjustment, and noise reduction, to enhance the image quality.

The dataset was divided into a training set comprising approximately 80% of the images, a validation set of 12%, and a testing set corresponding to the remaining 8%. This distribution ensured that each subset maintained a similar fracture ratio to non-fracture images, promoting balanced training and evaluation. Hand fractures were then isolated from the entire dataset to refine the analysis further. Separate training and validation splits were then explicitly created for the hand fracture subset to ensure targeted and practical model training in this category.

The experimental setup implemented the model, which includes the Deep Learning Framework (PyTorch 2.4.1, CUDA 12.1), utilizing Jupyter Notebook for local Python 3.11 code creation and editing. The YOLOv8 models were executed on an NVIDIA RTX 4070 GPU, leveraging its accelerated computing capabilities to handle the intensive processing tasks. This setup provided a robust environment for training and fine-tuning the models, enabling efficient experimentation and rapid iteration through 1000 epochs for each test.

3.2. Modified Model Architecture

This section outlines our proposed method, which integrates a novel HA mechanism into the YOLOv8 algorithm to enhance fracture detection performance. Initially, we employed the YOLOv8 algorithm to train models with different size parameters on the FracAtlas dataset. FracAtlas is a robust and well-annotated dataset of X-ray images that will aid in developing and benchmarking advanced machine-learning models for fracture diagnosis. After evaluating these models, we introduce our HA mechanism to refine the detection capabilities further.

3.2.1. Channel Attention Mechanism

The Channel Attention mechanism enhances or suppresses specific channels within the feature maps. This mechanism is crucial for identifying which features are most important across the different channels of the input data. The process begins with applying average and max pooling operations, which generate channel-wise descriptors [37]. They are represented by:

A v g P o o l (x) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{i j}

(1)

M a x P o o l (x) = \max (x_{i j})

(2)

where H is the height, W is the width of the feature map, and x_ij is the individual elements of the feature map. These descriptors are then processed through a shared network, typically composed of fully connected layers implemented via convolution layers with a 1 × 1 kernel:

f (x) = W * x + b

(3)

where f(x) is the output of the fully connected layer, W is a weight matrix in the fully connected layer, b is a bias vector in the fully connected layer, and x is the input that represents the standard operation in neural networks.

The ReLU activation function is crucial for neural networks in deep learning when determining the accuracy of the model output where x is the input:

R e L U (x) = m a x (0, x)

(4)

Next, the output is passed through another fully connected layer to restore the channel dimensions, which is represented by:

f_{2} (x) = W_{2} * x + b_{2}

(5)

Finally, the output is passed through another fully connected layer to restore the channel dimensions:

M_{c} = σ (f_{2} (R e L U (f_{1} (A v g P o o l (x))))) + f_{2} (R e L U (f_{1} (M a x P o o l (x))))

(6)

where M_c is the channel attention map, f₁ and f₂ are fully connected layers with learned weights and biases, and σ is the sigmoid activation function that limits the output between 0 and 1. AvgPool(x) and MaxPool(x) are operations applied to the feature map x. This equation computes the channel attention map by passing the average and max pooled values through a series of fully connected layers, applying the ReLU activation function, and then combining and calculating the result using the sigmoid function.

This results in the creation of a channel attention map, which is used to multiply the original input feature maps to give a final output given by:

x^{'} = x * M_{c}

(7)

This final step adjusts the original feature map by multiplying it with the channel attention map, where channels with higher attention values are emphasized in the output. The most important channels with the most information are highlighted, improving the model’s capture of critical features.

3.2.2. Spatial Attention Mechanism

The Spatial Attention mechanism complements Channel Attention by identifying the most informative spatial regions within the feature maps [38]. This mechanism also starts with average and max pooling operations, which are applied along the channel axis to generate spatial descriptors. The descriptors are concatenated and passed through a convolutional layer, which processes the spatial information. The result is then passed through a sigmoid activation function to generate the spatial attention map. This attention map is multiplied with the input feature maps, highlighting the most important spatial regions and enhancing the network’s sensitivity to significant areas within the images. Focusing on relevant spatial locations ensures the model considers the crucial areas for accurate fracture detection.

The spatial attention mechanism complements channel attention by identifying the most informative spatial regions within the feature maps. This mechanism also starts with average and max pooling operations, but these are applied along the channel axis to generate spatial descriptors, which are given by:

A v g P o o l (x) = \frac{1}{C} \sum_{k = 1}^{C} x_{i j k}

(8)

M a x P o o l (x) = x_{i j k}

(9)

where C equals the number of channels in the feature map. x_ijk is the value of the position (i, j, k) in the feature map, where k indexes over the channels. This operation computes the average value at each spatial location (i, j) across all channels. These descriptors are concatenated via the following equation:

x_{c a t} = C o n c a t (A v g P o o l (x), M a x P o o l (x))

(10)

where x_cat is the concatenated result of the average and max pooled descriptors. Concat combines the average and max pooling results along the channel axis. This concatenates the spatial descriptors to form a combined feature representation. The concatenated descriptors are then passed through a convolutional layer, which processes the spatial information:

f (x) = W * x_{c a t} + b

(11)

where W is the weight matrix of the convolutional layer and b is the bias term of the layer. Then, we multiply the weight matrix with the concatenated descriptors. This operation applies a convolutional filter to the concatenated descriptors to extract the spatial attention features. The result is then passed through a sigmoid activation function to generate the spatial attention map:

M_{s} = σ (f (x_{c a t}))

(12)

where M_s is the spatial attention map, σ is the sigmoid activation function that limits the output to a value between 0 and 1, and f(x_cat) results from the convolution applied to the concatenated spatial descriptors. This computes the spatial attention map by passing the concatenated spatial descriptors through a convolutional layer followed by a sigmoid activation. This attention map is multiplied by the input feature maps:

x^{″} = x^{'} * M_{s}

(13)

where x″ is the final output after applying spatial attention, x′ is the feature map after applying channel attention, and M_s is the spatial attention map. This final step adjusts the feature map by emphasizing spatial regions using the spatial attention map. Focusing on the relevant spatial locations ensures the model considers the crucial areas for accurate fracture detection.

3.2.3. Hybrid-Attention Module

The HA class is a critical component of our model (Figure 4), inheriting from the neural network blocks: Head, Neck, and Backbone. The module allows it to be integrated seamlessly into larger neural network architectures. The constructor of the HA module takes d_model (dimension of the model) as an input parameter and initializes the two types of attention mechanisms: Channel Attention and Spatial Attention. The forward method computes the attention maps and applies them to the input tensor x, first modulating the input with channel attention and then refining it further with spatial attention.

The DetectionModel class was used to extend a base class to create a specific detection model, in this case, YOLOv8. We incorporated the HA module into the model to enhance feature extraction. The constructor initializes the model using a configuration file and parameters such as the number of input channels (ch), the number of classes (nc), and the class names. During inference, the forward method processes the input through the HA module, refining the image features by focusing on channel and spatial dimensions. The output is then passed through the main YOLOv8 architecture, consisting of convolutional layers, batch normalization, and activation functions typical of YOLO models. Additional methods handle scaling and augmentation to support various input image sizes and data augmentation during training, ensuring the model’s robustness and versatility.

The OBB Model is another specialized variant of the YOLOv8 model designed for detecting oriented bounding boxes (OBB), which are helpful when bounding boxes are not axis-aligned and need to be rotated. Although we did not use this model, it introduces configurations for detecting objects in diverse orientations. We briefly mention this model as a potential approach that could be enhanced by our HA mechanism for more accurate detection in cases where object orientation varies within a single input, such as identifying objects from different angles.

4. Results and Discussion

This section presents the experimental results and evaluation of the YOLOv8 model with Hybrid-Attention (HA). The primary metric used to measure performance is the Mean Average Precision (mAP), which measures the model’s accuracy. Additionally, we evaluated the model using recall and F1-score metrics to gain further insights into its performance.

4.1. Evaluation Metrics

The primary evaluation metric in YOLOv8 to evaluate the model performance is the mAP [39]. The Mean Average Precision (mAP) is computed as the mean of the Average Precision (AP) values across all object classes. Average precision is derived from the Precision-Recall curve, which plots against recall for different threshold values. Intersection over union (IoU) is another important metric used in YOLOv8, and it measures the overlap between the predicted bounding box and the ground truth bounding box. A higher IoU value indicates superior localization performance, demonstrating that the predicted bounding box closely matches the ground truth [40]. YOLOv8 reports mAP@50 (where IoU is 0.5) and mAP@50–95 (averaging IoUs from 0.5 to 0.95). These metrics comprehensively assess the model’s localization performance at different IoU thresholds.

Precision is another metric representing the ratio of accurate positive detections to the total number of positive detections. This value can tell us how many detected objects are correct. A high precision score indicates that the model makes accurate predictions when detecting objects. Conversely, recall measures the ratio of accurate positive detections to the total number of actual objects. This metric evaluates the model’s ability to find all the relevant objects within the dataset. High recall means the model can detect most objects with very few missing [41].

The F1 score is another important metric that uses precision and recall, providing a balanced measure of the model’s actual performance. It is the harmonic means of precision and recall, making it helpful in balancing these two metrics’ trade-offs. A high F1 score indicates that the model achieves a balanced tradeoff between precision and recall, ensuring accurate and comprehensive object detection.

This study also explored the effectiveness of the Stochastic Gradient Descent (SGD) optimizer in the YOLOv8 model integrated with our HA mechanism. The HA mechanism significantly enhances feature representation, improving the model’s ability to focus on relevant features and improving performance in object detection tasks. While other optimizers, such as Adam or AdamW, offer adaptive learning rates that accelerate convergence, we found that SGD consistently outperformed the alternative optimizers, as seen in Table 1.

The SGD optimizer has a fixed learning rate and a more straightforward update mechanism that provides more controlled and stable learning dynamics, which is beneficial in tasks like dealing with the introduced complexity of the Hybrid-Attention mechanism. Adaptive optimizers that include weight decay for regularization can sometimes lead to overly aggressive updates, especially in networks with attention mechanisms that cause the model to converge too early. In contrast, the updates of SGD help prevent the model from settling into a sharp low level that corresponds to overfitting.

In the training process, we utilized the pre-trained YOLOv8 models already trained on the MS COCO dataset. The MS COCO dataset is a large-scale dataset designed for object detection and segmentation, containing over 330,000 images with more than 200,000 labeled images and around 1.5 million object instances across 80 object categories [42]. It provides detailed annotations, including segmentation masks, bounding boxes, object labels, and critical points for human poses, making it a standard benchmark for evaluating computer vision algorithms. Figure 5 illustrates the results for manually labeled and predicted images, demonstrating that our model successfully identified and detected arm and leg fractures.

Research from Ultralytics suggests that training YOLOv8 models typically necessitates around 500 epochs. However, leveraging pre-trained models allowed us to set the total number of epochs to 1000, with an early stopping mechanism that would terminate training if no significant improvement was detected after 200 epochs. Our experimental results showed that the optimal performance for all models occurred within the first 100 epochs, mainly between the 50th and 70th epochs.

4.2. Ablation Study and Quantitative Analysis

To showcase the effectiveness of the HA mechanism, we performed an ablation study focusing mainly on the YOLOv8m variant. The results revealed that the m-model excelled in detecting hand fractures, achieving mAP scores over 0.7 for that class. Table 2 illustrates the predictions for each class using our HA method, showing that the mAP for hand fractures improved from 0.59 to 0.71, giving prominence to the benefits of our modifications. Additionally, we combined the arms and leg classes, resulting in our HA model achieving a precision score of 0.592, outperforming the standard YOLOv8 model, which only had a precision of 0.476. These results show the performance capabilities of our HA mechanism for enhancing detection accuracy.

Additionally, we observed a significant improvement in training speed with the YOLOv8-HA mechanism. Training the original YOLOv8m model using the SGD optimizer required 6791 s and 4979 s with the Adam optimizer. In comparison, the YOLOv8m-HybridAttention model completed the training in 5931 s with SGD and 3708 s with Adam optimizer. These results show that the HA mechanism improves accuracy and reduces training time, highlighting its efficiency in enhancing model performance without taking up more computational resources. Older versions of YOLO models were tested for comparison and showed lower precision accuracy.

Further quantitative analysis was performed. The precision-confidence curve in Figure 6 shows the model’s performance in detecting fractures. The curve plots the ratio of accurate positive predictions to all positive predictions as the confidence threshold changes. The model uses the confidence threshold to determine its certainty before labeling a detection as positive. We can adjust the threshold, and the model can become more lenient or stricter with its detections.

The precision between 0.0 and 0.4 is relatively low, indicating that the model makes many predictions, some of which are false positives. This can happen when the model is not confident enough, so it considers many potential fractures as a valid detection. As a result, more incorrect positive detections are mixed with correct detections. As the confidence threshold increases, the precision rises. Between the range of 0.6–0.8, the model’s accuracy is above 80%, indicating that the model is doing a much better job detecting actual fractures and reducing false positives. This could be the model’s sweet spot, balancing precision and recall, ensuring that most predictions are correct.

Figure 7 illustrates several improvements compared to the regular YOLOv8 model. First, it starts with a higher precision at lower confidence thresholds, around 0.6. This can mean that the attention mechanisms, spatial and channel, help improve the model’s ability to identify a fracture early on, even if the model is not as confident. This boost is essential in reducing the false positive right from the start.

Furthermore, the HA model can maintain a more stable precision across a range of confidence thresholds. From 0.4–0.8, the model consistently holds its precision above 0.7 and 0.8. The stable precision indicates that the attention modules improve reliability across different confidence levels. While the standard YOLOv8 model fluctuates in precision, the HA YOLO model offers consistent performance and reduces the swings in accuracy.

Another difference between both models is that the HA model reaches 100% at a confidence threshold of 0.962, whereas the regular YOLOv8 model achieves this at 0.849. This means that the HA model is more cautious about making predictions, but it makes up for it with its reliability across lower confidence thresholds.

While precision measures the accuracy of the model’s predictions, recall, which is responsible for identifying the model’s actual positives, is important. The Recall-Confidence Curve plots recall as a function of the confidence threshold value. As the confidence threshold value increases, the recall will typically decrease because it forces the model to become more selective and may miss accurate positive detections.

In the Recall-Confidence Curve (Figure 8), recall is usually higher at low confidence thresholds because the model is not as strict and detects more possible fractures. However, if the confidence threshold rises, the model will become more confident in its predictions, but some true positives may be missed, leading to recall dropping. Balancing precision and recall is challenging, so the precision-confidence and recall-confidence curves are essential to consider together. As shown in Figure 8, with low confidence thresholds (around 0), the recall is around 0.6–0.7. This means the model is more lenient and considers more potential fractures while detecting false positives as true positives due to low confidence levels. Essentially, while the model detects more fractures, it does so at the cost of being less precise.

As the confidence threshold increases, the recall steadily declines due to the model being more selective with its accurate positive detections. Between the confidence range of 0.4–0.6, recall is between 0.4–0.5, showing that the model is starting to miss more true fractures as it becomes stricter in its detections. This drop can happen because the model prioritizes precision over recall at high confidence levels, leading to fewer predictions but with higher accuracy.

As we see in Figure 9, the HA model starts with a high recall between 0.5 and 0.6. This shows that when the model is less selective, it can detect many actual fractures. The HA mechanism enhances the model’s performance at the early stages by ensuring that a good number of actual fractures are detected, even though some might be false positives. This indicates that the attention modules are helping the model focus on relevant features early, improving recall without sacrificing accuracy.

Compared to the standard YOLOv8 model, the HA model shows greater consistency at low- and mid-confidence thresholds. This makes it particularly important in applications like fracture detection, where missing actual fractures can have consequences. The consistent recall curve across the thresholds shows reliable detection without extreme trade-offs, as seen in models that fluctuate more inconsistently. The sharper drop at around 0.8 confidence level shows the expected behavior as the model focuses on precision rather than recall. However, it maintains a better balance than the standard YOLOv8 model.

While recall measures the model’s ability to detect positive detections, it is important to keep the Precision-Recall curve in mind, which helps measure the success of the prediction when the classes are imbalanced; the Precision-Recall Curve shows the tradeoff between precision and recall for different threshold values. As shown in Figure 10, the Precision-Recall curve of the regular YOLOv8 model shows how the performance shifts as it makes more predictions or if it focuses on making accurate predictions. The model starts with high precision when the recall is low, which suggests that most predictions are correct. However, as the recall value increases, the precision starts to drop. The drop means that more false positives are being detected, impacting accuracy.

Figure 11 illustrates the HA YOLOv8 model’s Precision-Recall Curve, demonstrating the improved balance between precision and recall. The HA mechanism allows the model to maintain higher precision even as the recall increases, as evidenced by the less steep drop in precision as the recall value increases. The Hybrid model can keep the precision higher and steadier from about 0.6 until recall reaches about 0.4, proving that it performs better at detecting fractures while maintaining high accuracy. Even as recall increases, the precision decline is more gradual than the regular YOLOv8 model. This improved balance from the HA mechanism suggests it is more effective.

We compared the modified algorithm introduced in this study to other state-of-the-art object detection models for fracture detection, including popular frameworks and specialized models. A modified YOLOv2 achieved a mAP of 75.3% on a spinal fracture lesion dataset, excelling in time-sensitive environments with a detection speed of 0.027 s per image. Similarly, YOLOv8, trained on augmented fracture scans from the GRAZPEDWRI-DX dataset, achieved the same mAP. YOLOv7, enhanced with a Squeeze-and-Excitation module, reached a mAP of 86.2% on the FracAtlas dataset. Meanwhile, YOLO-NAS combined with EfficientDet delivered the best result, achieving a remarkable 98.1% mAP on a hand fracture dataset.

Our HA YOLOv8 model achieved an accuracy of 0.713 on hand fractures, which, although lower than some models, offers several key advantages. While our model does not have the highest accuracy, its balance of performance, improved speed, interpretability, and resource efficiency makes it well-suited for clinical applications, where speed and detailed analysis are critical, even when image quality could be better. Our results were obtained from a dataset with unique challenges, such as complex fracture patterns, imaging conditions, and fewer training images than other datasets. Comparing models across different datasets with varying complexities is very complex. More accurate models may have been trained on specialized datasets optimized for specific fracture types or custom hand-made datasets.

By integrating the HA mechanism, we improved the accuracy from 0.594 to 0.713, enhancing the model’s ability to detect subtle or ambiguous fracture features. We encountered several challenges or limitations during the testing and training process before we finalized the YOLOv8 algorithm to integrate our HA mechanism. The main challenge was addressing the tradeoff between speed and accuracy while integrating the HA mechanism and testing several YOLO models. Another challenge or limitation we faced included the unavailability of reliable, publicly available annotated datasets that are less noisy for real-time image processing without consuming enormous computational resources. In the future, we plan to integrate our proposed model into emerging architectures, such as YOLOv10, and capsule network-based algorithms, such as CapsNets [43] or TSPORTNet [44], which simplifies the overall architecture, using less energy and resources while improving accuracy and precision. Various device applications will also be tested for adaptability to different hardware. We plan to expand to other domains and not limit ourselves to medical imaging.

5. Conclusions

In conclusion, the HA mechanism significantly enhanced the YOLOv8 model’s feature representation capabilities. By focusing on both channel-wise and spatial aspects, the model can attend to more relevant information for object detection tasks, leading to improved performance. This enhancement is particularly beneficial in medical image analysis, where detecting small and complex features is crucial. For arm fracture detection, a 20% improvement in the mAP 50 score and a 24% increase in the F score were seen for the HA-modified YOLOv8 models. Improvement in accuracy makes the HA model excel at identifying complex fractures that base models miss. They increase object detection performance faster without needing additional hardware, computing, or energy resources, especially in hand fracture detection.

Author Contributions

Conceptualization, G.M., D.G. and S.G.T.; methodology, G.M. and D.G.; modeling of software and Visualization, G.M. and D.G.; validation, G.M. and D.G.; writing, G.M. and D.G.; and supervision, D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset utilized for the validation can be accessed from (https://doi.org/10.6084/m9.figshare.22363012, accessed on 1 October 2023).

Acknowledgments

We want to thank TAMIU (Texas A&M International University) for its support in completing the project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chung, K.C.; Spilson, S.V. The Frequency and Epidemiology of Hand and Forearm Fractures in the United States. J. Hand Surg. Am. Ed. 2001, 26, 908–915. [Google Scholar] [CrossRef]
Cooper, C.; Dennison, E.M.; Leufkens, H.G.M.; Bishop, N.; van Staa, T.P. Epidemiology of Childhood Fractures in Britain: A Study Using the General Practice Research Database. J. Bone Miner. Res. 2004, 19, 1976–1981. [Google Scholar] [CrossRef]
Ellis, E. Management of Fractures Through the Angle of the Mandible. Oral Maxillofac. Surg. Clin. N. Am. 2009, 21, 163–174. [Google Scholar] [CrossRef]
Umans, H.R.; Kaye, J.J. Longitudinal Stress Fractures of the Tibia: Diagnosis by Magnetic Resonance Imaging. Skelet. Radiol. 1996, 25, 319–324. [Google Scholar] [CrossRef]
Xiong, C.; Xu, X.; Zhang, H.; Zeng, B. An Analysis of Clinical Values of MRI, CT and X-Ray in Differentiating Benign and Malignant Bone Metastases. Am. J. Transl. Res. 2021, 13, 7335. [Google Scholar]
Feydy, A.; Drapé, J.; Beret, E.; Sarazin, L.; Pessis, E.; Minoui, A.; Chevrot, A. Longitudinal Stress Fractures of the Tibia: Comparative Study of CT and MR Imaging. Eur. Radiol. 1998, 8, 598–602. [Google Scholar] [CrossRef]
Hallas, P.; Ellingsen, T. Errors in Fracture Diagnoses in the Emergency Department—Characteristics of Patients and Diurnal Variation. BMC Emerg. Med. 2006, 6, 4. [Google Scholar] [CrossRef]
Er, E.; Kara, P.H.; Oyar, O.; Unluer, E.E. Overlooked Extremity Fractures in the Emergency Department. Ulus. Travma Acil Cerrahi Derg. Turk. J. Trauma Emerg. Surg. TJTES 2013, 19, 25–28. [Google Scholar] [CrossRef]
Ganatra, N. A Comprehensive Study of Applying Object Detection Methods for Medical Image Analysis. In Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 17–19 March 2021; pp. 821–826. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 779–788. [Google Scholar]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need; Cornell University Library: Ithaca, NY, USA, 2023. [Google Scholar]
Wu, B.; Xu, C.; Dai, X.; Wan, A.; Zhang, P.; Yan, Z.; Tomizuka, M.; Gonzalez, J.; Keutzer, K.; Vajda, P. Visual Transformers: Token-Based Image Representation and Processing for Computer Vision; Cornell University Library: Ithaca, NY, USA, 2020. [Google Scholar]
Zhang, S.; Zhu, X.; Lei, Z.; Wang, X.; Shi, H.; Li, S.Z. Detecting Face with Densely Connected Face Proposal Network. Neurocomputing 2018, 284, 119–127. [Google Scholar] [CrossRef]
Guan, B.; Yao, J.; Zhang, G.; Wang, X. Thigh Fracture Detection Using Deep Learning Method Based on New Dilated Convolutional Feature Pyramid Network. Pattern Recognit. Lett. 2019, 125, 521–526. [Google Scholar] [CrossRef]
Guan, B.; Zhang, G.; Yao, J.; Wang, X.; Wang, M. Arm Fracture Detection in X-Rays Based on Improved Deep Convolutional Neural Network. Comput. Electr. Eng. 2020, 81, 106530. [Google Scholar] [CrossRef]
Rajpurkar, P.; Irvin, J.; Bagul, A.; Ding, D.; Duan, T.; Mehta, H.; Yang, B.; Zhu, K.; Laird, D.; Ball, R.L.; et al. MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs. arXiv 2017, arXiv:1712.06957. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Ma, Y.; Luo, Y. Bone Fracture Detection Through the Two-Stage System of Crack-Sensitive Convolutional Neural Network. Inform. Med. Unlocked 2021, 22, 100452. [Google Scholar] [CrossRef]
Krupa, A.J.D.; Dhanalakshmi, S.; Lai, K.W.; Tan, Y.; Wu, X. An IoMT Enabled Deep Learning Framework for Automatic Detection of Fetal QRS: A Solution to Remote Prenatal Care. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 7200–7211. [Google Scholar] [CrossRef]
Thakur, M.; Kuresan, H.; Dhanalakshmi, S.; Lai, K.W.; Wu, X. Soft Attention Based DenseNet Model for Parkinson’s Disease Classification Using SPECT Images. Front. Aging Neurosci. 2022, 14, 908143. [Google Scholar] [CrossRef]
Ultralytics YOLOv8. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 12 September 2024).
Sha, G.; Wu, J.; Yu, B. Detection of Spinal Fracture Lesions Based on Improved Yolov2. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 235–238. [Google Scholar]
Ju, R.-Y.; Cai, W. Fracture Detection in Pediatric Wrist Trauma X-Ray Images Using YOLOv8 Algorithm. Sci. Rep. 2023, 13, 20077. [Google Scholar] [CrossRef]
Nagy, E.; Janisch, M.; Hržić, F.; Sorantin, E.; Tschauner, S. A Pediatric Wrist Trauma X-Ray Dataset (GRAZPEDWRI-DX) for Machine Learning. Sci. Data 2022, 9, 222. [Google Scholar] [CrossRef] [PubMed]
Zou, J.; Arshad, M.R. Detection of Whole Body Bone Fractures Based on Improved YOLOv7. Biomed. Signal Process. Control 2024, 91, 105995. [Google Scholar] [CrossRef]
YOLO-NAS (Neural Architecture Search). Available online: https://docs.ultralytics.com/models/yolo-nas (accessed on 6 September 2024).
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 10778–10787. [Google Scholar]
Medaramatla, S.C.; Samhitha, C.V.; Pande, S.D.; Vinta, S.R. Detection of Hand Bone Fractures in X-Ray Images Using Hybrid YOLO NAS. IEEE Access 2024, 12, 57661–57673. [Google Scholar] [CrossRef]
Abedeen, I.; Rahman, M.A.; Prottyasha, F.Z.; Ahmed, T.; Chowdhury, T.M.; Shatabda, S. FracAtlas: A Dataset for Fracture Classification, Localization and Segmentation of Musculoskeletal Radiographs. Sci. Data 2023, 10, 521. [Google Scholar] [CrossRef] [PubMed]
Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1571–1580. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 7464–7475. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
YOLOv5. Available online: https://docs.ultralytics.com/models/yolov5 (accessed on 6 September 2024).
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2019. [Google Scholar]
Fu, H.; Song, G.; Wang, Y. Improved YOLOv4 Marine Target Detection Combined with CBAM. Symmetry 2021, 13, 623. [Google Scholar] [CrossRef]
Padilla, R.; Netto, S.L.; Silva, E.A.B. da A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 237–242. [Google Scholar]
Rahman, M.A.; Wang, Y. Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation. In Advances in Visual Computing; Springer: Cham, Switzerland, 2016; pp. 234–244. [Google Scholar] [CrossRef]
Boyd, K.; Eng, K.H.; Page, C.D. Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In Machine Learning and Knowledge Discovery in Databases; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; pp. 451–466. ISBN 978-3-642-40993-6. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Computer Vision—ECCV 2014; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. ISBN 3-319-10601-5. [Google Scholar]
Liu, Y.; Cheng, D.; Zhang, D.; Xu, S.; Han, J. Capsule Networks with Residual Pose Routing. IEEE Trans. Neural Netw. Learn. Syst. 2024, 1, 1–14. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Zhang, D.; Zhang, Q.; Han, J. Part-Object Relational Visual Saliency. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3688–3704. [Google Scholar] [CrossRef]

Figure 1. The architecture of the YOLOv8 algorithm for real-time object detection. It consists of three main parts: the backbone for feature extraction, the neck for the fusion of features, and the head for localization and classification of features in the images.

Figure 2. Detailed illustration of the expansion of the YOLOv8 Algorithm Architecture, with modules present within the Backbone, Neck, and Head. They include C2f, ConvModule, DarknetBottleneck, and SPPF for faster and more efficient real-time object/feature detection from images.

Figure 3. Flowchart of the division of the dataset into training, validation, and testing sets.

Figure 4. Detailed illustration of the YOLOv8-HA model. The main alteration to the existing model includes the HA module in the head (indicated by the blue color outlined box). The HA module includes two attention mechanisms: Channel Attention and Spatial Attention.

Figure 5. Examples of fracture detection on X-ray images from the FracAtlas dataset. (a) Manually labeled images, (b) Predicted images (indicated by 0).

Figure 6. Detailed illustration of the Precision-Confidence Curve of the YOLOv8 model with the input image size of 640.

Figure 7. Detailed illustration of the Precision-Confidence Curve of the YOLOv8-HA model with the input image size of 640.

Figure 8. Detailed illustration of the Recall-Confidence Curve of the YOLOv8 model with the input image size of 640.

Figure 9. Detailed illustration of the Recall-Confidence Curve of the YOLOv8-HA model with the input image size of 640.

Figure 10. Detailed illustration of the Precision-Recall Curve of our YOLOv8 model with the input image size of 640.

Figure 11. Detailed illustration of the Precision-Recall Curve of the YOLOv8-HA model with the input image size of 640.

Table 1. Validation results for the hand on the FracAtlas dataset using the YOLOv8m model with the HA mechanism and comparing various optimizers.

Custom	Size (Pixels)	Epochs	Batch Size	Param (m)	Optimizer	mAP-50
Hand	640 × 640	1000	32	25.9	Auto	0.637
					Adam	0.692
					AdamW	0.561
					SGD	0.711

Table 2. Model performance comparison of YOLOv8m to our YOLOv8-HA model.

Model	Precision (mAP 50)	Recall	F1	Time (Seconds)
Yolov8m (Arm)	0.594	0.491	0.538	6791
YOLOv8m-HA(Arm)	0.713	0.642	0.676	5931
YOLOv8m (Arm and Leg)	0.476	0.418	0.445	1763
YOLOv8m-HA (Arm and Leg)	0.592	0.473	0.526	1698

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meza, G.; Ganta, D.; Gonzalez Torres, S. Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm. Algorithms 2024, 17, 471. https://doi.org/10.3390/a17110471

AMA Style

Meza G, Ganta D, Gonzalez Torres S. Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm. Algorithms. 2024; 17(11):471. https://doi.org/10.3390/a17110471

Chicago/Turabian Style

Meza, Gerardo, Deepak Ganta, and Sergio Gonzalez Torres. 2024. "Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm" Algorithms 17, no. 11: 471. https://doi.org/10.3390/a17110471

APA Style

Meza, G., Ganta, D., & Gonzalez Torres, S. (2024). Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm. Algorithms, 17(11), 471. https://doi.org/10.3390/a17110471

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning Approach for Arm Fracture Detection Based on an Improved YOLOv8 Algorithm

Abstract

1. Introduction

2. Model Architecture

3. Methods

3.1. Dataset

3.2. Modified Model Architecture

3.2.1. Channel Attention Mechanism

3.2.2. Spatial Attention Mechanism

3.2.3. Hybrid-Attention Module

4. Results and Discussion

4.1. Evaluation Metrics

4.2. Ablation Study and Quantitative Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI