AS-YOLO: A Novel YOLO Model with Multi-Scale Feature Fusion for Intracranial Aneurysm Recognition

Yang, Jun; Wang, Chen; Chen, Yang; Chen, Zhengkui; Tong, Jijun

doi:10.3390/electronics14081692

Open AccessArticle

AS-YOLO: A Novel YOLO Model with Multi-Scale Feature Fusion for Intracranial Aneurysm Recognition

by

Jun Yang

^1,2,

Chen Wang

³,

Yang Chen

¹,

Zhengkui Chen

¹ and

Jijun Tong

^2,3,*

¹

School of Computer Science and Technology, Zhejiang Sci-Tech University, Baiyang Street, Hangzhou 310018, China

²

Provinical Key Laboratory for Research and Translation of Kidney Deficiency-Stasis-Turbidity Disease, Hangzhou 310006, China

³

School of Information Science and Engineering, Zhejiang Sci-Tech University, Baiyang Street, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(8), 1692; https://doi.org/10.3390/electronics14081692

Submission received: 27 February 2025 / Revised: 2 April 2025 / Accepted: 17 April 2025 / Published: 21 April 2025

Download

Browse Figures

Versions Notes

Abstract

Intracranial aneurysm is a common clinical disease that seriously endangers the health of patients. In view of the shortcomings of existing intracranial aneurysm recognition methods in dealing with complex aneurysm morphologies, varying sizes, as well as multi-scale feature extraction and lightweight deployment, this study introduces an intracranial aneurysm detection framework, AS-YOLO, which is designed to enhance recognition precision while ensuring compatibility with lightweight device deployment. Built on the YOLOv8n backbone, this approach incorporates a cascaded enhancement module to refine representation learning across scales. In addition, a multi-stage fusion strategy was employed to facilitate efficient integration of cross-scale semantic features. Then, the detection head was improved by proposing an efficient depthwise separable convolutional aggregation detection head. This modification significantly lowers both the parameter count and computational burden without compromising recognition precision. Finally, the SIoU-based regression loss was employed, enhancing the bounding box alignment and boosting overall detection performance. Compared with the original YOLOv8, the proposed solution achieves higher recognition precision for aneurysm detection—boosting mAP@0.5 by 8.7% and mAP@0.5:0.95 by 4.96%. Meanwhile, the overall model complexity is effectively reduced, with a parameter count reduction of 8.21%. Incorporating multi-scale representation fusion and lightweight design, the introduced model maintains high detection accuracy and exhibits strong adaptability in environments with limited computational resources, including mobile health applications.

Keywords:

YOLOv8; aneurysm recognition; multi-scale features; Cascaded Fusion Network; lightweight

1. Introduction

Recently, deep learning technologies combined with modern imaging systems have emerged as key tools for diagnosing various cerebrovascular disorders, such as aneurysms, aortic dissections, and intracerebral hemorrhages. Pan et al. [1] used wavelet scattering transform to analyze multi-character electroencephalogram (EEG) signals. The team led by Li [2] reconstructed visual stimulus images through a deep visual representation model. Zhang et al. [3] verified the optimization effect of intracranial pressure monitoring on intracerebral hemorrhage surgery. These cross-disciplinary advancements provide methodological references for vascular imaging analysis.

An intracranial aneurysm refers to a pathological expansion of the cerebral arterial wall, typically resulting from either inherited vessel fragility or postnatal injury, with an estimated prevalence of 3% [4]. They are a major cause of subarachnoid hemorrhage [5]. Aneurysm rupture can be life threatening, with a mortality rate reaching 32% [6]. After the first rupture, 8% to 32% of patients may die, with a disability and mortality rate exceeding 60% within one year and reaching 85% within two years [7,8]. Therefore, early diagnosis and treatment are crucial. Early detection of aneurysms is of great value for the secondary prevention of intracerebral hemorrhage (ICH). The intervention strategy based on intracranial pressure detection can reduce the mortality rate of ICH patients [3]. At present, the diagnosis of intracranial aneurysms commonly utilizes imaging techniques, such as computed tomography (CT), digital subtraction angiography (DSA), computed tomography angiography (CTA), and magnetic resonance angiography (MRA), among which DSA is widely acknowledged as the most accurate diagnostic modality [9].

Intracranial aneurysm detection methods are typically divided into traditional techniques and deep learning-based approaches. Early methods relied on handcrafted features. For example, Rahmany et al. [10] used the MSER algorithm to extract vascular structures from DSA images, and they then combined it with Zernike moments to identify aneurysm regions. However, due to the complexity of aneurysm morphology and location, traditional algorithms have limited generalization ability.

Deep learning recently showed strong effectiveness in analyzing medical images, making it an important tool for assisted diagnosis. The application of convolutional neural networks (CNNs) has driven the automation of intracranial aneurysm detection. Nakao et al. [11] proposed a detection method combining CNN with maximum intensity projection (MIP), while Claux et al. [12] adopted a dual-stage U-Net [13] for MRI-based aneurysm detection, achieving good results but with high computational costs. Mask R-CNN [14] has shown excellent performance in medical image analysis but is computationally complex, making real-time detection challenging. TransUNet [15], which integrates Transformer and U-Net, enhances the detection of small aneurysms but requires substantial computational resources, limiting its application in low-power devices. Therefore, while CNNs and their variants perform well in medical image detection, their high computational complexity remains a challenge for real-time detection and lightweight deployment.

In terms of general object detection, the YOLO series algorithms have gained widespread attention due to their efficient end-to-end detection performance. Qiu et al. [16] successfully utilized YOLOv5 to predict the bounding boxes of intracranial arterial stenosis in MRA images, demonstrating the feasibility of the YOLO series in medical image analysis. However, in the task of intracranial aneurysm detection, YOLO-based algorithms still face several challenges: (1) Limited multi-scale feature fusion capability, making it difficult to detect small or morphologically complex aneurysms. (2) High computational cost, restricting deployment on embedded devices or low-power computing terminals.

In order to tackle these difficulties, this article presents AS-YOLO, a lightweight algorithm designed for detecting intracranial aneurysms, which is specifically optimized for aneurysm detection. The key contributions of this study are listed below.

Improved Cascade Fusion Network (CFNeXt): To enhance multi-scale feature fusion capability, this paper introduces the CFNeXt network, which replaces the C2F module in the YOLOv8 backbone with an improved CFocalNeXt module. This modification generates more hierarchical feature representations, improving the recognition of aneurysms of varying scales.
Multi-Level Feature Fusion Module (MLFF): To address the limitations of YOLOv8 in feature fusion, MLFF employs 3D convolution and scale-sequence feature extraction, integrating high-dimensional information from both deep and shallow feature maps. This significantly enhances feature fusion effectiveness, particularly in the detection of small aneurysms.
Efficient Depthwise Separable Convolutional Aggregation (EDSA) Detection Head: The multi-level symmetric compression structure in the YOLOv8 detection head has limitations in flexibility. This paper proposes the EDSA detection head, which allows for more adaptable processing of diverse feature representations and aneurysm size distributions, thereby improving detection speed while reducing computational overhead.
Improved SIoU Loss Function: To address the slow convergence issue of CIOU Loss in the regression process, this work introduces the SIoU loss function. By considering the angular vector between the predicted and ground truth bounding boxes, SIoU enhances bounding box alignment precision and accelerates training convergence.

The experimental findings indicate that AS-YOLO notably enhances multi-scale feature fusion, computational efficiency, and overall model compactness. AS-YOLO outperforms YOLOv8 with 3.51% higher accuracy and 8.7% higher mAP50, while reducing parameters by 8.21% on the DSA intracranial aneurysm dataset, thus offering a tradeoff between accuracy and lightweight deployment. The next section will provide a detailed description of the AS-YOLO algorithm’s foundation and innovations.

2. Materials and Methods

2.1. Baseline Method-YOLOv8

In January 2023, YOLOv8n (You Only Look Once v8-nano) was released by Ultralytics, and it was specifically designed for efficient object detection tasks. It inherits the core principles of the YOLO series [17,18,19,20,21,22], achieving fast and accurate object detection through a single-stage network structure. Compared with previous versions, such as YOLOv5 [23] and YOLOv7, YOLOv8n made improvements in network architecture, feature extraction, and efficiency optimization, making it particularly suitable for edge devices and real-time detection scenarios. Some new improved versions, such as YOLO11 and YOLOv10, have emerged in the academic community in the past two months. However, when this research was launched, YOLOv8 was the latest officially stable version available. To ensure the authority of the research benchmark and avoid confusion, we focused on the officially released YOLOv8 framework. In addition, as YOLO11 was developed by the same author as YOLOv8, there is little difference in their architectures.

YOLOv8n adopts an end-to-end design that is common in the YOLO family, enabling simultaneous object identification and localization within one forward propagation. Its architecture is divided into three functional modules: the Backbone, which captures primary visual features; the Neck, which integrates information across different scales; and the Head, which outputs object categories and corresponding bounding box coordinates.

The Backbone uses CSPDarknet [24] as its main structure, with CSPLayer_2Conv modules as fundamental units. Compared to YOLOv5’s C3 module, the C2F [25] module offers fewer parameters and superior feature extraction capability. Additionally, Bottleneck Block and SPPF modules enhance the feature extraction capacity. The Neck network sits between the Backbone and Head, focusing on feature fusion and enhancement. YOLOv8n introduces an improved feature pyramid structure within the Neck, effectively integrating feature information from different levels. This structure leverages FPN and PAN to enhance multi-scale feature fusion, especially for small objects, and it helps extract both fine-grained features and a global context, thereby improving the accuracy of object detection.

The Head network is the decision-making part of the object detection model. It uses feature maps of various sizes to obtain the object category and location, producing the final detection results. The model adopts a decoupled head with independent branches for detection and classification. Figure 1 illustrates the structure of the YOLOv8 network.

2.2. Proposed Method

The lightweight YOLOv8n—which was optimized and enhanced for the DSA intracranial aneurysm dataset, resulting in an improved AS-YOLO model—was adopted as the base framework in this work. First, a lightweight Cascade Fusion Network (CFNeXt) was integrated into the Backbone, where the CFocalNeXt module substitutes the original C2f, improving multi-scale perception and making the model more capable of identifying aneurysms of diverse sizes.

Additionally, the Neck integrates a Multi-Level Feature Fusion (MLFF) module to strengthen cross-level feature aggregation, significantly enhancing detection performance for aneurysms at multiple scales. In the detection head, an efficient depthwise separable convolutional aggregation (EDSA) detection head is proposed. By applying asymmetric compression to features along different paths, this detection head optimizes feature utilization efficiency, reducing the model’s parameters and computation while enhancing both generalization capability and detection speed.

To enhance detection performance and convergence, the SIoU loss function is incorporated, refining bounding box accuracy. This adjustment ensures precise alignment, which is essential for detecting aneurysms. Figure 2 illustrates the updated YOLOv8n network structure.

2.2.1. Backbone Network CFNeXt

In complex medical image detection tasks for aneurysm recognition, the model has extremely high requirements for multi-scale feature fusion ability. It needs to accurately capture the multi-scale feature information of aneurysms of different sizes and shapes. When traditional object detection networks handle such tasks, the insufficient multi-scale feature fusion ability often limits their performance.

In the YOLOv8 network framework [25], although the C2F module has a certain ability in feature extraction and fusion, its limitations are obvious in the task of aneurysm recognition. Its structure is fixed, and adapting the feature fusion method to the characteristics of the large differences in the size and complex shapes of aneurysms is difficult. When dealing with small aneurysms, it may not be able to extract sufficient detailed features. When dealing with large aneurysms, it is difficult to accurately grasp the global features. Moreover, the C2F module has a high computational complexity. When processing large-scale medical image data, it results in low computational efficiency and increases the time cost of model training and inference.

To address these issues, inspired by the Cascade Fusion Network [26], this study proposes a new backbone network, CFNeXt, and we designed the CFocalNeXt module to replace the original C2F module. The CFNeXt consists of five parts, from Stage0 to Stage4. Stage0 is the convolutional layer of the input layer, which includes a standard convolution, a batch normalization layer, and a SiLU activation function [27], and it also performs downsampling on the input image. Each of the subsequent stages have a downsampling convolutional layer and a CFocalNeXt module responsible for feature extraction. The downsampling factors of each part are 2, 4, 8, 16, and 32 times, respectively, and the number of output channels is 16, 32, 64, 128, and 256, respectively. Finally, the features extracted by the CFNeXt are fused and fed into the SPPF pyramid pooling structure. The structure of CFNeXt is shown in Figure 3.

The CFocalNeXt module enhances the feature representation ability while reducing the computational complexity, and its structure is shown in Figure 4. It first uses a standard convolutional module to extract feature information from the input feature map, and it then divides the extracted features into two branches:

One branch serves as a residual connection, preserving the original information flow;
The other branch feeds the feature map into the FocalNeXt module to further optimize feature representation.

Figure 4. The CFocalNeXt network is shown on the left, with the FocalNeXt architecture on the right.

The FocalNeXt feature focusing module incorporates two dilated convolutions [28] and two residual connections, and it is divided into two parts:

Multi-scale Feature Extraction
- A 7 × 7 convolution is used instead of the traditional 3 × 3 convolution to enhance local information aggregation capability.
- A depthwise separable convolution with dilation r = 3 expands the receptive field, balancing accuracy and computational cost, where r = 3 improves accuracy by 2.4% and reduces computational overhead by 5.2% FPS, as shown in Table 1.
In the formula, ${DWC}_{7 \times 7}$ represents the depthwise separable convolution operation, and $C_{1 \times 1}^{(k)}$ is the $1 \times 1$ convolution operation (outputting k channels). The activation functions are ReLU and GELU. Let the input be $X \in R^{H \times W \times n}$ , then the operation output result of the FocalNext module is as follows:

$Output = X + C_{1 \times 1}^{(n)} (GELU (C_{1 \times 1}^{(4 n)} (ReLU ({DWC}_{7 \times 7} ({DWC}_{7 \times 7} (X)))))) .$

(1)
Lightweight FFN Structure
- LayerNorm (LN) [29] replaces BatchNorm (BN) [30] for better training and inference stability in small batch sizes. This replacement boosts mAP from 0.793 to 0.805 while keeping the inference speed nearly identical (98.2 vs. 98.1), as shown in Table 2.
- GELU activation is introduced between 1 × 1 convolution layers to enhance nonlinear representation capability.
- Channel expansion (×4) and compression (÷4) mechanisms are applied to reduce computational cost.

In summary, CFocalNeXt improves detection accuracy while maintaining high computational efficiency through multi-scale feature extraction and lightweight structural optimization.

2.2.2. Multi Level Feature Fusion Module

In aneurysm recognition and multi-scale object detection, the model needs a powerful multi-scale feature fusion ability to handle objects with significant differences in size and shape in medical images. However, previous methods have their defects. The image pyramid structure extracts features at various scales separately, which limits feature interaction and hinders the model’s utilization of multi-scale data. The hierarchical pyramid structure (SSD, ref. [31]) discards shallow high-resolution features, weakening small object detection. FPN (Feature Pyramid Network, ref. [32]) fuses both high- and low-level features. With a simple method, it limits the diversity of features. The Path Aggregation Network (PANet, ref. [33]) simply concatenates features, failing to fully explore the connections between features, and its detection performance is poor in scenarios where small objects are densely overlapped.

This paper, inspired by ASF-YOLO [34], proposes a Multi-Level Feature Fusion Pyramid Network (MFFPN) and Multi-Level Feature Fusion (MLFF) module (Figure 5) to address this issue. The MLFF module integrates multi-scale features, formulates targeted strategies, and improves small, densely-overlapped object detection.

Let the input feature maps be

P_{2 / P 3} \in R^{C_{2 / 3} \times H_{2 / 3} \times W_{2 / 3}}

,

P_{3 / P 4} \in R^{C_{3 / 4} \times H_{3 / 4} \times W_{3 / 4}}

,

P_{4 / P 5} \in R^{C_{4 / 5} \times H_{4 / 5} \times W_{4 / 5}}

. Here, C, H, and W denote the channels, height, and width, respectively. The final output feature map is

Y_{o u t} \in R^{C_{o u t} \times H_{o u t} \times W_{o u t}}

. MLFF neural network’s structure can be represented by this composite function:

\begin{matrix} Y_{o u t} & = {Conv}_{f i n a l} ({Conv}_{3} (adaptive - \max - pooling (P_{4 / P 5}) \oplus avgpooling (P_{4 / P 5})) \oplus {Conv}_{2} (P_{3 / P 4}) \\ \oplus Nearest - neighbor interpolation (P_{2 / P 3})) . \end{matrix}

(2)

Here, ⊕ denotes concatenation, adaptive max-pooling, average pooling, and nearest-neighbor interpolation are denoted as AdaptiveMaxPool, AvgPool, and NNInterp, respectively.

For large feature maps, the Conv layer modifies the channels to 1C, ensuring minimal impact on concatenation and subsequent learning. A max + average pooling structure then downsamples, reducing spatial dimensions and providing translation invariance, thereby improving the network’s resilience to image transformations. As shown in Table 3, the hybrid pooling strategy offers higher accuracy (mAP0.5) than single pooling with similar computational costs, thus boosting detection performance.
For small feature maps, the Conv module first adjusts the channel count, and nearest neighbor interpolation [35] is then applied for upsampling. As shown in Table 4, this approach has low computational cost and few parameters. Achieving 85.2 FPS, it outperforms the other two methods. It also features low computational cost and high speed, making it suitable for embedded deployment.
For medium feature maps, the channels are adjusted using a Conv convolution and then directly input into the MLFF module.

The three feature maps of varying sizes (large, medium, and small) are convolved once and concatenated along the channel axis. The formula for this calculation is as follows:

P_{MLFF} = concat (p_{l + 1}, p_{l}, p_{l - 1}) .

(3)

where

p_{l + 1}

,

p_{l}

, and

p_{l - 1}

represent the inputs corresponding to the feature maps of large, medium, and small sizes, respectively.

The MLFF module enhances the feature fusion ability for aneurysm recognition by aggregating the low-resolution deep features, features at the same level, and high-resolution shallow features from the backbone network.

2.2.3. Efficient Depthwise Separable Convolutional Aggregation Detection Head

YOLOv8 employs an efficient decoupled detection head, a structure that is widely used in the YOLO family. It separates responsibilities for regression and classification. The decoupled design was also proposed in YOLOX [36], aiming to alleviate the interference caused by combining classification and localization in a single head, as seen in the earlier coupled architectures.

For the purpose of various tasks, the decoupled detection head was redesigned based on loss calculation complexity. When applied, adjusting feature channels from prior layers to task-specific ones may lead to different feature loss, which stems from variations in the final output dimensions. To achieve more accurate object localization and improve detection accuracy, this paper designed an efficient depthwise separable convolutional aggregation (EDSA) detection head, as shown in Figure 6.

As shown in Figure 6, a 3 × 3 depthwise separable convolution layer [37] is used to replace the standard Conv convolution layer for feature extraction. The original YOLOv8 detection head has a parameter count of 3,006,038 and a computation cost of 8.1 GFLOPs. In contrast, the EDSA detection head has 2,707,268 parameters and 6.9 GFLOPs of computation. The parameter count of the EDSA detection head is 298,770 fewer than the original YOLOv8 detection head, with a reduction of 1.2 GFLOPs in computation. Ablation experiments confirm that, despite the reduced parameters and computation, detection accuracy slightly improves, making object localization and recognition more accurate.

For the regression branch: The new regression branch structure first uses a standard Conv layer to adjust the channels from 64, 128, and 256 to 128, 128, and 128, respectively. Then, two depthwise separable convolution layers are consecutively used to extract feature information. The initial depthwise separable convolution uniformly sets the channels to 64, helping lower the parameter count and enhance computing performance. The second depthwise separable convolution layer is used to extract feature information and combine data from different channels, maintaining a certain degree of feature extraction capability. Finally, a Conv2d layer outputs the predicted coordinates.

For the classification branch: The new classification branch structure first uses a standard Conv layer to adjust the channels from 64, 128, and 256 to 2, 2, and 2, respectively, representing the number of object categories. Then, a depthwise separable convolution layer is used for feature extraction, reducing the number of parameters while improving computational efficiency. Finally, a Conv2d layer outputs the predicted object categories.

The EDSA detection head structure further reduces model parameters and improves detection accuracy. By using multiple depthwise separable convolution layers to separate classification and bounding boxes, it significantly reduces model parameters while improving detection accuracy, effectively addressing the complexity of regression and classification tasks in the original detection head. This is especially important when handling complex tasks, leading to more precise object localization and recognition and higher detection accuracy.

2.2.4. Loss Function SIoU

YOLOv8 computes its loss in two components: classification and regression. It adopts BCE Loss for classification, while the regression branch applies Distribution Focal Loss (DFL) along with CIoU Loss.

The GIoU loss [38] normalizes coordinates using IoU and solves optimization issues when IoU equals 0. Based on GIoU, DIoU adds the center point distance but ignores aspect ratio. CIoU [39] improves regression accuracy by adding a penalty term and considering the aspect ratio between the predicted and ground truth boxes. The formula for the penalty term is as follows:

L_{DIoU} = \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v,

(4)

L_{CIoU} = 1 - L_{IoU} + L_{DIoU} .

(5)

Here, v is the parameter used to measure the aspect ratio consistency, which is defined as follows:

v = \frac{4}{π^{2}} {(arctan \frac{w^{gt}}{h^{gt}} - arctan \frac{w}{h})}^{2} .

(6)

In the above,

α

is the parameter used for weighting, and its definition is as follows:

α = \frac{v}{1 - L_{IoU} + v} .

(7)

Here, w and h denote the width and height of the predicted box;

w^{gt}

and

h^{gt}

are those of the ground truth; b and

b^{g t}

are their center points; and

ρ

is the Euclidean distance between them.

To improve the bounding box alignment in aneurysm detection, the SIoU loss [40] is introduced, extending CIoU by considering the angle and direction of the boxes. It incorporates the angle between the predicted and ground truth bounding box vectors. This helps improve training speed and prediction accuracy by quickly moving the predicted box to the nearest axis, which is followed by regression of only one coordinate (X or Y). The angle penalty cost effectively reduces the total degrees of freedom of the loss.

The SIoU loss function consists of three parts.

The first part is the Angle Cost, which is defined as follows:

Λ = 1 - 2 {sin}^{2} (arcsin (x) - \frac{π}{4}),

(8)

x = \frac{c_{h}}{σ} = sin (α),

(9)

σ = \sqrt{{(b_{c_{x}}^{g t} - b_{c_{x}})}^{2} + {(b_{c_{y}}^{g t} - b_{c_{y}})}^{2}},

(10)

c_{h} = max (b_{c_{y}}^{g t}, b_{c_{y}}) - min (b_{c_{y}}^{g t}, b_{c_{y}}) .

(11)

The second part is the Distance Cost. Taking the Angle Cost into account, the Distance Cost is redefined as follows:

Δ = \sum_{t = x, y} (1 - e^{- γ ρ_{t}}),

(12)

ρ_{x} = \frac{b_{c_{x}}^{g t} - b_{c_{x}}}{c_{w}}, ρ_{y} = \frac{b_{c_{y}}^{g t} - b_{c_{y}}}{c_{h}},

(13)

γ = 2 - Λ .

(14)

When

α \to 0

, the contribution of the Distance Cost is greatly reduced. As the angle increases,

γ

is given a time-prioritized distance value.

The third part is the Shape Cost, and its formula is as follows:

Ω = \sum_{t = w, h} {(1 - e^{- ω_{t}})}^{4},

(15)

ω_{w} = \frac{| w - w^{g t} |}{max (w, w^{g t})}, ω_{h} = \frac{| h - h^{g t} |}{max (h, h^{g t})},

(16)

where

θ

defines the Shape Cost for the aneurysm dataset, and the corresponding value is unique. This affects the attention level to the Shape Cost.

Finally, the SIoU loss function is as follows:

L_{box} = 1 - IoU + \frac{Δ + Ω}{2} .

(17)

3. Results

3.1. Experimental Setup and Dataset Preparation

The model was trained with a batch size of 64 using the SGD optimizer, an initial learning rate of 0.01, a weight decay of 0.0005, and 300 training epochs.

This study employed an NVIDIA GeForce RTX 3090 GPU (24 GB) as the hardware platform. The software setup consisted of Ubuntu 20.04, PyTorch 1.9.0, Python 3.8, and CUDA 11.3 as the primary deep learning environment.

The dataset used consists of DSA images of intracranial aneurysms from the First Affiliated Hospital of Zhejiang University School of Medicine, involving 120 patients who were admitted in 2023. After screening, a total of 867 images were obtained. These images cover the various locations and morphologies of aneurysms. In the annotation process, we collaborated with professional neuro-interventional doctors. With the help of the LabelImg tool, the aneurysm regions were accurately marked with rectangular bounding boxes. The aneurysm lesions are classified into two types: sidewall aneurysms and bifurcation aneurysms, as shown in Figure 7. The former are mostly located on one side of the arterial wall, while the latter are commonly found at vascular bifurcations. Bifurcation aneurysms have a complex shape, making them more difficult to identify. Classifying and annotating these two types of aneurysms helps to improve the recognition ability of the model.

The dataset is split into training and test sets in an 8:2 ratio, with 694 and 173 images included in each set, respectively. The data of the same patient are only included in either the training set or the test set to prevent data leakage. Meanwhile, the stratified sampling method was adopted, which stratified the images according to the types and sizes of aneurysms to ensure that the distributions of the training set and the test set were consistent. When training the model, the input image size is set to 640 × 640.

Besides preparing and splitting the dataset, we applied several data augmentation strategies to improve the model’s generalization ability and robustness. In terms of geometric transformations, we randomly rotated the images within the range of ±15° to simulate the differences in shooting angles during actual imaging, and we then scaled the images within the range of 0.8 to 1.2 times to enhance the model’s detection performance across various object sizes. During optical transformations, we added Gaussian noise with a standard deviation of

σ

= 0.01 to the images to simulate common noise in medical images, and we then adjusted the contrast at the same time to adapt to different imaging devices. In feature space augmentation, we randomly occluded regions of 16 × 16 pixels with a probability of 0.3, thereby enhancing the model’s detection ability in complex backgrounds.

3.2. Experimental Evaluation Index

This study evaluated the model’s detection performance using the metrics of precision (P), recall (R), mAP, number of parameters (params), and FLOPs. Precision (P) measures the ratio of the true positive predictions to all predicted positives, while recall (R) indicates the proportion of detected positives out of actual positives. Mean average precision (mAP) offers a detailed assessment of the model’s performance across categories, effectively evaluating its detection capability in diverse scenarios. Based on the following formulas, TP denotes the correctly predicted bounding boxes, FP indicates false positive samples, FN represents missed positives, AP is the precision for each category, mAP is the mean precision across categories, and k is the number of categories.

P = \frac{T P}{T P + F P}, R = \frac{T P}{T P + F N},

(18)

A P = \int_{0}^{1} P \cdot R d R, m A P = \frac{1}{k} \sum_{k}^{i = 1} A P_{i} .

(19)

Additionally, we considered model parameters (params) and floating point operations (FLOPs) as key performance metrics. The number of parameters reflects the model’s complexity and memory usage. A smaller number of parameters generally indicates a more lightweight model, making it more suitable for deployment in resource-constrained environments. FLOPs, on the other hand, measure the model’s computational demand. Lower FLOPs suggest improved real-time performance. Considering these metrics together allows for a more comprehensive evaluation of the model, confirming its practicality and efficiency in real-world applications.

3.3. Data Analysis

This study evaluated the performance of the YOLOv8n and AS-YOLO models on datasets. Detection results for each category are shown in Table 5. The detection results using two algorithms are shown in Figure 8.

The experiments show that AS-YOLO improves the mAP@0.5 by 6.8% over the original YOLOv8 model. Specifically, in the detection tasks for forked-type and side-type targets, AS-YOLO achieves mAP@0.5 scores of 0.834 and 0.814, respectively, showing significant improvements over the original algorithm. Furthermore, as shown in Figure 9, the P-R curve indicates that AS-YOLO achieves performance enhancements across all categories. Particularly in the more challenging task of detecting side-type targets, AS-YOLO exhibits stronger robustness and higher detection accuracy, fully demonstrating the superiority of the improved algorithm in complex scenarios.

Figure 10 shows AS-YOLO’s performance in detecting intracranial aneurysm targets. The improved model’s heatmap reveals more prominent aneurysm target areas. This indicates AS-YOLO can precisely focus on targets, suppress background attention, and achieve higher detection efficiency and accuracy.

To validate the performance improvement of each module, this study designed a series of ablation experiments. The YOLOv8n network was used as the baseline model, and the CFNeXt module, MLFF module, EDSA module, and SIoU module were introduced separately to evaluate the impact of each module on detection performance. Table 6 presents the ablation experiment results for the bifurcated and side-type detection tasks.

By analyzing the experimental data, the following conclusions were made.

CFNeXt alone: By enhancing the multi-scale feature extraction through cascade fusion, CFNeXt improves mAP50 and mAP95 to 0.860 and 0.452, respectively. Meanwhile, it also optimizes computational architecture and eliminates redundant calculations, reducing parameters from 3.006 M to 2.662 M and GFLOPs from 8.1 to 7.2.
CFNeXt + MLFF: Adding MLFF enables hybrid pooling and cross-scale interaction, enhancing the fusion of high-level and low-level features. Leveraging CFNeXt’s rich multi-scale feature maps, MLFF further explores inter-scale feature relationships and optimizes feature representation. This synergy boosts mAP50 and mAP95 to 0.865 and 0.465, respectively.
CFNeXt + MLFF + EDSA: EDSA optimizes the feature processing pipeline by filtering and streamlining the features after CFNeXt/MLFF extraction/fusion, removing redundant information and reducing computational complexity. This maintains detection accuracy while lowering computational costs.
CFNeXt + MLFF + EDSA + SIoU: The final integrated model achieves optimal performance. SIoU improves bounding box prediction accuracy by refining the intersection-over-union (IoU) calculation. Building upon CFNeXt/MLFF/EDSA’s efficient feature extraction and fusion framework, SIoU further enhances detection precision while keeping computational costs within reasonable limits, achieving the best balance between performance and efficiency.

3.4. Comparative Analysis

We conducted additional experiments using the Brain-Tumor dataset from Ultralytics to test AS-YOLO’s generalization on medical images. The dataset consists of MRI images and is mainly used for brain tumor detection. Compared with DSA images, these MRI images have richer texture features and more complex background information. The results are as shown in Table 7.

On the Brain-Tumor MRI dataset, AS-YOLO achieved a 3.6% increase in mAP50, a 19.4% improvement in precision, and a 9.7% reduction in the number of parameters. These results demonstrate that its architecture possesses strong generalization and adaptation capabilities in MRI images, thus providing new possibilities for cross-modal medical image analysis.

This study conducted comparative experiments on a predefined dataset, evaluating AS-YOLO against mainstream object detection algorithms (Faster R-CNN [41], YOLOv3, YOLOv5, YOLOv6, and YOLOv8) and medical-specific models (Mask R-CNN, U-Net, and TransUNet). All of the models ran for 300 epochs under the same environment and dataset to ensure fair comparison. Table 8 shows the mAP@0.5 and mAP@0.5:0.95 for both bifurcation and sidewall detection tasks.

The results show that AS-YOLO achieved an mAP0.5 of 0.868 in bifurcation-type aneurysm detection, significantly outperforming other methods. Its mAP0.5:0.95 metric was 0.468, highlighting its advantages in high-precision detection. Algorithms such as Faster R-CNN, YOLOv3, and U-Net performed poorly in both metrics. TransUNet had an mAP0.5 of 0.834 in the detection of bifurcation-type aneurysms, which is close to that of AS-YOLO. However, in the detection of sidewall-type aneurysms, AS-YOLO closely followed TransUNet’s mAP0.5 of 0.790 with a value of 0.760. These results indicate that AS-YOLO significantly improved detection accuracy and exhibits strong generalization ability in multi-scale object detection tasks.

To further evaluate the effectiveness of AS-YOLO, this study conducted a quantitative analysis of mainstream object detection algorithms (Table 9). The experiments show that AS-YOLO achieved the highest detection accuracy (a mAP0.5 of 0.843 and a mAP0.5:0.95 of 0.428) with a low parameter count and low computational cost. Compared to YOLOv3, YOLOv5, YOLOv6, YOLOv8, and YOLOv11, AS-YOLO improved in mAP50 by 14.2%, 10.1%, 15.7%, 8.1%, and 8.4%, respectively. Additionally, AS-YOLO demonstrated superior computational efficiency over medical-specific algorithms, such as Mask R-CNN and TransUNet, indicating significant advantages in both detection accuracy and computational efficiency.

As shown in Table 9 and Figure 11, AS-YOLO has a small number of parameters and low computational complexity, making it highly suitable for deployment on embedded devices. In actual tests, its inference speed is 99.6 FPS, significantly performing above the 30.2 FPS of Mask R-CNN and the 24.3 FPS of TransUNet, thus meeting the requirements for real-time detection. Compared with medical-specific algorithms, it has remarkable advantages in computational and detection speeds. Through the optimization of the network architecture and computational efficiency, the inference speed of AS-YOLO approaches the 112.6 FPS of YOLOv8, while the number of parameters and computational complexity still remain at a low level. The lightweight optimization has not affected the operation efficiency, and its GFLOPs are reduced by 12.3%, resulting in higher operation efficiency in low-power scenarios. It can be seen that AS-YOLO achieves a good balance between real-time performance, inference speed, and the number of parameters.

Based on the experimental results and the detection results shown in Figure 12, AS-YOLO significantly outperforms other mainstream algorithms in both detection accuracy and computational efficiency. It demonstrates excellent performance in bifurcation-type and sidewall-type detection tasks, with stable high-precision metrics, meeting the requirements for intracranial aneurysm detection and achieving a balance between accuracy and speed. Additionally, the low parameter count and computational complexity of AS-YOLO make it highly suitable for embedded deployment, while its high inference speed (99.6 FPS) fulfills real-time detection needs, providing an efficient and reliable solution for real-time intracranial aneurysm detection.

4. Conclusions

This paper proposes AS-YOLO, a lightweight intracranial aneurysm detection algorithm built on an improved YOLOv8n model. By constructing the Cascade Fusion Network (CFNeXt), the multi-scale feature extraction ability is enhanced, and the recognition performance for aneurysms of different sizes is improved. We employed the MLFF module to integrate shallow and deep feature information, which enhances detection performance for small aneurysms. An efficient depthwise separable convolutional aggregation (EDSA) detection head was designed to reduce computational complexity while maintaining detection accuracy. The SIoU loss function was introduced to optimize the alignment precision of the bounding boxes and accelerate the training convergence speed.

AS-YOLO improves detection accuracy by 2.6%, reduces model size by 0.247 MB, and lowers computational load by 1.0 GFLOP compared to the original version, reflecting excellent detection capability. This algorithm performs well in detecting aneurysms of different sizes. Moreover, with a lightweight design, it reduces model parameters and computational overhead, making it suitable for embedded devices. It can achieve real-time detection with low latency and high accuracy even in resource-constrained environments.

Future work includes refining the model structure to boost generalization in more challenging medical contexts. We will also incorporate more datasets and clinical validations to improve detection accuracy and model efficiency while exploring more optimized embedded deployment solutions to support efficient inference in edge computing environments.

Author Contributions

Conceptualization, J.Y.; methodology, J.Y.; data curation, C.W.; formal analysis, Z.C.; investigation, Z.C.; resources, J.T.; funding acquisition, J.T.; writing—original draft preparation, J.Y.; supervision, Y.C.; project administration, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Key Research and Development Program of Zhejiang Province (Project No. 2025C01135).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pan, H.; Wang, Y.; Li, Z.; Chu, X.; Teng, B.; Gao, H. A complete scheme for multi-character classification using EEG signals from speech imagery. IEEE Trans. Biomed. Eng. 2024, 71, 2454–2462. [Google Scholar] [CrossRef] [PubMed]
Pan, H.; Li, Z.; Fu, Y.; Qin, X.; Hu, J. Reconstructing visual stimulus representation from EEG signals based on deep visual representation model. IEEE Trans. Hum.-Mach. Syst. 2024, 54, 711–722. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, S.; Yin, Y.; Wang, L.; Li, L.; Lan, C.; Shi, J.; Jiang, Z.; Ge, H.; Li, X.; et al. Clot removAl with or without decompRessive craniectomy under ICP monitoring for supratentorial IntraCerebral Hemorrhage (CARICH): A randomized controlled trial. Int. J. Surg. 2024, 110, 4804–4809. [Google Scholar] [CrossRef] [PubMed]
Chalouhi, N.; Hoh, B.L.; Hasan, D. Review of cerebral aneurysm formation, growth, and rupture. Stroke 2013, 44, 3613–3622. [Google Scholar] [CrossRef]
Alwalid, O.; Long, X.; Xie, M.; Han, P. Artificial intelligence applications in intracranial aneurysm: Achievements, challenges, and opportunities. Acad. Radiol. 2022, 29 (Suppl. 3), S201–S214. [Google Scholar] [CrossRef]
Heit, J.J.; Honce, J.M.; Yedavalli, V.S.; Baccin, C.E.; Tatit, R.T.; Copeland, K.; Timpone, V.M. RAPID Aneurysm: Artificial intelligence for unruptured cerebral aneurysm detection on CT angiography. J. Stroke Cerebrovasc. Dis. 2022, 31, 106690. [Google Scholar] [CrossRef]
Wardlaw, J.M.; White, P.M. The detection and management of unruptured intracranial aneurysms. Brain 2000, 123, 205–221. [Google Scholar] [CrossRef]
Menghini, V.V.; Brown, R.D., Jr.; Sicks, J.D.; O’Fallon, W.M.; Wiebers, D.O. Clinical manifestations and survival rates among patients with saccular intracranial aneurysms: Population-based study in Olmsted County, Minnesota, 1965 to 1995. Neurosurgery 2001, 49, 251–256. [Google Scholar]
van Amerongen, M.J.; Boogaarts, H.D.; de Vries, J.; Verbeek, A.L.; Meijer, F.J.; Prokop, M.; Bartels, R.H. MRA versus DSA for follow-up of coiled intracranial aneurysms: A meta-analysis. Am. J. Neuroradiol. 2014, 35, 1655–1661. [Google Scholar] [CrossRef]
Rahmany, I.; Laajili, S.; Khlifa, N. Automated computerized method for the detection of unruptured cerebral aneurysms in DSA images. Curr. Med. Imaging 2018, 14, 771–777. [Google Scholar] [CrossRef]
Nakao, T.; Hanaoka, S.; Nomura, Y.; Sato, I.; Nemoto, M.; Miki, S.; Abe, O. Deep neural network-based computer-assisted detection of cerebral aneurysms in MR angiography. J. Magn. Reson. Imaging 2017, 47, 948–953. [Google Scholar] [CrossRef] [PubMed]
Claux, F.; Baudouin, M.; Bogey, C.; Rouchaud, A. Dense, deep learning-based intracranial aneurysm detection on TOF MRI using two-stage regularized U-Net. J. Neuroradiol. 2023, 50, 9–15. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
Qiu, J.; Tan, G.; Lin, Y.; Guan, J.; Dai, Z.; Wang, F.; Wu, R. Automated detection of intracranial artery stenosis and occlusion in magnetic resonance angiography: A preliminary study based on deep learning. Magn. Reson. Imaging 2022, 94, 105–111. [Google Scholar] [CrossRef] [PubMed]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Ultralytics. Ultralytics YOLOv5 Architecture. 2023. Available online: https://docs.ultralytics.com/yolov5/tutorials/architecture_description (accessed on 24 April 2024).
Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Lou, H.; Duan, X.; Guo, J.; Liu, H.; Gu, J.; Bi, L.; Chen, H. DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor. Electronics 2023, 12, 2323. [Google Scholar] [CrossRef]
Zhang, G.; Li, Z.; Li, J.; Hu, X. Cfnet: Cascade fusion network for dense prediction. arXiv 2023, arXiv:2302.06052. [Google Scholar]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
Yu, F. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
Ba, J.L. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Kang, M.; Ting, C.M.; Ting, F.F.; Phan, R.C.W. ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation. Image Vis. Comput. 2024, 147, 105057. [Google Scholar] [CrossRef]
Rukundo, O.; Cao, H. Nearest neighbor value interpolation. Int. J. Adv. Comput. Sci. Appl. 2012, 3, 25–30. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Zheng, F.; Chen, X.; Liu, W.; Li, H.; Lei, Y.; He, J.; Zhou, S. SMAFormer: Synergistic Multi-Attention Transformer for Medical Image Segmentation. arXiv 2024, arXiv:2409.00346. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Lang, X.; Ren, Z.; Wan, D.; Zhang, Y.; Shu, S. MR-YOLO: An improved YOLOv5 network for detecting magnetic ring surface defects. Sensors 2022, 22, 9897. [Google Scholar] [CrossRef]
Ren, S. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]

Figure 1. YOLOv8 network architecture.

Figure 2. AS-YOLO network architecture.

Figure 3. CFNeXt network architecture.

Figure 5. MLFF and MFFPN network architecture. (a) MLFF architecture. (b) MFFPN architecture.

Figure 6. Comparisons of the YOLOv8 and EDSA detection head structures. (a) YOLOv8 detection head architecture. (b) Efficient depthwise separable convolutional aggregation detection head architecture.

Figure 7. DSA images of sidewall-type and bifurcation-type aneurysms. (a) Sidewall aneurysms. (b) Bifurcation aneurysms.

Figure 8. In the complex scenario where an aneurysm is occluded by blood vessels, AS-YOLO demonstrated significant advantages compared to YOLOv8. The detection accuracy of AS-YOLO increased by 28.1% compared to that of YOLOv8. These data suggest that, due to its distinct architecture and training approach, AS-YOLO can simulate various occlusion scenarios during data augmentation, allowing it to capture additional features of occluded targets. (a) YOLOv8 detection result. (b) AS-YOLO detection result.

Figure 9. PR curve comparisons of YOLOv8 and AS-YOLO. (a) YOLOv8 PR curve. (b) AS-YOLO PR curve.

Figure 10. Heatmap comparisons of YOLOv8 and AS-YOLO. (a,c) Heatmaps of the same image, as generated by YOLOv8 and AS-YOLO, respectively. (b,d) Heatmaps of another image, as generated by YOLOv8 and AS-YOLO, respectively. (a) YOLOv8 Heatmap 1. (b) YOLOv8 Heatmap 2. (c) AS-YOLO Heatmap 1. (d) AS-YOLO Heatmap 2.

Figure 11. Performance comparison chart of each model.

Figure 12. Detection outcomes of the different methods on sidewall-type aneurysm images. Each subfigure visualizes how the respective method performs. AS-YOLO showed the most accurate detection. (a) Input image. (b) Faster R-CNN results. (c) YOLOv6 results. (d) YOLOv8 results. (e) TransUNet results. (f) AS-YOLO results.

Table 1. Model performance under different dilation factors.

Configuration	mAP0.5	GFLOPs	Params (M)	FPS
CFocalNeXt (r = 1)	0.786	6.8	2.581	103.6
CFocalNeXt (r = 3)	0.805	7.2	2.662	98.2
CFocalNeXt (r = 5)	0.812	7.5	2.796	94.5

Table 2. Comparisons of the model performance with BatchNorm and LayerNorm.

Configuration	mAP0.5	GFLOPs	Params (M)	FPS
BatchNorm	0.793	7.2	2.662	98.1
LayerNorm	0.805	7.2	2.662	98.2

Table 3. Experimental performances of the different pooling strategies.

Strategy	mAP0.5	GFLOPs	Params (M)	FPS
Max Pool	0.784	8.1	3.020	86.4
Avg Pool	0.781	8.0	3.000	86.7
Max + Avg (Ours)	0.788	8.3	3.0572	85.2

Table 4. Experimental performances of the different upsampling methods.

Methods	mAP0.5	GFLOPs	Params (M)	FPS
NNI	0.788	8.3	3.057	85.2
TC	0.794	9.5	3.350	78.6
AG	0.796	10.1	3.680	72.1

NNI, TC, and AG represent nearest neighbor interpolation, transposed convolution, and attention guidance, respectively.

Table 5. Detection result comparisons of YOLOv8n and AS-YOLO.

Method	Sidewall Precision	Bifurcation Precision	mAP0.5	mAP0.5:0.95
YOLOv8	0.813	0.781	0.775	0.403
AS-YOLO	0.834	0.814	0.843	0.423

Table 6. Experiment results of the ablation experiment.

Method				Sidewall Type		Bifurcation Type		Params (M)	GFLOPs
CFNeXt	MLFF	EDSA	SIoU	mAP0.5	mAP0.5:0.95	mAP0.5	mAP0.5:0.95
				0.815	0.444	0.735	0.362	3.006	8.1
✓				0.860	0.452	0.750	0.375	2.662	7.2
	✓			0.835	0.455	0.741	0.373	3.057	8.3
		✓		0.813	0.440	0.727	0.355	2.707	6.9
			✓	0.825	0.452	0.741	0.369	3.006	8.1
✓	✓			0.865	0.465	0.755	0.377	2.714	7.4
✓		✓		0.820	0.450	0.738	0.366	2.706	6.8
✓	✓	✓		0.864	0.463	0.748	0.372	2.784	7.2
✓	✓	✓	✓	0.868	0.468	0.760	0.379	2.759	7.1

✓ indicates the module used.

Table 7. Results obtained on the Brain-Tumor dataset.

Method	GFLOPs	Parameters (M)	mAP0.5	P	R
YOLOv8	8.1	3.006	0.486	0.736	0.97
AS-YOLO	7.4	2.714	0.522	0.930	0.98

Table 8. Experiment results of the contrast experiment.

Type	Result	Faster R-CNN	YOLOv3	YOLOv5	YOLOv6	YOLOv8	Mask R-CNN	U-Net	TransUNet	AS-YOLO
Bifurcation	mAP0.5	0.736	0.764	0.792	0.793	0.815	0.817	0.786	0.834	0.868
Bifurcation	mAP0.5:0.95	0.384	0.401	0.431	0.436	0.444	0.446	0.426	0.454	0.468
Sidewall	mAP0.5	0.654	0.682	0.724	0.725	0.735	0.739	0.724	0.79	0.760
Sidewall	mAP0.5:0.95	0.344	0.356	0.369	0.361	0.362	0.366	0.364	0.368	0.379

Table 9. Experiment results of the contrast experiment.

Method	Params (M)	GFLOPs	mAP0.5	mAP0.5:0.95	FPS
Faster R-CNN	29.269	124.4	0.695	0.364	34.3
YOLOv3	38.269	82.1	0.723	0.3875	63.5
YOLOv5	2.503	7.1	0.758	0.405	86.3
YOLOv6	4.234	11.8	0.710	0.404	104.2
YOLOv8	3.006	8.1	0.775	0.403	112.6
YOLOv11	2.857	7.7	0.772	0.398	112.4
Mask R-CNN	28.524	130.6	0.778	0.406	30.2
U-Net	7.838	55.3	0.755	0.395	42.0
TransUNet	32.623	153.7	0.812	0.411	24.3
AS-YOLO	2.759	7.1	0.843	0.428	99.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Wang, C.; Chen, Y.; Chen, Z.; Tong, J. AS-YOLO: A Novel YOLO Model with Multi-Scale Feature Fusion for Intracranial Aneurysm Recognition. Electronics 2025, 14, 1692. https://doi.org/10.3390/electronics14081692

AMA Style

Yang J, Wang C, Chen Y, Chen Z, Tong J. AS-YOLO: A Novel YOLO Model with Multi-Scale Feature Fusion for Intracranial Aneurysm Recognition. Electronics. 2025; 14(8):1692. https://doi.org/10.3390/electronics14081692

Chicago/Turabian Style

Yang, Jun, Chen Wang, Yang Chen, Zhengkui Chen, and Jijun Tong. 2025. "AS-YOLO: A Novel YOLO Model with Multi-Scale Feature Fusion for Intracranial Aneurysm Recognition" Electronics 14, no. 8: 1692. https://doi.org/10.3390/electronics14081692

APA Style

Yang, J., Wang, C., Chen, Y., Chen, Z., & Tong, J. (2025). AS-YOLO: A Novel YOLO Model with Multi-Scale Feature Fusion for Intracranial Aneurysm Recognition. Electronics, 14(8), 1692. https://doi.org/10.3390/electronics14081692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AS-YOLO: A Novel YOLO Model with Multi-Scale Feature Fusion for Intracranial Aneurysm Recognition

Abstract

1. Introduction

2. Materials and Methods

2.1. Baseline Method-YOLOv8

2.2. Proposed Method

2.2.1. Backbone Network CFNeXt

2.2.2. Multi Level Feature Fusion Module

2.2.3. Efficient Depthwise Separable Convolutional Aggregation Detection Head

2.2.4. Loss Function SIoU

3. Results

3.1. Experimental Setup and Dataset Preparation

3.2. Experimental Evaluation Index

3.3. Data Analysis

3.4. Comparative Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI