1. Introduction
Unmanned Aerial Vehicles (UAVs), which are aircraft operated remotely or controlled by autonomous flight programs, have gained widespread application across various fields such as military [
1], communication [
2], and logistics [
3], owing to their flexibility and efficiency. However, with the proliferation of UAV technology, their misuse has increasingly become a critical security concern, manifesting in forms such as unauthorized incursions [
4], privacy violations [
5], and potential threats to critical infrastructure [
6]. These escalating risks further underscore the importance and urgency of research into anti-UAV detection technologies.
The rapid advancement of deep learning technology has provided novel solutions for anti-UAV detection, with deep learning-based methods gradually becoming the mainstream research direction in this field. Recent studies have demonstrated significant progress in enhancing detection performance through attention mechanisms and efficient feature learning. For instance, Ge et al. [
7] proposed Neural Attention Learning (NEAL), a gradient-driven method that refines attention maps to optimize detection without extra computational overhead, achieving significant improvements on COCO and VOC benchmarks. In applications requiring real-time processing, Chen et al. [
8] introduced MANet, a multi-attention framework that directly processes compressed video streams to reduce latency, demonstrating its potential for resource-constrained UAV scenarios. Meanwhile, Phan et al. [
9] addressed the challenge of limited supervision by incorporating structural attention into Transformers, showing that anatomical priors can significantly improve synthesis accuracy in unpaired learning settings—an insight applicable to UAV detection under data scarcity. These works collectively advance attention-based modeling for accuracy–efficiency trade-offs in dynamic environments.
In recent years, the YOLO [
10] series of object detection algorithms has gained widespread adoption across various fields due to their outstanding efficiency and real-time performance. Academic research on YOLO-based hybrid methods has yielded a range of innovative optimization strategies and practical engineering solutions. Stefenon et al. [
11] proposed a Hypertuned-YOLO approach that creatively integrates genetic algorithm-based hyperparameter optimization with EigenCAM visual interpretation, successfully applying it to power distribution network fault localization. Experimental results demonstrate that this method achieves remarkable performance in insulator contamination detection, with an F1-score of 0.867 and mAP50 of 0.922. Singh et al. [
12] developed a Pseudo-Prototype Component Network (Ps-ProtoPNet) for insulator defect classification in high-voltage transmission lines. Through systematic comparisons of various YOLO variants, they ultimately adopted YOLOv8m as the detection framework, achieving exceptional performance with an mAP50 of 0.9950 and mAP50-95 of 0.9125. The latest research further explores the integration of YOLOv5 detection modules with Quasi-ProtoPNet classifiers, opening new technical pathways for insulator defect classification tasks [
13].
Meanwhile, the YOLO series of algorithms has demonstrated broad potential for application in UAV object detection tasks. However, due to the characteristics of UAV targets, such as their small size and low pixel proportion, combined with the influence of complex background textures and similar disturbances, the detection accuracy of the YOLO series algorithms for UAV small targets is often suboptimal. To enhance the precision of the YOLO series models in detecting UAV small targets, existing approaches typically optimize the models by incorporating attention mechanisms or adding specialized detection heads tailored for small targets. For instance, Hu et al. [
14] were the first to apply an improved YOLOv3 algorithm to the field of anti-UAV detection. Their algorithm introduced an additional feature map scale to predict target bounding boxes, thereby capturing more texture and contour information, effectively improving the detection capability for UAV small targets. Fang et al. [
15] proposed the SEB-YOLOv8s model, which reconstructs the YOLOv8 architecture using SPD-Conv, replaces the original Faster Implementation of Cross-Stage Partial Bottleneck with 2 convolutions (C2f) module with the Attention-enhanced C2f (AttC2f) module, and optimizes the Neck part by integrating BiLevel Routing Attention. On the Anti-UAV dataset, the model achieved a high accuracy of 95.9%. Ma et al. [
16] proposed a high-performance LA-YOLO network that integrated the SimAM attention mechanism and a fusion block with normalized Wasserstein distance into YOLOv5, significantly enhancing the model’s detection accuracy for UAVs in low-altitude backgrounds. Zamri et al. [
17] developed the P2-YOLOv8n-ResCBAM model based on the YOLOv8n architecture, incorporating multiple attention mechanisms and a high-resolution small target detection head, which increased the mean average precision (mAP) from 90.3% to 92.6%. However, this also increased model complexity, resulting in a decrease in inference speed.
Moreover, the practical application scenarios of anti-UAV detection often require deploying detection models on embedded devices or mobile terminals with limited computational resources. However, the limitations of the YOLO series models in terms of parameter volume and computational complexity make it difficult for them to directly adapt to such resource-constrained environments. To address this, existing research typically achieves the lightweight optimization of YOLO series models by introducing lightweight network structures or employing techniques such as model pruning. For example, Niu et al. [
18] utilized the lightweight image detection network MobileNetV3 to replace the original CSPDarknet53 as the backbone network in the YOLOv4 framework and integrated a Coordinate Attention (CA) module, significantly reducing the model’s parameter count while maintaining detection accuracy. Zhang et al. [
19] optimized YOLOv3-SPP3 using channel pruning and shortcut layer pruning algorithms, combined with fine-tuning training, which substantially compressed the model size and improved detection speed, albeit at the cost of a reduced mean average precision (mAP). Feng et al. [
20] proposed an efficient UAV detection method based on YOLOv5s, constructing a lightweight backbone network using the ShuffleNetV2 network and a Coordinate Attention mechanism, and designing a balanced neck network with a Bidirectional Feature Pyramid Network (BiFPN) and Ghost Convolution. This method reduced the model’s computational complexity (GFLOPs) from 16.0 to 2.2 and increased the frame rate (FPS) from 153 to 188, though the mAP experienced a slight drop of 1.1%.
The trade-off between detection accuracy and model lightweighting is challenging to balance, as improving one aspect often significantly impacts the other. Consequently, existing research tends to focus on either enhancing detection accuracy or achieving model lightweighting as a singular goal, often overlooking the importance of maintaining high detection performance while ensuring a compact model architecture. To address this issue, this paper proposes the IASL-YOLO model based on the YOLOv8s framework, which optimizes the Neck structure, introduces a novel localization loss function, and implements pruning techniques to simultaneously enhance detection accuracy and achieve lightweighting. The acronym IASL encapsulates its core innovations:
- IA (Improved AFPN): A novel C2f-Faster-EMA-enhanced Adaptive Feature Pyramid Network (CFE-AFPN) is proposed to replace the Neck module in YOLOv8s. It strengthens multi-scale feature fusion for UAV object detection while maintaining lightweight advantages. 
- S (SIoU Loss): The original Complete-IoU (CIoU) loss is replaced with SCYLLA-IoU (SIoU), improving localization accuracy without additional computational overhead. 
- L (LAMP Pruning): The Layer-Adaptive Sparsity for the Magnitude-Based Pruning (LAMP) algorithm is applied to eliminate redundant parameters while preserving detection performance. 
The remainder of this paper is organized as follows: 
Section 2 provides a detailed introduction to the proposed method; 
Section 3 describes the datasets used in the experiments; 
Section 4 presents the experimental results and conducts a comprehensive analysis of the findings; 
Section 5 discusses the experimental details and compares the results with existing studies; and finally, 
Section 6 concludes the paper and suggests directions for future research.
  2. Methods
The UAV target detection model proposed in this paper is developed by enhancing the YOLOv8 [
21] framework. As an iterative version of the YOLO series, YOLOv8 introduces five hierarchical models: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. These models balance detection accuracy and inference speed by adjusting the network scale. YOLOv8 excels in anti-UAV detection tasks, leveraging its optimized multi-scale feature extraction capabilities and robust adaptability to complex environments, which are crucial for addressing the challenges of detecting small UAV targets. Its architecture, as illustrated in 
Figure 1, consists of three main components: the Backbone, Neck, and Head. To address the issues of insufficient detection accuracy and excessive model size in UAV small target detection scenarios, we propose the IASL-YOLO model, based on YOLOv8s as the baseline. The specific improvement strategies will be detailed in the subsequent sections.
In the design of Backbone, YOLOv8 adopts the CSPDarknet architecture, combined with the C2f module featuring a gradient shunting mechanism to achieve multi-scale feature extraction. It is further enhanced by the Fast Spatial Pyramid Pooling (SPPF) module, which optimizes the contextual information aggregation capability of feature maps. Notably, the C2f module retains the advantage of cross-stage feature concatenation from the Cross-Stage Partial Bottleneck with 3 convolutions (C3) module in YOLOv5 [
22], while integrating the multi-branch design philosophy of Efficient Layer Aggregation Networks (ELANs). By introducing additional branches and more flexible feature concatenation, the module significantly boosts the model’s ability to preserve spatial information. This enables YOLOv8 to precisely capture the geometric structure and spatial distribution characteristics of small UAV targets amidst complex background interference, effectively suppressing such disturbances during recognition tasks.
In the Neck section, YOLOv8 utilizes a bidirectional feature pyramid network based on the Path Aggregation Network-Feature Pyramid Network (PAN-FPN) structure, achieving efficient multi-level feature fusion through two complementary pathways. The top–down pathway propagates high-level semantic information from deep features to shallow features, enhancing the semantic representation of shallow layers. Conversely, the bottom–up pathway conveys rich detail information from shallow features to deep features, refining the local details of high-level features. This fusion structure allows YOLOv8 to retain the fine features and critical details of small UAV targets, significantly improving their detection effectiveness in anti-UAV scenarios.
In the Head section, YOLOv8’s loss function comprises classification loss and localization loss. The classification loss employs the Binary Cross-Entropy (BCE) function to measure the discrepancy between predicted and true class probability distributions, ensuring accurate learning of class features. For localization, a combination of Distribution Focal Loss (DFL) [
23] and CIoU [
24] loss is used. This approach not only ensures precise bounding box localization for UAV targets but also optimizes the shape and position of predicted boxes, thereby enhancing detection performance for fast-moving or dynamically changing UAV targets.
  2.1. The Structure of IASL-YOLO
Although the YOLOv8 object detection algorithm can effectively detect small UAV targets, it still faces challenges such as insufficient model lightweighting and inadequate learning and fusion of multi-level features. To address these issues, this paper introduces a lightweight small UAV target detection model, IASL-YOLO, based on YOLOv8s as the baseline model. The network structure is illustrated in 
Figure 2. The specific improvements include the following aspects: Firstly, the PAN-FPN structure in the Neck part of YOLOv8 is replaced with the proposed CFE-AFPN network, which employs a progressive fusion strategy to better integrate non-adjacent level features. The C2f-Faster-EMA module within this network enhances the representation of local details and global contextual information while reducing computational complexity. Secondly, the CIoU localization loss function of YOLOv8 is replaced with the SIoU localization loss function to address the misalignment between predicted and ground truth bounding boxes. Lastly, the LAMP pruning algorithm is applied to prune the model, significantly reducing its complexity without compromising detection accuracy, thereby achieving further lightweighting.
  2.1.1. Improvement of the Neck
In UAV-based object detection tasks, the diversity of target scales imposes higher demands on the design of feature fusion networks. Although YOLOv8 employs the PAN-FPN structure in its Neck section, which enhances feature interaction between adjacent layers through top–down and bottom–up pathways, its support for feature transmission between non-adjacent layers is limited. This limitation may result in the loss of fine details for small targets in deeper layers or the dilution of semantic information for large targets in shallower layers, thereby affecting the overall multi-scale object detection performance.
To address this issue, this paper introduces the Asymptotic Feature Pyramid Network (AFPN) [
25], enhances it further by integrating the C2f-Faster-EMA module, and ultimately proposes the CFE-AFPN network. This network significantly enhances UAV object detection performance by employing a progressive fusion strategy, an efficient multi-scale attention mechanism, and the incorporation of a 160 × 160 feature output layer. Additionally, by incorporating a single bottom–up feature fusion path and a design featuring cross-stage partial connections, the architecture significantly simplifies the network structure and effectively reduces model complexity. The structure of the CFE-AFPN network is shown in the Neck section of 
Figure 2.
The specific workflow of CFE-AFPN is as follows: In the initial stage of the network, CFE-AFPN fuses adjacent low-level features to reduce the semantic gap between them. As the network depth increases, CFE-AFPN progressively introduces higher-level features and integrates them with low-level features through a bottom–up progressive pathway, ultimately incorporating the highest-level features into the fusion process. After each feature fusion, CFE-AFPN employs the C2f-Faster-EMA module to further refine the learned features. Simultaneously, to address information conflicts between features of different levels, CFE-AFPN adopts the Adaptively Spatial Feature Fusion (ASFF) [
26] network, which multiplies features from different levels by learnable coefficients, assigning spatial weights to features at various levels. This enables CFE-AFPN to adaptively retain effective information and achieve superior feature fusion. Finally, CFE-AFPN outputs four feature maps with different resolutions, each corresponding to a specific scale to meet the detection requirements of targets of varying sizes. Additionally, the newly introduced 160 × 160 output layer enhances the model’s capability to detect small UAV targets, thereby significantly improving its overall performance in anti-drone detection tasks.
Notably, while retaining the feature fusion capabilities of AFPN, CFE-AFPN reduces computational costs and also improves detection performance by replacing the Residual Networks (ResNet) [
27] residual units in the AFPN network with the C2f-Faster-EMA module. The structure of the C2f-Faster-EMA module is illustrated in 
Figure 3.
The C2f-Faster-EMA module begins by performing a convolution operation on the input feature map, followed by splitting the feature map into two branches. One branch directly participates in subsequent feature fusion, while the other branch is processed through multiple stacked FasterBlock modules. The outputs of all FasterBlock modules except the last one are also extracted and incorporated into the subsequent feature fusion process to capture more fine-grained features. The two branches are then fused, and the EMA mechanism is employed to enhance cross-spatial feature representation, ultimately generating multi-scale attention feature maps. By integrating the strengths of C2f, FasterNet [
28], and the EMA [
29] attention mechanism, the C2f-Faster-EMA module achieves a lightweight design while further improving the model’s detection performance.
We introduced the C2f structure to replace the original residual units in AFPN for further feature learning after feature fusion. C2f, a key component of YOLOv8, utilizes Cross-Stage Partial Network (CSPNet)’s [
30] cross-stage partial connection design to divide the input feature map into two parts: one part is passed directly to the subsequent stage, while the other part undergoes convolution before being fused with the first part. Compared to the residual units in ResNet, which require convolution on the entire feature map before adding it to the original feature map, this design in CSPNet significantly reduces computational redundancy and memory demands.
To further enhance model efficiency, we replaced the original stacked Bottleneck structure with the more lightweight FasterBlock in the C2f framework. FasterBlock, the core component of FasterNet, consists of a 3 × 3 partial convolution (PConv) layer and two 1 × 1 point convolutions. Specifically, PConv applies convolution to only 1/4 of the input feature channels, while the remaining channels are directly preserved and involved in subsequent feature fusion. This design reduces the computational load of PConv to 1/16 of that of a standard convolution, significantly alleviating the model’s computational burden.
Additionally, we incorporated the Efficient Multi-scale Attention (EMA) mechanism after feature concatenation in the C2f module, as illustrated in 
Figure 4, to notably improve small-target detection performance. The EMA mechanism reshapes partial channels into batch dimensions and combines them with a grouped processing strategy to achieve uniform distribution of semantic features across channels, effectively preserving channel information. Its core architecture consists of two parallel branches: one branch employs 1 × 1 convolution to capture global context information, while the other uses 3 × 3 convolution to extract multi-scale local features. By fusing the outputs of both branches through cross-spatial learning, EMA precisely captures pixel-level pairwise relationships, enhancing the global contextual representation of features. This design enables more efficient extraction of multi-scale features, significantly improving the detection capability for small UAV targets.
  2.1.2. Introduction of the SIoU Loss Function
In YOLOv8, the traditional localization loss function CIoU comprehensively considers the distance between the centers of the predicted and ground truth bounding boxes, their aspect ratios, and their overlapping regions. However, it still has a key limitation: it does not account for directional alignment between the two boxes. Specifically, when there is a significant angular deviation between the line connecting the centers of the predicted and ground truth boxes and the coordinate axes (horizontal or vertical directions), the regression process of the model requires additional adjustments for directional correction, which increases optimization difficulty and reduces convergence efficiency.
To address this issue more effectively, we introduce the SIoU [
31] loss function. Compared to CIoU, SIoU incorporates an angle cost term to penalize directional misalignment between the predicted and ground truth boxes. This mechanism guides the model to prioritize directional alignment, thereby accelerating convergence and improving detection accuracy. The SIoU loss consists of four components: angle cost, distance cost, shape cost, and IoU cost. The physical meanings and calculation methods of each component are as follows:
Angle Cost (
): Precisely quantifies the angular deviation between the predicted and ground truth boxes, ensuring accurate directional alignment.
          where 
 is the height difference between the centers of the ground truth and predicted boxes, and 
 is the distance between their centers.
Distance Cost (
): Measures the distance between the centers of the two boxes, optimizing positional regression.
          where 
 and 
 are the 
x-coordinates of the centers of the ground truth and predicted boxes, 
 and 
 are their 
y-coordinates, 
 is the height of the minimum enclosing rectangle, 
 is its width, and 
 denotes 
 or 
.
Shape Cost (
): Evaluates the difference in aspect ratios between the predicted and ground truth boxes to accurately capture the target’s morphological characteristics.
          where 
 and 
 represent the widths and heights of the predicted and ground truth boxes, respectively, 
 controls the influence of the shape cost, and 
 denotes 
 or 
.
IoU Cost: Measures the degree of overlap between the two boxes to ensure spatial consistency.
          where 
b is the predicted box and 
 is the ground truth box.
Finally, the SIoU loss function is obtained by integrating these four components, as shown in the following equation.
  2.1.3. Model Pruning Using the LAMP Algorithm
YOLOv8 has achieved remarkable breakthroughs in object detection performance and accuracy. However, the increased model complexity has led to a dramatic rise in computational load and parameter volume. This issue makes efficient deployment on resource-constrained embedded devices particularly challenging, especially for anti-UAV detection tasks, where real-time processing and lightweight models are crucial. To address these limitations, we introduce the LAMP [
32] algorithm, which adaptively optimizes the model structure to enable lightweight deployment while maintaining detection accuracy, providing a viable solution for anti-UAV detection.
The implementation of model pruning involves three key steps: First, determine the speed-up ratio, which is the ratio of the computational load of the original model to that of the pruned model. Second, apply the LAMP pruning algorithm to perform the pruning operation. Finally, fine-tune the pruned model to recover any potential performance loss during the pruning process.
The core of the LAMP algorithm lies in its layer-adaptive scoring mechanism. This algorithm meticulously scores the weight tensors corresponding to fully connected layers and convolutional layers, dynamically pruning the connections with the smallest LAMP scores until the predefined speed-up ratio is achieved. By dynamically adjusting weight scores and integrating layer-wise optimization strategies, the LAMP algorithm effectively preserves the model’s core features while significantly reducing redundant computations. This approach greatly enhances computational resource efficiency, making the model more suitable for deployment in resource-constrained anti-UAV detection scenarios.
The calculation formula for the LAMP score is as follows. Here, 
 represents the weight corresponding to the convolution kernel indexed by 
u, and it holds that 
 when 
.
  4. Results
In this section, experiments are conducted based on the publicly available Anti-UAV dataset to validate the effectiveness of the proposed method. The experimental process encompasses several aspects: first, a detailed explanation of the experimental environment and hyperparameter configurations is provided; second, various metrics for evaluating model performance and lightweight design are introduced; next, the ablation experiment results are thoroughly analyzed and interpreted; finally, by comparing with baseline models and other different YOLOv8 models, the detection efficacy of the proposed method is further verified.
  4.1. Experimental Environment and Hyperparameter Settings
The experimental environment is set up on a Linux server equipped with Nvidia Tesla K80 GPUs (4 GPUs, totaling 48GB of VRAM). The software versions used are as follows: Python 3.8.19, PyTorch 1.12.1, CUDA 10.2, and YOLOv8 8.3.12. Through systematic experimental validation, the optimal hyperparameters are determined: a batch size of 24, training epochs of 300, an input image size of 640 × 640, and an initial learning rate of 0.01. At the same time, it is particularly important to note that the proposed model in this paper adopts a speed-up ratio of 1.5 when applying the LAMP pruning algorithm for model pruning. The rationale behind this specific setting will be discussed in detail in subsequent sections. The experiments are based on the YOLOv8s model, and 
Table 1 provides a detailed list of the key parameter configurations.
  4.2. Experimental Evaluation Metrics
For the models obtained during the training phase in this experiment, we adopted Precision, Recall, mAP50, and mAP50-95 as performance metrics for detection accuracy, along with Model Size, Parameters, and GFLOPs as lightweight metrics to evaluate the model efficiency. Additionally, FPS was employed as a key metric to measure the model’s inference speed.
Precision represents the proportion of actual positive samples among those predicted as positive by the model. Here, 
TP (True Positives) denotes the number of samples correctly predicted as drones, while 
FP (False Positives) denotes the number of samples incorrectly predicted as drones.
 Recall refers to the proportion of actual positive samples that are correctly predicted as positive by the model. Here, 
TP (True Positives) denotes the number of samples correctly predicted as drones, while 
FN (False Negatives) denotes the number of samples that the model failed to correctly predict as drones.
 mAP (Mean Average Precision) is a core evaluation metric in object detection tasks, used to measure the overall performance of a model in multi-class detection. It is calculated by taking the mean of the area under the precision–recall curve (also known as Average Precision, 
AP) for each class, comprehensively reflecting the model’s balanced performance between detection precision and recall. This metric is widely used for performance evaluation in the field of object detection. mAP50 (Mean Average Precision at IoU = 0.5) represents the average precision when the IoU threshold is set to 0.5, which is the area under the precision–recall curve. mAP50-95, on the other hand, is a comprehensive evaluation of the average precision across IoU thresholds ranging from 0.5 to 0.95.
 Model Size refers to the size of a model, measured in megabytes (MB), which indicates the amount of storage and bandwidth resources required for saving and transferring the model.
Parameters represent the number of learnable weights in a model, measured in units. They reflect the complexity of the model.
GFLOPs (Giga Floating-Point Operations per Second) refers to the computational load of a model, representing the number of floating-point operations performed per second, measured in billions of operations.
FPS (Frames Per Second) denotes the number of image frames processed by the model per second, measuring its inference speed.
  4.3. Ablation Experiment
To validate the rationality of the IASL-YOLO model, this study conducted a series of ablation experiments on the Anti-UAV dataset. The detailed experimental results are presented in 
Table 2. Using the YOLOv8s model (Y) as the baseline, the experiments evaluated the individual and combined effects of the CFE-AFPN module (IA), the SIoU localization loss function (S), and the LAMP pruning algorithm (L). When independently introducing the CFE-AFPN module (Y+IA), the model’s detection performance improved significantly: precision, recall, and mAP50 increased by 2.0%, 6.1%, and 3.7%, respectively. In terms of model lightweighting, this module reduced model size by 37% and parameters by 39%, despite a 7% increase in computational cost and a drop in FPS, the overall performance gain was substantial. The standalone introduction of the SIoU loss function (Y+S) resulted in modest performance improvements, while the LAMP pruning algorithm (Y+L) achieved slight gains in performance metrics alongside significant lightweighting and FPS improvements. In progressive experiments (Y→Y+IA→Y+IA+S→Y+IA+S+L), the results demonstrated that after integrating the CFE-AFPN module, the addition of SIoU further slightly enhanced performance metrics. Subsequently, applying the LAMP pruning algorithm maintained performance while achieving greater lightweighting, reducing model size, parameters, and computational cost to 5.3, 2.4, and 19.9, respectively, with FPS recovering to 51.1.
  4.4. Comparative Experiments with the Baseline Model
To validate the effectiveness of the proposed model in this paper, we conducted experiments on the Anti-UAV dataset and compared the IASL-YOLO model with the baseline model under the same experimental conditions. The experimental results are presented in 
Table 3. The results demonstrate that the IASL-YOLO model achieves significant improvements in detection performance, with precision increasing by 2.9%, recall by 6.8%, mAP50 by 3.9%, and mAP50-95 by 3.8%. In terms of model lightweighting, the model size is reduced by 75%, the number of parameters is decreased by 78%, and the computational load is reduced by 30%. While the proposed model exhibits a decrease in FPS, the significant improvements in detection accuracy and model lightweighting make it markedly superior to the YOLOv8s baseline model in overall performance.
To further evaluate the detection performance of the IASL-YOLO model, this study selected images from the Anti-UAV dataset featuring complex backgrounds at different times of day and varying UAV pixel ratios for comparative experiments. The experimental results demonstrate that the IASL-YOLO model exhibits significant advantages over the baseline model, YOLOv8s.
As illustrated in 
Figure 6, in images with a higher UAV pixel ratio, YOLOv8s is capable of identifying UAV targets but suffers from insufficient bounding box accuracy and missed detections. In contrast, the IASL-YOLO model not only provides more precise bounding boxes but also achieves a notable improvement in detection accuracy. In images with a lower UAV pixel ratio, YOLOv8s continues to struggle with missed detections and low precision, whereas the IASL-YOLO model not only accurately identifies targets but also delivers higher confidence scores.
Furthermore, as seen in the inference results in 
Figure 7, in complex backgrounds at different times of day, the IASL-YOLO model consistently outperforms YOLOv8s. Specifically, it significantly reduces the missed detection rate while enhancing detection accuracy. These experimental results comprehensively validate the superior performance of the IASL-YOLO model in UAV detection tasks.
We also conducted a performance comparison between YOLOv8s and our proposed model in multi-UAV target scenarios, as shown in 
Figure 8. The results demonstrate that our model outperforms other models in multi-UAV detection tasks, effectively reducing missed detections and false positives while achieving significantly higher accuracy than YOLOv8s, particularly for small-sized targets with low pixel proportions, where the detection improvement is even more pronounced.
It is important to emphasize that the cases where the proposed model displayed confidence scores below 0.7 in the presented test samples occurred only under extreme testing conditions, such as scenes with severe background interference. These instances represent edge cases that fall well beyond the scope of conventional applications. Notably, in these highly challenging scenarios where the benchmark model YOLOv8s either failed to detect objects entirely or produced critically low confidence scores, our IASL-YOLO model consistently delivered reliable detection performance. This comparative result strongly confirms the model’s significant robustness advantage when handling edge cases.
  4.5. Comparative Experiments with Different YOLOv8 Models
To further validate the effectiveness of the drone target detection model proposed in this study, comparative experiments were conducted with the YOLOv8 series models, namely YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. The experimental results are presented in 
Table 4. In terms of detection performance, the proposed model achieved the highest values in precision (96.8%), recall (88.2%), mAP50 (92.4%), and mAP50-95 (61.9%). Additionally, in terms of lightweight design, compared to YOLOv8n, the smallest model in the YOLOv8 series, the proposed model reduced the model size to 88.3% of YOLOv8n and the number of parameters to 75%. Meanwhile, in terms of inference speed, the FPS metric of the proposed model shows significant improvement over YOLOv8m, YOLOv8l, and YOLOv8x. In conclusion, the proposed model demonstrates optimal performance in balancing detection accuracy and lightweight design.
  6. Conclusions
To address the balance between accuracy and lightweight design in anti-drone detection, this study proposes a lightweight drone detection model named IASL-YOLO. The model first replaces the Neck part of YOLOv8s with the CFE-AFPN network, utilizing a bottom–up progressive feature fusion mechanism to reduce the model’s size while minimizing the semantic gap between non-adjacent hierarchical features. The integrated C2f-Faster-EMA module not only reduces computational load but also enhances the expressive power of local details and global features. Secondly, the SIoU localization loss function is employed in place of CIoU, effectively resolving the misalignment between predicted and true bounding boxes, thereby significantly improving the model’s detection accuracy. Lastly, the LAMP pruning algorithm is applied to the model, substantially reducing its size, number of parameters, and computational complexity. Experiments demonstrate that the proposed model outperforms the original model and other state-of-the-art models on the Anti-UAV dataset. The innovative design of this model offers a unique advantage in solving the challenge of balancing performance and lightweight in drone target detection, providing an efficient solution for anti-drone detection.
In future research, we aim to explore the application of knowledge distillation techniques to further optimize the model’s performance. By transferring knowledge from complex models to simplified ones, this technique can effectively enhance the accuracy and efficiency of pruned models. We plan to apply this technique to our detection model, with the goal of achieving a comprehensive improvement in performance.