Figure 1.
Overall architecture of BDNet. The model follows a single-stage detection pipeline composed of Input, Backbone, Neck, and Detection Head. The backbone integrates Hybrid Downsampling (HyDASE) and C3k2_MogaBlocks for efficient feature preservation and contextual aggregation. The neck employs A2C2f_FRFN modules for multi-scale feature refinement, while three detection heads output predictions across different scales.
Figure 1.
Overall architecture of BDNet. The model follows a single-stage detection pipeline composed of Input, Backbone, Neck, and Detection Head. The backbone integrates Hybrid Downsampling (HyDASE) and C3k2_MogaBlocks for efficient feature preservation and contextual aggregation. The neck employs A2C2f_FRFN modules for multi-scale feature refinement, while three detection heads output predictions across different scales.
Figure 2.
Architecture of the proposed HyDASE module. The block integrates hybrid pre-pooling, parallel convolutional and pooling branches, and a squeeze-and-excitation attention unit to achieve detail-preserving downsampling.
Figure 2.
Architecture of the proposed HyDASE module. The block integrates hybrid pre-pooling, parallel convolutional and pooling branches, and a squeeze-and-excitation attention unit to achieve detail-preserving downsampling.
Figure 3.
Architecture of the proposed C3k2_MogaBlock module.
Figure 3.
Architecture of the proposed C3k2_MogaBlock module.
Figure 4.
Architecture of our proposed A2C2f_FRFN module. Multiple ABlock_FRFN units, each embedding an FRFN, are stacked within the A2C2f structure. The refined features are concatenated and projected through a lightweight fusion layer, ensuring enhanced feature selectivity while maintaining residual stability.
Figure 4.
Architecture of our proposed A2C2f_FRFN module. Multiple ABlock_FRFN units, each embedding an FRFN, are stacked within the A2C2f structure. The refined features are concatenated and projected through a lightweight fusion layer, ensuring enhanced feature selectivity while maintaining residual stability.
Figure 5.
Representative images from the BRVD dataset under varying densities, illumination, and weather conditions (day, night, fog, and rain).
Figure 5.
Representative images from the BRVD dataset under varying densities, illumination, and weather conditions (day, night, fog, and rain).
Figure 6.
Instance distribution of the BRVD dataset across 13 vehicle categories.
Figure 6.
Instance distribution of the BRVD dataset across 13 vehicle categories.
Figure 7.
Representative samples from the VisDrone-DET2019 dataset showing aerial viewpoints and small-object challenges.
Figure 7.
Representative samples from the VisDrone-DET2019 dataset showing aerial viewpoints and small-object challenges.
Figure 8.
Training and validation performance curves of BDNet on the BRVD validation set. The smooth convergence and stable loss trajectory demonstrate effective optimization and strong generalization.
Figure 8.
Training and validation performance curves of BDNet on the BRVD validation set. The smooth convergence and stable loss trajectory demonstrate effective optimization and strong generalization.
Figure 9.
Precision–recall (PR) curves of BDNet and YOLO baseline models (YOLOv8n-YOLOv12n) on the BRVD validation set. BDNet maintains higher precision across the entire recall spectrum, evidencing robustness under dense and occluded traffic.
Figure 9.
Precision–recall (PR) curves of BDNet and YOLO baseline models (YOLOv8n-YOLOv12n) on the BRVD validation set. BDNet maintains higher precision across the entire recall spectrum, evidencing robustness under dense and occluded traffic.
Figure 10.
Integrated comparison of precision–recall (PR) curves and mean Average Precision metrics ( and ) for BDNet and YOLO baseline models on the BRVD validation set. This comprehensive visualization demonstrates that BDNet consistently outperforms YOLOv8n-YOLOv12n across multiple IoU thresholds, exhibiting superior recall stability and stronger cross-threshold generalization.
Figure 10.
Integrated comparison of precision–recall (PR) curves and mean Average Precision metrics ( and ) for BDNet and YOLO baseline models on the BRVD validation set. This comprehensive visualization demonstrates that BDNet consistently outperforms YOLOv8n-YOLOv12n across multiple IoU thresholds, exhibiting superior recall stability and stronger cross-threshold generalization.
Figure 11.
Qualitative detection results for BDNet and YOLO baselines on challenging BRVD scenarios, including night-time, fog, rain, and heterogeneous daytime traffic. BDNet recovers small and partially occluded vehicles, reduces background false positives, and provides more accurate bounding boxes.
Figure 11.
Qualitative detection results for BDNet and YOLO baselines on challenging BRVD scenarios, including night-time, fog, rain, and heterogeneous daytime traffic. BDNet recovers small and partially occluded vehicles, reduces background false positives, and provides more accurate bounding boxes.
Figure 12.
Grad-CAM comparisons on BRVD validation images. BDNet attends tightly to vehicle boundaries and discriminative parts under occlusion and scale variation, whereas baselines display broader, background-biased activations.
Figure 12.
Grad-CAM comparisons on BRVD validation images. BDNet attends tightly to vehicle boundaries and discriminative parts under occlusion and scale variation, whereas baselines display broader, background-biased activations.
Figure 13.
Hierarchical feature evolution for YOLOv8n-YOLOv12n and BDNet on BRVD. BDNet exhibits structured, high-contrast representations at intermediate and final stages, while baselines show scattered or background-dominated activations.
Figure 13.
Hierarchical feature evolution for YOLOv8n-YOLOv12n and BDNet on BRVD. BDNet exhibits structured, high-contrast representations at intermediate and final stages, while baselines show scattered or background-dominated activations.
Figure 14.
Normalized confusion matrices on the BRVD validation set. BDNet reduces inter-class confusion relative to YOLOv8n-YOLOv12n, particularly for visually similar vehicle types.
Figure 14.
Normalized confusion matrices on the BRVD validation set. BDNet reduces inter-class confusion relative to YOLOv8n-YOLOv12n, particularly for visually similar vehicle types.
Figure 15.
Trade-off analysis ( vs. Params and vs. GFLOPs) for YOLOv12 family and BDNet variants. BDNet consistently achieves higher m𝒜 at equal or lower computational complexity, confirming its scalable efficiency across model sizes.
Figure 15.
Trade-off analysis ( vs. Params and vs. GFLOPs) for YOLOv12 family and BDNet variants. BDNet consistently achieves higher m𝒜 at equal or lower computational complexity, confirming its scalable efficiency across model sizes.
Figure 16.
Qualitative detection comparisons on the VisDrone-DET2019 validation set under diverse aerial conditions (daytime, fog/haze, urban top-view, and nighttime). Rows depict YOLOv8n, YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, and BDNet. Red ellipses highlight regions where BDNet recovers missed small objects, suppresses false positives, and provides tighter localization.
Figure 16.
Qualitative detection comparisons on the VisDrone-DET2019 validation set under diverse aerial conditions (daytime, fog/haze, urban top-view, and nighttime). Rows depict YOLOv8n, YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, and BDNet. Red ellipses highlight regions where BDNet recovers missed small objects, suppresses false positives, and provides tighter localization.
Table 1.
Hyperparameter settings used for model training.
Table 1.
Hyperparameter settings used for model training.
| Parameters | Value |
|---|
| Input image size | 640 × 640 |
| Batch size | 16 |
| Epochs | 300 |
| Optimizer | SGD |
| Momentum | 0.937 |
| Weight decay | 0.0005 |
| Initial learning rate (lr0) | 0.01 |
| Final learning rate (lrf) | 0.01 |
| Warmup epochs | 3 |
| Warmup momentum | 0.8 |
| Mosaic | 1 |
| HSV-Hue | 0.015 |
| HSV-Saturation | 0.7 |
| HSV-Value | 0.4 |
| Translation factor | 0.1 |
Table 2.
Per-class detection performance of BDNet on the BRVD validation set, reported in terms of precision, recall, 𝓕1-score, , , .
Table 2.
Per-class detection performance of BDNet on the BRVD validation set, reported in terms of precision, recall, 𝓕1-score, , , .
| Class Name | 𝓟rec | 𝓡rec | 𝓕1-score | | | |
|---|
| Auto-Rickshaw | 85.7 | 77.1 | 81.2 | 86.3 | 80.0 | 72.9 |
| Bicycle | 91.1 | 79.2 | 84.7 | 88.5 | 62.8 | 58.6 |
| Bus | 91.8 | 74.5 | 82.3 | 87.0 | 74.7 | 68.5 |
| Car | 91.6 | 77.3 | 83.7 | 87.4 | 74.1 | 69.4 |
| CNG | 92.2 | 81.1 | 86.3 | 89.7 | 79.1 | 72.0 |
| Covered Van | 89.3 | 75.7 | 81.9 | 86.7 | 79.7 | 73.3 |
| Easy Bike | 87.3 | 78.9 | 82.9 | 88.7 | 82.1 | 74.1 |
| Leguna | 84.4 | 61.2 | 71.0 | 74.9 | 69.0 | 62.7 |
| Motorcycle | 92.0 | 74.0 | 82.0 | 85.4 | 64.5 | 57.4 |
| Pickup | 88.7 | 72.1 | 79.5 | 85.2 | 78.6 | 71.1 |
| Rickshaw | 88.7 | 75.9 | 81.8 | 86.6 | 72.4 | 67.6 |
| Truck | 85.1 | 81.5 | 83.2 | 88.8 | 77.4 | 71.7 |
| Van | 81.9 | 70.3 | 75.7 | 81.1 | 61.6 | 55.2 |
| All (Mean) | 88.4 | 75.3 | 81.3 | 85.9 | 73.5 | 67.3 |
Table 3.
Quantitative comparison of BDNet with YOLOv8n-YOLOv12n on the BRVD validation set. Metrics include precision, recall, 𝓕1-score, and mAP at multiple IoU thresholds, along with model size (M parameters), computational complexity (GFLOPs), and inference speed (FPS).
Table 3.
Quantitative comparison of BDNet with YOLOv8n-YOLOv12n on the BRVD validation set. Metrics include precision, recall, 𝓕1-score, and mAP at multiple IoU thresholds, along with model size (M parameters), computational complexity (GFLOPs), and inference speed (FPS).
| Model | 𝓟rec | 𝓡rec | 𝓕1-score | | | | 𝓟arams | GFLOPs | FPS (f/s) |
|---|
| YOLOv8n | 86.1 | 72.6 | 78.7 | 83.5 | 72.1 | 64.7 | 3.2 | 8.7 | 217.4 |
| YOLOv9t | 87.2 | 73.7 | 79.8 | 84.1 | 73.1 | 66.0 | 2.0 | 7.7 | 185.2 |
| YOLOv10n | 85.5 | 74.3 | 79.4 | 84.3 | 72.4 | 65.4 | 2.3 | 6.7 | 212.8 |
| YOLOv11n | 87.0 | 75.0 | 80.5 | 84.7 | 71.0 | 65.1 | 2.6 | 6.5 | 222.2 |
| YOLOv12n | 87.3 | 73.8 | 78.0 | 84.5 | 72.6 | 66.6 | 2.6 | 6.5 | 196.1 |
| BDNet (Ours) | 88.4 | 75.3 | 81.3 | 85.9 | 73.5 | 67.3 | 2.5 | 6.0 | 285.7 |
Table 4.
Per-class comparison between BDNet and YOLO baseline models (YOLOv8n-YOLOv12n) on the BRVD validation set.
Table 4.
Per-class comparison between BDNet and YOLO baseline models (YOLOv8n-YOLOv12n) on the BRVD validation set.
| Class Name | YOLOv8n | YOLOv9t | YOLOv10n | YOLOv11n | YOLOv12n | BDNet (Ours) |
|---|
| Auto- Rickshaw | 84.3 | 85.2 | 84.7 | 84.9 | 84.7 | 86.3 |
| Bicycle | 85.7 | 85.7 | 86.1 | 87.5 | 85.8 | 88.5 |
| Bus | 83.8 | 83.5 | 85.3 | 85.2 | 85.3 | 87.0 |
| Car | 85.9 | 86.7 | 86.5 | 86.9 | 86.6 | 87.4 |
| CNG | 87.8 | 87.8 | 88.3 | 88.7 | 89.0 | 89.7 |
| Covered Van | 83.3 | 83.8 | 84.8 | 84.4 | 84.5 | 86.7 |
| Easy Bike | 88.6 | 88.3 | 88.0 | 88.0 | 87.2 | 88.7 |
| Leguna | 70.9 | 76.7 | 72.3 | 75.5 | 75.5 | 74.9 |
| Motorcycle | 84.4 | 83.2 | 84.6 | 85.0 | 84.4 | 85.4 |
| Pickup | 81.3 | 81.5 | 81.8 | 81.8 | 82.4 | 85.2 |
| Rickshaw | 84.7 | 84.7 | 85.2 | 85.2 | 85.4 | 86.6 |
| Truck | 85.3 | 86.2 | 87.1 | 87.0 | 87.3 | 88.8 |
| Van | 79.7 | 80.1 | 80.7 | 80.8 | 80.8 | 81.1 |
| All (Mean) | 83.5 | 84.1 | 84.3 | 84.7 | 84.5 | 85.9 |
Table 5.
Performance comparison of BDNet with state-of-the-art object detection models on the BRVD dataset. Metrics include , , parameter count, GFLOPs, and FPS.
Table 5.
Performance comparison of BDNet with state-of-the-art object detection models on the BRVD dataset. Metrics include , , parameter count, GFLOPs, and FPS.
| Model | | | Params | GFLOPs | FPS (f/s) |
|---|
| SSD | 51.4 | 45.3 | 138.0 | 34.80 | 96.2 |
| Faster R-CNN | 65.1 | 60.1 | 41.2 | 292.3 | 122.0 |
| RT-DETR | 82.9 | 65.2 | 41.9 | 125.7 | 256.4 |
| LSOD-YOLO | 78.5 | 67.5 | 3.8 | 33.9 | 181.8 |
| SO-YOLOv8 | 78.2 | 61.4 | 69.8 | 263.0 | 166.7 |
| YOLO-FD | 65.9 | 30.3 | 12.0 | 52.8 | 208.3 |
| SD-YOLO-AWDNet | 81.5 | 65.1 | 3.7 | 8.3 | 204.1 |
| VP-YOLO | 82.8 | 67.3 | 66.8 | 129.7 | 232.6 |
| EL-YOLO | 60.5 | 42.5 | 1.1 | 6.7 | 185.2 |
| LVD-YOLO | 83.1 | 64.1 | 3.6 | 5.7 | 243.9 |
| MT-YOLO | 82.5 | 65.3 | 3.5 | 6.9 | 238.1 |
| BDNet (Our Model) | 85.9 | 67.3 | 2.5 | 6.0 | 285.7 |
Table 6.
Ablation study of BDNet on the BRVD validation set. Each configuration is evaluated in terms of precision (𝓟rec), recall (𝓡rec), m𝒜50, parameter count (Params), FLOPs, and inference speed in frames per second (FPS).
Table 6.
Ablation study of BDNet on the BRVD validation set. Each configuration is evaluated in terms of precision (𝓟rec), recall (𝓡rec), m𝒜50, parameter count (Params), FLOPs, and inference speed in frames per second (FPS).
| Baseline | HyDASE | C3K2_MogaBlock | A2C2f_FRFN | 𝓟rec | 𝓡rec | | Params | FLOPs (G) | FPS (f/s) |
|---|
| √ | | | | 87.3 | 73.8 | 84.5 | 2.6 | 6.5 | 196.1 |
| √ | √ | | | 88.0 | 75.0 | 85.3 | 2.3 | 5.6 | 256.4 |
| √ | | √ | | 88.1 | 74.4 | 84.9 | 2.5 | 6.2 | 250.0 |
| √ | | | √ | 86.8 | 74.8 | 84.7 | 2.7 | 6.3 | 243.9 |
| √ | √ | √ | | 88.5 | 75.3 | 85.8 | 2.3 | 5.8 | 263.2 |
| √ | √ | | √ | 88.5 | 75.2 | 85.6 | 2.5 | 5.8 | 277.8 |
| √ | | √ | √ | 88.3 | 75.5 | 85.5 | 2.7 | 6.4 | 270.3 |
| √ | √ | √ | √ | 88.4 | 75.3 | 85.9 | 2.5 | 6.0 | 285.7 |
Table 7.
Scale-wise comparison between YOLOv12 models and corresponding BDNet variants (n, s, m, l) on the BRVD validation set. Metrics include mean precision, recall, 𝓕1-score, and m𝒜 at IoU 0.50, 0.75, and 0.50–0.95, along with model size (Params) and computational cost (GFLOPs).
Table 7.
Scale-wise comparison between YOLOv12 models and corresponding BDNet variants (n, s, m, l) on the BRVD validation set. Metrics include mean precision, recall, 𝓕1-score, and m𝒜 at IoU 0.50, 0.75, and 0.50–0.95, along with model size (Params) and computational cost (GFLOPs).
| Model | 𝓟rec | 𝓡rec | 𝓕1-score | | | | Params | GFLOPs |
|---|
| YOLOv12n | 87.3 | 73.8 | 78.0 | 84.5 | 72.6 | 66.6 | 2.6 | 6.5 |
| BDNetn | 88.4 | 75.3 | 81.3 | 85.9 | 73.5 | 67.3 | 2.5 | 6.0 |
| YOLOv12s | 91.6 | 80.1 | 85.5 | 88.3 | 76.3 | 70.2 | 9.3 | 21.4 |
| BDNets | 88.5 | 81.1 | 84.6 | 89.3 | 78.5 | 71.3 | 9.0 | 19.5 |
| YOLOv12m | 91.2 | 84.4 | 87.7 | 90.3 | 80.4 | 72.2 | 20.2 | 67.5 |
| BDNetm | 89.1 | 84.2 | 86.6 | 91.2 | 82.2 | 74.4 | 17.3 | 55.7 |
| YOLOv12l | 91.9 | 84.2 | 87.9 | 90.6 | 79.2 | 75.2 | 26.4 | 88.9 |
| BDNetl | 91.0 | 83.1 | 86.9 | 91.5 | 82.9 | 75.7 | 23.8 | 77.3 |
Table 8.
Quantitative performance comparison between BDNet and YOLOv8-YOLOv12 baselines on the VisDrone-DET2019 validation set. All models are trained on BRVD and directly evaluated on VisDrone-DET2019 without fine-tuning.
Table 8.
Quantitative performance comparison between BDNet and YOLOv8-YOLOv12 baselines on the VisDrone-DET2019 validation set. All models are trained on BRVD and directly evaluated on VisDrone-DET2019 without fine-tuning.
| Model | 𝓟rec | 𝓡rec | 𝓕1-score | m𝓐𝓟50 | m𝓐𝓟75 | m𝓐𝓟50-95 | 𝓟arams | GFLOPs | FPS (f/s) |
|---|
| YOLOv8n | 38.9 | 30.6 | 33.4 | 29.6 | 25.6 | 17.8 | 3.2 | 8.7 | 39.2 |
| YOLOv9t | 42.2 | 31.3 | 35.1 | 31.3 | 26.3 | 17.9 | 2.0 | 7.7 | 49.3 |
| YOLOv10n | 39.0 | 30.8 | 33.7 | 29.5 | 25.1 | 17.0 | 2.3 | 6.7 | 59.9 |
| YOLOv11n | 42.9 | 31.4 | 35.5 | 31.4 | 24.8 | 17.6 | 2.6 | 6.5 | 82.0 |
| YOLOv12n | 40.0 | 30.2 | 33.6 | 29.3 | 25.4 | 16.5 | 2.6 | 6.5 | 87.7 |
| BDNet (Ours) | 43.6 | 32.6 | 36.6 | 31.9 | 25.9 | 17.9 | 2.5 | 6.0 | 104.2 |
Table 9.
Per-class m𝒜50 comparison between BDNet and YOLOv8-YOLOv12 models on the VisDrone-DET2019 validation set.
Table 9.
Per-class m𝒜50 comparison between BDNet and YOLOv8-YOLOv12 models on the VisDrone-DET2019 validation set.
| Class Name | YOLOv8n | YOLOv9t | YOLOv10n | YOLOv11n | YOLOv12n | BDNet (Ours) |
|---|
| Pedestrian | 30.6 | 31.9 | 30.6 | 32.9 | 30.2 | 34.0 |
| People | 24.1 | 25.6 | 25.9 | 26.8 | 24.8 | 28.0 |
| Bicycle | 5.7 | 7.4 | 6.6 | 6.1 | 5.5 | 7.5 |
| Car | 73.1 | 74.3 | 73.2 | 75.0 | 73.6 | 75.1 |
| Van | 36.1 | 37.6 | 33.3 | 36.5 | 34.8 | 37.3 |
| Truck | 24.2 | 26.3 | 25.1 | 24.9 | 24.0 | 26.6 |
| Tricycle | 18.5 | 21.6 | 17.7 | 19.6 | 18.4 | 20.4 |
| Awning-tricycle | 10.3 | 09.9 | 10.0 | 12.1 | 11.1 | 10.4 |
| Bus | 41.3 | 44.0 | 40.3 | 45.1 | 38.0 | 42.9 |
| Motor | 31.6 | 34.0 | 32.8 | 34.6 | 32.2 | 36.9 |
| All | 29.6 | 31.3 | 29.5 | 31.4 | 29.3 | 31.9 |