Adaptive Edge-Aware Detection with Lightweight Multi-Scale Fusion
Abstract
1. Introduction
- 1.
- Addressing geometric constraints in edge extraction:To overcome the limitations of traditional Sobel operators, which are restricted to fixed scales and discrete orientations, we propose the Variable Sobel Compact Inverted Block (VSCIB). By dynamically adjusting kernel size and orientation, this module significantly enhances the model’s ability to capture multi-scale and multi-directional edge features.
- 2.
- Optimizing feature fusion efficiency:Targeting the detail loss and redundancy caused by standard pooling operations, we introduce the Spatial Pyramid Shared Convolution (SPSC). This module replaces pooling with shared dilated convolutions, effectively preserving multi-scale information while avoiding unnecessary computational overhead.
- 3.
- Reducing downsampling loss: To mitigate severe semantic information loss during feature map reduction, we design the Efficient Downsampling Convolution (EDC). Utilizing a dual-branch architecture to balance compression and preservation, it substantially reduces parameter count and GFLOPs without compromising detection performance.
2. Related Works
2.1. You Only Look Once (YOLO)-11
2.2. Edge Feature Extraction
2.3. Spatial Pyramid Feature Fusion
3. Proposed Approach
3.1. Baseline
3.2. Variable Sobel Compact Inverted Block
| Algorithm 1 Optimized Algorithm for Sobel Kernel Rotation | |
| Input: Base Kernel , Rotation Angle | |
| Output: Rotated Kernel | |
| 1: // Kernel center | |
| 2: // Initialize output | |
| 3: foreach pixel in do | |
| 4: Compute source coordinates relative to center: | |
| 5: // Nearest neighbor | |
| 6: if and then | |
| 7: | |
| 8: end if | |
| 9: end for | |
| 10: return | |
- Synergistic Design Integration. The architectural logic of our framework relies on the complementary relationship between feature enhancement and structural efficiency. While the VSCIB module significantly improves the network’s sensitivity to high-frequency edge information and semantic context, this enriched representation risks degradation if subjected to standard, lossy downsampling operations (such as Max Pooling or strided convolutions). Furthermore, the computational investment made in the VSCIB for precise edge extraction necessitates a subsequent reduction in spatial redundancy to maintain overall model lightness. Consequently, the design transitions to the Efficient Downsampling Convolution (EDC). This module serves as the structural counterpart to VSCIB, ensuring that the integrity of the extracted edge features is preserved during dimensionality reduction while effectively offsetting computational costs through its streamlined dual-branch architecture.
3.3. Efficient Downsampling Convolution
- Synergy between Compression and Contextualization. The architectural flow progresses from efficient spatial compression to robust semantic expansion. While the EDC module successfully reduces computational redundancy and minimizes information loss during downsampling, the resulting feature maps require a mechanism to capture long-range dependencies and multi-scale context to ensure accurate detection. The design therefore integrates the Spatial Pyramid Shared Convolution (SPSC) immediately following the downsampling stages. This arrangement creates a critical synergy: EDC provides a lightweight, information-rich foundation, allowing the SPSC module to focus exclusively on maximizing the receptive field without the burden of processing spatially redundant data. Furthermore, by utilizing a shared-weight strategy, SPSC aligns with the parameter-saving philosophy of EDC, ensuring that the network’s depth translates into semantic richness rather than computational overhead.
3.4. Spatial Pyramid Shared Convolution (SPSC)
4. Experiments
4.1. Introduction of Datasets
4.2. Evaluation Indicators
4.3. Experimental Environments
4.4. Comparative Experiments on VSCIB
4.5. Comparative Experiments on EDC
4.6. Comparative Experiments on SPSC
4.7. Comparisons Experiments on Other Models
4.8. Ablation Study and Synergy Analysis
4.8.1. Logic of Module Synergy
- 1.
- Complementary Perception (VSCIB + SPSC): The VSCIB acts as a front-end feature enhancer, explicitly capturing high-frequency edge details. Since standard pooling layers tend to discard these fragile features, the SPSC serves as a mid-stage preserver. By replacing pooling with shared dilated convolutions, SPSC expands the receptive field while explicitly preserving the sharp edge features extracted by VSCIB, ensuring geometric priors are propagated to the detection head.
- 2.
- Efficiency–Accuracy Trade-off (EDC + VSCIB): The rotatable kernels in VSCIB inevitably incur computational costs. To counterbalance this, the EDC acts as a global efficiency regulator. Using a lightweight dual-branch architecture for downsampling, EDC significantly reduces the parameter count and GFLOPs in the backbone and head. This creates a “computational budget” that allows us to afford the sophisticated VSCIB module without compromising real-time performance.
- 3.
- Holistic Optimization: The combination aims for a non-linear performance boost where EDC enables the speed, VSCIB provides the precision, and SPSC ensures feature integrity across scales.
4.8.2. Quantitative Analysis
4.9. Qualitative Analysis
- 1.
- Spatial Constraint: In the baseline heatmaps (Columns c-e), activation regions are diffuse and often “spill over” into the background. This indicates that standard convolutions rely heavily on texture correlations, which are continuous across boundaries.
- 2.
- Edge Guidance: In contrast, our model’s heatmaps (Column b) display sharp cutoffs at object boundaries. For example, on the diver’s body, the high activation strictly follows the limb edges. This phenomenon demonstrates that the VSCIB module effectively introduces a geometric constraint: the strong gradient response at the edges acts as a barrier, confining the semantic features within the object’s instance mask.
- 3.
- Background Suppression: By prioritizing structural edges over textural noise, the model effectively suppresses irrelevant background activations. This is crucial in complex scenes where background textures (e.g., water ripples or street lights) often trigger false positives in baseline models.

4.10. Robustness Analysis in Extreme Scenarios
- 1.
- Severe Blur (Row 1): The baseline does not detect the holothurian due to diffuse gradients in the turbid water. Our method successfully localizes it, validating that VSCIB’s explicit geometric prior captures faint edges better than implicit adaptive methods.
- 2.
- Background Clutter (Row 2): In the snowy scene, the baseline misses the fine-grained skis. Our model, utilizing VSCIB’s multi-directional perception, accurately isolates the thin, elongated structure from the chaotic background.
- 3.
- Low Lighting and Occlusion (Row 3): The baseline misses the bicycle and the handbag (red box) in the dark street. Our method recovers these targets, attributed to the synergy of VSCIB’s noise suppression ( operator) and SPSC’s expanded contextual receptive field.

5. Discussion
5.1. Performance and Generalizability
5.2. Theoretical Comparisons
- 1.
- Boundary Ambiguity: Attention mechanisms rely on implicit feature matching. In scenarios with severe motion blur or low contrast (e.g., turbid underwater scenes), feature correlations become weak, often leading to “oversmoothed” boundaries. In contrast, our VSCIB module introduces an explicit geometric prior via rotatable Sobel operators. This imposes a structural constraint that forces the network to lock onto high-frequency gradient changes, offering superior boundary adherence where learned attention maps might fail to converge.
- 2.
- Occlusion Handling: Transformers excel at handling occlusion via global self-attention but suffer from quadratic computational cost (). Our framework addresses occlusion through the SPSC module, which employs dilated convolutions to expand the receptive field. This allows the model to capture sufficient surrounding context to infer occluded objects with linear complexity (), achieving a pragmatic balance between contextual robustness and real-time latency.
5.3. Limitations and Failure Case Analysis
- Sensitivity to Adversarial Perturbations. While our method demonstrates strong robustness against natural environmental noise (e.g., underwater turbidity, low-light ISO noise) due to the smoothing operator in VSCIB, its resilience against adversarial attacks remains an open research question. Unlike random natural noise, adversarial perturbations are often meticulously calculated to invert gradient directions with imperceptible texture changes. Since VSCIB explicitly relies on gradient computation, it may be theoretically more susceptible to such gradient-targeted attacks than purely semantic-based networks. We acknowledge that no specific adversarial defense mechanisms (e.g., adversarial training) were integrated into this study.Failure Cases in High-Frequency Textures. Although the VSCIB includes a low-pass filtering mechanism to suppress noise, it may encounter difficulties in scenarios with extremely dense, high-frequency repetitive patterns (e.g., dense rain streaks, camouflage nets, or chain-link fences). In these cases, the texture frequency may overlap with the edge frequency of the target objects. If the texture intensity is strong, the Sobel operators might interpret these patterns as structural edges, leading to false positives or fragmented detection boxes. Future iterations will need to explore adaptive frequency-domain filtering to better distinguish between structural edges and high-frequency textural noise.
5.4. Future Work
- 1.
- Adversarial Robustness: Investigating the integration of adversarial training strategies to fortify the edge-aware mechanisms against synthetic perturbations and gradient-based attacks.
- 2.
- Architecture Optimization: Combining Neural Architecture Search (NAS) to further reduce computational complexity and parameters.
- 3.
- Model Compression: Exploring pruning and quantization for deployment on extreme edge devices (e.g., mobile or embedded systems).
- 4.
- Cross-Domain Adaptability: Validating the generalization ability of Edge Aware-YOLO in other domains where edge features are critical, such as medical image analysis (e.g., lesion segmentation) and remote sensing.
- 5.
- Integration with Heterogeneous Architectures: Extending the application of the proposed modules (VSCIB, EDC, SPSC) beyond the YOLO framework to other mainstream backbones (e.g., ResNet, Swin Transformer) and detection heads (e.g., Faster R-CNN, RetinaNet) to comprehensively verify their architectural universality and plug-and-play capability.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bogdoll, D.; Nitsche, M.; Zöllner, J.M. Anomaly detection in autonomous driving: A survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 4488–4499. [Google Scholar]
- Dos Reis, D.H.; Welfer, D.; De Souza Leite Cuadros, M.A.; Gamarra, D.F.T. Mobile robot navigation using an object recognition software with RGBD images and the YOLO algorithm. Appl. Artif. Intell. 2019, 33, 1290–1305. [Google Scholar] [CrossRef]
- Zeng, F.; Dong, B.; Zhang, Y.; Wang, T.; Zhang, X.; Wei, Y. Motr: End-to-end multiple-object tracking with transformer. In Computer Vision—ECCV 2022; Springer: Cham, Switzerland, 2022; pp. 659–675. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1922–1933. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2023; Volume 36, pp. 51094–51112. [Google Scholar]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. Damo-yolo: A report on real-time object detection design. arXiv 2022, arXiv:2211.15444. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Geng, Y.; Jiang, H.; Cheng, M.; Zhang, B.; Ke, Z.; Xu, X.; Chu, X. Yolov6 v3. 0: A full-scale reloading. arXiv 2023, arXiv:2301.05586. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Guo, M.H.; Lu, C.Z.; Liu, Z.N.; Cheng, M.M.; Hu, S.M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2012; Volume 25. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Jocher, G. Yolov11. GitHub Repository. 2024. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 18 January 2026).
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H. Designing network design strategies through gradient path analysis. arXiv 2022, arXiv:2211.04800. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In Computer Vision—ECCV 2024; Springer: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2024; Volume 37, pp. 107984–108011. [Google Scholar]
- Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11030–11039. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
- Fu, C.; Liu, R.; Fan, X.; Chen, P.; Fu, H.; Yuan, W.; Zhu, M.; Luo, Z. Rethinking general underwater object detection: Datasets, challenges, and solutions. Neurocomputing 2023, 517, 243–256. [Google Scholar] [CrossRef]
- Ren, M.; Zhang, X.; Zhi, X.; Wei, Y.; Feng, Z. An annotated street view image dataset for automated road damage detection. Sci. Data 2024, 11, 407. [Google Scholar] [CrossRef] [PubMed]
- Gochoo, M. Safety Helmet Wearing Dataset. Mendeley Data. 2021. Available online: https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset (accessed on 17 December 2019).
- Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Computer Vision—ECCV 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; Volume 28. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Jocher, G. Yolov8. GitHub Repository. 2023. Available online: https://github.com/ultralytics/ultralytics/tree/main (accessed on 18 January 2026).
















| Datasets | Images | Data Split | Challenges | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Lab. | Cat. | Bac. | Train | Test | Tol | HLE. | CC. | CO. | |
| RUOD [31] | 74,903 | 10 | G | 9800 | 4200 | 14,000 | ✔ | ✔ | ✔ |
| SVRDD [32] | 20,804 | 7 | G | 6000 | 1000 | 7000 | ✔ | ||
| SHWD [33] | 120,558 | 2 | S | 6064 | 1517 | 7581 | |||
| VOC2007+2012 [34] | 40,058 | 20 | G | 16,551 | 4952 | 21,503 | ✔ | ||
| MS COCO2017 [35] | 846,684 | 80 | G | 118,287 | 5000 | 123,287 | ✔ | ||
| Models | Precision (%) | Recall (%) | mAP50 (%) | mAP50-95 (%) | Model Size (MB) | Param (M) | GFLOPs |
|---|---|---|---|---|---|---|---|
| C3k2 | 80.7 | 74.4 | 81.4 | 61.4 | 5.3 | 2.6 | 6.5 |
| C2f | 80.4 | 74.0 | 81.1 | 60.6 | 6.0 | 2.9 | 6.7 |
| C3 | 79.5 | 73.5 | 80.5 | 59.8 | 5.0 | 2.4 | 6.3 |
| VSCIB (ours) | 80.9 | 74.8 | 82.0 | 62.7 | 5.5 | 2.6 | 6.6 |
| Models | Precision (%) | Recall (%) | mAP50 (%) | mAP50-95 (%) | Model Size (MB) | Param (M) | GFLOPs |
|---|---|---|---|---|---|---|---|
| Conv | 80.7 | 74.4 | 81.4 | 61.4 | 5.3 | 2.6 | 6.5 |
| DWConv | 80.0 | 73.7 | 80.7 | 60.7 | 4.2 | 1.7 | 4.7 |
| SCDown | 80.9 | 73.8 | 80.9 | 60.8 | 4.2 | 1.8 | 5.3 |
| ADown | 79.4 | 73.1 | 79.4 | 57.7 | 4.4 | 2.0 | 4.5 |
| EDC (ours) | 80.6 | 75.1 | 81.4 | 61.8 | 4.6 | 2.2 | 5.7 |
| Models | Precision (%) | Recall (%) | mAP50 (%) | mAP50-95 (%) | Model Size (MB) | Param (M) | GFLOPs |
|---|---|---|---|---|---|---|---|
| SPPF | 80.7 | 74.4 | 81.4 | 61.4 | 5.3 | 2.6 | 6.5 |
| SPP | 80.5 | 74.2 | 81.2 | 61.2 | 5.2 | 2.6 | 6.5 |
| ASPP | 79.0 | 72.7 | 79.8 | 60.6 | 7.6 | 4.6 | 8.1 |
| RFB | 82.0 | 73.6 | 82.0 | 62.1 | 5.9 | 2.8 | 6.6 |
| SPSC (ours) | 80.6 | 75.4 | 82.5 | 62.6 | 5.8 | 2.7 | 6.5 |
| Models | RUOD | SVRDD | SHWD | Pascal VOC | MS COCO | Param (M) | GFLOPs | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mAP50 | mAP50-95 | mAP50 | mAP50-95 | mAP50 | mAP50-95 | mAP50 | mAP50-95 | mAP50 | mAP50-95 | |||
| Faster-RCNN [36] | 81.8 | 57.5 | 64.9 | 36.3 | 90.9 | 59.2 | 73.2 | 53.4 | 43.9 | 21.9 | 41.14 | 63.3 |
| RetinaNet [37] | 79.3 | 54.5 | 60.4 | 30.9 | 85.4 | 63.5 | 76.4 | 55.9 | 48.4 | 27.2 | 36.17 | 39.7 |
| FCOS | 79.5 | 54.0 | 60.2 | 32.2 | 85.8 | 63.9 | 78.4 | 58.7 | 56.1 | 37.4 | 31.84 | 38.8 |
| ATSS [38] | 80.3 | 56.9 | 63.6 | 34.9 | 89.4 | 57.8 | 79.8 | 58.6 | 55.6 | 38.3 | 31.89 | 38.8 |
| YOLOv5n | 81.4 | 64.8 | 61.7 | 35.8 | 91.3 | 57.8 | 79.3 | 58.1 | 52.4 | 37.4 | 2.5 | 7.1 |
| YOLOv8n [39] | 84.7 | 67.4 | 63.5 | 36.8 | 92.2 | 60.1 | 80.4 | 60.1 | 52.1 | 37.3 | 3.2 | 8.7 |
| YOLOv10n | 84.6 | 69.8 | 59.7 | 35.8 | 91.6 | 58.4 | 80.1 | 60.5 | 53.5 | 38.5 | 2.3 | 6.7 |
| YOLOv11n | 84.9 | 67.9 | 63.1 | 37.5 | 92.2 | 59.5 | 81.4 | 61.4 | 54.9 | 39.5 | 2.6 | 6.5 |
| Ours | 85.8 | 68.4 | 65.1 | 39.5 | 92.9 | 60.9 | 83.2 | 63.9 | 56.3 | 40.5 | 2.4 | 5.8 |
| Modules | VSCIB | EDC | SPSC | Precision (%) | Recall (%) | mAP50 (%) | mAP50-95 (%) | Model Size (MB) | Param (M) | GFLOPs | FPS |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A (Baseline) | 80.7 | 74.4 | 81.4 | 61.4 | 5.3 | 2.6 | 6.5 | 115 | |||
| B | ✔ | 80.9 | 74.8 | 82.0 | 62.7 | 5.5 | 2.6 | 6.6 | 105 | ||
| C | ✔ | ✔ | 81.4 | 74.5 | 82.1 | 63.0 | 4.8 | 2.2 | 5.8 | 124 | |
| D | ✔ | ✔ | 81.5 | 74.8 | 82.5 | 63.2 | 5.8 | 2.8 | 6.5 | 108 | |
| E | ✔ | ✔ | 80.6 | 75.1 | 82.5 | 62.7 | 4.9 | 2.3 | 5.7 | 128 | |
| F | ✔ | 82.2 | 73.0 | 81.4 | 61.8 | 4.6 | 2.2 | 5.7 | 132 | ||
| G | ✔ | 80.6 | 75.4 | 82.5 | 62.6 | 5.6 | 2.7 | 6.5 | 112 | ||
| H (Ours) | ✔ | ✔ | ✔ | 80.3 | 76.6 | 83.2 | 63.9 | 5.2 | 2.4 | 5.8 | 119 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Pan, X.; Xiong, K.; Li, J. Adaptive Edge-Aware Detection with Lightweight Multi-Scale Fusion. Electronics 2026, 15, 449. https://doi.org/10.3390/electronics15020449
Pan X, Xiong K, Li J. Adaptive Edge-Aware Detection with Lightweight Multi-Scale Fusion. Electronics. 2026; 15(2):449. https://doi.org/10.3390/electronics15020449
Chicago/Turabian StylePan, Xiyu, Kai Xiong, and Jianjun Li. 2026. "Adaptive Edge-Aware Detection with Lightweight Multi-Scale Fusion" Electronics 15, no. 2: 449. https://doi.org/10.3390/electronics15020449
APA StylePan, X., Xiong, K., & Li, J. (2026). Adaptive Edge-Aware Detection with Lightweight Multi-Scale Fusion. Electronics, 15(2), 449. https://doi.org/10.3390/electronics15020449

