Sparse-Gated RGB-Event Fusion for Small Object Detection in the Wild
Abstract
1. Introduction
- We propose the TMAF module, which effectively fuses event streams at multiple temporal scales to enhance the feature saliency of small moving objects while mitigating noise interference;
- We introduce the SNGAF module, a sparse-gated fusion mechanism inspired by mixture-of-experts [26] (MoE) models, which adaptively integrates RGB and event features to accommodate dynamic input characteristics;
- We present RGBE-UAV, the first publicly available dataset tailored for small moving object detection using RGB-Event modalities, featuring a wide range of lighting conditions and environments;
- We achieve state-of-the-art (SOTA) detection performance on both the RGBE-UAV and DSEC-MOD [9] datasets, validating the effectiveness of our approach through extensive quantitative and qualitative evaluations.
2. Related Work
2.1. Event Processing
2.2. RGB-Event Fusion for Object Detection
2.3. RGB-Event Object Detection Dataset
3. Method
3.1. Network Overview
3.2. Temporal Multi-Scale Attention Fusion Module for Event Processing
3.3. Sparse Noisy Gated Attention Fusion Module for RGB-Event Fusion
4. Experiments
4.1. Dataset
4.2. Experiment Settings
4.3. Main Results
4.3.1. Quantitative Comparison
4.3.2. Qualitative Comparison
4.4. Ablation Studies
4.4.1. Temporal Scale Selection Analysis
4.4.2. Module Contribution Analysis
4.4.3. Expert Configuration Impact
4.4.4. Modality Complementarity Validation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
RGB | Red–Green–Blue |
UAV | Unmanned Aerial Vehicle |
DVS | Dynamic Vision Sensor |
DSEC | A Stereo Event Camera Dataset for Driving Scenarios |
MOD | Moving Object Detection |
CNN | Convolutional Neural Network |
FPN | Feature Pyramid Network |
MVSEC | Multi-Vehicle Stereo Event Camera dataset |
MoE | Mixture-of-Experts |
TMAF | Temporal Multi-Scale Attention Fusion |
SNGAF | Sparse Noisy Gated Attention Fusion |
MLP | Multilayer Perceptron |
SOTA | State of the Art |
mAP | Mean Average Precision |
IoU | Intersection over Union |
Params | Number of Parameters |
FLOPs | Number of Floating-Point Operations |
FP | False Positive |
TP | True Positive |
FN | False Negative |
References
- Cheng, G.; Yuan, X.; Yao, X.; Yan, K.; Zeng, Q.; Xie, X.; Han, J. Towards large-scale small object detection: Survey and benchmarks. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13467–13488. [Google Scholar] [CrossRef]
- Wang, X.; Wang, A.; Yi, J.; Song, Y.; Chehri, A. Small object detection based on deep learning for remote sensing: A comprehensive review. Remote Sens. 2023, 15, 3265. [Google Scholar] [CrossRef]
- Rashed, H.; Ramzy, M.; Vaquero, V.; El Sallab, A.; Sistu, G.; Yogamani, S. Fusemodnet: Real-time camera and lidar based moving object detection for robust low-light autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar] [CrossRef]
- Wu, Z.; Gobichettipalayam, S.; Tamadazte, B.; Allibert, G.; Paudel, D.P.; Demonceaux, C. Robust rgb-d fusion for saliency detection. In Proceedings of the 2022 International Conference on 3D Vision (3DV), Prague, Czech Republic, 12–16 September 2022; pp. 403–413. [Google Scholar] [CrossRef]
- Zhen, W.; Scherer, S. Estimating the localizability in tunnel-like environments using LiDAR and UWB. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 4903–4908. [Google Scholar] [CrossRef]
- Chen, N.; Xiao, C.; Dai, Y.; He, S.; Li, M.; An, W. Event-based Tiny Object Detection: A Benchmark Dataset and Baseline. arXiv 2025, arXiv:2506.23575. [Google Scholar] [CrossRef]
- Gallego, G.; Delbrück, T.; Orchard, G.; Bartolozzi, C.; Taba, B.; Censi, A.; Leutenegger, S.; Davison, A.J.; Conradt, J.; Daniilidis, K.; et al. Event-Based Vision: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 154–180. [Google Scholar] [CrossRef] [PubMed]
- Rebecq, H.; Ranftl, R.; Koltun, V.; Scaramuzza, D. High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1964–1980. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Z.; Wu, Z.; Boutteau, R.; Yang, F.; Demonceaux, C.; Ginhac, D. RGB-Event Fusion for Moving Object Detection in Autonomous Driving. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 7808–7815. [Google Scholar] [CrossRef]
- Cao, J.; Zheng, X.; Lyu, Y.; Wang, J.; Xu, R.; Wang, L. Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 9026–9032. [Google Scholar] [CrossRef]
- Mondal, A.; Giraldo, J.H.; Bouwmans, T.; Chowdhury, A.S. Moving object detection for event-based vision using graph spectral clustering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 876–884. [Google Scholar] [CrossRef]
- Sironi, A.; Brambilla, M.; Bourdis, N.; Lagorce, X.; Benosman, R. HATS: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1731–1740. [Google Scholar] [CrossRef]
- Zhu, A.Z.; Yuan, L.; Chaney, K.; Daniilidis, K. EV-FlowNet: Self-supervised optical flow estimation for event-based cameras. arXiv 2018, arXiv:1802.06898. [Google Scholar] [CrossRef]
- Gehrig, D.; Rebecq, H.; Gallego, G.; Scaramuzza, D. EKLT: Asynchronous photometric feature tracking using events and frames. Int. J. Comput. Vis. 2020, 128, 601–618. [Google Scholar] [CrossRef]
- Manderscheid, J.; Sironi, A.; Bourdis, N.; Migliore, D.; Lepetit, V. Speed invariant time surface for learning to detect corner points with event-based cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 10245–10254. [Google Scholar] [CrossRef]
- Bardow, P.; Davison, A.J.; Leutenegger, S. Simultaneous optical flow and intensity estimation from an event camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 884–892. [Google Scholar] [CrossRef]
- Zhu, A.Z.; Yuan, L.; Chaney, K.; Daniilidis, K. Unsupervised Event-Based Optical Flow Using Motion Compensation. In Proceedings of the Computer Vision—ECCV 2018 Workshops, Munich, Germany, 8–14 September 2018; Leal-Taixé, L., Roth, S., Eds.; Springer: Cham, Switzerland, 2019; pp. 711–714. [Google Scholar] [CrossRef]
- Rebecq, H.; Ranftl, R.; Koltun, V.; Scaramuzza, D. Events-To-Video: Bringing Modern Computer Vision to Event Cameras. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3852–3861. [Google Scholar] [CrossRef]
- Zhu, A.Z.; Yuan, L.; Chaney, K.; Daniilidis, K. Unsupervised event-based learning of optical flow, depth, and egomotion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 989–997. [Google Scholar] [CrossRef]
- Tomy, A.; Paigwar, A.; Mann, K.S.; Renzaglia, A.; Laugier, C. Fusing event-based and rgb camera for robust object detection in adverse conditions. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 933–939. [Google Scholar] [CrossRef]
- Sun, L.; Sakaridis, C.; Liang, J.; Jiang, Q.; Yang, K.; Sun, P.; Ye, Y.; Wang, K.; Gool, L.V. Event-based fusion for motion deblurring with cross-modal attention. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 412–428. [Google Scholar] [CrossRef]
- Tulyakov, S.; Gehrig, D.; Georgoulis, S.; Erbach, J.; Gehrig, M.; Li, Y.; Scaramuzza, D. Time lens: Event-based video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16155–16164. [Google Scholar] [CrossRef]
- Lin, G.; Han, J.; Cao, M.; Zhong, Z.; Zheng, Y. Event-guided frame interpolation and dynamic range expansion of single rolling shutter image. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 3078–3088. [Google Scholar] [CrossRef]
- Zhou, W.; Guo, Q.; Lei, J.; Yu, L.; Hwang, J.N. ECFFNet: Effective and consistent feature fusion network for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1224–1235. [Google Scholar] [CrossRef]
- Gao, W.; Liao, G.; Ma, S.; Li, G.; Liang, Y.; Lin, W. Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2091–2106. [Google Scholar] [CrossRef]
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv 2017, arXiv:1701.06538. [Google Scholar] [CrossRef]
- Jiang, B.; Li, Z.; Asif, M.S.; Cao, X.; Ma, Z. Token-Based Spatiotemporal Representation of the Events. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 5240–5244. [Google Scholar] [CrossRef]
- Xie, B.; Deng, Y.; Shao, Z.; Xu, Q.; Li, Y. Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 13427–13440. [Google Scholar] [CrossRef]
- Peng, Y.; Zhang, Y.; Xiong, Z.; Sun, X.; Wu, F. GET: Group Event Transformer for Event-Based Vision. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 6015–6025. [Google Scholar] [CrossRef]
- Xu, F.; Yu, L.; Wang, B.; Yang, W.; Xia, G.S.; Jia, X.; Qiao, Z.; Liu, J. Motion deblurring with real events. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2583–2592. [Google Scholar] [CrossRef]
- Zhou, C.; Teng, M.; Han, J.; Liang, J.; Xu, C.; Cao, G.; Shi, B. Deblurring low-light images with events. Int. J. Comput. Vis. 2023, 131, 1284–1298. [Google Scholar] [CrossRef]
- Yao, B.; Deng, Y.; Liu, Y.; Chen, H.; Li, Y.; Yang, Z. Sam-event-adapter: Adapting segment anything model for event-rgb semantic segmentation. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 9093–9100. [Google Scholar] [CrossRef]
- Kachole, S.; Huang, X.; Naeini, F.B.; Muthusamy, R.; Makris, D.; Zweiri, Y. Bimodal SegNet: Fused instance segmentation using events and RGB frames. Pattern Recognit. 2024, 149, 110215. [Google Scholar] [CrossRef]
- Devulapally, A.; Khan, M.F.F.; Advani, S.; Narayanan, V. Multi-modal fusion of event and rgb for monocular depth estimation using a unified transformer-based architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2081–2089. [Google Scholar] [CrossRef]
- Zhu, P.; Sun, Y.; Cao, B.; Hu, Q. Task-customized mixture of adapters for general image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 7099–7108. [Google Scholar] [CrossRef]
- Cao, B.; Sun, Y.; Zhu, P.; Hu, Q. Multi-modal gated mixture of local-to-global experts for dynamic image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 23555–23564. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Gehrig, M.; Aarents, W.; Gehrig, D.; Scaramuzza, D. DSEC: A Stereo Event Camera Dataset for Driving Scenarios. IEEE Robot. Autom. Lett. 2021, 6, 4947–4954. [Google Scholar] [CrossRef]
- Zhu, A.Z.; Thakur, D.; Özaslan, T.; Pfrommer, B.; Kumar, V.; Daniilidis, K. The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. IEEE Robot. Autom. Lett. 2018, 3, 2032–2039. [Google Scholar] [CrossRef]
- Hu, Y.; Liu, S.C.; Delbruck, T. v2e: From Video Frames to Realistic DVS Events. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 1312–1321. [Google Scholar] [CrossRef]
- Wang, X.; Li, J.; Zhu, L.; Zhang, Z.; Chen, Z.; Li, X.; Wang, Y.; Tian, Y.; Wu, F. Visevent: Reliable object tracking via collaboration of frame and event flows. IEEE Trans. Cybern. 2023, 54, 1997–2010. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Wang, S.; Tang, C.; Zhu, L.; Jiang, B.; Tian, Y.; Tang, J. Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19248–19257. [Google Scholar] [CrossRef]
- Magrini, G.; Becattini, F.; Pala, P.; Del Bimbo, A.; Porta, A. Neuromorphic Drone Detection: An Event-RGB Multimodal Approach. In Proceedings of the Computer Vision—ECCV 2024 Workshops; Del Bue, A., Canton, C., Pont-Tuset, J., Tommasi, T., Eds.; Springer: Cham, Switzerland, 2025; pp. 259–275. [Google Scholar] [CrossRef]
- Li, Y.; Li, X.; Li, Y.; Zhang, Y.; Dai, Y.; Hou, Q.; Cheng, M.M.; Yang, J. Sm3det: A unified model for multi-modal remote sensing object detection. arXiv 2024, arXiv:2412.20665. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar] [CrossRef]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar] [CrossRef]
Dataset | AVG Object Scale | MS | MT | OE | NE | UE | Year |
---|---|---|---|---|---|---|---|
EventVOT [43] | pixels | ✓ | × | × | ✓ | ✓ | 2024 |
VisEvent [42] | pixels | ✓ | × | × | ✓ | ✓ | 2023 |
NeRDD [44] | pixels | × | ✓ | × | ✓ | ✓ | 2024 |
DSEC-MOD [9] | pixels | × | ✓ | × | ✓ | ✓ | 2023 |
RGBE-UAV | pixels | ✓ | ✓ | ✓ | ✓ | ✓ | 2025 |
Modality | Method | Pulication | Params | FLOPs | RGBE-UAV | DSEC-MOD | ||
---|---|---|---|---|---|---|---|---|
mAP50 | mAP75 | mAP50 | mAP75 | |||||
RGB-only | RetinaNet [46] | ICCV2017 | 36.33M | 61.37G | 0.7304 | 0.1622 | 0.3074 | 0.1964 |
FCOS [47] | ICCV2019 | 32.29M | 61.62G | 0.7019 | 0.1764 | 0.2943 | 0.2011 | |
Deformable DETR [48] | ICLR2021 | 40.80M | 60.20G | 0.7865 | 0.1824 | 0.3901 | 0.2259 | |
YOLOv10 [49] | NeurIPS2024 | 16.58M | 64.50G | 0.8680 | 0.2240 | 0.4550 | 0.2759 | |
Event-only | RetinaNet [46] | ICCV2017 | 36.33M | 61.37G | 0.3903 | 0.0311 | 0.3366 | 0.1358 |
FCOS [47] | ICCV2019 | 32.29M | 61.62G | 0.3859 | 0.0204 | 0.3171 | 0.1336 | |
Deformable DETR [48] | ICLR2021 | 40.80M | 60.20G | 0.4323 | 0.0324 | 0.3521 | 0.1433 | |
YOLOv10 [49] | NeurIPS2024 | 16.58M | 64.50G | 0.5780 | 0.0435 | 0.3602 | 0.1682 | |
RGB-Event Fusion | Early-Fusion | — | 33.52M | 167.20G | 0.8974 | 0.1864 | 0.4718 | 0.2632 |
FPN-Fusion [20] | ICRA2022 | 59.85M | 198.40G | 0.9186 | 0.2583 | 0.5729 | 0.3636 | |
RENet [9] | ICRA2023 | 87.74M | 273.98G | 0.9456 | 0.2795 | 0.5890 | 0.3977 | |
EOLO [10] | ICRA2024 | 106.93M | 327.79G | 0.9333 | 0.3046 | 0.6681 | 0.4352 | |
Ours | — | 60.77M | 198.41G | 0.9557 | 0.3159 | 0.6810 | 0.4531 |
Method | Over | Normal | Under | |||
---|---|---|---|---|---|---|
mAP50 | mAP75 | mAP50 | mAP75 | mAP50 | mAP75 | |
Early-Fusion | 0.8294 | 0.1617 | 0.9585 | 0.2229 | 0.8438 | 0.2160 |
FPN-Fusion [20] | 0.8458 | 0.1929 | 0.9687 | 0.3070 | 0.9067 | 0.2720 |
RENet [9] | 0.8534 | 0.1855 | 0.9737 | 0.2722 | 0.9372 | 0.2679 |
EOLO [10] | 0.8407 | 0.1941 | 0.9848 | 0.3203 | 0.9370 | 0.2772 |
Ours | 0.8601 | 0.2209 | 0.9884 | 0.3395 | 0.9419 | 0.3046 |
Combination | (ms) | (ms) | (ms) | mAP50 | mAP75 |
---|---|---|---|---|---|
# 1 | 5 | 10 | 20 | 0.9285 | 0.2853 |
# 2 | 10 | 20 | 30 | 0.9391 | 0.2944 |
# 3 | 15 | 25 | 35 | 0.9490 | 0.3089 |
# 4 | 20 | 30 | 40 | 0.9525 | 0.3126 |
# 5 | 30 | 35 | 40 | 0.9453 | 0.3094 |
# 6 | 15 | 30 | 40 | 0.9557 | 0.3159 |
Baseline | TMAF | SNGAF w/o Noise | SNGAF | mAP50 | mAP75 |
---|---|---|---|---|---|
✓ | 0.8768 | 0.2453 | |||
✓ | ✓ | 0.9186 | 0.2583 | ||
✓ | ✓ | 0.9312 | 0.2895 | ||
✓ | ✓ | 0.9393 | 0.2982 | ||
✓ | ✓ | ✓ | 0.9475 | 0.3110 | |
✓ | ✓ | ✓ | 0.9557 | 0.3159 |
1CMAFE | 2CMAFE | SNGAF | mAP50 | mAP75 |
---|---|---|---|---|
✓ | 0.9397 | 0.2742 | ||
✓ | 0.9282 | 0.3131 | ||
✓ | 0.9557 | 0.3159 |
RGB Modality | Event Modality | mAP50 | mAP75 |
---|---|---|---|
✓ | 0.9074 | 0.2948 | |
✓ | 0.4519 | 0.0243 | |
✓ | ✓ | 0.9557 | 0.3159 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shi, Y.; Li, M.; Chen, N.; Luo, Y.; He, S.; An, W. Sparse-Gated RGB-Event Fusion for Small Object Detection in the Wild. Remote Sens. 2025, 17, 3112. https://doi.org/10.3390/rs17173112
Shi Y, Li M, Chen N, Luo Y, He S, An W. Sparse-Gated RGB-Event Fusion for Small Object Detection in the Wild. Remote Sensing. 2025; 17(17):3112. https://doi.org/10.3390/rs17173112
Chicago/Turabian StyleShi, Yangsi, Miao Li, Nuo Chen, Yihang Luo, Shiman He, and Wei An. 2025. "Sparse-Gated RGB-Event Fusion for Small Object Detection in the Wild" Remote Sensing 17, no. 17: 3112. https://doi.org/10.3390/rs17173112
APA StyleShi, Y., Li, M., Chen, N., Luo, Y., He, S., & An, W. (2025). Sparse-Gated RGB-Event Fusion for Small Object Detection in the Wild. Remote Sensing, 17(17), 3112. https://doi.org/10.3390/rs17173112