EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera
Abstract
1. Introduction
1.1. Event Camera
1.2. Design Overview
- Perform stereo calibration between the event camera and the RGB camera to determine the relative extrinsic transformation between the two cameras.
- Detect motion in the event camera by clustering the events.
- Project detected clusters onto RGB image plane using the extrinsic transformation, thus determining the region of interest (ROI) in the image.
- Crop the RGB image based on ROIs.
- Run object detection on cropped images.
1.3. Contributions of This Paper
- A comprehensive review of RGB and event camera sensor fusion in the literature.
- An adaptation to common camera calibration techniques that facilitates the calibration of an event-based camera with a frame-based camera using common automated software.
- A novel foveal vision framework that combines event-based and frame-based cameras to achieve the following:
- −
- Enhanced small object detection through selective high-resolution processing (78% accuracy improvement over baseline).
- −
- Reduced computational complexity via EPCNet architecture, using the event camera as a region proposer with a lightweight classifier (40% lower latency than YOLO while maintaining comparable mAP).
- A proposed system design that leverages the power efficiency of an event camera in a real-time monitoring system suitable for embedded applications.
2. Related Works
2.1. Event Camera
2.2. Event–RGB Sensor Fusion
2.3. Foveal Vision
2.4. Summary
3. Methods
3.1. Calibration
- Create flashing checkerboard GIF and display on large screen.
- Mount RGB camera on apparatus beside event camera and ensure apparatus is stable and secure.
- Record video of flashing checkerboard using both cameras simultaneously and repeat for 10–20 different camera angles.
- Extract singular frame from each video, and extract image from each frame.
- Process images to improve clarity and ensure image sizes are equal.
- Calibrate cameras using calibration software to determine the intrinsic and extrinsic parameters of each camera.
3.1.1. Calibration Software
3.1.2. Translation of Points
3.1.3. Average Extrinsic Parameters
3.2. Fovea Camera Implementation
3.2.1. Data Processing
3.2.2. Clustering & Projection
3.3. Object Detection
3.3.1. Performance Measurement
3.3.2. YOLO Algorithm
3.3.3. Proof-of-Concept Implementation
3.3.4. Data Annotation
3.4. Event-Region Proposal and Classification Network (EPCNet)
3.4.1. Design Overview
3.4.2. Classification Model
3.4.3. Training Results
4. Results
4.1. Model Accuracy
4.2. Network Latency
5. Discussion
5.1. Projection
5.2. Detection & Classification
5.3. Proposed Design
6. Limitations
6.1. Initialization
6.2. Time Synchronization
6.3. ECPNet Performance
6.4. Bounding Box Accuracy and IoU Thresholds
7. Conclusions & Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Calibration
- Event camera images were cropped to 640 × 338 pixels to match the aspect ratio of the RGB camera, and the RGB images were downscaled to match this resolution, maintaining their aspect ratio of 1.9.
- Additional angles were recorded to cover the entire FOV of the cameras and added to original calibration images.
Appendix A.2. EPCNet
References
- Stewart, E.E.M.; Valsecchi, M.; Schütz, A.C. A review of interactions between peripheral and foveal vision. J. Vis. 2020, 20, 2. [Google Scholar] [CrossRef] [PubMed]
- Prophesee. Metavision for Machines. Available online: https://www.prophesee.ai/ (accessed on 4 January 2024).
- Gallego, G.; Delbrück, T.; Orchard, G.; Bartolozzi, C.; Taba, B.; Censi, A.; Leutenegger, S.; Davison, A.J.; Conradt, J.; Daniilidis, K.; et al. Event-Based Vision: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 154–180. [Google Scholar] [CrossRef] [PubMed]
- Gehrig, D.; Scaramuzza, D. Are High-Resolution Event Cameras Really Needed? arXiv 2022, arXiv:2203.14672. [Google Scholar] [CrossRef]
- Molloy, D.; Deegan, B.; Mullins, D.; Ward, E.; Horgan, J.; Eising, C.; Denny, P.; Jones, E.; Glavin, M. Impact of ISP Tuning on Object Detection. J. Imaging 2023, 9, 260. [Google Scholar] [CrossRef] [PubMed]
- Molloy, D.; Müller, P.; Deegan, B.; Mullins, D.; Horgan, J.; Ward, E.; Jones, E.; Braun, A.; Glavin, M. Analysis of the Impact of Lens Blur on Safety-Critical Automotive Object Detection. IEEE Access 2024, 12, 3554–3569. [Google Scholar] [CrossRef]
- Rebecq, H.; Ranftl, R.; Koltun, V.; Scaramuzza, D. High Speed and High Dynamic Range Video with an Event Camera. arXiv 2019, arXiv:1906.07165. [Google Scholar] [CrossRef] [PubMed]
- Perot, E.; De Tournemire, P.; Nitti, D.; Masci, J.; Sironi, A. Learning to detect objects with a 1 megapixel event camera. Adv. Neural Inf. Process. Syst. 2020, 33, 16639–16652. [Google Scholar]
- Ryan, C.; O’Sullivan, B.; Elrasad, A.; Lemley, J.; Kielty, P.; Posch, C.; Perot, E. Real-Time Face & Eye Tracking and Blink Detection using Event Cameras. arXiv 2020, arXiv:2010.08278. [Google Scholar] [CrossRef]
- Hinz, G.; Chen, G.; Aafaque, M.; Röhrbein, F.; Conradt, J.; Bing, Z.; Qu, Z.; Stechele, W.; Knoll, A. Online Multi-object Tracking-by-Clustering for Intelligent Transportation System with Neuromorphic Vision Sensor. In Proceedings of the KI 2017: Advances in Artificial Intelligence, Dortmund, Germany, 25–29 September 2017; Kern-Isberner, G., Fürnkranz, J., Thimm, M., Eds.; Springer: Cham, Switzerland, 2017; pp. 142–154. [Google Scholar]
- Mondal, A.; Das, M. Moving Object Detection for Event-based Vision using k-means Clustering. In Proceedings of the 2021 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON), Dehradun, India, 11–13 November 2021. [Google Scholar] [CrossRef]
- Mondal, A.; R, S.; Giraldo, J.H.; Bouwmans, T.; Chowdhury, A.S. Moving Object Detection for Event-based Vision using Graph Spectral Clustering. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar] [CrossRef]
- Chai, Z.; Sun, Y.; Xiong, Z. A Novel Method for LiDAR Camera Calibration by Plane Fitting. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Auckland, New Zealand, 9–12 July 2018. [Google Scholar] [CrossRef]
- Singandhupe, A.; La, H.M. Single Frame LiDAR and Stereo Camera Calibration Using Registration of 3D Planes. In Proceedings of the 2021 Fifth IEEE International Conference on Robotic Computing (IRC), Taichung, Taiwan, 15–17 November 2021; pp. 115–118. [Google Scholar] [CrossRef]
- Willis, A.R.; Zapata, M.J.; Conrad, J.M. A linear method for calibrating LiDAR-and-camera systems. In Proceedings of the 2009 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, London, UK, 21–23 September 2009. [Google Scholar] [CrossRef]
- Zhou, L.; Li, Z.; Kaess, M. Automatic Extrinsic Calibration of a Camera and a 3D LiDAR Using Line and Plane Correspondences. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 5562–5569. [Google Scholar] [CrossRef]
- Kim, E.S.; Park, S.Y. Extrinsic calibration of a camera-LiDAR multi-sensor system using a planar chessboard. In Proceedings of the 2019 IEEE International Conference on Ubiquitous and Future Networks (ICUFN), Zagreb, Croatia, 2–5 July 2019. [Google Scholar] [CrossRef]
- Vidal, A.R.; Rebecq, H.; Horstschaefer, T.; Scaramuzza, D. Ultimate SLAM? Combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios. IEEE Robot. Autom. Lett. 2018, 3, 994–1001. [Google Scholar] [CrossRef]
- Zhu, A.Z.; Thakur, D.; Özaslan, T.; Pfrommer, B.; Kumar, V.; Daniilidis, K. The Multi Vehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. arXiv 2018, arXiv:1801.10202. [Google Scholar] [CrossRef]
- Gao, L.; Liang, Y.; Yang, J.; Wu, S.; Wang, C.; Chen, J.; Kneip, L. Vector: A versatile event-centric benchmark for multi-sensor slam. IEEE Robot. Autom. Lett. 2022, 7, 8217–8224. [Google Scholar] [CrossRef]
- Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
- Gehrig, M.; Aarents, W.; Gehrig, D.; Scaramuzza, D. DSEC: A Stereo Event Camera Dataset for Driving Scenarios. IEEE Robot. Autom. Lett. 2021, 6, 4947–4954. [Google Scholar] [CrossRef]
- Furgale, P.; Rehder, J.; Siegwart, R. Unified temporal and spatial calibration for multi-sensor systems. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 1280–1286. [Google Scholar] [CrossRef]
- Tomy, A.; Paigwar, A.; Mann, K.S.; Renzaglia, A.; Laugier, C. Fusing Event-based and RGB camera for Robust Object Detection in Adverse Conditions. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022. [Google Scholar] [CrossRef]
- Chen, P.; Guan, W.; Lu, P. ESVIO: Event-based Stereo Visual Inertial Odometry. arXiv 2023, arXiv:2212.13184. [Google Scholar] [CrossRef]
- Lee, C.; Kosta, A.K.; Roy, K. Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor Fusion and Deep Fused Spiking-Analog Network Architectures. In Proceedings of the IEEE International Conference on Robotics and Automation, Philadelphia, PA, USA, 23–27 May 2022; pp. 6504–6510. [Google Scholar] [CrossRef]
- Brandli, C.; Berner, R.; Yang, M.; Liu, S.C.; Delbruck, T. A 240 × 180 130 dB 3 μs latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 2014, 49, 2333–2341. [Google Scholar] [CrossRef]
- Constandinou, T.G.; Degenaar, P.; Toumazou, C. An adaptable foveating vision chip. In Proceedings of the IEEE International Symposium on Circuits and Systems, Kos, Greece, 21–24 May 2006; pp. 3566–3569. [Google Scholar] [CrossRef]
- Vogelstein, R.J.; Mallik, U.; Culurciello, E.; Etienne-Cummings, R.; Cauwenberghs, G. Spatial acuity modulation of an address-event imager. In Proceedings of the 2004 IEEE International Conference on Electronics, Circuits and Systems (ICECS), Tel Aviv, Israel, 15 December 2004. [Google Scholar] [CrossRef]
- Butko, N.J.; Movellan, J.R. Optimal scanning for faster object detection. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2010; pp. 2751–2758. [Google Scholar] [CrossRef]
- Thavamani, C.; Li, M.; Cebron, N.; Ramanan, D. FOVEA: Foveated Image Magnification for Autonomous Navigation. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 15519–15528. [Google Scholar] [CrossRef]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. arXiv 2020, arXiv:1805.04687. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar] [CrossRef]
- Jocher, G. YOLOv5 by Ultralytics; Ultralyrics Inc.: Frederic, MD, USA, 2020; Available online: https://github.com/ultralytics/yolov5 (accessed on 2 February 2024).
- CVAT.ai Corporation. Computer Vision Annotation Tool (CVAT) v2.4.1; Zenodo: Geneve, Switzerland, 2023. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
- Muglikar, M.; Gehrig, M.; Gehrig, D.; Scaramuzza, D. How to Calibrate Your Event Camera. arXiv 2021, arXiv:2105.12362. [Google Scholar] [CrossRef]
Event Camera | RGB Camera | |
---|---|---|
Model | PROPHESEE ONBOARD (Paris, France) | Blackfly S (Teledyne FLIR, Arlington, VA, USA) |
Spatial Resolution (pixels) | 640 × 480 | 4096 × 2160 |
Temporal Resolution (fps) | >5000 | 42 |
Dynamic Range (dB) | >120 | <50 |
Power Efficiency (W) | 0.026 | 3 |
Pixel Pitch (m) | 15 | 3.45 |
Class | Our Dataset | BDD Dataset | Total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Train | Val | Test | Total | Train | Val | Test | Total | Train | Val | Test | |
Rider | 2467 | 352 | 703 | 3522 | 720 | 103 | 207 | 1030 | 3187 | 455 | 910 |
Person | 2573 | 352 | 703 | 3628 | 1075 | 154 | 307 | 1536 | 3648 | 506 | 1010 |
Car | 2573 | 352 | 703 | 3628 | 1400 | 200 | 400 | 2000 | 3973 | 552 | 1103 |
Metric | Class | |||
---|---|---|---|---|
Car | Person | Rider | Average | |
Precision | 98.37 | 96.54 | 94.18 | 96.36 |
Recall | 98.02 | 94.40 | 97.06 | 96.50 |
Actual\Predicted | Predicted Class | ||
---|---|---|---|
Car | Person | Rider | |
Car | 1087 | 13 | 5 |
Person | 14 | 977 | 21 |
Rider | 8 | 45 | 857 |
Class | COCO AP | AP50 | AP75 | ||||||
---|---|---|---|---|---|---|---|---|---|
FF | Crop | EPCNet | FF | Crop | EPCNet | FF | Crop | EPCNet | |
Car | 47.54 | 23.55 | 23.71 | 55.62 | 34.05 | 61.93 | 53.88 | 23.59 | 12.01 |
Person | 45.16 | 59.73 | 5.75 | 76.15 | 84.19 | 21.76 | 45.05 | 69.08 | 1.12 |
Bicycle | 21.08 | 57.38 | 5.11 | 39.33 | 70.61 | 20.50 | 22.86 | 67.00 | 0.03 |
Average | 37.93 | 46.89 | 11.52 | 57.03 | 62.95 | 34.73 | 40.60 | 53.22 | 4.38 |
Class | IoU Threshold 10% | IoU Threshold 25% | IoU Threshold 40% | ||||||
---|---|---|---|---|---|---|---|---|---|
FF | Crop | EPCNet | FF | Crop | EPCNet | FF | Crop | EPCNet | |
Car | 55.62 | 39.98 | 77.42 | 55.62 | 39.64 | 75.95 | 55.62 | 37.33 | 71.65 |
Person | 78.27 | 85.49 | 61.18 | 78.06 | 85.33 | 51.38 | 78.03 | 84.94 | 34.88 |
Bicycle | 45.67 | 72.25 | 58.87 | 45.46 | 71.99 | 49.54 | 44.88 | 70.72 | 31.12 |
Average | 59.85 | 65.91 | 65.83 | 59.71 | 65.65 | 58.96 | 59.51 | 64.33 | 45.89 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sealy Phelan, O.; Molloy, D.; George, R.; Jones, E.; Glavin, M.; Deegan, B. EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera. Sensors 2025, 25, 4540. https://doi.org/10.3390/s25154540
Sealy Phelan O, Molloy D, George R, Jones E, Glavin M, Deegan B. EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera. Sensors. 2025; 25(15):4540. https://doi.org/10.3390/s25154540
Chicago/Turabian StyleSealy Phelan, Orla, Dara Molloy, Roshan George, Edward Jones, Martin Glavin, and Brian Deegan. 2025. "EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera" Sensors 25, no. 15: 4540. https://doi.org/10.3390/s25154540
APA StyleSealy Phelan, O., Molloy, D., George, R., Jones, E., Glavin, M., & Deegan, B. (2025). EPCNet: Implementing an ‘Artificial Fovea’ for More Efficient Monitoring Using the Sensor Fusion of an Event-Based and a Frame-Based Camera. Sensors, 25(15), 4540. https://doi.org/10.3390/s25154540