Hardware-Aware Neural Architecture Search for Real-Time Video Processing in FPGA-Accelerated Endoscopic Imaging
Abstract
1. Introduction
2. Related Work
2.1. Technologies for Low-Power Medical Imaging
- Cell Abstraction and Power–Temperature Closed-Loop Model: FPGA resources (LUTs, BRAM, DSP) are normalized into cell units, and a surgical handpiece temperature formula is established.
- Latency-Balancing Hard Constraint: Interstage pipeline latency variance is constrained to prevent frequency degradation caused by deep pipelines.
- Medical Task-Specific Operator Library: For CFA demosaicing, depthwise separable convolution is prioritized, achieving 38.2 dB CPSNR under a 90-cell constraint.
2.2. Advances in Hardware-Aware Neural Architecture Search
2.3. FPGA Acceleration Practices in Medical Imaging
3. Our Approach
3.1. Problem Formulation
3.2. Unified Co-Exploration Framework Overview
3.3. Hardware-Aware NAS Framework
3.3.1. FPGA-Centric Search Space Initialization with Medical Constraints
- (1)
- Hardware Constraint Definition and FPGA-Friendly Operator Library
- (2)
- Load-balanced Pipeline Stage Partitioning
3.3.2. Multi-Task Joint Optimization for Endoscopic Imaging in NAS Framework
| search_space = { |
| “edge_conv”: {“kernel”: [3, 5], “dilation”: [1, 2]}, # Preserves edge details |
| “bilateral_filter”: {“sigma”: 0.8}, # Suppresses false color |
| ”skip_connect“: {} # Avoids oversmoothing |
| } |

3.3.3. Hardware/Software Co-Exploration with Dynamic Medical Constraints
- (1)
- Fast exploration (FE): Hardware efficiency first.
- (2)
- Slow exploration (SE): Accuracy optimization.
3.4. FPGA-Centric Optimization Components
3.4.1. Cell-Abstracted Resource Modeling with Thermal Constraints
3.4.2. Latency Balancing via Projected Gradient Descent
| Algorithm 1 Projected Gradient Descent |
| def projected_gradient_descent (Δt_init, η = 0.01, max_iter = 100): |
| Δt = Δt_init # Initial delay vector |
| for k in range(max_iter): |
| # 1. Compute gradient: Objective function f(Δt) = max(Δt) − min(Δt) |
| grad = compute_gradient(Δt) # Gradient calculation (see below). |
| # 2. Gradient descent update: Δt_new = Δt − η × grad |
| Δt_new = Δt − η × grad. |
| # 3. Projection onto feasible set: Δt = Proj_(Δt_new) |
| Δt = project_to_feasible_set(Δt_new). |
| # 4. Convergence check: If ||grad|| < ε, break loop |
| return Δt. |
- Δt_init: Initial interstage latency vector [Δt1, Δt2..., Δtm];
- η: Learning rate controlling step size;
- Compute_gradient(): Computes gradient of max(Δt) − min(Δt);
- Project_to_feasible_set(): Projects update latencies onto convex constraint set.
- i* = argmaxiΔti (index of the stage with maximum delay);
- j* = argminjΔtj (index of the stage with minimum delay);
- ek denotes the standard basis vector (1 at position k, 0 elsewhere).
3.4.3. Real-Time Constrained NAS Space
3.4.4. Experimental Validation in Medical Scenarios
3.5. FPGA-Optimized Multi-Task Implementation Summary
- Cell-based Resource Unification: FPGA heterogeneous resources are normalized into unified cell units, enabling cross-platform quantification of resource consumption and mathematical modeling of hardware constraints.
- Pipeline Balancing Constraints: Interstage latency variation is constrained to prevent frequency degradation caused by deep pipelines, significantly improving throughput. The average hardware utilization reached 96.15% under these constraints.
- Medical Scenario Adaptation: Under low power constraints, the architecture achieves low latency with limited cells, meeting real-time requirements for endoscopy.
- Power–Temperature Model: A unified model that integrates the surgical handpiece’s heat dissipation area, power consumption, and temperature to ensure clinical safety.
- HDR Compression: NAS-derived parameters configure piecewise linear mapping operators that dynamically compress high dynamic ranges while preserving tissue textures in low-light endoscopy.
- CFA Demosaicing: The same parameters drive edge-sensitive convolution kernels, suppressing zipper artifacts and reducing false color ratios in Bayer pattern reconstruction.
4. Experimental Results
4.1. Experimental Design
- 1.
- Precise Annotations: Frame-level labels for bleeding points and vascular malformations, and tumor locations validated by three gastroenterologists (κ-coefficient > 0.85).
- 2.
- Technical Challenges:
- HDR Compression: Scenes with extreme illumination variance (>100 dB dynamic range) due to fluid occlusion and tissue reflectivity.
- CFA Artifacts: Bayer pattern demosaicing complications under low-light conditions (SNR < 10 dB) inducing zipper effects in 23% of frames.
- 3.
- Acquisition: Available at endovis.grand-challenge.orgunder CC-BY-NC-SA 4.0 license (Dataset ID: EV-WCE-2021).
4.2. Experimental Comparison of the Intermediate Process Before and After Optimization
4.3. Comparison with Existing Methods
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| NAS | Neural architecture search |
| FPGA | Field-programmable gate array |
| GPU | Graphics processing unit |
| CPU | Central processing unit |
| CPSNR | Color peak signal-to-noise ratio |
| HDR | High dynamic range |
| LUT | Look-up table |
Appendix A
| #Bayer to RGB pipeline (Edge-adaptive interpolation) |
| process(clk) |
| begin |
| if rising_edge(clk) then |
| #Green channel interpolation (Gradient-based) |
| if |grad_h| > |grad_v| then |
| G_out <= (G1 + G2) / 2; -- Horizontal interpolation |
| else |
| G_out <= (G3 + G4) / 2; -- Vertical interpolation |
| end if; |
| #R/B channel reconstruction |
| R_out <= R_raw × 1.8; -- Hemorrhage enhancement gain |
| end if; |
| end process; |
| #Piecewise linear mapping (0.5 ms latency) |
| assign HDR_out = (pixel_in < 0.3) ? 0.5 × pixel_in; |
| (pixel_in < 0.7) ? 0.3 + 0.7 × (pixel_in −0.3); |
| 0.8 + 0.2 × (pixel_in −0.7); |
References
- Lee, S.; González-Montiel, L.; Figueira, A.C.; Medina-Pérez, G.; Fernández-Luqueño, F.; Aguirre-Álvarez, G.; Pérez-Soto, E.; Pérez-Ríos, S.; Campos-Montiel, R.G. Thermal-aware power optimization for endoscopic surgical devices. Appl. Sci. 2022, 12, 7892. [Google Scholar] [CrossRef]
- Wei, X.; Zhou, M.; Kwong, S.; Zhang, L.; Wang, Y.; Liu, J.; Chen, H.; Li, K.; Sun, T. Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework. Inf. Sci. 2021, 569, 786–803. [Google Scholar] [CrossRef]
- Shen, Y.; Feng, Y.; Fang, B.; Zhou, M.; Kwong, S.; Qiang, B.h. DSRPH: Deep semantic-aware ranking preserving hashing for efficient multi-label image retrieval. Inf. Sci. 2020, 539, 145–156. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhou, M.; Kwong, S.; Liu, J.; Wang, H.; Chen, L.; Li, X. Miniaturized heat dissipation design for handheld medical instruments. Appl. Sci. 2023, 13, 2105. [Google Scholar] [CrossRef]
- Wei, X.; Zhou, M.; Kwong, S.; Zhang, L.; Wang, Y.; Liu, J.; Chen, H. A hybrid control scheme for 360-degree dynamic adaptive video streaming over mobile devices. IEEE Trans. Mob. Comput. 2021, 21, 3428–3442. [Google Scholar] [CrossRef]
- Chen, H.; Zhang, L.; Wang, Y.; Liu, J.; Zhou, M.; Kwong, S.; Li, X. FPGA-based real-time HDR compression for low-light endoscopic imaging. Appl. Sci. 2021, 11, 10834. [Google Scholar] [CrossRef]
- Cheng, S.; Song, J.; Zhou, M.; Li, Y.; Wang, H.; Zhang, L. Ef-detr: A lightweight transformer-based object detector with an encoder-free neck. In IEEE Transactions on Industrial Informatics; IEEE: New York, NY, USA, 2024. [Google Scholar]
- Zhou, M.; Wei, X.; Wang, S.; Kwong, S.; Fong, C.K.; Wong, P.H.; Yuen, W.Y. Global rate-distortion optimization-based rate control for HEVC HDR coding. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 4648–4662. [Google Scholar] [CrossRef]
- Zhang, W.; Zhou, M.; Ji, C.; Sui, X.; Bai, J. Cross-frame transformer-based spatio-temporal video super-resolution. IEEE Trans. Broadcast. 2022, 68, 359–369. [Google Scholar] [CrossRef]
- Bartlett, A.; Gullickson, R.G.; Singh, R.; Ro, S.; Omaye, S.T. The Link between Oral and Gut Microbiota in Inflammatory Bowel Disease and a Synopsis of Potential Salivary Biomarkers. Appl. Sci. 2020, 10, 6421. [Google Scholar] [CrossRef]
- Zhao, D.; Lu, Q.; Su, R.; Li, Y.; Zhao, M. Light Harvesting and Optical-Electronic Properties of Two Quercitin and Rutin Natural Dyes. Appl. Sci. 2019, 9, 2567. [Google Scholar] [CrossRef]
- Mou, E.; Wang, H.; Chen, X.; Li, Z.; Zhong, L.; Xia, S. Low-light Endoscopic Image Enhancement for Healthcare Electronics Using Efficient Multiscale Selective Fusion. IEEE Trans. Consum. Electron. 2025. [Google Scholar] [CrossRef]
- Zhou, M.; Zhang, Y.; Li, B.; Lin, X. Complexity correlation-based CTU-level rate control with direction selection for HEVC. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2017, 13, 1–23. [Google Scholar] [CrossRef]
- Cai, H.; Gan, C.; Wang, L.; Zhang, C.; Han, S. Once-for-all: Train one network and specialize it for efficient deployment. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Zhou, M.; Wu, X.; Wei, X.; Xiang, T.; Fang, B.; Kwong, S. Low-light enhancement method based on a Retinex model for structure preservation. IEEE Trans. Multimed. 2023, 26, 650–662. [Google Scholar] [CrossRef]
- Song, J.; Zhou, M.; Luo, J.; Pu, H.; Feng, Y.; Wei, X.; Jia, W. Boundary-Aware Feature Fusion with Dual-Stream Attention for Remote Sensing Small Object Detection. In IEEE Transactions on Geoscience and Remote Sensing; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Yan, J.; Zhang, B.; Zhou, M.; Campbell-Valois, F.X.; Siu, S.W. A deep learning method for predicting the minimum inhibitory concentration of antimicrobial peptides against Escherichia coli using Multi-Branch-CNN and Attention. msystems 2023, 8, e00345-23. [Google Scholar] [CrossRef]
- Liao, X.; Wei, X.; Zhou, M.; Zhang, Y.; Wang, H.; Chen, L.; Li, Q. Image quality assessment: Measuring perceptual degradation via distribution measures in deep feature spaces. IEEE Trans. Image Process. 2024, 33, 4044–4059. [Google Scholar] [CrossRef]
- Liu, L.; Jia, X.; Liu, J.; Tian, Q. Joint Demosaicing and Denoising With Self Guidance. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 2237–2246. [Google Scholar] [CrossRef]
- Yang, B.; Zhang, X.; Zhang, J.; Luo, J.; Zhou, M.; Pi, Y. EFLNet: Enhancing feature learning network for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5906511. [Google Scholar] [CrossRef]
- Zhou, M.; Shen, W.; Wei, X.; Luo, J.; Jia, F.; Zhuang, X.; Jia, W. Blind image quality assessment: Exploring content fidelity perceptibility via quality adversarial learning. In International Journal of Computer Vision; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1–17. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Zhou, M.; Zhao, X.; Luo, F.; Luo, J.; Pu, H.; Xiang, T. Robust rgb-t tracking via adaptive modality weight correlation filters and cross-modality learning. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 20, 1–20. [Google Scholar] [CrossRef]
- Shen, W.; Zhou, M.; Luo, J.; Li, Z.; Kwong, S. Graph-represented distribution similarity index for full-reference image quality assessment. IEEE Trans. Image Process. 2024, 33, 3075–3089. [Google Scholar] [CrossRef]
- Xu, Y.; Xie, Y.; Zhang, L.; Chen, S.; Wang, K. DNA: Differentiable network-accelerator co-search. IEEE Micro 2020, 40, 7–15. [Google Scholar]
- Li, Y.L.; Feng, Y.; Zhou, M.L.; Xiong, X.C.; Wang, Y.H.; Qiang, B.H. DMA-YOLO: Multi-scale object detection method with attention mechanism for aerial images. Vis. Comput. 2024, 40, 4505–4518. [Google Scholar] [CrossRef]
- Zhou, M.; Wei, X.; Ji, C.; Xiang, T.; Fang, B. Optimum quality control algorithm for versatile video coding. IEEE Trans. Broadcast. 2022, 68, 582–593. [Google Scholar] [CrossRef]
- Zhou, M.; Wang, H.; Wei, X.; Zhang, Y.; Chen, L.; Li, Q.; Liu, S. HDIQA: A hyper debiasing framework for full reference image quality assessment. IEEE Trans. Broadcast. 2024, 70, 545–554. [Google Scholar] [CrossRef]
- Zhou, Z.; Zhou, M.; Luo, J.; Zhang, Y.; Wang, H.; Chen, L. VideoGNN: Video Representation Learning via Dynamic Graph Modelling. ACM Trans. Multimed. Comput. Commun. Appl. 2025. [Google Scholar] [CrossRef]
- Gao, T.; Sheng, W.; Zhou, M.; Fang, B.; Luo, F.; Li, J. Method for fault diagnosis of temperature-related mems inertial sensors by combining Hilbert–Huang transform and deep learning. Sensors 2020, 20, 5633. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Liu, X.; Chen, H.; Wang, K.; Zhou, M.; Li, Y. Bayer pattern optimization for low-light endoscopic imaging. IEEE Trans. Med. Imaging 2021, 40, 2056–2069. [Google Scholar]
- Liu, G.; Rao, P.; Chen, X.; Li, Y.; Jiang, H. Efficient Polarization Demosaicking Via Low-Cost Edge-Aware and Inter-Channel Correlation. IEEE Photonics J. 2025, 17, 1–11. [Google Scholar] [CrossRef]
- Wang, Q.; Li, Y.; Zhang, L.; Chen, H.; Zhou, M.; Liu, X. Deep tone mapping network for endoscopic image enhancement. IEEE J. Biomed. Health Inform. 2020, 24, 3457–3468. [Google Scholar]
- Zhou, M.; Leng, H.; Fang, B.; Zhang, Y.; Liu, S. Low-light image enhancement via a frequency-based model with structure and texture decomposition. ACM Trans. Mul timed. Comput. Commun. Appl. 2023, 19, 1–23. [Google Scholar] [CrossRef]
- Kim, T.; Kim, D.; Lee, S.; Kim, Y.; Yang, J. Power consumption analysis of GPU-accelerated endoscopic image processing. IEEE Trans. Biomed. Circuits Syst. 2022, 16, 522–533. [Google Scholar]
- ISO 60601-2-18:2023; Medical Electrical Equipment—Part 2-18: Requirements for Endoscopic Systems. ISO: Geneva, Switzerland, 2023.
- Sze, V.; Chen, Y.-H.; Yang, T.-J.; Emer, J.S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 2020, 105, 2295–2329. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Lang, S.; Liu, X.; Zhou, M.; Zhang, Y.; Wang, C. A full-reference image quality assessment method via deep meta-learning and conformer. IEEE Trans. Broadcast. 2023, 70, 316–324. [Google Scholar] [CrossRef]
- Guo, T.; Peng, S.; Li, Y.; Zhang, L.; Wang, H.; Chen, Z. Community-based social recommendation under local differential privacy protection. Inf. Sci. 2023, 639, 119002. [Google Scholar] [CrossRef]
- Liao, X.; Wei, X.; Zhou, M.; Zhang, Y.; Wang, H. Full-reference image quality assessment: Addressing content misalignment issue by comparing order statistics of deep features. IEEE Trans. Broadcast. 2023, 70, 305–315. [Google Scholar] [CrossRef]
- Wei, X.; Zhou, M.; Jia, W. Toward Low-Latency and High-Quality Adaptive 360 Streaming. IEEE Trans. Ind. Inform. 2022, 19, 6326–6336. [Google Scholar] [CrossRef]
- Wu, J.; Leng, C.; Wang, Y.; Li, Q.; Cheng, J.; Guo, Y. Mixed-precision quantization for CNN inference on edge devices. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2970–2984. [Google Scholar]
- Wei, X.; Zhou, M.; Wang, H.; Yang, H.; Chen, L.; Kwong, S. Recent advances in rate control: From optimization to implementation and beyond. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 17–33. [Google Scholar] [CrossRef]
- Chang, X.; Pan, H.; Zhang, D.; Sun, Q.; Lin, W. A Memory-Optimized and Energy-Efficient CNN Acceleration Architecture Based on FPGA. In Proceedings of the 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, BC, Canada, 12–14 June 2019; pp. 2137–2141. [Google Scholar] [CrossRef]
- Chen, L.; Wang, S.; Zhang, Y.; Zhou, M.; Li, Q.; Liu, X. StreamArch: A memory-efficient streaming architecture for real-time image processing. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1347–1361. [Google Scholar]
- Zhang, Z.; Wang, K.; Li, Y.; Zhou, M.; Chen, L. FPGA acceleration of convolutional layers via loop unrolling and pipelining. IEEE Trans. Very Large Scale Integr. Syst. 2021, 29, 1240–1253. [Google Scholar]
- Rastegari, M.; Ordóñez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 525–542. [Google Scholar]
- Zhou, M.; Zhang, Y.; Li, B.; Chen, L.; Wang, H.; Liu, S. Complexity-based intra frame rate control by jointing inter-frame correlation for high efficiency video coding. J. Vis. Commun. Image Represent. 2017, 42, 46–64. [Google Scholar] [CrossRef]
- Zhao, L.; Shang, Z.; Tan, J.; Chen, H.; Wang, R. Siamese networks with an online reweighted example for imbalanced data learning. Pattern Recognit. 2022, 132, 108947. [Google Scholar] [CrossRef]
- Zhou, M.; Hu, H.M.; Zhang, Y. Region-based intra-frame rate-control scheme for high efficiency video coding. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, Siem Reap, Cambodia, 9–12 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–4. [Google Scholar]
- Xian, W.; Zhou, M.; Fang, B.; Chen, L.; Wang, H. Spatiotemporal feature hierarchy-based blind prediction of natural video quality via transfer learning. IEEE Trans. Broadcast. 2022, 69, 130–143. [Google Scholar] [CrossRef]
- Cai, H.; Zhu, L.; Han, S. ProxylessNAS: Direct neural architecture search on target task and hardware. arXiv 2018, arXiv:1812.00332. [Google Scholar]
- Wei, X.; Li, J.; Zhou, M.; Wang, X. Contrastive distortion-level learning-based no-reference image-quality assessment. Int. J. Intell. Syst. 2022, 37, 8730–8746. [Google Scholar] [CrossRef]
- Zhang, Z.; Li, Y.; Wang, K.; Zhou, M.; Chen, L. HotNAS: Thermal-aware neural architecture search for edge devices. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), Boston, MA, USA, 1–3 August 2022; pp. 1–6. [Google Scholar]
- Xian, W.; Zhou, M.; Fang, B.; Chen, L.; Zhang, Y.; Wang, H. A content-oriented no-reference perceptual video quality assessment method for computer graphics animation videos. Inf. Sci. 2022, 608, 1731–1746. [Google Scholar] [CrossRef]
- Guo, Q.; Zhou, M. Progressive domain translation defogging network for real-world fog images. IEEE Trans. Broadcast. 2022, 68, 876–885. [Google Scholar] [CrossRef]
- Zhou, M.; Han, S.; Luo, J.; Zhang, Y.; Wang, H. Transformer-Based and Structure-Aware Dual-Stream Network for Low-Light Image Enhancement. ACM Trans. Multimed. Comput. Commun. Appl. 2025, 21, 1–24. [Google Scholar] [CrossRef]
- Zhou, M.; Li, J.; Wei, X.; Zhang, Y.; Wang, H.; Chen, L. AFES: Attention-Based Feature Excitation and Sorting for Action Recognition. In IEEE Transactions on Consumer Electronics; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
- Shen, W.; Zhou, M.; Chen, Y.; Zhang, Y.; Wang, H.; Li, Q. Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville TN, USA, 11–15 June 2025; pp. 17990–17999. [Google Scholar]
- Lan, X.; Xian, W.; Zhou, M.; Zhang, Y.; Wang, H.; Chen, L. No-Reference Image Quality Assessment: Exploring Intrinsic Distortion Characteristics via Generative Noise Estimation with Mamba. In IEEE Transactions on Circuits and Systems for Video Technology; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
- Zheng, Z.; Zhou, M.; Shang, Z.; Tan, J.; Chen, L.; Wang, H. GAANet: Graph Aggregation Alignment Feature Fusion for Multispectral Object Detection. In IEEE Transactions on Industrial Informatics; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
- Dasika, G.; Sethia, A.; Robby, V.; Mudge, T.; Mahlke, S. MEDICS: Ultra-portable processing for medical image reconstruction. In Proceedings of the 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), Vienna, Austria, 10–15 September 2010; pp. 181–192. [Google Scholar]
- Guo, Z.; Wang, Q.; Chen, H.; Zhou, M.; Zhang, L.; Li, Y. Lightweight network design for endoscopic image demosaicking via accuracy-oriented NAS. IEEE Trans. Med. Imaging 2023, 42, 2105–2116. [Google Scholar]
- Li, Y.; Zhou, M.; Wang, K.; Zhang, L.; Chen, H.; Wang, H. FPGA-NAS: Bridging the gap between neural architecture search and FPGA acceleration. IEEE Trans. Comput. 2022, 71, 2056–2069. [Google Scholar]
- Zhou, M.; Li, Y.; Yang, G.; Zhang, Y.; Wang, H.; Chen, L. COFNet: Contrastive Object-aware Fusion using Box-level Masks for Multispectral Object Detection. In IEEE Transactions on Multimedia; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
- Zhang, Q.; Li, Y.; Wang, K.; Zhou, M.; Zhang, L.; Chen, H.; Wang, H. Co-exploration of neural architectures and hardware accelerators for real-time edge intelligence. Nat. Mach. Intell. 2021, 3, 1067–1078. [Google Scholar]
- Zhang, Y.; Wang, K.; Li, Y.; Zhou, M.; Chen, L. Fixed-hardware NAS: Limitations in adaptive FPGA acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2021, 40, 1532–1545. [Google Scholar]
- Li, Q.; Zhang, Z.; Zhou, M.; Shang, Z.; Tan, J.; Zhao, L. Beyond accuracy: Multi-objective neural architecture search for edge devices. IEEE Micro 2022, 42, 58–67. [Google Scholar]
- Lang, S.; Zhou, M.; Wei, X.; Zhang, Y.; Wang, H.; Chen, L. Image Quality Assessment: Exploring the Similarity of Deep Features via Covariance-Constrained Spectra. In IEEE Transactions on Broadcasting; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
- Wang, R.; Li, Y.; Zhang, Q.; Zhou, M.; Chen, H.; Wang, H. Accelerating hardware-aware NAS via parallelizable search for medical edge deployment. IEEE Trans. Biomed. Circuits Syst. 2023, 17, 210–223. [Google Scholar]
- Mittal, S. A survey of FPGA-based accelerators for convolutional neural networks. Neural Netw. 2020, 128, 364–381. [Google Scholar] [CrossRef]
- Sze, V.; Chen, Y.-H.; Yang, T.-J.; Emer, J.S. Hardware for machine learning: Challenges and opportunities. IEEE Micro 2020, 40, 17–25. [Google Scholar]
- Rahman, S.; Chen, H.; Wang, K.; Li, Y.; Zhou, M.; Liu, X. FPGA-based convolutional and fully connected layer separation for real-time medical imaging. IEEE Trans. Biomed. Circuits Syst. 2021, 15, 243–256. [Google Scholar]
- Gao, T.; Sheng, W.; Zhou, M.; Fang, B.; Zheng, L. MEMS inertial sensor fault diagnosis using a cnn-based data-driven method. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2059048. [Google Scholar] [CrossRef]
- Chen, H.; Zhang, Y.; Liu, X.; Zhou, M.; Wang, K.; Li, Q. FPGA mapping of CFA demosaicing and HDR compression for endoscopic video enhancement. IEEE J. Biomed. Health Inform. 2021, 25, 2240–2251. [Google Scholar]
- Cheong, L.S.; Qian, K.; Seah H., S.; Soo K., C.; Lin, F.; Patricia, T. FPGA Implementation of Real-time Fluorescence Endoscopy Imaging Algorithms. In Proceedings of the 2006 International Conference on Biomedical and Pharmaceutical Engineering, Singapore, 11–14 December 2006; pp. 73–78. [Google Scholar]
- Rahman, S.; Wang, K.; Chen, H.; Li, Y.; Zhou, M.; Liu, X. StreamSep: Convolutional and fully connected layer separation for real-time endoscopic video processing on FPGA. IEEE Trans. Ind. Inform. 2023, 19, 9321–9332. [Google Scholar]
- Chen, L.; Wang, S.; Zhang, Z.; Zhou, M.; Li, Y.; Liu, X. PingPongNet: Latency-masked data transmission for endoscopic imaging with load imbalance analysis. J. Real-Time Image Process. 2022, 19, 589–602. [Google Scholar]
- Simunic, D. Thermal Management of Electronic Devices in Medical Equipment: A Review. IEEE Rev. Biomed. Eng. 2020, 13, 245–259. [Google Scholar]
- Yan, J.; Zhang, B.; Zhou, M.; Kwok, H.F.; Siu, S.W. Multi-Branch-CNN: Classification of ion channel interacting peptides using parallel convolutional neural networks. bioRxiv 2021. [Google Scholar] [CrossRef]
- Jiang, W.; Yang, L.; Sha, E.H.; Zhuge, Q.; Gu, S.; Dasgupta, S.; Shi, Y.; Hu, J. Hardware/Software Co-Exploration of Neural Architectures. arXiv 2020, arXiv:1907.04650v2. [Google Scholar] [CrossRef]
- JESD51-2; Integrated Circuits Thermal Test Method Environmental Conditions—Natural Convection (Still Air). JEDEC Solid State Technology Association: Arlington, VA, USA, 1995.
- Kim, J.; Park, H.; Lee, S. Dynamic Double-Buffering for Latency-Hidden Streaming in FPGA-Accelerated Endoscopic Video Processing. IEEE Trans. Biomed. Circuits Syst. 2023, 17, 1124–1137. [Google Scholar]
- Parikh, N.; Boyd, S. Proximal algorithms. Found. Trends Optim. 2014, 1, 127–239. [Google Scholar] [CrossRef]
- Kwan, C.; Larkin, J. Demosaicing of Bayer and CFA 2.0 Patterns for Low Lighting Images. Electronics 2019, 8, 1444. [Google Scholar] [CrossRef]
- Tan, W.; Xu, C.; Lei, F.; Fang, Q.; An, Z.; Wang, D.; Han, J.; Qian, K.; Feng, B. An Endoscope Image Enhancement Algorithm Based on Image Decomposition. Electronics 2022, 11, 1909. [Google Scholar] [CrossRef]
- Chen, H.; Zhang, Y.; Wang, K.; Liu, X.; Zhou, M.; Li, Y. FPGA-Accelerated Piecewise Linear HDR Compression for Low-Light Endoscopic Surgery. IEEE Trans. Biomed. Circuits Syst. 2023, 17, 1450–1464. [Google Scholar]






| Platform | Power (W) | Temperature (°C) | Latency (ms) | Frame Rate (fps) | CPSNR (db) | Efficiency (fps/W) |
|---|---|---|---|---|---|---|
| NVIDIA-Jetson-AGX | 18.2 | 65 | 12.3 | 81.3 | 38.8 | 4.47 |
| Xilinx-Zynq-(FPGA) | 2.8 | 48 | 8.1 | 123.5 | 38.2 | 44.1 |
| ARM Cortex-A72 | 4.5 | 54 | 41.7 | 24.0 | 36.5 | 5.33 |
| TI TDA4VM (DSP) | 5.2 | 52 | 28.9 | 34.6 | 37.1 | 6.65 |
| SiFive U740 (RISC-V) | 3.1 | 49 | 62.4 | 16.0 | 35.8 | 5.16 |
| Vendor | Resource Type | Property | Coefficient | Note |
|---|---|---|---|---|
| Xilinx | 1 LUT6 | Logic | 1.5 Cell | Programmable combinational logic |
| 1 Flip-Flop (FF) | Logic | 0.5 Cell | Constructs and registers sequential logic | |
| 1 BRAM (18 Kb) | Store | 20 Cell | Implements FIFO, RAM, ROM, etc. | |
| 1 DSP48 Slice | Calculate | 10 Cell | Efficiently executes multiply accumulate (MAC) operations | |
| 1 MMCM | Clock | 15 Cell | Mixed-mode clock manager (MMCM) | |
| 1 SERDES | Interface | 25 Cell | Used for high-speed serial interfaces | |
| Intel (Altera) | 1 ALM | Logic | 1.8 Cell | Adaptive logic module (ALM) |
| 1 Register | Logic | 0.5 Cell | Similar to Flip-Flop; functions as a storage unit | |
| 1 M20 K Block | Store | 22 Cell | Functionally equivalent to Xilinx BRAM | |
| 1 DSP Block | Calculate | 12 Cell | Digital signal processing (DSP) slice | |
| 1 PLL | Clock | 14 Cell | Phase-locked loop (PLL) for clock generation and management | |
| 1 Transceiver | Interface | 28 Cell | High-speed transceivers for serial protocols | |
| Lattice | 1 LUT4 | Logic | 1.0 Cell | 4-input look-up table (LUT) |
| 1 PFU | Logic | 4.0 Cell | Programmable functional unit supporting multiple resources | |
| 1 EBR | Store | 18 Cell | Embedded block RAM (BRAM) for storage | |
| 1 sysDSP Block | Calculate | 9 Cell | Dedicated DSP computing unit | |
| 1 PLL | Clock | 13 Cell | Phase-locked loop (PLL) for clock management |
| Design Objective | Optimization Formula | Constraint Condition |
|---|---|---|
| Maximize Throughput | min(Tthroughput) = min(max(Tstage)) | k ≤ kmax(Resource upper bound) |
| Minimize First Latency | min(Lfirst) = min(k·Tstage) | Tstage ≥ Tclk, min(Process limit) |
| Control Resource Consumption | min(αk + βk + γk) | Tstage ≤ Tclk |
| Task | FPGA Resources | Latency (ms) |
|---|---|---|
| CFA Demosaicing | 40 Cells | 3.2 |
| HDR Compression | 20 Cells | 1.8 |
| Method | CPSNR (dB) | SSIM | False Color Pixel Ratio | Power (W) |
|---|---|---|---|---|
| Bilinear Interpolation | 34.2 | 0.82 | 12.3% | 1.5 |
| GPU-HDR (Tone Mapping) | 37.1 | 0.86 | 8.7% | 4.2 |
| Our NAS Scheme | 38.2 | 0.89 | 4.1% | 2.8 |
| Resource Type | Xilinx (Cell) | Intel (Cell) | Lattice (Cell) | Calibration Basis |
|---|---|---|---|---|
| 1 LUT | 1.0 | 1.0 | 1.0 | Base Unit |
| 1 FF | 0.5 | 0.6 | 0.4 | Power Ratio |
| 1 BRAM (18 Kb) | 15 | 18 | 12 | βj/βLUT |
| 1 DSP | 10 | 12 | 8 | MACEnergy Efficiency |
| 1 PLL | 8 | 10 | 6 | Clock Mgmt Complexity |
| 1 SERDES | 20 | 25 | 15 | I/O Speed Weight |
| Metric | Unconstrained Scheme | Constrained Scheme (Δtratio ≤ 1.5) | Improvement |
|---|---|---|---|
| Max Frequency | 125 MHz | 192 MHz | +53.6% |
| Throughput (FPS) | 19.6 | 30.2 | +54.1% |
| Resource Utilization (LUT) | 81% | 76% | −6.2% |
| Method | Δtratio | Throughput (FPS) |
|---|---|---|
| Hardware-Aware NAS (Fixed Pipeline) | 2.1 | 19.6 |
| Co-Exploration (Ours) | 1.3 | 30.2 |
| Iteration Step | Δt1 | Δt2 | Δt3 | Δt4 | Disparity Ratio (max/min) |
|---|---|---|---|---|---|
| 0 | 8.0 | 3.0 | 5.0 | 4.0 | 2.67 |
| 10 | 6.2 | 4.1 | 5.0 | 4.7 | 1.51 |
| 20 | 5.8 | 4.3 | 5.0 | 4.9 | 1.35 |
| Variable | Range of Values | Testing Purpose |
|---|---|---|
| Max Cell Count | 70/80/90/100 Cells | Validate the accuracy of the power–temperature model. |
| Pipeline Stages (M) | 2/4/6 stages | Analyze the impact of the number of stages on throughput and latency balance. |
| Parallelism | 1/2/4/8 channels | Explore the trade-off between computational resources and accuracy. |
| Metric | Before Optimization (Fixed Pipeline) | After Optimization (Latency-Balancing Constraint) | Improvement |
|---|---|---|---|
| Interstage Latency Variation (max/min) | 2.1 (8 ns vs. 3.8 ns) | 1.3 (5.2 ns vs. 4.0 ns) | 38.1% |
| Maximum Frequency (MHz) | 125 (Limited by the 8 ns stage) | 192 | +53.6% |
| Throughput (FPS) | 19.6 | 30.2 | +54.1% |
| Cell Count | Theoretical Power (W) | Measured Temperature (°C) | Temperature Prediction Error |
|---|---|---|---|
| 70 | 2.1 | 42.3 | +0.2 °C |
| 80 | 2.5 | 45.1 | −0.4 °C |
| 90 | 2.8 | 47.8 | +0.3 °C |
| 100 | 3.2 | 52.6 (Exceeded) | −0.2 °C |
| Parallelism | CPSNR (dB) | LUT Utilization | Frame Rate (FPS) |
|---|---|---|---|
| 1 | 38.5 | 42% | 18.3 |
| 2 | 38.2 | 58% | 26.7 |
| 4 | 38.0 | 76% | 30.2 |
| 8 | 37.8 | 93% (Exceeded) | 32.1 |
| Method | CPSNR (dB) | Power (W) | Temp. (°C) | Latency (ms) |
|---|---|---|---|---|
| Manually Designed Model (VGG8) | 38.5 | 4.1 | 55 (Exceeded) | 22.4 |
| GPU-NAS (ENAS) | 38.8 | 18.2 | 65 (Exceeded) | 12.3 |
| FPGA-NAS (Fixed Pipeline) | 37.9 | 3.0 | 46 | 51.2 |
| Proposed Method | 38.2 | 2.8 | 48 | 8.1 |
| Method | LUT Utilization | BRAM Utilization | Throughput (FPS) |
|---|---|---|---|
| FPGA-NAS (Ref. [38]) | 81% | 78% | 19.6 |
| HASIC (Ref. [42]) | 76% | 72% | 28.3 |
| Proposed Method | 76% | 68% | 30.2 |
| Method | Low-Light CPSNR (dB) | Motion Blur SSIM | Thermal Safety (≤50 °C) |
|---|---|---|---|
| Traditional Interpolation (Bilinear) | 34.2 | 0.82 | true |
| GPU-HDR (Tone Mapping) | 37.1 | 0.86 | error (65 °C) |
| Proposed Method | 38.2 | 0.89 | true (48 °C) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Cui, R.; Wang, G.; Gao, T.; Yan, J.; Xian, W.; Wei, X.; Qin, Y. Hardware-Aware Neural Architecture Search for Real-Time Video Processing in FPGA-Accelerated Endoscopic Imaging. Appl. Sci. 2025, 15, 11200. https://doi.org/10.3390/app152011200
Zhang C, Cui R, Wang G, Gao T, Yan J, Xian W, Wei X, Qin Y. Hardware-Aware Neural Architecture Search for Real-Time Video Processing in FPGA-Accelerated Endoscopic Imaging. Applied Sciences. 2025; 15(20):11200. https://doi.org/10.3390/app152011200
Chicago/Turabian StyleZhang, Cunguang, Rui Cui, Gang Wang, Tong Gao, Jielu Yan, Weizhi Xian, Xuekai Wei, and Yi Qin. 2025. "Hardware-Aware Neural Architecture Search for Real-Time Video Processing in FPGA-Accelerated Endoscopic Imaging" Applied Sciences 15, no. 20: 11200. https://doi.org/10.3390/app152011200
APA StyleZhang, C., Cui, R., Wang, G., Gao, T., Yan, J., Xian, W., Wei, X., & Qin, Y. (2025). Hardware-Aware Neural Architecture Search for Real-Time Video Processing in FPGA-Accelerated Endoscopic Imaging. Applied Sciences, 15(20), 11200. https://doi.org/10.3390/app152011200

