Compression-Efficient Feature Extraction Method for a CMOS Image Sensor
Abstract
1. Introduction
- We present a six-channel binary feature data format consisting of three channels of luminance signals and three channels of horizontal edge signals. This format effectively captures target features across various contrast levels, achieving a superior trade-off between object recognition accuracy and data size compared to previous edge-only methods.
- We propose an effective compression method using run length encoding (RLE) tailored to the six-channel binary feature data characteristics. By allocating an appropriate bit length to the binary values (0 and 1) for each feature channel, this approach significantly reduces data transmission volume while maintaining high accuracy for image recognition tasks.
2. Proposed Feature Data Format
2.1. Overview of Proposed Image Recognition System
2.2. Six-Channel Binary Feature Separation and Binarization
2.2.1. Quantization for Feature Data
2.2.2. Separation and Binarization for Feature Data
2.3. Run Length Encoding (RLE) for Feature Data
3. Simulation Results
3.1. Six-Channel Binary Feature Data
3.2. Compression Rate of Tailored RLE
3.3. Data Size and Object Detection Accuracy
3.4. Evaluation on Image Classification
4. Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CMOS | Complementary Metal–Oxide Semiconductor |
| RGB | Red, Green, and Blue |
| QVGA | Quarter Video Graphics Array |
| ADC | Analog-to-Digital Converter |
| FLOPs | Floating Point Operations per Second |
| IoU | Intersection over Union |
Appendix A. Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Epochs | 400 |
| Optimizer | SGD |
| Initial learning rate | 0.01 |
| Final learning rate | 0.001 |
| Momentum | 0.937 |
| Weight decay | 0.0005 |
| Warm-up | |
| Warm-up epochs | 3 |
| Warm-up momentum | 0.8 |
| Warm-up bias learning rate | 0.1 |
| Loss Gains | |
| Box loss gain | 0.05 |
| Class loss gain | 0.3 |
| Obj loss gain | 0.7 |
| Data Augmentation | |
| HSV hue | 0.015 |
| HSV saturation | 0.7 |
| HSV value | 0.4 |
| Translation | 0.2 |
| Scale | 0.9 |
| Horizontal flip | 0.5 |
| Mosaic | 1.0 |
| MixUp | 0.15 |
| Copy & Paste | 0.15 |
Appendix B. Learning Curves

References
- Government of Japan; Cabinet Office. Society 5.0. Available online: https://www8.cao.go.jp/cstp/english/society5_0/index.html (accessed on 5 November 2023).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA, 2019; pp. 6105–6114. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Kuznetsova, A.; Rom, H.; Alldrin, N.; Uijlings, J.; Krasin, I.; Pont-Tuset, J.; Kamali, S.; Popov, S.; Malloci, M.; Kolesnikov, A.; et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vis. 2020, 128, 1956–1981. [Google Scholar] [CrossRef]
- Chowdhery, A.; Warden, P.; Shlens, J.; Howard, A.; Rhodes, R. Visual wake words dataset. arXiv 2019, arXiv:1906.05721. [Google Scholar] [CrossRef]
- Young, C.; Omid-Zohoor, A.; Lajevardi, P.; Murmann, B. A data-compressive 1.5/2.75-bit log-gradient QVGA image sensor with multi-scale readout for always-on object detection. IEEE J. Solid-State Circuits 2019, 54, 2932–2946. [Google Scholar] [CrossRef]
- Yoneda, S.; Negoro, Y.; Kobayashi, H.; Nei, K.; Takeuchi, T.; Oota, M.; Kawata, T.; Ikeda, T.; Yamazaki, S. Image Sensor Capable of Analog Convolution for Real-time Image Recognition System Using Crystalline Oxide Semiconductor FET. In Proceedings of the 2019 International Image Sensor Workshop (IISW 2019), Snowbird, UT, USA, 23–27 June 2019; pp. 322–325. [Google Scholar]
- Finateu, T.; Niwa, A.; Matolin, D.; Tsuchimoto, K.; Mascheroni, A.; Reynaud, E.; Mostafalu, P.; Brady, F.; Chotard, L.; LeGoff, F.; et al. 5.10 A 1280× 720 back-illuminated stacked temporal contrast event-based vision sensor with 4.86 μm pixels, 1.066 GEPS readout, programmable event-rate controller and compressive data-formatting pipeline. In Proceedings of the 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 16–20 February 2020; IEEE: New York, NY, USA, 2020; pp. 112–114. [Google Scholar]
- Suh, Y.; Choi, S.; Ito, M.; Kim, J.; Lee, Y.; Seo, J.; Jung, H.; Yeo, D.H.; Namgung, S.; Bong, J.; et al. A 1280× 960 dynamic vision sensor with a 4.95-μm pixel pitch and motion artifact minimization. In Proceedings of the 2020 IEEE international symposium on circuits and systems (ISCAS), Sevilla, Spain, 10–21 October 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
- Lichtsteiner, P.; Posch, C.; Delbruck, T. A 128 x 128 120db 30mw asynchronous vision sensor that responds to relative intensity change. In Proceedings of the 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, San Francisco, CA, USA, 6–9 February 2006; IEEE: New York, NY, USA, 2006; pp. 2060–2069. [Google Scholar]
- Son, B.; Suh, Y.; Kim, S.; Jung, H.; Kim, J.S.; Shin, C.W.; Park, K.; Lee, K.; Park, J.M.; Woo, J.; et al. 4.1 A 640×480 dynamic vision sensor with a 9µm pixel and 300Meps address-event representation. In Proceedings of the 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 5–9 February 2017; pp. 66–67. [Google Scholar]
- Okura, S.; Otani, A.; Itsuki, K.; Kitazawa, Y.; Yamamoto, K.; Osuka, Y.; Morikaku, Y.; Yoshida, K. A study on a feature extractable CMOS image sensor for low-power image classification system. In Proceedings of the 2023 International Image Sensor Workshop, Crieff, UK, 21–25 May, 2023. [Google Scholar]
- Morikaku, Y.; Ujiie, R.; Morikawa, D.; Shima, H.; Yoshida, K.; Okura, S. On-Chip Data Reduction and Object Detection for a Feature-Extractable CMOS Image Sensor. Electronics 2024, 13, 4295. [Google Scholar] [CrossRef]
- Kuroda, K.; Morikaku, Y.; Osuka, Y.; Iegaki, R.; Ujiie, R.; Shima, H.; Yoshida, K.; Okura, S. [Paper] Lightweight Object Detection Model for a CMOS Image Sensor with Binary Feature Extraction. ITE Trans. Media Technol. Appl. 2026, 14, 102–109. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Sato, M.; Akebono, S.; Yasuoka, K.; Kato, E.; Tsuruta, M.; Takano, C.; Ota, K.; Haraguchi, K.; Watanabe, M.; Fujii, G.; et al. A 0.8 μm 32Mpixel Always-On CMOS Image Sensor with Windmill-Pattern Edge Extraction and On-Chip DNN. IEEE Solid-State Circuits Lett. 2025, 8, 353–356. [Google Scholar] [CrossRef]
- Lee, S.; Yun, Y.C.; Heu, S.M.; Lee, K.H.; Lee, S.J.; Lee, K.; Moon, J.; Lim, H.; Jang, T.; Song, M.; et al. The Design of a Computer Vision Sensor Based on a Low-Power Edge Detection Circuit. Sensors 2025, 25, 3219. [Google Scholar] [CrossRef] [PubMed]
- Okumura, S.; Morikaku, Y.; Osuka, Y.; Ujiie, R.; Morikawa, D.; Shima, H.; Okura, S. Feature Extractable CMOS Image Sensor Pixel with RGB to Grayscale Conversion. In Proceedings of the 2023 IEEE International Meeting for Future of Electron Devices, Kansai (IMFEDK), Kyoto, Japan, 16–17 November 2023; IEEE: New York, NY, USA, 2023; pp. 1–2. [Google Scholar]
- Takayanagi, I.; Yoshimura, N.; Mori, K.; Matsuo, S.; Tanaka, S.; Abe, H.; Yasuda, N.; Ishikawa, K.; Okura, S.; Ohsawa, S.; et al. An over 90 dB intra-scene single-exposure dynamic range CMOS image sensor using a 3.0 μm triple-gain pixel fabricated in a standard BSI process. Sensors 2018, 18, 203. [Google Scholar] [CrossRef] [PubMed]
- Matsubara, I.; Ujiie, R.; Morikawa, D.; Shima, H.; Okura, S. A Scalable Single-Slope ADC for a Feature-Extractable CMOS Image Sensor. In Proceedings of the 2025 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), Seoul, Republic of Korea, 7–10 July 2025; IEEE: New York, NY, USA, 2025; pp. 1–4. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the EEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
- Banbury, C.; Njor, E.; Garavagno, A.M.; Mazumder, M.; Stewart, M.; Warden, P.; Kudlur, M.; Jeffries, N.; Fafoutis, X.; Reddi, V.J. Wake vision: A tailored dataset and benchmark suite for tinyml computer vision applications. arXiv 2024, arXiv:2405.00892. [Google Scholar]
- Verdant, A.; Guicquero, W.; Coriat, D.; Moritz, G.; Royer, N.; Thuries, S.; Mollard, A.; Teil, V.; Desprez, Y.; Monnot, G.; et al. A 450μW@ 50fps Wake-Up Module Featuring Auto-Bracketed 3-Scale Log-Corrected Pattern Recognition and Motion Detection in a 1.5 Mpix 8T Global Shutter Imager. In Proceedings of the 2024 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, HI, USA, 16–20 June 2024; IEEE: New York, NY, USA, 2024; pp. 1–2. [Google Scholar]
- Omid-Zohoor, A.; Ta, D.; Murmann, B. PASCALRAW: Raw image database for object detection. Stanf. Digit. Repos. 2014, 2, 4. [Google Scholar]















| <1:0> | |||
|---|---|---|---|
| 00 | 0 | 0 | 0 |
| 01 | 1 | 0 | 0 |
| 10 | 1 | 1 | 0 |
| 11 | 1 | 1 | 1 |
| <2:0> | ||||
|---|---|---|---|---|
| Type 1 | Type 2 | |||
| 000 | 0 | 1 | 1 | 1 |
| 001 | 0 | 0 | 1 | 1 |
| 010 | 0 | 0 | 0 | 1 |
| 011 | 0 | 0 | 0 | 0 |
| 100 | 0 | 0 | 0 | 0 |
| 101 | 1 | 0 | 0 | 0 |
| 110 | 1 | 0 | 1 | 0 |
| 111 | 1 | 1 | 1 | 0 |
| Model | Channel Width | Backbone ELAN Depth | Neck ELAN Depth | #Param. |
|---|---|---|---|---|
| YOLOv7 | 100% | 6 | 6 | 36.9 M |
| Ours | 50% | 6 | 3 | 5.9 M |
| YOLOv7-tiny | 50% | 4 | 4 | 6.2 M |
| Vref2 | Logical Operation | AP50 | APL50 |
|---|---|---|---|
| 125 mV | type 1 | 36.6% | 62.1% |
| type 2 | 36.6% | 61.7% | |
| 250 mV | type 1 | 34.5% | 59.1% |
| type 2 | 35.6% | 60.7% |
| Data | #Allocating Bits | Comp. Rate | |||
|---|---|---|---|---|---|
| 0s Runs | 1s runs | Each Channel | Average | ||
| Luminance signals | 6-bit | 7-bit | 49.1% | 41.1% | |
| 7-bit | 6-bit | 47.4% | |||
| 8-bit | 6-bit | 26.8% | |||
| Edge signals () | 6-bit | 2-bit | 59.9% | 51.9% | |
| 7-bit | 2-bit | 36.6% | |||
| 6-bit | 2-bit | 59.1% | |||
| Edge signals () | 7-bit | 2-bit | 37.5% | 33.8% | |
| 7-bit | 2-bit | 27.0% | |||
| 7-bit | 2-bit | 36.9% | |||
| Bit Depth | 8-Bit | 3-Bit | 1-Bit |
|---|---|---|---|
| Data size (bits) | 22,118,400 | 8,294,400 | 2,764,800 |
| AP50 | 48.1% | 45.3% | 31.6% |
| 58.7% | 55.5% | 40.7% |
| Bit Depth | 8-Bit | 3-Bit | 1-Bit | 1-Bit + Comp. |
|---|---|---|---|---|
| Data size (bits) | 1,382,400 | 518,400 | 172,800 | 110,765 |
| AP50 | 44.9% | 39.8% | 27.6% | |
| 62.4% | 58.4% | 44.7% | ||
| Vref2 | 125 mV | 125 mV + Comp. | 250 mV | 250 mV + Comp. |
|---|---|---|---|---|
| Data size (bits) | 460,800 | 214,281 | 460,800 | 172,595 |
| AP50 | 36.6% | 35.6% | ||
| 62.1% | 60.7% | |||
| Bit Depth | 8-Bit | 3-Bit | 1-Bit |
|---|---|---|---|
| Data size (bits) | 22,118,400 | 8,294,400 | 2,764,800 |
| Accuracy (VWW) | 92.8% | 92.3% | 87.9% |
| Accuracy (WV) | 86.3% | 85.5% | 80.8% |
| Feature Type | Edge Signal | Six-Channel Binary Feature Data | |||
|---|---|---|---|---|---|
| Data format | 8-bit [19] | 3-bit [19] | 1-bit [20] | ||
| Data size (bits) | 1,382,400 | 518,400 | 110,765 | 214,281 | 172,595 |
| Accuracy (VWW) | 88.9% | 87.6% | 84.3% | 89.4% | 88.9% |
| Accuracy (WV) | 81.0% | 80.8% | 76.1% | 81.3% | 80.6% |
| This Work | [12] | [13] | [22] | [29] | |
|---|---|---|---|---|---|
| Feature extraction | Luminance and Edge (Analog) | HOG (Analog) | Conv. (Analog) | Windmill Edge (Analog) | Conv. (Digital) |
| Pixel structure | 5T | 4T | 4T1C | 4T | 8T2C |
| RGB color image | Available | Not available | Available | Available | Available |
| Validation data | COCO/ VWW/WV | PASCAL VOC [9] PASCAL RAW [30] | N/A | Face/Hand | Face |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kuroda, K.; Osuka, Y.; Iegaki, R.; Ujiie, R.; Shima, H.; Yoshida, K.; Okura, S. Compression-Efficient Feature Extraction Method for a CMOS Image Sensor. Sensors 2026, 26, 962. https://doi.org/10.3390/s26030962
Kuroda K, Osuka Y, Iegaki R, Ujiie R, Shima H, Yoshida K, Okura S. Compression-Efficient Feature Extraction Method for a CMOS Image Sensor. Sensors. 2026; 26(3):962. https://doi.org/10.3390/s26030962
Chicago/Turabian StyleKuroda, Keiichiro, Yu Osuka, Ryoya Iegaki, Ryuichi Ujiie, Hideki Shima, Kota Yoshida, and Shunsuke Okura. 2026. "Compression-Efficient Feature Extraction Method for a CMOS Image Sensor" Sensors 26, no. 3: 962. https://doi.org/10.3390/s26030962
APA StyleKuroda, K., Osuka, Y., Iegaki, R., Ujiie, R., Shima, H., Yoshida, K., & Okura, S. (2026). Compression-Efficient Feature Extraction Method for a CMOS Image Sensor. Sensors, 26(3), 962. https://doi.org/10.3390/s26030962

