iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing
Abstract
:1. Introduction
- Algorithmic optimizations for the anyOCR system are presented that improve the accuracy of the historical document digitization by .
- A new hardware-software partitioning scheme is presented for the optimized anyOCR algorithm.
- A heterogeneous hardware-software architecture is designed and implemented based on the new partitioning scheme.
- A custom hardware accelerator based on the new architecture is realized using Zynq-7045 FPGA.
- The novel accelerator is compared to optimized anyOCR implemented on multiple computing platforms, including low-power CPUs.
- It is demonstrated that the iDocChip system outperforms the original anyOCR software running on i7-4790T by more than and in terms of runtime and energy efficiency, respectively.
2. Related Works
2.1. Cross-Platform Comparisons
2.2. End-to-End OCR Systems
2.3. End-to-End OCR Hardware Architectures
3. The anyOCR Algorithm
3.1. Binarization
3.2. Text and Image Segmentation
3.3. Text Line Extraction
3.4. Text Line Recognition
4. iDocChip Background
4.1. Binarization
4.2. Text and Image Segmentation
4.3. Text Line Extraction
4.4. Text Line Recognition
4.5. The anyOCR System vs. Separate iDocChip Components
5. Algorithmic Optimizations and Hardware-Software Partitioning for the End-to-End iDocChip
5.1. Binarization
5.1.1. Binarization: G-1 Operations
5.1.2. Binarization: G-2 Operations
5.1.3. Binarization: G-3 Operations
5.1.4. Binarization: G-4 Operations
5.2. Text and Image Segmentation
5.2.1. Text and Image Segmentation: G-1 Operations
5.2.2. Text and Image Segmentation: G-2 Operations
5.2.3. Text and Image Segmentation: G-3 Operations
5.2.4. Text and Image Segmentation: G-4 Operations
5.3. Text Line Extraction
5.3.1. Text Line Extraction: G-1 Operations
5.3.2. Text Line Extraction: G-2 Operations
5.3.3. Text Line Extraction: G-3 Operations
5.3.4. Text Line Extraction: G-4 Operations
5.4. Text Line Recognition
5.4.1. Text Line Recognition: G-2 Operations
5.4.2. Text Line Recognition: G-1 Operations
6. iDocChip Hardware Architecture
6.1. Hardware Architectures of Preprocessing Operations
6.1.1. Pixel-Based Computations
6.1.2. Window-Based Computations
6.1.3. Computations with Irregular Memory Access
6.2. Hardware Architectures of Text Line Recognition
7. Experimental Setup and Results
7.1. The iDocChip Hardware Accelerator
7.2. Comparisons between the Hardware and Software Implementations
7.2.1. Software Optimizations
7.2.2. Energy Consumption
7.3. Results and Discussion
8. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
ASIC | application-specific integrated circuit |
AXI | advanced extensible interface |
Bi-LSTM | Bidirectional LSTM |
BRAM | block random-access memory |
CC | connected component |
CCA | connected component analysis |
CCL | connected component labeling |
CER | character error rate |
CMOS | complementary metal oxide semiconductor |
CNN | Convolutional Neural Network |
CPU | central processing unit |
CTC | connectionist temporal classification |
DIBCO | Document Image Binarization Competition |
DMA | direct memory access |
DRAM | dynamic random-access memory |
EDT | Euclidean Distance Transform |
ELM | Extreme Learning Machine |
FPGA | field-programmable gate array |
FPS | frames per second |
GPIO | general-purpose input/output |
GPU | graphics processing unit |
IP | intellectual property |
LSTM | long short-term memory |
MD-LSTM | multidimensional long short-term memory |
OCR | optical character recognition |
PBB | percentile-based binarization |
PL | programmable logic |
PS | processing system |
SE | structuring element |
SIPO | Serial In - Parallel Out |
SoC | System-on-Chip |
SVM | support-vector machine |
TDP | thermal design power |
References
- PenPower. Available online: http://www.penpowerinc.com (accessed on 28 July 2021).
- Scanning Pens. Available online: https://www.scanningpens.com/ (accessed on 28 July 2021).
- Scanmaker. Available online: https://scanmarker.com/ (accessed on 28 July 2021).
- Ectaco C-Pen. Available online: https://www.ectaco.com/cpen-30/ (accessed on 28 July 2021).
- IRISPen. Available online: https://www.irislink.com/EN-US/c1870/Compare-IRIS-digital-pens.aspx (accessed on 28 July 2021).
- C-PEN. Available online: https://cpen.com/ (accessed on 28 July 2021).
- Google Cloud Vision OCR. Available online: https://cloud.google.com/vision/docs/ocr (accessed on 28 July 2021).
- Microsoft Computer Vision. Available online: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ (accessed on 28 July 2021).
- ABBYY Cloud OCR. Available online: https://www.abbyy.com/cloud-ocr-sdk/ (accessed on 28 July 2021).
- CloudOCR. Available online: https://cloudocr.com/ (accessed on 28 July 2021).
- Forbes-FPGA Chip on iPhone 7. Available online: https://www.forbes.com/sites/aarontilley/2016/10/17/iphone-7-fpga-chip-artificial-intelligence/?sh=6fbb634d3c69 (accessed on 28 July 2021).
- Vuzix Glass OCR. Available online: https://www.vuzix.com/appstore/app/glass-ocr-for-m300 (accessed on 28 July 2021).
- ORCAM OCR Device to Wear on Glasses. Available online: https://www.orcam.com/en/media/life-changing-optical-character-recognition-glasses/ (accessed on 28 July 2021).
- Envision Glasses. Available online: https://www.letsenvision.com/envision-glasses (accessed on 28 July 2021).
- eSight. Available online: https://esighteyewear.com/ (accessed on 28 July 2021).
- ABBYY. Available online: https://www.abbyy.com/en-eu/ (accessed on 28 July 2021).
- Omnipage. Available online: https://www.kofax.com/Products/omnipage?source=nuance (accessed on 28 July 2021).
- OCRopus. Available online: https://github.com/ocropus/ocropy (accessed on 28 July 2021).
- Tesseract. Available online: https://github.com/tesseract-ocr (accessed on 28 July 2021).
- Bukhari, S.S.; Kadi, A.; Jouneh, M.A.; Mir, F.M.; Dengel, A. anyOCR: An Open-Source OCR System for Historical Archives. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; IEEE: Piscataway, NJ, USA, 2017; Volume 1, pp. 305–310. [Google Scholar]
- Narragonien-Digital. Available online: http://www.narragonien-digital.de/exist/home.html (accessed on 28 July 2021).
- Kallimachos. Available online: http://kallimachos.de/kallimachos/index.php/Projektbeschreibung (accessed on 28 July 2021).
- German Research Centre for Artificial Intelligence (DFKI). Available online: https://www.dfki.de/web/news/detail/News/any-ocr/ (accessed on 28 July 2021).
- University of Würzburg. Available online: https://www.uni-wuerzburg.de/aktuelles/einblick/single/news/narrenschi/ (accessed on 28 July 2021).
- Narrenschif. Available online: http://kallimachos.de/kallimachos/index.php/Narragonien (accessed on 28 July 2021).
- Rybalkin, V.; Bukhari, S.S.; Ghaffar, M.M.; Ghafoor, A.; Wehn, N.; Dengel, A. iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing: Percentile Based Binarization. In Proceedings of the ACM Symposium on Document Engineering 2018, Halifax, NS, Canada, 28–31 August 2018; ACM: New York, NY, USA, 2018; p. 24. [Google Scholar]
- Tekleyohannes, M.K.; Rybalkin, V.; Ghaffar, M.M.; Varela, J.A.; Wehn, N.; Dengel, A. iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing. Int. J. Parallel Program. 2021, 49, 253–284. [Google Scholar] [CrossRef]
- Tekleyohannes, M.K.; Rybalkin, V.; Ghaffar, M.M.; Wehn, N.; Dengel, A. iDocChip-A Configurable Hardware Architecture for Historical Document Image Processing: Text Line Extraction. In Proceedings of the 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig), Cancun, Mexico, 9–11 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
- Rybalkin, V.; Wehn, N.; Yousefi, M.R.; Stricker, D. Hardware architecture of bidirectional long short-term memory neural network for optical character recognition. In Proceedings of the Conference on Design, Automation & Test in Europe, Lausanne, Switzerland, 27–31 March 2017; European Design and Automation Association: Leuven, Belgium, 2017; pp. 1394–1399. [Google Scholar]
- Tekleyohannes, M.K.; Rybalkin, V.; Bukhari, S.S.; Ghaffar, M.M.; Varela, J.A.; Wehn, N.; Dengel, A. iDocChip—A Configurable Hardware Architecture for Historical Document Image Processing: Multiresolution Morphology-based Text and Image Segmentation. In Proceedings of the 6th International Embedded Systems Symposium (IESS), Friedrichshafen, Germany, 9–11 September 2019. [Google Scholar]
- Brugger, C.; Dal’Aqua, L.; Varela, J.A.; De Schryver, C.; Sadri, M.; Wehn, N.; Klein, M.; Siegrist, M. A quantitative cross-architecture study of morphological image processing on CPUs, GPUs, and FPGAs. In Proceedings of the 2015 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Langkawi, Malaysia, 12–14 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 201–206. [Google Scholar]
- Qasaimeh, M.; Denolf, K.; Lo, J.; Vissers, K.; Zambreno, J.; Jones, P.H. Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels. In Proceedings of the 2019 IEEE International Conference on Embedded Software and Systems (ICESS), Las Vegas, NV, USA, 2–3 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
- Page, A.; Mohsenin, T. An efficient & reconfigurable FPGA and ASIC implementation of a spectral Doppler ultrasound imaging system. In Proceedings of the 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, Washington, DC, USA, 5–7 June 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 198–202. [Google Scholar]
- Jiang, S.; He, D.; Yang, C.; Xu, C.; Luo, G.; Chen, Y.; Liu, Y.; Jiang, J. Accelerating mobile applications at the network edge with software-programmable fpgas. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 15–19 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 55–62. [Google Scholar]
- Bonamy, R.; Bilavarn, S.; Muller, F.; Duhem, F.; Heywood, S.; Millet, P.; Lemonnier, F. Energy efficient mapping on manycore with dynamic and partial reconfiguration: Application to a smart camera. Int. J. Circuit Theory Appl. 2018, 46, 1648–1662. [Google Scholar] [CrossRef] [Green Version]
- Xilinx, Inc. Zynq®-7000 All Programmable SoC. Available online: https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html (accessed on 27 June 2021).
- Baidu’s Apollo Driverless Platform. Available online: https://www.electronicdesign.com/markets/automotive/article/21119589/xilinx-soc-fpga-powers-baidus-apollo-driverless-platform (accessed on 28 July 2021).
- Topic Embedded Systems. Available online: https://topic.nl/en/products (accessed on 28 July 2021).
- AXIOM Beta: A Professional Digital Cinema Camera. Available online: https://apertus.org/axiom (accessed on 28 July 2021).
- Ishikawa, S.N.; Takahashi, T.; Watanabe, S.; Narukage, N.; Miyazaki, S.; Orita, T.; Takeda, S.; Nomachi, M.; Fujishiro, I.; Hodoshima, F. High-speed X-ray imaging spectroscopy system with Zynq SoC for solar observations. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 2018, 912, 191–194. [Google Scholar] [CrossRef] [Green Version]
- Mata-Carballeira, Ó.; Gutiérrez-Zaballa, J.; del Campo, I.; Martínez, V. An FPGA-Based Neuro-Fuzzy Sensor for Personalized Driving Assistance. Sensors 2019, 19, 4011. [Google Scholar] [CrossRef] [Green Version]
- Guo, K.; Sui, L.; Qiu, J.; Yu, J.; Wang, J.; Yao, S.; Han, S.; Wang, Y.; Yang, H. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2017, 37, 35–47. [Google Scholar] [CrossRef]
- Afroge, S.; Ahmed, B.; Mahmud, F. Optical character recognition using back propagation neural network. In Proceedings of the 2016 2nd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE), Rajshahi, Bangladesh, 8–10 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–4. [Google Scholar]
- Wei, T.C.; Sheikh, U.; Ab Rahman, A.A.H. Improved optical character recognition with deep neural network. In Proceedings of the 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), Parkroyal, Malaysia, 9–10 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 245–249. [Google Scholar]
- Nasien, D.; Haron, H.; Yuhaniz, S.S. Support Vector Machine (SVM) for English handwritten character recognition. In Proceedings of the 2010 Second International Conference on Computer Engineering and Applications, Bali Island, Indonesia, 19–21 March 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 1, pp. 249–252. [Google Scholar]
- Lavanya, K.; Bajaj, S.; Tank, P.; Jain, S. Handwritten digit recognition using hoeffding tree, decision tree and random forests—A comparative approach. In Proceedings of the 2017 International Conference on Computational Intelligence in Data Science (ICCIDS), Chennai, India, 2–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
- Ilmi, N.; Budi, W.T.A.; Nur, R.K. Handwriting digit recognition using local binary pattern variance and K-Nearest Neighbor classification. In Proceedings of the 2016 4th International Conference on Information and Communication Technology (ICoICT), Shanghai, China, 22–23 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar]
- Sampath, A.; Gomathi, N. Decision tree and deep learning based probabilistic model for character recognition. J. Cent. South Univ. 2017, 24, 2862–2876. [Google Scholar] [CrossRef]
- Younis, K.S.; Alkhateeb, A.A. A new implementation of deep neural networks for optical character recognition and face recognition. In Proceedings of the New Trends in Information Technology, Amman, Jordan, 25–27 April 2017; pp. 157–162. [Google Scholar]
- Srivastava, S.; Priyadarshini, J.; Gopal, S.; Gupta, S.; Dayal, H.S. Optical character recognition on bank cheques using 2D convolution neural network. In Applications of Artificial Intelligence Techniques in Engineering; Springer: Berlin/Heidelberg, Germany, 2019; pp. 589–596. [Google Scholar]
- Das, T.; Tripathy, A.K.; Mishra, A.K. Optical character recognition using artificial neural network. In Proceedings of the 2017 International Conference on Computer Communication and Informatics (ICCCI), Oxford, UK, 26–28 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–4. [Google Scholar]
- Moysset, B.; Kermorvant, C.; Wolf, C.; Louradour, J. Paragraph text segmentation into lines with recurrent neural networks. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 456–460. [Google Scholar]
- Murdock, M.; Reid, S.; Hamilton, B.; Reese, J. ICDAR 2015 competition on text line detection in historical documents. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1171–1175. [Google Scholar]
- Kundu, S.; Paul, S.; Bera, S.K.; Abraham, A.; Sarkar, R. Text-line extraction from handwritten document images using GAN. Expert Syst. Appl. 2020, 140, 112916. [Google Scholar] [CrossRef]
- Breuel, T.M.; Ul-Hasan, A.; Al-Azawi, M.A.; Shafait, F. High-performance OCR for printed English and Fraktur using LSTM networks. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 683–687. [Google Scholar]
- Singh, B.M.; Sharma, R.; Mittal, A.; Ghosh, D. Parallel implementation of Souvola’s binarization approach on GPU. Int. J. Comput. Appl. 2011, 32, 28–33. [Google Scholar]
- Chen, X.; Lin, L.; Gao, Y. Parallel nonparametric binarization for degraded document images. Neurocomputing 2016, 189, 43–52. [Google Scholar] [CrossRef]
- Singh, B.M.; Sharma, R.; Mittal, A.; Ghosh, D. Parallel implementation of Otsu’s binarization approach on GPU. Int. J. Comput. Appl. 2011, 32, 16–21. [Google Scholar]
- Soua, M.; Kachouri, R.; Akil, M. GPU parallel implementation of the new hybrid binarization based on Kmeans method (HBK). J. Real-Time Image Process. 2018, 14, 363–377. [Google Scholar] [CrossRef] [Green Version]
- Westphal, F.; Grahn, H.; Lavesson, N. Efficient document image binarization using heterogeneous computing and parameter tuning. Int. J. Doc. Anal. Recognit. (IJDAR) 2018, 21, 41–58. [Google Scholar] [CrossRef] [Green Version]
- Sultana, A.; Meenakshi, M. Design and development of fpga based adaptive thresholder for image processing applications. In Proceedings of the 2011 IEEE Recent Advances in Intelligent Computational Systems, Trivandrum, India, 22–24 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 633–637. [Google Scholar]
- Rybalkin, V.; Wehn, N. When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA, 23–25 February 2020; pp. 111–121. [Google Scholar]
- Kumar, A.; Rastogi, P.; Srivastava, P. Design and FPGA Implementation of DWT, Image Text Extraction Technique. Procedia Comput. Sci. 2015, 57, 1015–1025. [Google Scholar] [CrossRef] [Green Version]
- Bai, X.; Shi, B.; Zhang, C.; Cai, X.; Qi, L. Text/non-text image classification in the wild with convolutional neural networks. Pattern Recognit. 2017, 66, 437–446. [Google Scholar] [CrossRef]
- Vignesh, O.; Mangalam, H.; Gayathri, S. FPGA architecture for text extraction from images. Clust. Comput. 2019, 22, 12137–12146. [Google Scholar] [CrossRef]
- Sanni, K.; Garreau, G.; Molin, J.L.; Andreou, A.G. FPGA implementation of a Deep Belief Network architecture for character recognition using stochastic computation. In Proceedings of the 2015 49th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 18–20 March 2015; pp. 1–5. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Zho, H.; Zhu, G.; Peng, Y. A RMB optical character recognition system using FPGA. In Proceedings of the 2016 IEEE International Conference on Signal and Image Processing (ICSIP), Beijing, China, 13–15 August 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 539–542. [Google Scholar]
- De Oliveira Junior, L.A.; Barros, E. An fpga-based hardware accelerator for scene text character recognition. In Proceedings of the 2018 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Verona, Italy, 8–10 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 125–130. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference On Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Pratikakis, I.; Zagoris, K.; Barlas, G.; Gatos, B. ICDAR2017 competition on document image binarization (DIBCO 2017). In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; IEEE: Piscataway, NJ, USA, 2017; Volume 1, pp. 1395–1403. [Google Scholar]
- Bezmaternykh, P.V.; Ilin, D.A.; Nikolaev, D.P. U-Net-bin: Hacking the document image binarization contest. Comput. Opt. 2019, 43, 825–832. [Google Scholar] [CrossRef]
- Karpinski, R.; Belaïd, A. Combination of Two Fully Convolutional Neural Networks for Robust Binarization. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2018; pp. 509–524. [Google Scholar]
- Huang, X.; Li, L.; Liu, R.; Xu, C.; Ye, M. Binarization of degraded document images with global-local U-Nets. Optik 2020, 203, 164025. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wagner, R.A.; Fischer, M.J. The string-to-string correction problem. J. ACM (JACM) 1974, 21, 168–173. [Google Scholar] [CrossRef]
- Bailey, D.G.; Johnston, C.T. Single pass connected components analysis. In Proceedings of the Image and Vision Computing, Hamilton, New Zealand, 5–7 December 2007; pp. 282–287. [Google Scholar]
- Bailey, D.G. Design for Embedded Image Processing on FPGAs; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Ma, N.; Bailey, D.G.; Johnston, C.T. Optimised single pass connected components analysis. In Proceedings of the 2008 International Conference on Field-Programmable Technology, Taipei, Taiwan, 7–10 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 185–192. [Google Scholar]
- Klaiber, M.J. A Parallel and Resource-Efficient Single Lookup Connected Components Analysis Architecture for Reconfigurable Hardware. Ph.D. Thesis, Universität Stuttgart, Stuttgart, Germany, 2016. [Google Scholar]
- Spagnolo, F.; Perri, S.; Corsonello, P. An efficient hardware-oriented single-pass approach for connected component analysis. Sensors 2019, 19, 3055. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tekleyohannes, M.; Sadri, M.; Weis, C.; Wehn, N.; Klein, M.; Siegrist, M. An advanced embedded architecture for connected component analysis in industrial applications. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 734–735. [Google Scholar]
- Tekleyohannes, M.K.; Weis, C.; Wehn, N.; Klein, M.; Siegrist, M. A Reconfigurable Accelerator for Morphological Operations. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, BC, Canada, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 186–193. [Google Scholar]
- Multi-Dimensional Image Processing (Scipy.Ndimage). Available online: https://docs.scipy.org/doc/scipy-0.14.0/reference/ndimage.html (accessed on 27 June 2021).
Binarization Method | Model [Mparams] | “Narrenschiff” Character-Level Accuracy [%] |
---|---|---|
State-of-the-art U-Net [74] | 93.1 | 75.05 |
Low-complexity U-Net [73] | 0.76 | 74.73 |
Hand-tuned percentile-based | - | 76.30 |
Original anyOCR Accuracy | 76.3% | |
---|---|---|
Modification Type | Modified Operations | Accuracy |
BIN-1 | After relocating skew angle computation | 76.42% |
BIN-2 | After relocating image rotation operation | 76.54% |
BIN-3 | After relocating high- and low-score calculations | 76.51% |
BIN-4 | After replacing spline interpolations by nearest-neighbor | 76.65% |
BIN-5 | After increasing zoom value | 76.34% |
BIN-6 | After modifying percentile filter operations | 76.22% |
TISEG-1 | After using alternate hole-fill operation instead of morphological reconstruction | 76.24% |
TLEXT-1 | After changing find maximum pixel computations by constant thresholds | 75.23% |
TLEXT-2 | After changing topological sort algorithm to a quick sort operation | 79.13% |
TLEXT-3 | After modifying the propagate label algorithm | 79.24% |
TLEXT-4 | After adding a reduction operation with and a expansion operation | 80.1% |
High-Resolution Images, Accuracy [%] | Low-Resolution Images, Accuracy [%] | |
---|---|---|
Cloud Vision OCR, Google | 76.32 | 76.39 |
iDocChip OCR | 80.10 | 79.82 |
Pipeline | LUT | FF | BRAM 36 Kb | DSP |
---|---|---|---|---|
Total of previous works | 109,701 (51%) | 101,179 (24%) | 248 (46%) | 99 (11%) |
End-to-end OCR | 201,895 (93%) | 323,067 (74%) | 512 (94%) | 129 (15%) |
Available | 218,600 | 437,200 | 545 | 900 |
Platform | Num. Cores | Threads per Core | Total Threads | Freq. [GHz] | Tested on | |||
---|---|---|---|---|---|---|---|---|
Python Baseline | Python Optimized | C++ (ST) | C++ MT | |||||
i7 4790T | 4 | 2 | 8 | 2.7 | ✓ | ✓ | ✓ | ✓ |
Atom C2758 | 8 | 1 | 8 | 2.4 | ✓ | ✓ | ✓ | |
Cortex A53 | 4 | 1 | 4 | 1.5 | ✓ | ✓ | ✓ | |
Cortex A9 | 2 | 1 | 2 | 0.8 | ✓ | ✓ | ✓ |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tekleyohannes, M.K.; Rybalkin, V.; Ghaffar, M.M.; Varela, J.A.; Wehn, N.; Dengel, A. iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing. J. Imaging 2021, 7, 175. https://doi.org/10.3390/jimaging7090175
Tekleyohannes MK, Rybalkin V, Ghaffar MM, Varela JA, Wehn N, Dengel A. iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing. Journal of Imaging. 2021; 7(9):175. https://doi.org/10.3390/jimaging7090175
Chicago/Turabian StyleTekleyohannes, Menbere Kina, Vladimir Rybalkin, Muhammad Mohsin Ghaffar, Javier Alejandro Varela, Norbert Wehn, and Andreas Dengel. 2021. "iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing" Journal of Imaging 7, no. 9: 175. https://doi.org/10.3390/jimaging7090175
APA StyleTekleyohannes, M. K., Rybalkin, V., Ghaffar, M. M., Varela, J. A., Wehn, N., & Dengel, A. (2021). iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing. Journal of Imaging, 7(9), 175. https://doi.org/10.3390/jimaging7090175