A Review of Document Image Enhancement Based on Document Degradation Problem
Abstract
:1. Introduction
2. Document Image Degradation Problems
2.1. Background Texture
2.2. Page Smudging
2.3. Handwriting Fading
2.4. Poor Lighting Conditions
2.5. Watermark Removal
2.6. Deblur
3. Document Image Degradation Methods
3.1. Background Texture Problem
3.2. Page Smudging Problem
3.3. Handwriting Fading Problem
3.4. The Problem of Poor Lighting Conditions
3.5. Watermark Removal Problem
3.6. Deblur Problem
4. Datasets and Metrics
4.1. Datasets
- Jung’s dataset [44]
- 2.
- Real Document Shadow Removal Dataset(RDSRD) [17]
- 3.
- Blurry document images (BMVC)Text dataset [55]
- 4.
- Bickley diary [67]
- 5.
- SMADI (Synchromedia Multispectral Ancient Document Images Dataset) [68]
- 6.
- DIBCO and H-DIBCO
- 7.
- DocImgEN
- 8.
- DocImgCN
4.2. Metrics
- F-Measure (FM) [71]: It is the summed average of accuracy and recall, which is a common evaluation criterion in the field of IR (information retrieval) and is often used to evaluate the classification models. The calculation formula is as follows:
- Distance Reciprocal Distortion (DRD) [71]: It is used to measure the visual distortion of the image in the binary document. The formula is as follows:
- Pseudo-F-Measure: It is introduced in Ref. [81], which makes use of pseudo-recall and pseudo-precision. The advantage of using pseudo-recall and pseudo-precision is that they use the weighted distance between output images as boundaries of characters in the extracted document and the boundaries of characters in the ground truth (GT) image. One other advantage of using the pseudo nature of recall is its consideration of local stroke width in output images, while pseudo-recall takes into consideration the local stroke width, and the pseudo nature of precision grows to the stroke width of the connected component in ground truth images.
- Peak signal-to-noise ratio (PSNR) [82]: The ratio between the maximum possible power used to represent a signal and the power of the corrupted noise that affects the fidelity of its representation. Since many signals have a wide dynamic range, PSNR is often expressed as a pair number using a decibel scale. PSNR is mainly used in image processing to quantify the reconstruction quality of images and videos affected by lossy compression.
- Character error rate (CER) [83]: Character Error Rate is computed based on the Levenshtein distance. It is the minimum number of character-level operations required to transform the ground truth or reference text into the OCR output text. CER is formulated as follows:
- Word error rate (WER) [84]: In order to maintain consistency between the identified word sequences and the standard word sequences, certain words need to be replaced, deleted, or inserted, and the total number of these inserted, replaced, or deleted words, divided by the percentage of the total number of words in the standard word sequences, is the WER.
- Structural similarity (SSIM) [85]: The structural similarity index defines structural information from the perspective of image composition as an attribute that reflects the structure of objects in a scene independently of luminance and contrast and models distortion as a combination of three different factors: luminance, contrast, and structure. Given patch “x” from one image and corresponding patch “y” from another image. The formula is as follows:
4.3. Experiment
Degradation | Dataset | Method | FM | FPS | PSNR | DRD |
---|---|---|---|---|---|---|
background texture | DIBCO H-DIBCO | Otsu | 74.22 | 76.99 | 14.54 | 30.36 |
Niblack | 41.12 | 41.57 | 6.67 | 91.23 | ||
Sauvola | 79.12 | 82.95 | 16.07 | 8.61 | ||
Bezmaternykh’s UNet | 89.29 | 90.53 | 21.32 | 3.29 | ||
FD-Net | 95.25 | 96.65 | 22.84 | 1.22 | ||
page smudge | Vo’s DSN | 88.04 | 90.81 | 18.94 | 4.47 | |
Bhowmik’s GiB | 83.16 | 87.72 | 16.72 | 8.82 | ||
Gallego’s SAE | 79.22 | 81.12 | 16.09 | 9.75 | ||
Zhao’s cGAN | 87.45 | 88.87 | 18.81 | 5.56 | ||
Peng’s woConvCRF | 86.09 | 87.40 | 18.99 | 4.83 | ||
handwriting fading | H-DIBCO 2018 | Bhunia [35] | 59.25 | 59.18 | 11.80 | 9.56 |
Xiong [36] | 88.34 | 90.37 | 19.11 | 4.93 | ||
DP-LinkNet | 95.99 | 96.85 | 22.71 | 1.09 | ||
Suh [40] | 84.95 | 91.58 | 17.04 | 16.86 | ||
DocEnTr | 90.59 | 93.97 | 19.46 | 3.35 |
4.4. Problems and Development Direction
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Su, B.; Lu, S.; Tan, C.L. Robust document image binarization technique for degraded document images. IEEE Trans. Image Process. 2012, 22, 1408–1417. [Google Scholar]
- Sulaiman, A.; Omar, K.; Nasrudin, M.F. Degraded historical document binarization: A review on issues, challenges, techniques, and future directions. J. Imaging 2019, 5, 48. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, X.; He, X.; Yang, J.; Wu, Q. An effective document image deblurring algorithm. In Proceedings of the CVPR 2011, Washington, DC, USA, 20–25 June 2011; pp. 369–376. [Google Scholar]
- Kligler, N.; Katz, S.; Tal, A. Document enhancement using visibility detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–23 June 2018; pp. 2374–2382. [Google Scholar]
- Mesquita, R.G.; Mello, C.A.; Almeida, L. A new thresholding algorithm for document images based on the perception of objects by distance. Integr. Comput.-Aided Eng. 2014, 21, 133–146. [Google Scholar] [CrossRef]
- Hedjam, R.; Cheriet, M. Historical document image restoration using multispectral imaging system. Pattern Recognit. 2013, 46, 2297–2312. [Google Scholar] [CrossRef]
- Lu, D.; Huang, X.; Sui, L. Binarization of degraded document images based on contrast enhancement. Int. J. Doc. Anal. Recognit. 2018, 21, 123–135. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Anvari, Z.; Athitsos, V. A pipeline for automated face dataset creation from unlabeled images. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA, 5–7 June 2019; pp. 227–235. [Google Scholar]
- Lin, W.-A.; Chen, J.-C.; Castillo, C.D.; Chellappa, R. Deep density clustering of unconstrained faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–23 June 2018; pp. 8128–8137. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Gu, S.; Zuo, W.; Guo, S.; Chen, Y.; Chen, C.; Zhang, L. Learning dynamic guidance for depth image enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3769–3778. [Google Scholar]
- Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1780–1789. [Google Scholar]
- Singh, G.; Mittal, A. Various image enhancement techniques—A critical review. Int. J. Innov. Sci. Res. 2014, 10, 267–274. [Google Scholar]
- Lin, Y.-H.; Chen, W.-C.; Chuang, Y.-Y. Bedsr-net: A deep shadow removal network from a single document image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12905–12914. [Google Scholar]
- Hansen, P.C.; Nagy, J.G.; O’leary, D.P. Deblurring Images: Matrices, Spectra, and Filtering; SIAM: Philadelphia, PA, USA, 2006. [Google Scholar]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
- Sauvola, J.; Pietikäinen, M. Adaptive document image binarization. Pattern Recognit. 2000, 33, 225–236. [Google Scholar] [CrossRef] [Green Version]
- Niblack, W. An Introduction to Digital Image Processing; Strandberg Publishing Company: København, Denmark, 1985. [Google Scholar]
- Westphal, F.; Grahn, H.; Lavesson, N. Efficient document image binarization using heterogeneous computing and parameter tuning. Int. J. Doc. Anal. Recognit. 2018, 21, 41–58. [Google Scholar] [CrossRef] [Green Version]
- Jana, P.; Ghosh, S.; Bera, S.K.; Sarkar, R. Handwritten document image binarization: An adaptive K-means based approach. In Proceedings of the 2017 IEEE Calcutta Conference (CALCON), Kolkata, India, 2–3 December 2017; pp. 226–230. [Google Scholar]
- Howe, N.R. Document binarization with automatic parameter tuning. Int. J. Doc. Anal. Recognit. (IJDAR) 2013, 16, 247–258. [Google Scholar] [CrossRef] [Green Version]
- Rani, U.; Kaur, A.; Josan, G. A New Contrast Based Degraded Document Image Binarization. In Cognitive Computing in Human Cognition; Springer: Berlin/Heidelberg, Germany, 2020; pp. 83–90. [Google Scholar]
- Bezmaternykh, P.V.; Ilin, D.A.; Nikolaev, D.P. U-Net-bin: Hacking the document image binarization contest. Кoмпьютерная Оптика 2019, 43, 825–832. [Google Scholar] [CrossRef]
- Xiong, W.; Yue, L.; Zhou, L.; Wei, L.; Li, M. FD-Net: A Fully Dilated Convolutional Network for Historical Document Image Binarization. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Beijing, China, 29 October–1 November 2021; pp. 518–529. [Google Scholar]
- Vo, Q.N.; Kim, S.H.; Yang, H.J.; Lee, G. Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognit. 2018, 74, 568–586. [Google Scholar] [CrossRef]
- Bhowmik, S.; Sarkar, R.; Das, B.; Doermann, D. GiB: A Game theory Inspired Binarization technique for degraded document images. IEEE Trans. Image Process. 2018, 28, 1443–1455. [Google Scholar] [CrossRef] [PubMed]
- Calvo-Zaragoza, J.; Gallego, A.-J. A selectional auto-encoder approach for document image binarization. Pattern Recognit. 2019, 86, 37–47. [Google Scholar] [CrossRef] [Green Version]
- Zhao, J.; Shi, C.; Jia, F.; Wang, Y.; Xiao, B. Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognit. 2019, 96, 106968. [Google Scholar] [CrossRef]
- Peng, X.; Wang, C.; Cao, H. Document binarization via multi-resolutional attention model with DRD loss. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 45–50. [Google Scholar]
- Souibgui, M.A.; Kessentini, Y. DE-GAN: A conditional generative adversarial network for document enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1180–1191. [Google Scholar] [CrossRef]
- Jia, F.; Shi, C.; He, K.; Wang, C.; Xiao, B. Degraded document image binarization using structural symmetry of strokes. Pattern Recognit. 2018, 74, 225–240. [Google Scholar] [CrossRef]
- Bhunia, A.K.; Bhunia, A.K.; Sain, A.; Roy, P.P. Improving document binarization via adversarial noise-texture augmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2721–2725. [Google Scholar]
- Xiong, W.; Zhou, L.; Yue, L.; Li, L.; Wang, S. An enhanced binarization framework for degraded historical document images. EURASIP J. Image Video Process. 2021, 2021, 13. [Google Scholar] [CrossRef]
- Xiong, W.; Jia, X.; Yang, D.; Ai, M.; Li, L.; Wang, S. DP-LinkNet: A convolutional network for historical document image binarization. KSII Trans. Internet Inf. Syst. (TIIS) 2021, 15, 1778–1797. [Google Scholar]
- Chaurasia, A.; Culurciello, E. Linknet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 182–186. [Google Scholar]
- Suh, S.; Kim, J.; Lukowicz, P.; Lee, Y.O. Two-stage generative adversarial networks for binarization of color document images. Pattern Recognit. 2022, 130, 108810. [Google Scholar] [CrossRef]
- Souibgui, M.A.; Biswas, S.; Jemni, S.K.; Kessentini, Y.; Fornés, A.; Lladós, J.; Pal, U. Docentr: An end-to-end document image enhancement transformer. arXiv 2022, arXiv:2201.10252. [Google Scholar]
- Bako, S.; Darabi, S.; Shechtman, E.; Wang, J.; Sunkavalli, K.; Sen, P. Removing shadows from images of documents. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 173–183. [Google Scholar]
- Wang, J.; Li, X.; Yang, J. Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–23 June 2018; pp. 1788–1797. [Google Scholar]
- Jung, S.; Hasan, M.A.; Kim, C. Water-filling: An efficient algorithm for digitized document shadow removal. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; pp. 398–414. [Google Scholar]
- Chen, Z.; Long, C.; Zhang, L.; Xiao, C. Canet: A context-aware network for shadow removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 4743–4752. [Google Scholar]
- Liu, Z.; Yin, H.; Mi, Y.; Pu, M.; Wang, S. Shadow removal by a lightness-guided network with training on unpaired data. IEEE Trans. Image Process. 2021, 30, 1853–1865. [Google Scholar] [CrossRef] [PubMed]
- Gangeh, M.J.; Tiyyagura, S.R.; Dasaratha, S.V.; Motahari, H.; Duffy, N.P. Document enhancement system using auto-encoders. In Proceedings of the Workshop on Document Intelligence at NeurIPS 2019, Vancouver, BC, Canada, 14 December 2019. [Google Scholar]
- Jemni, S.K.; Souibgui, M.A.; Kessentini, Y.; Fornés, A. Enhance to read better: A multi-task adversarial network for handwritten document image enhancement. Pattern Recognit. 2022, 123, 108370. [Google Scholar] [CrossRef]
- Liu, Y.; Guo, M.; Zhang, J.; Zhu, Y.; Xie, X. A novel two-stage separable deep learning framework for practical blind watermarking. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1509–1517. [Google Scholar]
- Jiang, P.; He, S.; Yu, H.; Zhang, Y. Two-stage visible watermark removal architecture based on deep learning. IET Image Process. 2020, 14, 3819–3828. [Google Scholar] [CrossRef]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2794–2802. [Google Scholar]
- Liu, Y.; Zhu, Z.; Bai, X. Wdnet: Watermark-decomposition network for visible watermark removal. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3685–3693. [Google Scholar]
- Ge, S.; Xia, Z.; Fei, J.; Sun, X.; Weng, J. A Robust Document Image Watermarking Scheme using Deep Neural Network. arXiv 2022, arXiv:2202.13067. [Google Scholar] [CrossRef]
- Hradiš, M.; Kotera, J.; Zemcık, P.; Šroubek, F. Convolutional neural networks for direct text deblurring. In Proceedings of the BMVC, Swansea, UK, 7–10 September 2015. [Google Scholar]
- Pan, J.; Sun, D.; Pfister, H.; Yang, M.-H. Blind image deblurring using dark channel prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1628–1636. [Google Scholar]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Nah, S.; Hyun Kim, T.; Mu Lee, K. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3883–3891. [Google Scholar]
- Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–23 June 2018; pp. 8183–8192. [Google Scholar]
- Takano, N.; Alaghband, G. Srgan: Training dataset matters. arXiv 2019, arXiv:1903.09922. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Lee, H.; Jung, C.; Kim, C. Blind deblurring of text images using a text-specific hybrid dictionary. IEEE Trans. Image Process. 2019, 29, 710–723. [Google Scholar] [CrossRef]
- Lu, B.; Chen, J.-C.; Chellappa, R. Unsupervised domain-specific deblurring via disentangled representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10225–10234. [Google Scholar]
- Liu, J.; Tan, J.; He, L.; Ge, X.; Hu, D. Blind image deblurring via local maximum difference prior. IEEE Access 2020, 8, 219295–219307. [Google Scholar] [CrossRef]
- Neji, H.; Hamdani, T.; Halima, M.; Nogueras-Iso, J.; Alimi, A.M. Blur2sharp: A gan-based model for document image deblurring. Int. J. Comput. Intell. Syst. 2021, 14, 1315–1321. [Google Scholar] [CrossRef]
- Gonwirat, S.; Surinta, O. DeblurGAN-CNN: Effective Image Denoising and Recognition for Noisy Handwritten Characters. IEEE Access 2022, 10, 90133–90148. [Google Scholar] [CrossRef]
- Deng, F.; Wu, Z.; Lu, Z.; Brown, M.S. BinarizationShop: A user-assisted software suite for converting old documents to black-and-white. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, Gold Coast, Australia, 21–25 June 2010; pp. 255–258. [Google Scholar]
- Hedjam, R.; Nafchi, H.Z.; Moghaddam, R.F.; Kalacska, M.; Cheriet, M. Icdar 2015 contest on multispectral text extraction (ms-tex 2015). In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 1181–1185. [Google Scholar]
- Gatos, B.; Ntirogiannis, K.; Pratikakis, I. ICDAR 2009 document image binarization contest (DIBCO 2009). In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 1375–1382. [Google Scholar]
- Pratikakis, I.; Gatos, B.; Ntirogiannis, K. ICDAR 2011 Document Image Binarization Contest (DIBCO 2011). In Proceedings of the International Conference on Document Analysis & Recognition, Beijing, China, 18–21 September 2011. [Google Scholar]
- Pratikakis, I.; Gatos, B.; Ntirogiannis, K. ICDAR 2013 document image binarization contest (DIBCO 2013). In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 1471–1476. [Google Scholar]
- Pratikakis, I.; Zagoris, K.; Barlas, G.; Gatos, B. ICDAR2017 competition on document image binarization (DIBCO 2017). In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 1395–1403. [Google Scholar]
- Pratikakis, I.; Zagoris, K.; Karagiannis, X.; Tsochatzidis, L.; Mondal, T.; Marthot-Santaniello, I. ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019). In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019. [Google Scholar]
- Pratikakis, I.; Gatos, B.; Ntirogiannis, K. H-DIBCO 2010-handwritten document image binarization competition. In Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition, Washington, DC, USA, 16–18 November 2010; pp. 727–732. [Google Scholar]
- Pratikakis, I.; Gatos, B.; Ntirogiannis, K. ICFHR 2012 competition on handwritten document image binarization (H-DIBCO 2012). In Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, Bari, Italy, 18–20 September 2012; pp. 817–822. [Google Scholar]
- Ntirogiannis, K.; Gatos, B.; Pratikakis, I. ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014). In Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Crete, Greece, 1–4 September 2014; pp. 809–813. [Google Scholar]
- Pratikakis, I.; Zagoris, K.; Barlas, G.; Gatos, B. ICFHR2016 handwritten document image binarization contest (H-DIBCO 2016). In Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 619–623. [Google Scholar]
- Pratikakis, I.; Zagori, K.; Kaddas, P.; Gatos, B. ICFHR 2018 Competition on Handwritten Document Image Binarization (H-DIBCO 2018). In Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA, 5–8 August 2018. [Google Scholar]
- IEEE/Iet Electronic Library. Available online: https://ieeexplore.ieee.org/Xplore/home.jsp (accessed on 13 February 2023).
- China National Knowledge Infrastructure. Available online: https://www.cnki.net/ (accessed on 13 February 2023).
- Ntirogiannis, K.; Gatos, B.; Pratikakis, I. Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 2012, 22, 595–609. [Google Scholar] [CrossRef] [PubMed]
- Lu, H.; Kot, A.C.; Shi, Y.Q. Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 2004, 11, 228–231. [Google Scholar] [CrossRef] [Green Version]
- Bazzi, I.; Schwartz, R.; Makhoul, J. An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21, 495–504. [Google Scholar] [CrossRef]
- Klakow, D.; Peters, J. Testing the correlation of word error rate and perplexity. Speech Commun. 2002, 38, 19–28. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dataset | Task | No. of Image | Real vs. Synthetic |
---|---|---|---|
Jung’s dataset | poor lighting conditions | 159 | real |
RDSRD | poor lighting conditions | 540 | real |
BMVC | deblurring | 3M train/35K test | synthetic |
Bickley diary | multiple | 7 | real |
SMADI | multiple | 240 | synthetic |
DIBCO and H-DIBCO | multiple | 10, 10 | real |
DocImgEN | watermark removal | 10 k train/10 k validation/10 k test | synthetic |
DocImgCN | watermark removal | 230 k train/10 k validation/10 k test | synthetic |
Degradation | Method | Jung’s Datasets | RDSRD | ||
---|---|---|---|---|---|
PSNR | SSIM | PSNR | SSIM | ||
poor lighting conditions | Bako [42] | 23.70 | 0.9015 | 28.24 | 0.8664 |
ST-CGAN | 23.71 | 0.9046 | 30.31 | 0.9016 | |
Kligler [4] | 24.45 | 0.8332 | 22.53 | 0.7056 | |
Jung [44] | 28.49 | 0.9108 | 14.45 | 0.7054 | |
BEDSR-Net | 27.23 | 0.9115 | 33.48 | 0.9084 |
Degradation | Dataset | Method | PSNR | SSIM | CER |
---|---|---|---|---|---|
deblurring | BMVC text | Hradis [55] | 30.6 | 0.98 | 7.2 |
Pan [56] | 21.84 | 0.93 | 35.3 | ||
Zhu [57] | 19.57 | 0.89 | 53.0 | ||
Nah [58] | 22.27 | 0.92 | 50.6 | ||
Lu [63] | 22.56 | 0.95 | 10.1 |
Degradation | Problems or Future Developments |
---|---|
background texture | The challenge of the background texture problem is mostly for color document images, and much of the current research is still focused on the processing of grayscale text images with better results. The color document images, on the other hand, cause a loss of text content when processed by model algorithms due to the difference in color. Therefore, methods for color document images need to be investigated. The text images include handwritten images and printed images, and there is still less discussion on the comparison of these two types of datasets, which can be processed and studied. |
page smudge | Character recognition is very challenging when the image document has ghosting problems (ink is visible on the other side of the text page, but the ink is not precisely from the other side, causing interference with the text on the target page) and when the RGB image has a variety of ink colors and intensities that vary from moment to moment. Current methods are less effective for low-contrast document image bleed-through problems. Therefore, we need to propose a model approach based on such problems to solve these problems. |
handwriting fading | Although the fading of handwriting will affect the readability of the document and the OCR effect, there are severe and minor fading cases, and minor fading is still relatively easy to identify. This type of degradation problem deals more with handwritten text images. For severely faded documents, the datasets of many studies are not published, and there is still a need to produce datasets for such problems. |
the poor lighting conditions | The problem of overexposure in uneven illumination conditions, i.e., the result of adding too much flash to devices such as cameras during digitization, has been less studied [54,55]. There is no suitable dataset for this problem. Therefore, the exposure processing problem in image document enhancement still needs sufficient research in the future. |
watermark removal | There is less research on watermark removal for document images, mainly due to the differences in language and text, resulting in limited watermark removal techniques for document images. For example, English text, a combination of 26 letters, and Chinese characters are three-dimensional homogeneous characters, resulting in less effective information redundancy in Chinese characters. Additionally, most of the methods are now only applicable to black text, the watermark color of a single document. However, watermark removal involves the integrity of the original content and intellectual property issues, so we need to handle it carefully. |
deblurring | Models mainly generate blurred text datasets, but this is not enough to be used in real scenarios, which are diverse and many situations are not considered. The current number of blurred text datasets is still insufficient, especially since Chinese datasets are even more scarce. The insufficient coverage of scenarios in the dataset also leads to algorithmic models that do not take advantage of them. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, Y.; Zuo, S.; Yang, Z.; He, J.; Shi, J.; Zhang, R. A Review of Document Image Enhancement Based on Document Degradation Problem. Appl. Sci. 2023, 13, 7855. https://doi.org/10.3390/app13137855
Zhou Y, Zuo S, Yang Z, He J, Shi J, Zhang R. A Review of Document Image Enhancement Based on Document Degradation Problem. Applied Sciences. 2023; 13(13):7855. https://doi.org/10.3390/app13137855
Chicago/Turabian StyleZhou, Yanxi, Shikai Zuo, Zhengxian Yang, Jinlong He, Jianwen Shi, and Rui Zhang. 2023. "A Review of Document Image Enhancement Based on Document Degradation Problem" Applied Sciences 13, no. 13: 7855. https://doi.org/10.3390/app13137855
APA StyleZhou, Y., Zuo, S., Yang, Z., He, J., Shi, J., & Zhang, R. (2023). A Review of Document Image Enhancement Based on Document Degradation Problem. Applied Sciences, 13(13), 7855. https://doi.org/10.3390/app13137855