Next Article in Journal
Novel Light Convolutional Neural Network for COVID Detection with Watershed Based Region Growing Segmentation
Previous Article in Journal
Forensic Gender Determination by Using Mandibular Morphometric Indices an Iranian Population: A Panoramic Radiographic Cross-Sectional Study
Previous Article in Special Issue
Digital Hebrew Paleography: Script Types and Modes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Quality, Size and Time Assessment of the Binarization of Documents Photographed by Smartphones

by
Rodrigo Bernardino
1,
Rafael Dueire Lins
1,2,3,* and
Ricardo da Silva Barboza
3
1
Centro de Informática, Universidade Federal de Pernambuco, Recife 50.670-901, PE, Brazil
2
Departamento de Computação, Universidade Federal Rural de Pernambuco, Recife 55.815-060, PE, Brazil
3
Coordenação de Engenharia da Computação, Escola Superior de Tecnologia, Universidade do Estado do Amazonas, Manaus 69.410-000, AM, Brazil
*
Author to whom correspondence should be addressed.
J. Imaging 2023, 9(2), 41; https://doi.org/10.3390/jimaging9020041
Submission received: 19 November 2022 / Revised: 10 January 2023 / Accepted: 15 January 2023 / Published: 13 February 2023

Abstract

:
Smartphones with an in-built camera are omnipresent today in the life of over eighty percent of the world’s population. They are very often used to photograph documents. Document binarization is a key process in many document processing platforms. This paper assesses the quality, file size and time performance of sixty-eight binarization algorithms using five different versions of the input images. The evaluation dataset is composed of deskjet, laser and offset printed documents, photographed using six widely-used mobile devices with the strobe flash off and on, under two different angles and four shots with small variations in the position. Besides that, this paper also pinpoints the algorithms per device that may provide the best visual quality-time, document transcription accuracy-time, and size-time trade-offs. Furthermore, an indication is also given on the “overall winner” that would be the algorithm of choice if one has to use one algorithm for a smartphone-embedded application.

1. Introduction

The current number of smartphone users in the world today is over 6.6 billion (Source: https://www.bankmycell.com/blog/how-many-phones-are-in-the-world, last visited on 29 December 2022), which means that over 83% of the world’s population owns a smartphone. The omnipresence of smartphones with in-built cameras made most people (91%) take photos with smartphones, while only 7% use digital cameras or tablets (2%). According to that same website, the forecast figures by Ericsson and the Radicati Group, that percentage is expected to grow from 91% in 2022 to 94% in 2026. Consumers see the quality of the camera as a key factor in choosing a smartphone model. Thus, since cameras became the most significant selling point on smartphones, manufacturers have been putting much effort into improving their quality. At first, they paid more attention to the amount of megapixels a smartphone camera could pack. In the last few years, smartphone manufacturers have opted to add more cameras to their phones to improve photo quality and optical zoom functionality while keeping the device thin. Each camera has a lens that can yield either a wide shot or a zoomed-in shot. Some phones have additional black and white cameras for increased light sensitivity, while others offer depth information. Data from the different cameras can be combined into a clear photo with seemingly shallow depth-of-field and good low-light capability.
Taking photos of documents with smartphone cameras, an attitude that started almost two decades ago [1,2,3,4], became of widespread use today. It is extremely simple and saves photocopying costs, allowing the document image to be easily stored and shared using computer networks. However, smartphone cameras were made to take family and landscape photos or make videos of such subjects and were not targeted at document image acquisition. Smartphone document images have several problems that bring challenges to processing them. The resolution and illumination are uneven, there are perspective distortions, and often the interference of external light sources [4]. Even the in-built strobe flash may add further difficulties if activated by the user or automatically. Besides all that, the standard file format used by smartphone cameras to save the images is jpeg, which inserts the jpeg noise [5], a light white noise added to prevent two pixels of the same color from appearing next to each other. This noise makes the final image more pleasant to the human eye glancing at a landscape or family photo, but it also means a loss in sharpness in a document image, bringing difficulties to any further processing.
The conversion of a color image into its black-and-white version is called thresholding or binarization. It is a key step in the pipeline of many document processing systems, including document content recovery [6]. The binarization of scanned document images is far from being a simple task as the physical noises [7], such as paper aging, stains, fungi, folding marks, etc., and back-to-front interference [8] increase the complexity of the task. In the case of scanned documents, some recent document binarization competitions [9,10] show that no single binarization algorithm is efficient for all types of text document images. Their performance depends on a wide number of factors, from the digitalization device, image resolution, the kind of physical noises in the document [7], the way the document was printed, typed or handwritten, the age of the document, etc. Besides that, those competitions showed that the time complexity of the algorithms also varies widely, making some of them impossible to be used in any document processing pipeline. Thus, instead of having an overall best, those competitions pointed out the top quality-time algorithms in several categories of documents.
The binarization of photographed documents is far more complex than scanned ones and, as already mentioned above, the resolution and illumination are uneven, among several other problems. Besides that, each smartphone model has different camera features. The first competition to assess the quality and time of the binarization of smartphone camera-acquired text documents, the type of document that is most often photographed, comparing new algorithms with previously published and more classical ones was [11]. In 2021, that same competition occurred with several new competitors and devices [12].
Binary images also are much smaller than their color counterparts, thus their use may save storage space and computer bandwidth [13]. This means that assessing the resulting image file size using a lossless compression scheme is also relevant for comparison among binarization algorithms. Besides that, the binary image may be the key for generating colored synthetic images, which are visually indistinguishable from the original document whenever printed or visualized on a screen [14]. Run-length encoding [15] the sequences of black and white pixels is the key to several schemes for compressing monochromatic images. Suppose the binarization process leaves salt-end-pepper noise in the final image, sometimes imperceptible to the human eye. In that case, that noise will break the sequence of similar pixels, degrading the performance of the image compression scheme. Indirectly, that can also be observed as a measure of the quality of the monochromatic image. The third venue [16] of the ACM DocEng Competition on the binarization of photographed documents assessed five new and sixty-four algorithms, and it was possibly the first time the size of the monochromatic image was considered in the assessment of the binarization algorithms, ever.
Reference [17] shows that feeding the binarization algorithms with the different red, green and blue (RGB) channels, instead of the whole image, may yield a better quality two-tone image, besides saving processing time. This paper largely widens the scope of [16] as, due to the restricted time to produce the final report, it was impossible to process and assess the quality, time, and file size of the almost 350 binarization schemes. Besides that, also due to processing time limitations, the file-size assessment was ranked based on the quality of the optical character recognition (OCR) transcription based on the Levenshtein distance to the ground-truth text. In contrast, here, one ranks the algorithms based on a new image quality measure introduced here, possibly a more adequate measure.
The recent paper [18] presents a methodology to pinpoint which binarization algorithm would provide the best quality-time trade-off either for printing or for OCR-transcription. It also proposes an overall winner if one would choose one single algorithm capable of being embedded in applications in a smartphone model. The present paper also makes such choices for each of the smartphones assessed.

2. Materials and Methods

Six different models of smartphones from three different manufacturers, widely used today, were used in this assessment. Their camera specification is described on Table 1. Their in-built strobe flash was set on and off to acquire images of offset, laser, and deskjet printed text documents photographed at four shots with small variations in the position and moments, to allow for different interfering light sources. The document images captured with the six devices were grouped into two separate datasets:
  • Dataset 1: created for 2022 DocEng contest [16], the photos were taken with devices Samsung N10+ (Note 10+) (Samsung Electronics, Suwon-si, South Korea) and Samsung S21U (Ultra 5G) (Samsung Electronics, Suwon-si, South Korea). It has challenging images with natural and artificial light sources and with strong shadows;
  • Dataset 2: created for 2021 DocEng contest [19], the photos were taken with devices Motorola G9 (Motorola Mobility, Chicago, IL, USA), Samsung A10S (Samsung Electronics, Suwon-si, South Korea), Samsung S20 (Samsung Electronics, Suwon-si, South Korea) and Apple iPhone SE 2 (Apple Inc., Cupertino, CA, USA). It also has challenging images, but they are less complex than Dataset 1.
The test images were incorporated to the IAPR (International Association for Pattern Recognition) DIB - Document image binarization platform (https://dib.cin.ufpe.br, accessed on 17 January 2023)), which focuses on document binarization. It encompasses several datasets of document images of historical, bureaucratic, and ordinary documents, Which were handwritten, machine-typed, offset, laser, and ink-jet printed, both scanned and photographed, several of them with their corresponding ground-truth images. Besides being a document repository, the DIB-platform encompasses a synthetic document image generator, which allows the user to create over 5.5 million documents with different features. As already mentioned, reference [17] shows that binarization algorithms, in general, yield different quality images whenever fed with the color, gray-scale-converted, and R, G, and B-channels. Here, 68 classical and recently published binarization algorithms are fed with the five versions of the input image, totaling 340 different binarization schemes. The complete list of the algorithms used is presented in Table 2, along with a short description and the approach followed in each of them.
The quality of the final monochromatic image is the most important assessment criterion. Once one has the top-quality images, one may consider the mean size of the monochromatic files and the mean time elapsed by each of the assessed algorithms through the dataset. This paper proposes a novel quality measure for photographed document images called P L , it is a combination of the previously proposed P e r r [70] and [ L d i s t ] [11] measures. Two quality measures were used to evaluate the quality of the binarization algorithms: the [ L d i s t ] and the P L .

2.1. The Quality Measure of the Proportion of Pixels ( P e r r )

Assessing image quality of any kind is a challenging task. The quality of photographed documents is particularly hard to evaluate as the image resolution is uneven, it strongly depends on the features of the device, the distance between the document and the camera and it even suffers from perspective distortion. Creating a ground-truth (GT) binary image for each photographed document would require a non-viable paramount effort. An alternative method [70] was used: the paper sheet or book page is scanned at 300 dpi, binarized with several algorithms, visually inspected, and manually selected and retouched to provide the best possible binary image of that scanned document, which will generate the reference proportion of black pixels for that document image. The P e r r measure compares the proportion between the black-to-white pixels in the scanned and photographed binary documents, as described in Equation (1):
P e r r = a b s ( P B b i n P B G T ) ,
where P B = 100 × ( B / N ) is the proportion of black pixels in the image, B is the total number of black pixels and N is the total number of pixels in the image. Thus, P B b i n is the proportion of black pixels in the binary image and P B G T is the proportion of black pixels in the scanned ground-truth image.
In order to provide a fair assessment, the photographed image must meet several requirements. The resolution of the output document photo must be close to 300 dpi (which correspond to the scanned one). To meet such a requirement, the camera should have around 12 Mpixel resolution and the document should fill nearly all the photographed image; the photo must be cropped to remove any reminding border. Here, the cropping is manually done, as the focus is to assess specifically the binarization algorithms. Figure 1 describes the preparation of the images and an example of P e r r calculation. The P e r r was used by the last DocEng contests [11,12,16] to evaluate the quality of the binary images for printing and human reading.

2.2. Normalized Levenshtein Distance ( [ L d i s t ] )

The second quality measure is the Optical Character Recognition (OCR) correctness rate measured by [ L d i s t ] [11], which is the Levenshtein [71] distance normalized by the number of characters in the text. Google Vision OCR was used to obtain the machine-transcribed text. It is important to note that Google Vision automatically detects the input language and applies post-processing based on dictionary, which cannot be deactivated. The Levenshtein distance, here denoted by L d i s t , expresses the number of character insertion, deletion and replacements that would be necessary to convert the recognized text into the manually transcribed reference text for each image. Thus, the L d i s t depends on the length of the text and cannot be used to measure the performance across different documents as an absolute value. In [11], a normalized version of the L d i s t was proposed, calculated as:
[ L d i s t ] = # c h a r L d i s t # c h a r ,
where # c h a r is the number of characters in the reference text.
The DocEng 2022 binarization competition for photographed documents presented a new challenging dataset in which complex shaded areas were introduced. Although the P e r r quality measure worked well whenever the shaded area was more uniformly distributed, in those more complex multi-shaded documents, some algorithms may concentrate the pixels around some characters (e.g. by dilatation) while completely removing other parts of the document. This could generate an image that has the same proportion of black pixels as the ground-truth, a clear background with no evident noise, but its text is unreadable. Taking, for instance, an example image taken with Apple iPhone SE2 of a deskjet printed document with the strobe flash off (Figure 2a), the algorithm with the closest black pixel proportion would be DiegoPavan provided the original color image. The result is presented in Figure 2b. Note that even the remaining dilated letters are nearly unreadable, giving a [ L d i s t ] of nearly zero, meaning almost no text was transcribed. The P e r r close to zero means the proportion of black pixels is very close to the ground-truth.
If one ignores the P e r r and only sorts the results by [ L d i s t ] , the most recommended algorithm would be dSLR, having the original color image as input. The result of such binarization is presented in Figure 2c for the same image. Nearly all the text was successfully transcribed ( [ L d i s t ] close to 1.0), however, there is a large noisy area in the bottom-left corner, which only did not significantly affected the transcription due to the large margins of the document. Such a noise was generated by a shadow of the mobile phone and could not be detected by [ L d i s t ] measure, but checking P e r r it is clear that a large amount of noise is present. A printed document usually has nearly 5% of text pixels (in this image, it was 3.77%), thus a difference of 8.79 from the ground-truth is a large one. If one would want just to transcribe the text, it could be enough to use such an algorithm for that image; however, if the margins were smaller or the binarized document would be printed, such a large noise blurb would be unacceptable.

2.3. Pixel Proportion and Levenshtein Measure ( P L )

In order to obtain the best OCR quality while providing visually pleasant human-readable binary document images, a new quality measure is proposed here:
P L = [ L d i s t ] × ( 100 P e r r ) .
Applying such a new measure to the already presented examples of document images would yield P L = 5.69 for DiegoPavan-C and P L = 84.82 for dSLR-C, while the best algorithm, according to the proposed quality measure, Yasin-R, yields P L = 90.22 . The corresponding image is presented in Figure 2d, and it has a better overall visual quality and OCR transcription rate, although the dSLR algorithm is an order of magnitude faster than the other two algorithms.

2.4. TIFF Group 4 Compression Rate ( C R G 4 )

This work also assessed the size of the monochromatic image files compressed using the Tag Image File Format Group 4 (TIFF_G4) with Run-length encoding (RLE), a new quality measure for monochromatic images recently introduced in [16]. Such a compression scheme is part of the Facsimile (FAX) recommendation and was implemented in most FAX systems at a time when transmitting resources were scarce. The TIFF_G4 file format is possibly the most efficient lossless compression scheme for binary images [5]. One central part of such an algorithm is to apply run-length encoding [15]. Thus, the less salt-and-pepper noise present in the binary image, the longer the sequences of the same color bits, yielding a smaller TIFF_G4 file, which claims for less bandwidth for network transmission and less storage space for archiving. The compression rate is denoted by C R G 4 and is calculated by:
C R G 4 = 100 × S G 4 S P N G ,
where S G 4 denotes the size of the compressed TIFF G4 file and S P N G is the size of the Portable Network Graphics (PNG) compressed file with compression level 4. It is important to remark that such a measure should be used not as an isolated quality measure, but only to re-rank the algorithms with the best P L , as it provides a secondary fine-grained quality measure.

2.5. Processing Time Evaluation

The viability of using a binarization algorithm in a document processing pipeline depends not only on the quality of the final image, but also on the processing time elapsed by the algorithm and the maximum amount of memory claimed during the process. To the best knowledge of the authors, the first assessment of binarization algorithms to take the average processing time into account was [9]. The assessed algorithms were implemented by their authors using several programming languages and operating systems, running in different platforms, thus the processing time figures presented here provide the order of magnitude of the time elapsed for binarizing the whole dataset. The training times for the AI-based algorithms were not computed. Two processing devices were used:
  • Device 1 (CPU algorithms): Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz, with 32 GB RAM and a GPU GeForce GTX 1650 4 GB.
  • Device 2 (GPU algorithms): Intel(R) Core(TM) i9-9900K CPU @ 3.60 GHz, with 64 GB RAM and a GPU NVIDIA GeForce RTX 2080 Ti 12 GB.
The algorithms were implemented using two operating systems and different programming languages for specific hardware platforms such as GPUs:
  • Device 1, Windows 10 (version 1909), Matlab (version 9.4): Akbari_1, Akbari_2, Akbari_3, CLD, CNW, ElisaTV, Ergina-Global, Ergina-Local, Gattal, Ghosh, HBUT, Howe, iNICK, Jia-Shi, Lu-Su, Michalak, MO1, MO2, MO3, Yasin;
  • Device 1, Linux Pop!_OS (version 20.10): Bataineh, Bernsen, Bradley, Calvo-Zaragoza, daSilva-Lins-Rocha, DiegoPavan, Huang, Intermodes, ISauvola, IsoData, Johannsen-Bille, Kapur-SW, Li-Tam, Mean, Mello-Lins, MinError, Minimum, Moments, Niblack, Nick, Otsu, Percentile, Pun, RenyEntropy, Sauvola, Shanbhag, Singh, Su-Lu, Triangle, Vahid22, WAN, Wolf, Wu-Lu, Yen, YinYang, YinYang21, YinYang22;
  • Device 2, Linux Pop!_OS (version 22.04): DE-GAN, DeepOtsu, DilatedUNet, Doc-DLinkNet, Doc-UNet, DPLinkNet, HuangBCD, HuangUnet, Robin, Vahid, Yuleny.
The algorithms were executed on different operating systems (OS), but on the same hardware. For those that could be executed on both OS types, the processing times for each OS was measured and no significant difference was noticed. This is expected based on previous experimentation [11]. The mean processing time was used in the analysis. As already mentioned, the primary purpose is to provide the order of magnitude time of the processing time elapsed.

2.6. Quality, Space and Time Evaluation

For each of the six devices studied, this paper assesses the performance of the 340 binarization schemes listed applied to photographed documents, with the strobe flash on and off, in two different ways:
  • Best quality-time and compression: applies the ranking by summation, followed by sorting by processing time, but clustering by device and observing the compression rate for the top-rated algorithms.
  • Image-specific best quality-time: makes use of P L and [ L d i s t ] . The ranking is performed by first sorting according to the quality measure and when the quality results are the same, sorted by processing time. This is illustrated in Figure 3.
The ranking summation applied to binarization was first applied on the series of competitions Document Image Binarization Competition (DIBCO) [72] and has been then used in many subsequent competitions and assessments [9]. In Figure 4 a visual description of this criterion is presented. First, the algorithms are ranked in the context of each image individually, then the ranking position is summed up across the images, composing the score for each algorithm. The final ranking is determined by sorting the algorithms by the score, and the global mean of all images is presented to provide a quantitative overall ordering.
Sorting directly by the mean of the quality measure gives less precise results, as one seeks here the algorithm that most frequently appears at the top of the ranking, which not necessarily means that it is the best quality all the time. In the example of Figure 4, if one would sort by the [ L d i s t ] mean alone, the Li-Tam algorithm would be the top-ranked, as for Image 2 its [ L d i s t ] is higher than most of the other algorithms, raising its mean value. However, it only appears as the top algorithm for that single image. For most images, Moments is better ranked, indicating that for any given image in such a data set, Moments may provide better results.
The simple mean sorting method is applicable to the first way of assessing the algorithms, as the aggregated images have very similar features (capturing device and print type). As for the second way, the different printing types are aggregated to give an overall result for each device, increasing the variability and making the ranking summation more appropriate.

3. Choosing the Best Channel

The recent paper [17] showed that there may be a quality difference in feeding a binarization algorithm with the original color image, its grayscale equivalent (using the luminance formula), or the red, green or blue channel. That fact is important, as having one of the input channels as the best-quality result would save processing space and, consequently, processing time, while the grayscale image demands extra processing time, which may be significant for the faster algorithms. Ideally, one would analyze the best channel for each different type of image; however, for the sake of simplicity, in this study, only the input channel which provided the best P L summation ranking was chosen for each algorithm. In several cases, there was a nearly equal quality result between the red or blue channels and the color image. In some other cases, providing a single channel actually increased the final quality and the channel that more often provided better quality was the red channel. Thus, whenever an algorithm yields similar quality results having the full color image and one of the channels as input, the red channel is chosen, as that often means less processing time and space.
Six of the best-ranked algorithms are presented in Table 3 with their respective average P L and the score of the ranking summation, stressing that the lower the score, the better the algorithm. The algorithm by Singh was one of the few that the blue channel offered better results. Among the best algorithms, Sauvola was the one with the greatest difference between applying a single channel or the original color image.

4. Results

For each device model, with the in-built strobe-flash on and off, the binarization algorithms were evaluated in two contexts: clustering by the specific image characteristics; and aggregating the whole dataset (global evaluation). In all results, the letter after the original algorithm indicates the version of the image used: R—red; G—green; B—blue; L—luminance; C—original color image. The mean processing time was taken to evaluate the order of magnitude of the time complexity of the algorithms, thus minor time differences are not relevant to this study. The grayscale conversion time was not considered here.
Table 4 presents the results for each device using the ranking summation strategy. YinYang22 and Michalak21a are often among the top 5 for any of the tested devices. For Samsung Note 10+, only HuangUNet presented significant improvement using a single channel other than red. For Samsung S21 Ultra 5G, ElisaTV presented good results compared to recent efficient ones such as YinYang22. For Motorola G9, Michalak21a would be recommended either with flash on or off, due to high quality and low processing time. For Samsung A10S, Michalak21a would also be the one recommended. For Samsung S20, even the most classical algorithm (Ostu) could properly binarize photos taken with flash on. It is important to notice that Dataset 2 has less complex images than Dataset 1. For Apple iPhone SE 2 and flash on, which also used Dataset 2, Otsu again appeared as recommended.
The detailed results for each device are presented in Table 5, Table 6, Table 7 and Table 8. The quality-time criteria was used (Figure 3), as the variation in image characteristics is lower, and thus the standard variation is small enough to allow a fair assessment. It is important to remark that the standard deviation (SD) of the [ L d i s t ] for the Laser and Deskjet dataset was, for all the top 5 and nearly all the other algorithms, approximately 0.04, and for book dataset it was of 0.01, being in some cases close to zero. Only for devices Samsung S21 Ultra 5G and Samsung Note 10+ there was a more significant variation, with the standard deviation varying from 0.1 to 0.3. Those results demonstrate that the top five algorithms for all test datasets provide excellent binarization results for OCR in general.
The P L standard variation was higher due to a higher variation of the P e r r measure, which is part of it. For all devices, the SD of the Deskjet and Laser dataset was approximately 4.00, while for book dataset, it was under 1 for the devices Motorola G9, Samsung S20, Samsung A10S and between 1 and 3 for devices Samsung Note 10+, Apple iPhone SE 2, Samsung S21 Ultra 5G. The overall quality perceived by visually inspecting the resulting images produced by the top-ranked algorithms is good.
In order to choose the most suitable algorithm for some specific application, the first thing to consider is the intrinsic characteristics of the printing, as different types of ink and printing methods imply entirely different recommendations, as shown in the tables of results. If the document was printed with a deskjet device, it is recommended to check whether the strobe flash should be on or off prior to the image acquisition. After that, the binarization algorithm with the best quality-time balance must be applied. If an application has no significant time constraint, but the quality is so crucial that even a small amount of lost information is not acceptable, one should choose the top quality-time. However, if the image binarization is part of an embedded application, its processing time is a crucial factor, thus the best quality-time trade-off must be chosen.
Two quality measures were used to support the decision of two types of applications: OCR transcription and printing, archiving or transmission through computer networks. For the first application (OCR transcription), the [ L d i s t ] measure should be used, as it does not take into account the visual quality, but only the OCR precision, giving the algorithms with the best chance to provide the best transcription possible. For the second application, the visual quality is also important, thus the P L measure is used, which allows the choice of the best algorithm for OCR transcription and, at the same time, for printing or transmitting.
In general, keeping the strobe flash on or off does not imply any significant difference in the quality of the best-ranked algorithms; however, in most cases, the set of recommended algorithms varies across the devices. For instance, using Samsung S21 Ultra 5G, the algorithms recommended for deskjet printed documents are similar if one keeps the flash on or off, but they are completely different for book offset-printed documents. The same happens for most other devices, either using the [ L d i s t ] or the P L measure when comparing different setups. This fact highlights the importance of considering as many more algorithms as possible, as in some cases, one algorithm that offers excellent results with one configuration may have totally different results with a different set of capturing conditions, devices and setup.
In the results table for [ L d i s t ] measure, the first red line represents the performance of applying the original color image directly on Google Vision OCR without prior binarization. In most cases, the results are equivalent to the performance of providing a binary image. However, for the Motorola G9 and Apple iPhone SE 2, no OCR output is given for most of the captured images. The standard deviation in all cases was nearly zero, which means there were almost no results for the images. This shows that general-purpose OCR engines can be greatly improved when provided with a clean binary image.
In several cases, the recommended algorithms for OCR ( [ L d i s t ] ) match the recommendations using the P L measure with the same input channel or a different one. For instance, using Wolf-R to binarize laser documents with flash off captured by the Samsung S21 Ultra 5G yields not only excellent OCR results, but also good visual quality images. If one checks the example binary image using that algorithm at Figure 5b, it is possible to observe how well this algorithm went, generating a clear binary image with nearly no noise.
It is remarkable how classical global algorithms such as Otsu, dSLR and WAN were quality-time top-ranked, but only when using the in-built strobe flash on. This happened because the flash was sufficient to diminish the shadows and allow those global algorithms to work well and highlights that very simple and fast algorithms can still be used for uniform images, even if photographed in different places and by different smartphones.
Figure 5 and Figure 6 present some example images. For each input color image, one of the most recommended algorithms is used, according to the global ranking of Table 4. The cropped portion of the image shows the critical regions where shadows and the flash light reflex can be noticed. For nearly all images, an almost perfect binary image was generated. Only in Figure 5c it is possible to see some noise due to the strong flash light reflected on the printed laser page. The laser printing process creates a surface that reflects more light than other types of printing, thus even on the color image, some pixels inside the text stroke are very close to the background ones, making it almost impossible to generate a perfect binary image. No algorithm tested here did better than that, which highlights a possible problem to be solved by future proposals.

5. Conclusions

Document binarization is a key step in many document processing pipelines, demanding for quality and time performance. This paper analyses the performance of 68 binarization algorithms in images acquired using six different models of smartphones from three different manufacturers, widely used today. The quality, size and processing time of the binarization algorithms are assessed. A novel quality measure is proposed that combines the Levenshtein distance with the overall visual quality of the binary image. The mean compression rate of the TIFF G4 file with RLE compression was also analyzed; it also provides a quality analysis as the quantity of salt-and-pepper noise in the final image degrades file compression performance.
The results were presented through two perspectives: a detailed evaluation considering the device, the in-built strobe flash state (on or off), and the printing technology (deskjet, laser, or offset); a device-based evaluation considering the visual quality and compressed binary image file size.
Several conclusions may be drawn from the presented results:
  • Keeping the strobe flash on or off may not imply in a better quality image, but one needs to make the right choice of the binarization algorithm in order to have the best monochromatic image.
  • The ranking order is nearly completely different through all the different possible setups, thus it reinforces the claim that no binarization algorithm is good for all document images.
  • The quality of the images yielded by the top-rated algorithms with the offset-printed documents (book) dataset is almost perfect if considering the OCR transcriptions precision.
  • In several cases, as for Apple iPhone SE 2, some global algorithms had the best performance. They are much faster than the newer algorithms and, in some rare cases, even generate cleaner images (better P L ).
  • Even when not in the top rank, newer algorithms such as Michalak or YinYang algorithms and their variants are dominant in the results. It is important to stress that they were developed having as target photographed documents, while most of the other algorithms, overall the global ones, were developed aiming at the scanned document images.
  • If the compression rate is a priority, YinYang22, with any of the input versions of the image, would be the most recommended algorithm overall, as it offers the best compression rates while maintaining high quality.
  • If processing time is a priority, Michalak21a with the red channel would be the most recommended algorithm overall, as it requires a small processing time, comparable to one of the classical algorithms, while providing high-quality binary images.
  • This paper also shows that the PL measure provides a better overall quality evaluation of binarization algorithms.
  • Analyzing the TIFF G4 compression rate with RLE has also proved valuable, as, on several occasions, two algorithms provided similar quality results, but one may be two times more efficient in this compression scheme.
  • None of the tested algorithms could perfectly binarize the regions of the laser-printed documents in which the strobe flash (whenever on) created a strong noise in the central region of the image, which suggests that such a set-up should be avoided when photographing laser printed documents.
The recent paper [18] changes the outlook from the document to the device, in such a way that if one had to in-built one binarization algorithm in an embedded application handling document images, which would that be? That algorithm would have to be light and fast enough to yield good quality-space-time performance. Following that approach and looking at Table 4, one could recommend the following algorithms for each device:
Samsung Note 10+: 
YinYang22-R, Yasin-R, Michalak-R or HuangUNet-B.
Samsung S21 Ultra 5G: 
ElisaTV-R, YinYang22-R, Michalak21a-R or Singh-B.
Motorola G9: 
Michalak21a-R, Michalak-R, YinYang-R, ElisaTV-R.
Samsung A10S: 
Michalak21a-R, YinYang22-R, Wolf-R, Singh-B.
Samsung S20: 
Michalak21c-R, Michalak-R, YinYang22-R, YinYang-R.
Apple iPhone SE 2: 
Yasin-R, YinYang22-R, YinYang21-R, Singh-B.
No doubt the list above may suffer variations as visual inspection carries some degree of subjectivity amongst time performances of around the same order of magnitude.
The authors of this paper recently became aware of the reference [73], in which the authors look at the impact on binarization of the color-to-gray conversion algorithms. Besides the binarization performance of the color-to-gray CIE Y (International Commission on Illumination luminance channel) conversion algorithm (assessed here), reference [73] looks at five other algorithms. It proposes two new schemes focusing on the quality of the final monochromatic image and makes a global assessment of scanned documents. The analysis of the performance of such color-to-gray conversion algorithms on photographed documents is left for further work.
Another important point also left as line for further work is setting in-built the strobe flash in auto mode, which means that the device itself will decide, depending overall on the quantity of light in the environment, if the flash will be activated or not.

Author Contributions

Conceptualization: R.D.L.; methodology: R.D.L.; writing—original draft preparation: all authors; writing—review and editing: all authors; funding acquisition: R.B. and R.D.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported in this paper was partly sponsored by The MEC Essay Project—Automatically Assessing Handwritten Essays in Portuguese from the Ministry of Education of Brazil and the RD&I project Callidus Academy signed between the Universidade do Estado do Amazonas (UEA) and Callidus Indústria through the Lei de Informática/SUFRAMA. Rafael Dueire Lins was also partly sponsored by CNPq—Brazil.

Data Availability Statement

The results presented here made use of the IAPR (International Association on Pattern Recognition) DIB—Document Image Binarization dataset, available at: https://dib.cin.ufpe.br, accessed on 17 January 2023.

Acknowledgments

The authors are grateful to all researchers who made the code for their binarization algorithms available.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Doermann, D.; Liang, J.; Li, H. Progress in Camera-Based Document Image Analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, Edinburgh, UK, 6 August 2003; Volume 1, pp. 606–616. [Google Scholar] [CrossRef]
  2. Silva, A.R.G.; Lins, R.D. Background Removal of Document Images Acquired Using Portable Digital Cameras. In Image Analysis and Recognition; Springer: Berlin/ Heidelberg, Germany, 2005; Volume 3656, pp. 278–285. [Google Scholar] [CrossRef]
  3. Lins, R.D.; Silva, G.E.; Gomes e Silva, A.R. Assessing and Improving the Quality of Document Images Acquired with Portable Digital Cameras. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 569–573. [Google Scholar] [CrossRef]
  4. Silva, G.P.; Lins, R.D. PhotoDoc: A Toolbox for Processing Document Images Acquired Using Portable Digital Cameras. In Proceedings of the CBDAR 2007, Curitiba, Brazil, 22 September 2007; pp. 107–114. [Google Scholar]
  5. Lins, R.D.; Avila, B.T. A New Algorithm for Skew Detection in Images of Documents. Int. Conf. Image Anal. Recognit. 2004, 3212, 234–240. [Google Scholar] [CrossRef]
  6. Godse, S.P.; Nimbhore, S.; Shitole, S.; Katke, D.; Kasar, P. Recovery of Badly Degraded Document Images Using Binarization Technique. Int. J. Sci. Res. Publ. 2014, 4, 433–438. [Google Scholar]
  7. Lins, R.D. A Taxonomy for Noise in Images of Paper Documents—The Physical Noises. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Heidelberg/Berlin, Germany, 2009; Volume 5627, pp. 844–854. [Google Scholar] [CrossRef]
  8. Dueire Lins, R.; Guimarães Neto, M.; França Neto, L.; Galdino Rosa, L. An environment for processing images of historical documents. Microprocess. Microprogram. 1994, 40, 939–942. [Google Scholar] [CrossRef]
  9. Lins, R.D.; Kavallieratou, E.; Smith, E.B.; Bernardino, R.B.; de Jesus, D.M. ICDAR 2019 Time-Quality Binarization Competition. In Proceedings of the 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 1539–1546. [Google Scholar] [CrossRef]
  10. Lins, R.D.; Bernardino, R.B.; Smith, E.B.; Kavallieratou, E. ICDAR 2021 Competition on Time-Quality Document Image Binarization. In Proceedings of the ICDAR 2021 Competition on Time-Quality Document Image Binarization, Lausanne, Switzerland, 5–10 September 2021; pp. 708–722. [Google Scholar] [CrossRef]
  11. Lins, R.D.; Simske, S.J.; Bernardino, R.B. DocEng’2020 Time-Quality Competition on Binarizing Photographed Documents. In Proceedings of the ACM Symposium on Document Engineering, DocEng, San Jose, CA, USA, 1 October 2020; ACM: New York, NY, USA, 2020; pp. 1–4. [Google Scholar] [CrossRef]
  12. Lins, R.D.; Simske, S.J.; Bernardino, R.B. Binarisation of Photographed Documents Image Quality and Processing Time Assessment. In Proceedings of the 21st ACM Symposium on Document Engineering, Limerick, Ireland, 24–27 August 2021; Volume 1, pp. 1–6. [Google Scholar] [CrossRef]
  13. Mello, C.A.B.; Lins, R.D. Image Segmentation of Historical Documents. Visual 2000, 2000, 30. [Google Scholar]
  14. Mello, C.A.B.; Lins, R.D. Generation of Images of Historical Documents by Composition. In Proceedings of the 2002 ACM Symposium on Document Engineering, McLean, VA, USA, 8–9 November 2022; pp. 127–133. [Google Scholar] [CrossRef]
  15. Robinson, A.; Cherry, C. Results of a Prototype Television Bandwidth Compression Scheme. Proc. IEEE 1967, 55, 356–364. [Google Scholar] [CrossRef]
  16. Lins, R.D.; Bernardino, R.B.; Barboza, R.d.S.; Simske, S.J. Binarization of Photographed Documents Image Quality, Processing Time and Size Assessment. In Proceedings of the 22nd ACM Symposium on Document Engineering, San Jose, CA, USA, 20–23 September 2022; pp. 1–10. [Google Scholar] [CrossRef]
  17. Lins, R.D.; Bernardino, R.B.; da Silva Barboza, R.; Lins, Z.D. Direct Binarization a Quality-and-Time Efficient Binarization Strategy. In Proceedings of the 21st ACM Symposium on Document Engineering; ACM: New York, NY, USA, 2021; Volume 1, pp. 1–4. [Google Scholar] [CrossRef]
  18. Lins, R.D.; Bernardino, R.B.; Barboza, R.; Oliveira, R. The Winner Takes It All: Choosing the “Best” Binarization Algorithm for Photographed Documents. In Document Analysis Systems; Springer International Publishing: La Rochelle, France, 2022; Volume 13237, pp. 48–64. [Google Scholar] [CrossRef]
  19. Lins, R.D.; Simske, S.J.; Bernardino, R.B. DocEng’2021 Time-Quality Competition on Binarizing Photographed Documents. In Proceedings of the ACM Symposium on Document Engineering, Limerick, Ireland, 24–27 August 2021. [Google Scholar]
  20. Doyle, W. Operations Useful for Similarity-Invariant Pattern Recognition. J. ACM 1962, 9, 259–267. [Google Scholar] [CrossRef]
  21. Zack, G.W.; Rogers, W.E.; Latt, S.A. Automatic Measurement of Sister Chromatid Exchange Frequency. J. Histochem. Cytochem. 1977, 25, 741–753. [Google Scholar] [CrossRef] [PubMed]
  22. Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
  23. Velasco, F.R. Thresholding Using the Isodata Clustering Algorithm; Technical Report; University of Maryland: College Park, MD, USA, 1979. [Google Scholar] [CrossRef]
  24. Pun, T. Entropic Thresholding, a New Approach. Comput. Graph. Image Process. 1981, 16, 210–239. [Google Scholar] [CrossRef] [Green Version]
  25. Johannsen, G.; Bille, J. A Threshold Selection Method Using Information Measures. In Proceedings of the International Conference on Pattern Recognition, Munich, Germany, 19–22 October 1982; pp. 140–143. [Google Scholar]
  26. Kapur, J.; Sahoo, P.; Wong, A. A New Method for Gray-Level Picture Thresholding Using the Entropy of the Histogram. Comput. Vision Graph. Image Process. 1985, 29, 140. [Google Scholar] [CrossRef]
  27. Tsai, W.H. Moment-Preserving Thresolding: A New Approach. Comput. Vision Graph. Image Process. 1985, 29, 377–393. [Google Scholar] [CrossRef]
  28. Niblack, W. An Introduction to Digital Image Processing; Strandberg Publishing Company: Birkeroed, Denmark, 1985; pp. 115–116. [Google Scholar]
  29. Bernsen, J. Dynamic Thresholding of Gray-Level Images. In Proceedings of the International Conference on Pattern Recognition, Paris, France, 27 October 1986; pp. 1251–1255. [Google Scholar]
  30. Kittler, J.; Illingworth, J. Minimum Error Thresholding. Pattern Recognit. 1986, 19, 41–47. [Google Scholar] [CrossRef]
  31. Glasbey, C. An Analysis of Histogram-Based Thresholding Algorithms. Graph. Model. Image Process. 1993, 55, 532–537. [Google Scholar] [CrossRef]
  32. Shanbhag, A.G. Utilization of Information Measure as a Means of Image Thresholding. CVGIP Graph. Model. Image Process. 1994, 56, 414–419. [Google Scholar] [CrossRef]
  33. Huang, L.K.; Wang, M.J.J. Image Thresholding by Minimizing the Measures of Fuzziness. Pattern Recognit. 1995, 28, 41–51. [Google Scholar] [CrossRef]
  34. Yen, J.C.; Chang, F.J.C.S.; Yen, J.C.; Chang, F.J.; Chang, S. A New Criterion for Automatic Multilevel Thresholding. IEEE Trans. Image Process. 1995, 4, 370–378. [Google Scholar] [CrossRef] [PubMed]
  35. Sahoo, P.; Wilkins, C.; Yeager, J. Threshold Selection Using Renyi’s Entropy. Pattern Recognit. 1997, 30, 71–84. [Google Scholar] [CrossRef]
  36. Sauvola, J.; Seppanen, T.; Haapakoski, S.; Pietikainen, M. Adaptive Document Binarization. In Proceedings of the Fourth International Conference on Document Analysis and Recognition, Ulm, Germany, 18–20 August 1997; Volume 1, pp. 147–152. [Google Scholar] [CrossRef]
  37. Li, C.; Tam, P. An Iterative Algorithm for Minimum Cross Entropy Thresholding. Pattern Recognit. Lett. 1998, 19, 771–776. [Google Scholar] [CrossRef]
  38. Lu, W.; Songde, M.; Lu, H. An Effective Entropic Thresholding for Ultrasonic Images. In Proceedings of the Fourteenth International Conference on Pattern Recognition, Brisbane, Australia, 20–20 August 1998; Volume 2, pp. 1552–1554. [Google Scholar]
  39. Wolf, C.; Doermann, D. Binarization of Low Quality Text Using a Markov Random Field Model. In Proceedings of the 2002 International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; Volume 3, pp. 160–163. [Google Scholar] [CrossRef]
  40. Hadjadj, Z.; Meziane, A.; Cherfa, Y.; Cheriet, M.; Setitra, I. ISauvola: Improved Sauvola’s Algorithm for Document Image Binarization; Springer: Berlin/Heidelberg, Germany, 2016; Volume 3212, pp. 737–745. [Google Scholar] [CrossRef]
  41. Kavallieratou, E. A Binarization Algorithm Specialized on Document Images and Photos. ICDAR 2005, 2005, 463–467. [Google Scholar] [CrossRef]
  42. Kavallieratou, E.; Stathis, S. Adaptive Binarization of Historical Document Images. Proc. Int. Conf. Pattern Recognit. 2006, 3, 742–745. [Google Scholar] [CrossRef]
  43. Prewitt, J.M.S.; Mendelsohn, M.L. The Analysis of Cell Images. Ann. N. Y. Acad. Sci. 2006, 128, 1035–1053. [Google Scholar] [CrossRef]
  44. Silva, J.M.M.; Lins, R.D.; Rocha, V.C. Binarizing and Filtering Historical Documents with Back-to-Front Interference. In Proceedings of the 2006 ACM Symposium on Applied Computing, Dijon, France, 23–27 April 2006; pp. 853–858. [Google Scholar] [CrossRef]
  45. Bradley, D.; Roth, G. Adaptive Thresholding Using the Integral Image. J. Graph. Tools 2007, 12, 13–21. [Google Scholar] [CrossRef]
  46. Khurshid, K.; Siddiqi, I.; Faure, C.; Vincent, N. Comparison of Niblack Inspired Binarization Methods for Ancient Documents. In Proceedings of the SPIE 7247, Document Recognition and Retrieval XVI, San Jose, CA, USA, 19 January 2009; p. 72470U. [Google Scholar] [CrossRef]
  47. Barney Smith, E.H.; Likforman-Sulem, L.; Darbon, J. Effect of Pre-Processing on Binarization. In Proceedings of the SPIE, Document Recognition and Retrieval XVII, San Jose, CA, USA, 18 January 2010; Volume 7534, p. 75340H. [Google Scholar] [CrossRef]
  48. Lu, S.; Su, B.; Tan, C.L. Document Image Binarization Using Background Estimation and Stroke Edges. Int. J. Doc. Anal. Recognit. (IJDAR) 2010, 13, 303–314. [Google Scholar] [CrossRef]
  49. Bataineh, B.; Abdullah, S.N.H.S.; Omar, K. An Adaptive Local Binarization Method for Document Images Based on a Novel Thresholding Method and Dynamic Windows. Pattern Recognit. Lett. 2011, 32, 1805–1813. [Google Scholar] [CrossRef]
  50. Singh, T.R.; Roy, S.; Singh, O.I.; Sinam, T.; Singh, K.M. A New Local Adaptive Thresholding Technique in Binarization. IJCSI Int. J. Comput. Sci. Issues 2011, 8, 271–277. [Google Scholar] [CrossRef]
  51. Howe, N.R. Document Binarization with Automatic Parameter Tuning. Int. J. Doc. Anal. Recognit. (IJDAR) 2013, 16, 247–258. [Google Scholar] [CrossRef]
  52. Su, B.; Lu, S.; Tan, C.L. Robust Document Image Binarization Technique for Degraded Document Images. IEEE Trans. Image Process. 2013, 22, 1408–1417. [Google Scholar] [CrossRef]
  53. Saddami, K.; Munadi, K.; Muchallil, S.; Arnia, F. Improved Thresholding Method for Enhancing Jawi Binarization Performance. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 1108–1113. [Google Scholar] [CrossRef]
  54. Saddami, K.; Afrah, P.; Mutiawani, V.; Arnia, F. A New Adaptive Thresholding Technique for Binarizing Ancient Document. In Proceedings of the 2018 Indonesian Association for Pattern Recognition International Conference (INAPR), Jakarta, Indonesia, 7–8 September 2018; pp. 57–61. [Google Scholar] [CrossRef]
  55. Zhou, L.; Zhang, C.; Wu, M. D-Linknet: Linknet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef]
  56. Gattal, A.; Abbas, F.; Laouar, M.R. Automatic Parameter Tuning of K-Means Algorithm for Document Binarization. In Proceedings of the 7th International Conference on Software Engineering and New Technologies—ICSENT, Hammamet, Tunisie, 26–28 December 2018; ACM Press: New York, NY, USA, 2018; pp. 1–4. [Google Scholar] [CrossRef]
  57. Jia, F.; Shi, C.; He, K.; Wang, C.; Xiao, B. Degraded Document Image Binarization Using Structural Symmetry of Strokes. Pattern Recognit. 2018, 74, 225–240. [Google Scholar] [CrossRef]
  58. Mustafa, W.A.; Abdul Kader, M.M.M. Binarization of Document Image Using Optimum Threshold Modification. J. Phys. Conf. Ser. 2018, 1019, 012022. [Google Scholar] [CrossRef] [Green Version]
  59. Akbari, Y.; Britto, A.S., Jr.; Al-Maadeed, S.; Oliveira, L.S. Binarization of Degraded Document Images Using Convolutional Neural Networks Based on Predicted Two-Channel Images. In Proceedings of the International Conference on Document Analysis and Recognition, Sydney, NSW, Australia, 20–25 September 2019. [Google Scholar]
  60. Saddami, K.; Munadi, K.; Away, Y.; Arnia, F. Effective and Fast Binarization Method for Combined Degradation on Ancient Documents. Heliyon 2019, 5, e02613. [Google Scholar] [CrossRef]
  61. Calvo-Zaragoza, J.; Gallego, A.J. A Selectional Auto-Encoder Approach for Document Image Binarization. Pattern Recognit. 2019, 86, 37–47. [Google Scholar] [CrossRef]
  62. He, S.; Schomaker, L. DeepOtsu: Document Enhancement and Binarization Using Iterative Deep Learning. Pattern Recognit. 2019, 91, 379–390. [Google Scholar] [CrossRef]
  63. Michalak, H.; Okarma, K. Fast Binarization of Unevenly Illuminated Document Images Based on Background Estimation for Optical Character Recognition Purposes. J. Univers. Comput. Sci. 2019, 25, 627–646. [Google Scholar]
  64. Michalak, H.; Okarma, K. Improvement of Image Binarization Methods Using Image Preprocessing with Local Entropy Filtering for Alphanumerical Character Recognition Purposes. Entropy 2019, 21, 562. [Google Scholar] [CrossRef]
  65. Michalak, H.; Okarma, K. Adaptive Image Binarization Based on Multi-layered Stack of Regions. In Proceedings of the Computer Analysis of Images and Patterns, Salerno, Italy, 3–5 September 2019; Springer International Publishing: Cham, Switzerland, 2019; Volume 11679, pp. 281–293. [Google Scholar]
  66. Souibgui, M.A.; Kessentini, Y. DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1180–1191. [Google Scholar] [CrossRef]
  67. Bera, S.K.; Ghosh, S.; Bhowmik, S.; Sarkar, R.; Nasipuri, M. A Non-Parametric Binarization Method Based on Ensemble of Clustering Algorithms. Multimed. Tools Appl. 2021, 80, 7653–7673. [Google Scholar] [CrossRef]
  68. Xiong, W.; Zhou, L.; Yue, L.; Li, L.; Wang, S. An Enhanced Binarization Framework for Degraded Historical Document Images. EURASIP J. Image Video Process. 2021, 2021, 13. [Google Scholar] [CrossRef]
  69. Xiong, W.; Yue, L.; Zhou, L.; Wei, L.; Li, M. FD-Net: A Fully Dilated Convolutional Network for Historical Document Image Binarization. In Pattern Recognition and Computer Vision; Ma, H., Wang, L., Zhang, C., Wu, F., Tan, T., Wang, Y., Lai, J., Zhao, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2021; Volume 13019, pp. 518–529. [Google Scholar] [CrossRef]
  70. Lins, R.D.; Bernardino, R.B.; de Jesus, D.M.; Oliveira, J.M. Binarizing Document Images Acquired with Portable Cameras. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 45–50. [Google Scholar] [CrossRef]
  71. Levenshtein, V.I. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Sov. Phys. Dokl. 1966, 10, 707–710. [Google Scholar]
  72. Gatos, B.; Ntirogiannis, K.; Pratikakis, I. ICDAR 2009 Document Image Binarization Contest (DIBCO 2009). In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 1375–1382. [Google Scholar] [CrossRef]
  73. Hedjam, R.; Nafchi, H.Z.; Kalacska, M.; Cheriet, M. Influence of Color-to-Gray Conversion on the Performance of Document Image Binarization: Toward a Novel Optimization Problem. IEEE Trans. Image Process. 2015, 24, 3637–3651. [Google Scholar] [CrossRef]
Figure 1. P e r r measure example (GT: ground-truth, bin: binary).
Figure 1. P e r r measure example (GT: ground-truth, bin: binary).
Jimaging 09 00041 g001
Figure 2. Comparison between different measures: P L , [ L d i s t ] , P e r r . For each case, the full image is shown on the top and an example region bellow, where the red boxes indicates the crop position for the example region. (a) Original image; (b) Ranking by P e r r only, DiegoPavan-C binarized image; (c) Ranking by [ L d i s t ] only, dSLR-C binarized image; (d) Ranking by P L measure, Yasin-R binarized image.
Figure 2. Comparison between different measures: P L , [ L d i s t ] , P e r r . For each case, the full image is shown on the top and an example region bellow, where the red boxes indicates the crop position for the example region. (a) Original image; (b) Ranking by P e r r only, DiegoPavan-C binarized image; (c) Ranking by [ L d i s t ] only, dSLR-C binarized image; (d) Ranking by P L measure, Yasin-R binarized image.
Jimaging 09 00041 g002
Figure 3. Example of ranking by the quality-time criteria. The algorithms are first sorted by quality ( [ L d i s t ] ) and then by time. The red and blue boxes highlight that the first two algorithms have the same quality results and thus are sorted separately from the other four.
Figure 3. Example of ranking by the quality-time criteria. The algorithms are first sorted by quality ( [ L d i s t ] ) and then by time. The red and blue boxes highlight that the first two algorithms have the same quality results and thus are sorted separately from the other four.
Jimaging 09 00041 g003
Figure 4. Example of sorting by the ranking summation criterion. The algorithm marked in red (Moments-R) is the overall best according to this criterion.
Figure 4. Example of sorting by the ranking summation criterion. The algorithm marked in red (Moments-R) is the overall best according to this criterion.
Jimaging 09 00041 g004
Figure 5. Dataset 1 example images. The red boxes indicates the crop region for the zoomed example next to each image. (a) Samsung Note 10+, book offset page, strong natural light, flash off with strong shadow, binarized by HuangUNet-B; (b) Samsung S21, laser printed, artificial light, medium shadow, flash off, binarized by Wolf-R; (c) Same as (b), but with flash on and binarized by YinYang22-R.
Figure 5. Dataset 1 example images. The red boxes indicates the crop region for the zoomed example next to each image. (a) Samsung Note 10+, book offset page, strong natural light, flash off with strong shadow, binarized by HuangUNet-B; (b) Samsung S21, laser printed, artificial light, medium shadow, flash off, binarized by Wolf-R; (c) Same as (b), but with flash on and binarized by YinYang22-R.
Jimaging 09 00041 g005
Figure 6. Dataset 2 example images. The red boxes indicates the crop region for the zoomed example next to each image. (a) Apple iPhone SE 2, book offset page, artificial light, flash off with medium shadow; (b) Samsung S20, deskjet printed, artificial light, medium shadow, flash off; (c) Same as (b), but with flash on, note that on deskjet printed pages no flash reflex interfere on the photo.
Figure 6. Dataset 2 example images. The red boxes indicates the crop region for the zoomed example next to each image. (a) Apple iPhone SE 2, book offset page, artificial light, flash off with medium shadow; (b) Samsung S20, deskjet printed, artificial light, medium shadow, flash off; (c) Same as (b), but with flash on, note that on deskjet printed pages no flash reflex interfere on the photo.
Jimaging 09 00041 g006
Table 1. Summary of devices camera specifications.
Table 1. Summary of devices camera specifications.
Samsung N10+Samsung S21UMoto. G9 PlusSamsung A10Samsung S20iPhone SE2
Megapixels161212131212
ApertureF 1.5–2.4F 1.5F 1.8F 1.9F 1.8F 1.8
Sensor size1/2.55 inch1/1.8 inch1/1.73 inch-1/2.55 inch1/3 inch
Pixel size-1.4 μm1.4 μm-1.4 μm1.4 μm
Release year201920212020202020202020
Camera Count344231
Table 2. Tested binarization algorithms.
Table 2. Tested binarization algorithms.
MethodYearCategoryDescription
Percentile [20]1962Global thresholdBased on partial sums of the histogram levels
Triangle [21]1977Global thresholdBased on most and least frequent gray level
Otsu [22]1979Global thresholdMaximize between-cluster variance of pixel intensity
IsoData [23]1980Global thresholdIsoData clulstering algorithm applied to image histogram
Pun [24]1981Global thresholdDefines an anisotropy coefficient related to the asymmetry of the histogram
Johannsen-Bille [25]1982Global thresholdMinimizes formula based on the image entropy
Kapur-SW [26]1985Global thresholdMaximizes formula based on the image entropy
Moments [27]1985Global thresholdAims to preserve the moment of the input picture
Niblack [28]1985Local thresholdBased on window mean and the standard deviation
Bernsen [29]1986Local thresholdUses local image contrast to choose threshold
MinError [30]1986Global thresholdMinimum error threshold
Mean [31]1993Global thresholdMean of the grayscale levels
Shanbhag [32]1994Global thresholdImproves Kapur-SW by viewing the two pixel classes as fuzzy sets
Huang [33]1995Global thresholdMinimizes the measures of fuzzines
Yen [34]1995Global thresholdMultilevel threshold based on maximum correlation criterion
RenyEntropy [35]1997Global thresholdUses Renyi’s entropy similarly as Kapur-SW method
Sauvola [36]1997Local thresholdImprovement on Niblack
Li-Tam [37]1998Global thresholdMinimum cross entropy
Wu-Lu [38]1998Global thresholdMinimizes the difference between the entropy of the object and the background
Mello-Lins [13]2000Global thresholdUses Shannon Entropy to determine the global threshold. Possibly the first
to properly handle back-to-front interference
Wolf [39]2002Local thresholdImprovement on Sauvola with global normalization
ISauvola [40]2004Local thresholdUses image contrast in combination with Sauvola’s binarization
Ergina-Global [41]2005Global thresholdAverage color value and histogram equalization
Ergina-Local [42]2006Local thresholdDetects where to apply local thresholding after a applying a global one
Intermodes [43]2006Global thresholdSmooth histogram until only two local maxima
Minimum [43]2006Global thresholdVariation of Intermodes algorithm
dSLR [44]2006Global thresholdUses Shannon entropy to find a global threshold
Bradley [45]2007Local thresholdAdaptive thresholding using the integral image of the input
Nick [46]2009Local thresholdAdapts Niblack based on global mean
ElisaTV [47]2010Local thresholdBackground estimation and subtraction
Lu-Su [48]2010Edge basedLocal thresholding near edges after background removal
Bataineh [49]2011Local thresholdBased on local and global statistics
Singh [50]2011Global thresholdUses integral sum image prior to local mean calculation
Howe [51]2013CRF LaplacianUnary term and pairwise Canny-based term
Su-Lu [52]2013Edge basedCanny edges using local contrast
iNICK [53]2017Local thresholdAdaptively sets k in Nick method based on the global standard deviation
CNW [54]2018Local thresholdCombination of Niblack and Wolf’s algorithm
DocDLinkNet [55]2018Deep LearningD-LinkNet architecture with document image patches
Gattal [56]2018ClusteringAutomatic Parameter Tuning of K-Means Algorithm
Jia-Shi [57]2018Edge basedDetecting symmetry of stroke edges
Robin2018Edge basedU-net model trained with several datasets
(https://github.com/masyagin1998/robin, accessed on 17 January 2023)
WAN [58]2018Global thresholdImproves Sauvola’s method by shifting up the threshold
Akbari_1 [59]2019Deep LearningSegnet network architecture fed by multichannel images (wavelet sub bands)
Akbari_2 [59]2019Deep LearningVariation of Akibari_1 with multiple networks
Akbari_3 [59]2019Deep LearningVariation of Akibari_1 where fewer channels are used
CLD [60]2019Local thresholdContrast enhancement followed by adaptive thresholding and artifact removal
Calvo-Zaragoza [61]2019Deep learningFully convolutional Encoder–decoder FCN with residual blocks
DeepOtsu [62]2019Deep LearningNeural networks learn degradations and global Otsu generates binarization map
DocUNet [9]2019Deep LearningHybrid pyramid U-Net convolutional network fed with morphological
bottom-hat transform enhanced document images
Michalak21 a [63]2019Image ProcessingDownsample image to remove low-frequency information and apply Otsu
Michalak21 b [64]2019Image ProcessingEqualize illumination and contrast, apply morphological dilatation
and Bradley’s method
Michalak21 c [65]2019Local thresholdAverage brightness corrected by two parameters to apply local threshold
Michalak [63]2019Image ProcessingDownsample image to remove low-frequency information and apply Otsu
Yasin [9]2019Image ProcessingGradient descent optimization followed by Otsu thresholding
Yuleny [9]2019Shallow MLA XGBoost classifier is trained with features generated from Otsu, Niblack,
Sauvola, Su and Howe algorithms
DiegoPavan [66]2020Deep LearningDownscale image to feed a DE-GAN network
DilatedUNet [11]2020Deep LearningDownsample to smooth image and use a dilated convolutional layer to
correct the feature map spatial resolution
YinYang [11]2020Image ProcessingDetect background with median of small overllaping windows, extract it and
apply Otsu
YinYang21 [11]2020Image ProcessingA faster and more effective version of YinYang algorithm
DE-GAN [66]2020Deep LearningUses a conditional generative adversarial network
Gosh [67]2021ClusteringClustering applied to a superset of foreground estimated by Niblack’s algorithm
HuangBCD [10]2021Deep LearningBCD-Unet based model to binarize and combine image patches
HuangUNet [10]2021Deep LearningUnet based model binarize and combine image patches
Vahid [10]2021Deep LearningA pixel-wise segmentation model based on Resnet50-Unet
HBUT [68]2021Image ProcessingMorphological operations using minimum entropy-based stroke width transform and Laplacian energy-based segmentation
DPLinkNet [69]2021Deep LearningFully dilated convolutional network using atrous convolutions
Vahid22 [16]2022Deep LearningPixel-wise segmentation combining a CNN with a transformer model
YinYang22 [16]2022Image ProcessingUses maximum color occurrence to detect and subtract background, then normalize and apply Otsu
Table 3. Example of the choice of a channel with some of the best algorithms.
Table 3. Example of the choice of a channel with some of the best algorithms.
TeamBest
Channel
Best ChannelColor ImageLuminance
ScoreMean PL ScoreMean PL ScoreMean PL
Michalak21aRed63296.1081796.1172796.16
YinYang22Red64993.0382593.4268793.42
SinghBlue65896.1484695.4269494.98
WolfRed63594.5384493.0968795.07
SauvolaRed64493.3789790.3765093.03
Table 4. Overall results by capturing device sorted according to the ranking summation criterion.
Table 4. Overall results by capturing device sorted according to the ranking summation criterion.
FLASH OFFFLASH ON
RankAlgorithmScore PL CR G 4 Time (s)AlgorithmScore PL CR G 4 Time (s)
Dataset 1
Samsung Note 10+
1HuangUNet-B24596.4675.22%58.67YinYang22-R26196.4379.99%5.85
2YinYang22-R26396.2580.25%6.50HuangUNet-B26696.3774.79%58.05
3Yasin-R26396.1865.60%1.90ElisaTV-R31595.7947.36%8.82
4iNICK-R26696.1149.26%3.46HuangBCD-R32196.0474.88%249.90
5Michalak-R28396.2249.17%0.06Yasin-R32995.6564.91%1.76
Samsung S21 Ultra 5G
1ElisaTV-R23596.3047.81%10.38YinYang22-R27391.3680.20%5.54
2YinYang22-R24396.1380.05%6.36Michalak21a-R27695.9848.40%0.04
3Yasin-R26595.9565.02%1.78Singh-B28595.4576.03%0.34
4Michalak21a-R26991.5148.02%0.05Nick-R28695.2676.07%0.16
5Singh-B28994.3475.68%0.32ElisaTV-R31095.7448.06%10.07
Dataset 2
Motorola G9
1Michalak21a-R21896.9247.51%0.05Gattal-R13897.2363.09%53.09
2ElisaTV-R23096.7545.83%12.47Michalak21a-R15097.2647.83%0.05
3Michalak-R23096.8847.51%0.05YinYang-R16497.2378.48%1.81
4YinYang21-R23196.8369.14%1.71ElisaTV-R18197.1847.18%12.21
5Michalak21c-R23196.9046.71%1.48YinYang21-R21497.1269.33%1.64
Samsung A10S
1YinYang22-R23297.0880.84%4.63Wolf-R14097.2475.19%0.16
2Michalak21a-R24797.0344.06%0.03Singh-B14797.2375.19%0.24
3Michalak-R24897.0144.13%0.03Yasin-R14997.2662.78%1.30
4Michalak21c-R26596.9944.07%0.84Michalak21a-R15597.1744.03%0.03
5YinYang21-R28296.8566.65%1.08Nick-R17497.2175.11%0.11
SamsungS20
1Michalak21c-R19997.0047.97%1.09Gattal-R17097.2063.78%52.14
2Michalak-R21696.8648.16%0.04Otsu-R18997.1175.93%0.02
3Michalak21a-R23096.8848.13%0.04YinYang-R21097.0877.29%1.42
4Bradley-R25196.8276.34%0.29YinYang22-R22697.1381.39%5.07
5YinYang-R26696.8278.03%1.45Li-Tam-R24697.0475.89%0.12
Apple iPhone SE 2
1Yasin-R15695.4463.18%1.59Otsu-R19297.0375.11%0.01
2Sauvola-R16296.9375.49%0.14YinYang22-R21196.9481.19%5.29
3Singh-B16396.9475.47%0.23Yasin-R22996.8962.80%1.40
4YinYang22-R16796.8781.32%5.51YinYang21-R23596.8867.15%1.14
5Nick-R17396.9075.46%0.14Gattal-R23596.8862.28%51.36
Table 5. Summary of results with PL measure and flash OFF sorted according to the quality-time criteria.
Table 5. Summary of results with PL measure and flash OFF sorted according to the quality-time criteria.
DESKJETLASERBOOK
RankAlgorithm PL Time (s)Algorithm PL Time (s)Algorithm PL Time (s)
Dataset 1—Flash OFF
Samsung Note 10+
1iNICK-R96.473.48Sauvola-R96.590.19Vahid22-C98.4129.22
2Sauvola-R96.070.19Nick-R96.580.19HuangUNet-B98.1850.22
3Yasin-R95.991.77iNICK-R96.573.49CNW-R97.973.60
4Nick-R95.880.19Yasin-R96.501.94DPLinkNet-C97.879.10
5Singh-B95.780.40ElisaTV-R96.5011.66DocDLink-C97.817.01
Samsung S21 Ultra 5G
1Sauvola-R96.590.19Wolf-R96.750.26Michalak-R97.780.04
2iNICK-R95.893.43Nick-R96.540.19CNW-R97.753.37
3Wolf-R95.810.25Singh-B96.450.38ElisaTV-R97.658.73
4Singh-B95.660.37Yasin-R96.221.85Vahid22-C97.4529.14
5Nick-R95.620.18iNICK-R96.143.49Jia-Shi-R97.4418.45
Dataset 2—Flash OFF
Motorola G9
1Nick-R96.200.21YinYang21-R96.521.67Michalak21b-R99.103.13
2iNICK-R95.633.53YinYang-R96.511.74Michalak21c-R99.061.48
3YinYang21-R95.561.73iNICK-R96.463.50CNW-R99.013.55
4Singh-B95.480.51Nick-R96.340.20Michalak-R98.990.05
5Yasin-R95.442.13Michalak21a-R96.280.05DPLinkNet-C98.8611.86
Samsung A10S
1Sauvola-R96.310.12YinYang22-R96.704.59ISauvola-R99.140.31
2Singh-B96.230.26ElisaTV-R96.557.39Michalak21c-R98.970.84
3Nick-R96.150.12YinYang-R96.511.08Michalak-R98.800.03
4Yasin-R95.901.30Michalak21a-R96.410.03Vahid22-C98.8017.47
5iNICK-R95.803.27YinYang21-R96.361.04WAN-R98.770.78
Samsung S20
1Nick-R96.100.15YinYang-R96.101.41Michalak21c-R99.101.04
2Singh-B95.830.34Michalak21c-R96.071.14DocUNet-L99.0745.50
3iNICK-R95.633.35Michalak21a-R95.980.04Michalak-R99.060.04
4Yasin-R95.311.63Bradley-R95.980.31ISauvola-R99.050.38
5YinYang-R95.191.37Michalak-R95.950.04Bradley-R99.040.28
Apple iPhone SE 2
1Yasin-R95.511.67Yasin-R96.651.60Singh-B98.700.17
2Nick-R95.400.14YinYang22-R96.526.02YinYang21-R98.661.11
3Sauvola-R95.350.15ElisaTV-R96.507.38Sauvola-R98.590.12
4YinYang22-R95.315.76Nick-R96.370.16Wolf-R98.530.17
5iNICK-R95.303.31Sauvola-R96.280.16Nick-R98.420.12
Table 6. Summary of results with PL measure and flash ON state sorted according to the quality-time criteria.
Table 6. Summary of results with PL measure and flash ON state sorted according to the quality-time criteria.
DESKJETLASERBOOK
RankAlgorithm PL Time (s)Algorithm PL Time (s)Algorithm PL Time (s)
Dataset 1—Flash ON
Samsung Note 10+
1Sauvola-R96.250.19YinYang22-R96.696.35HuangUNet-B97.6248.25
2Yasin-R96.071.98ElisaTV-R96.6811.88Calvo-Z-R97.591.26
3Nick-R96.010.19Yasin-R96.651.82DocDLink-C97.296.55
4Singh-B95.940.37Sauvola-R96.600.20DocUNet-L97.2739.87
5Yen-CC-C95.920.16YinYang21-R96.521.55Vahid22-C97.2427.96
Samsung S21 Ultra 5G
1Nick-R96.110.18Singh-B96.660.41HuangBCD-R98.12202.48
2Singh-B96.090.40Nick-R96.580.18WAN-R97.780.87
3Wolf-R95.680.25Michalak21a-R96.020.05HuangUNet-B97.6547.00
4Michalak21a-R95.270.05Yasin-R95.971.91CNW-R97.623.35
5Yasin-R95.271.80YinYang21-R95.911.55DocDLink-C97.486.28
Dataset 2—Flash OFF
Motorola G9
1Sauvola-R96.660.22Nick-R96.740.20Michalak21a-R99.290.05
2Nick-R96.080.21YinYang-R96.621.69ElisaTV-R99.2811.42
3Singh-B95.810.49Gattal-R96.6053.34Bradley-R99.240.35
4Wolf-R95.570.29Singh-B96.580.45Michalak21c-R99.151.30
5YinYang-R95.561.83YinYang21-R96.441.59Michalak-R99.060.05
Samsung A10S
1Sauvola-R96.230.12Nick-R96.400.11Wolf-R99.460.16
2Yasin-R95.681.25Yasin-R96.381.27Michalak21c-R99.410.80
3ElisaTV-R95.625.95YinYang-R96.181.05Michalak21a-R99.350.03
4Nick-R95.560.12Wolf-R96.120.16Singh-B99.320.23
5Singh-B95.560.25Singh-B96.120.25YinYang22-R99.204.47
Samsung S20
1Shanbhag-R96.360.13Sauvola-R96.670.16Ergina L -L99.420.56
2Nick-R95.770.15Yasin-R96.661.59Michalak21c-R99.360.95
3Singh-B95.570.33Otsu-R96.570.02Michalak21a-R99.350.04
4Gattal-R95.3052.04YinYang22-R96.515.27Bradley-R99.350.26
5Sauvola-R95.260.16Gattal-R96.4952.64Ergina G -L99.280.42
Apple iPhone SE 2
1ElisaTV-R96.113.18Otsu-R96.570.02YinYang21-R98.741.09
2Gattal-R95.9351.76Nick-R96.550.15Ergina G -L98.600.36
3Li-Tam-R95.870.12ElisaTV-R96.544.07YinYang-R98.581.34
4Nick-R95.830.15Singh-B96.530.26Ergina L -L98.560.49
5Singh-B95.790.26YinYang22-R96.515.51YinYang22-R98.564.26
Table 7. Summary of results with Ldist measure and flash OFF sorted according to the quality-time criteria. Note that Google Vision (in red) is not a binarization algorithm, but an OCR platform.
Table 7. Summary of results with Ldist measure and flash OFF sorted according to the quality-time criteria. Note that Google Vision (in red) is not a binarization algorithm, but an OCR platform.
DESKJETLASERBOOK
RankAlgorithm [ L dist ] Time (s)Algorithm [ L dist ] Time (s)Algorithm [ L dist ] Time (s)
Dataset 1—Flash OFF
Samsung Note 10+
0Google Vision0.971Google Vision0.971Google Vision0.984
1HuangUNet-B0.97164.271HuangUNet-B0.97164.329iNICK-R0.9903.421
2Michalak-R0.9700.051Michalak-R0.9700.051Vahid22-C0.99029.224
3Nick-R0.9700.188Michalak21a-R0.9700.052Singh-B0.9880.255
4Sauvola-R0.9700.194Nick-R0.9700.188Yasin-R0.9861.967
5Bradley-R0.9700.352Singh-B0.9700.408HuangUNet-B0.98650.216
Samsung S21 Ultra 5G
0Google Vision0.971Google Vision0.971Google Vision0.982
1Jia-Shi-R0.97122.391Wolf-R0.9710.259Niblack-C0.9880.133
2Wolf-R0.9700.254CNW-R0.9713.506ElisaTV-R0.9868.726
3ISauvola-R0.9700.453Jia-Shi-R0.97122.470Michalak-R0.9850.038
4WAN-R0.9701.209Nick-R0.9700.187Bradley-R0.9840.266
5Michalak21c-R0.9701.328Robin-L0.9700.979WAN-R0.9840.913
Dataset 2—Flash OFF
Motorola G9
0Google Vision0.000Google Vision0.000Google Vision0.001
1Bradley-R0.9680.401iNICK-R0.9703.503WAN-R0.9971.226
2CNW-R0.9683.595ISauvola-R0.9690.491CNW-R0.9973.547
3YinYang22-R0.9686.636YinYang21-R0.9691.672Jia-Shi-R0.99723.597
4Michalak21a-R0.9670.055CNW-R0.9693.578Michalak21a-R0.9960.050
5Michalak-R0.9670.056YinYang22-R0.9696.486Singh-B0.9960.391
Samsung A10S
0Google Vision0.970Google Vision0.971Google Vision0.995
1dSLR-R0.9710.030YinYang22-R0.9694.588Michalak21a-R0.9960.033
2WAN-R0.9700.795CNW-R0.9683.240ISauvola-R0.9960.308
3ISauvola-R0.9690.294Vahid22-C0.96816.820WAN-R0.9960.776
4Michalak21c-R0.9690.849Vahid-B0.96817.314Michalak21c-R0.9960.838
5YinYang21-R0.9691.050Michalak21a-R0.9670.032ElisaTV-R0.9965.948
Samsung S20
0Google Vision0.971Google Vision0.971Google Vision0.995
1ISauvola-R0.9700.376Michalak21c-R0.9681.141Nick-R0.9960.147
2YinYang22-R0.9705.789CNW-R0.9683.441WAN-R0.9960.973
3Vahid22-C0.97021.839Vahid22-C0.96822.565DE-GAN-G0.9963.334
4WAN-R0.9691.032Michalak-R0.9670.043CNW-R0.9963.410
5Michalak21c-R0.9691.103Bradley-R0.9670.307ElisaTV-R0.9968.087
Apple iPhone SE 2
0Google Vision0.804Google Vision0.000Google Vision0.990
1Ergina G -L0.9720.409Otsu-R0.9710.017WAN-R0.9910.798
2Gattal-R0.97250.697WAN-R0.9711.027CNW-R0.9913.416
3Otsu-R0.9710.015DPLinkNet-C0.9719.845Singh-B0.9900.173
4Li-Tam-R0.9710.105Vahid-B0.97122.857Bradley-R0.9900.214
5Moments-R0.9700.106Gattal-R0.97150.781ISauvola-R0.9900.312
Table 8. Summary of results with Ldist measure and flash ON sorted according to the quality-time criteria. Note that Google Vision (in red) is not a binarization algorithm, but an OCR platform.
Table 8. Summary of results with Ldist measure and flash ON sorted according to the quality-time criteria. Note that Google Vision (in red) is not a binarization algorithm, but an OCR platform.
DESKJETLASERBOOK
RankAlgorithm [ L dist ] Time (s)Algorithm [ L dist ] Time (s)Algorithm [ L dist ] Time (s)
Dataset 1—Flash ON
Samsung Note 10+
0Google Vision0.971Google Vision0.971Google Vision0.984
1DocDLink-C0.9718.926Michalak21b-R0.9703.230Nick-R0.9840.134
2DPLinkNet-C0.97112.102Yasin-R0.9691.822YinYang22-R0.9835.227
3Jia-Shi-R0.97123.264Vahid-B0.96929.386Calvo-Z-R0.9811.256
4DilatedUNet-G0.97136.097HuangUNet-B0.96965.967HuangUNet-B0.98148.253
5Michalak-R0.9700.049Akbari 3 -L0.96979.356WAN-R0.9790.890
Samsung S21 Ultra 5G
0Google Vision0.971Google Vision0.971Google Vision0.983
1ISauvola-R0.9710.434Vahid-B0.96927.036HuangBCD-R0.987202.484
2Michalak21a-R0.9700.049Singh-B0.9680.414Michalak21a-R0.9820.037
3WAN-R0.9701.183Nick-R0.9670.181Singh-B0.9820.245
4CNW-R0.9703.502Michalak21c-R0.9671.318WAN-R0.9820.865
5DocDLink-C0.9708.442Vahid22-C0.96738.140HuangUNet-B0.98247.002
Dataset 2—Flash ON
Motorola G9
0Google Vision0.000Google Vision0.000Google Vision0.001
1Michalak21a-R0.9710.055Michalak21a-R0.9700.053Vahid-B0.99726.296
2Bataineh-R0.9710.153Michalak-R0.9700.053Yen-CC-C0.9960.170
3Nick-R0.9710.209Bataineh-R0.9700.147Singh-B0.9960.360
4Sauvola-R0.9710.216ISauvola-R0.9700.478Ergina G -L0.9960.562
5Bradley-R0.9710.396WAN-R0.9701.314WAN-R0.9961.201
Samsung A10S
0Google Vision0.967Google Vision0.971Google Vision0.997
1ElisaTV-R0.9705.952Michalak21a-R0.9680.032Michalak21a-R0.9980.034
2HuangBCD-R0.970171.542Michalak-R0.9680.032Nick-R0.9980.115
3dSLR-R0.9690.025Bradley-R0.9680.218WAN-R0.9980.754
4Moments-R0.9690.026Singh-B0.9680.254Jia-Shi-R0.99815.750
5Michalak21a-R0.9690.032YinYang22-R0.9684.308HuangUNet-B0.99839.811
Samsung S20
0Google Vision0.967Google Vision0.971Google Vision0.997
1Nick-R0.9700.154ISauvola-R0.9700.362Otsu-R0.9970.014
2ISauvola-R0.9700.372YinYang22-R0.9705.271dSLR-R0.9970.098
3CNW-R0.9703.419Bataineh-R0.9690.111Li-Tam-R0.9970.098
4YinYang22-R0.9705.221Jia-Shi-R0.96920.096Wolf-R0.9970.186
5Triangle-C0.9690.148Vahid22-C0.96921.402Bradley-R0.9970.257
Apple iPhone SE 2
0Google Vision0.638Google Vision0.000Google Vision0.987
1WAN-R0.9710.992ISauvola-R0.9690.347YinYang21-R0.9911.087
2Otsu-R0.9700.016WAN-R0.9690.958Michalak21b-R0.9912.254
3Michalak-R0.9700.041DE-GAN-G0.9693.181DE-GAN-G0.9912.860
4Bataineh-R0.9700.114YinYang22-R0.9695.508Vahid22-C0.99116.958
5Moments-R0.9700.122DocDLink-C0.9697.026Li-Tam-R0.9900.034
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bernardino, R.; Lins, R.D.; Barboza, R.d.S. A Quality, Size and Time Assessment of the Binarization of Documents Photographed by Smartphones. J. Imaging 2023, 9, 41. https://doi.org/10.3390/jimaging9020041

AMA Style

Bernardino R, Lins RD, Barboza RdS. A Quality, Size and Time Assessment of the Binarization of Documents Photographed by Smartphones. Journal of Imaging. 2023; 9(2):41. https://doi.org/10.3390/jimaging9020041

Chicago/Turabian Style

Bernardino, Rodrigo, Rafael Dueire Lins, and Ricardo da Silva Barboza. 2023. "A Quality, Size and Time Assessment of the Binarization of Documents Photographed by Smartphones" Journal of Imaging 9, no. 2: 41. https://doi.org/10.3390/jimaging9020041

APA Style

Bernardino, R., Lins, R. D., & Barboza, R. d. S. (2023). A Quality, Size and Time Assessment of the Binarization of Documents Photographed by Smartphones. Journal of Imaging, 9(2), 41. https://doi.org/10.3390/jimaging9020041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop