FADIT: Fast Document Image Thresholding

: We propose a fast document image thresholding method (FADIT) and evaluations of the two classic methods for demonstrating the effectiveness of FADIT. We put forward two assumptions: (1) the probability of the occurrence of grayscale text and background is ideally two constants, and (2) a pixel with a low grayscale has a high probability of being classified as text and a pixel with a high grayscale has a high probability of being classified as background. With the two assumptions, a new criterion function is applied to document image thresholding in the Bayesian framework. The effectiveness of the method has been borne of a quantitative metric as well as qualitative comparisons with the state-of-the-art methods. presents a new thresholding technique called fast document image thresholding that the threshold by using a new criterion function under two assumptions. The two assumptions are proposed by using and properties of and background of document of When


Introduction
Document image analysis (DIA) is an important area of pattern recognition [1]. Degraded document image restoration has become active research topic [2][3][4]. Restoration is difficult for document images suffering from various degradations such as faded ink, bleed-through, folding marks, show-through, uneven illumination, etc.
Image thresholding can be applied to the preprocessing of document image restoration because of its simplicity of implementation and relatively constant contrast of document images [5]. The role of thresholding is to remove the background as much as possible without modifying the text. Two of the most popular techniques for this purpose are Otsu's method [6] and Kittler et al.'s method [7]. The comparison study of image thresholding obtains a conclusion that Kittler et al.'s method gives a better performance than others [8]. Kittler et al.'s optimum threshold is achieved by a criterion function related to average Bayesian classification error probability. Otsu's method is the other thresholding that produces relatively good results [9,10]. Otsu's threshold is determined by maximizing a discriminant function, but it tends to split the larger part when the sizes of object and background are unequal [11]. These well-known global thresholding methods have been widely applied to document image analysis [12][13][14][15][16].
Besides these global thresholding techniques, there are many local thresholding methods. Local thresholding is handled by applying different thresholds in different spatial regions. Sauvola et al. propose a sliding window-based method for document image thresholding [17]. Sauvola et al.'s method has a good performance [8], but has a high computational cost. A grid-based adaptive method is proposed to speed up Sauvola et al.'s [18]. With the grid-based technique, spatial discontinuity in the local gray-level information can be avoided by interpolation over the entire image to yield a smooth surface. With the grid-based technique, the performance of Saulova et al.'s is greatly improved [18]. The performance of these local thresholding methods depends heavily on the window or grid size and hence the character stroke width. This paper presents a new thresholding technique called fast document image thresholding (FADIT) that selects the threshold by using a new criterion function under two assumptions. The two assumptions are proposed by using the relation and properties of text and background of document images to improve the performance of thresholding. When a threshold is determined by a classifier, the pixels with lower grayscales than the threshold are classified into text and the others are background, so we assume that (1) both of the text and background are uniform with constant probabilities and (2) pixels with lower grayscales have a larger probability to be classified as text and pixels with higher grayscales have a larger probability to be classified as background.
FADIT has been compared and benchmarked against classic methods and has shown better levels of efficiency and performance. Besides FADIT is compared with the two classic methods in this paper, we also conduct experiments to compare the FADIT with some state-of-the-art methods. Additionally, we apply the grid-based technique to FADIT and realize a grid-based FADIT scheme. The comparisons indicate the efficiency of FADIT and the grid-based FADIT by using two performance measures. Experimental results show that FADIT achieves a better performance than others in terms of visual quality and quantitative evaluations.

Bayesian Framework
Suppose that the grayscale of an image is defined in the range [0, L − 1], and the grayscale value is denoted by n. If an image is classified by the threshold τ into two classes that denote as i (n <= τ) and j (n > τ), the probability of occurrence of each grayscale n in the image is given by, For a given grayscale n, P(n) is a constant that is determined by an image, the actual conditional probabilities P(n|i, τ) and P(n|j, τ) are able to be obtained after the threshold τ is determined, and P(i, τ) and P(j, τ) are easily to be calculated after the threshold τ is determined too. Kittler et al. [7] assumed that each of the two components P(n|i, τ) and P(n|j, τ) is normally distributed. If the assumed P(n|i, τ) and P(n|j, τ) are substituted into Equation (1), the equation returns a new probability P (n, τ) which is no longer equal to the actual P(n). P (n, τ) is related to the threshold τ, but P(n) is a constant that is determined by the image. As finding the optimized τ with respect to minimizing the Bayesian classification error probability is equivalent to maximizing correct classification probability, [7] adopt the average correct performance as the criterion function [7,19], Kittler et al.'s method is not only in consideration of the Bayes minimum error probability, but also trying to approximate the actual P(n) through P (n, τ), which means that the maximum of the function Equation (2) over the threshold τ has the other explanation that is the maximum correlation of P(n) and P (n, τ), where the column vector P(n) is [P(0), P(1), ..., P(L − 1)] T , and the column vector P (n, τ) is [P (0, τ), P (1, τ), ..., P (L − 1, τ)] T .
We see from the first line in Equation (2) that is the mathematical expectation of the correct classification probability. With Bayes' theorem [20], the posterior probabilities take the form of P(j|n, τ) ∝ P(n|j, τ)P(j, τ).
Thus, the criterion function Equation (2) can be written also as In next section, we propose two assumptions for the probability P(n) and the posterior probabilities P(i|n, τ) and P(j|n, τ) to obtain a new simple criterion function.

Criterion Function
Suppose that we select a threshold τ ∈ [0, L − 1], and use it to threshold the document image into two classes, i and j, where i consists of all the pixels with grayscales in the range of [0, τ] and j consists of the pixels with grayscales in the range of [τ + 1, L − 1]. The black dark text belongs to class i and the white bright background belongs to class j in a document image. In this paper, we propose two assumptions for document images: 1. Because both the dark text and the bright background are uniform in an ideal document image, we assume the probabilities of grayscales of the text and the background are two constants P i (τ) and P j (τ) respectively and the probabilities of other grayscales are 0. It means that once an image is segmented with threshold τ the image only have pure text and background. Under this assumption, a degraded document image is easily to be restored because Equation (3) implies that the actual probability tries to approximate the probability of the ideal document image. As Equation (2) is obtained by the maximum correct classification probability, P i (τ) and P j (τ) are given by the cumulative sum, The probabilities satisfy the constraint, 2. A lower threshold makes its text pixels with lower grayscales, and a lower grayscale has larger posterior probability to be classified into text. The lower grayscale a pixel has, the larger probability it is classified into dark text, and the higher grayscale a pixel has, the larger probability it is classified into bright background, so we assume that the posterior probability of the dark text P(i|n, τ) is a decreasing function f − (τ) of the threshold τ and the posterior probability of the bright background P(j|n, τ) is an increasing function f + (τ). The probabilities satisfy the constraint, Based on the two assumptions, a new criterion function is given by, which is obtained by substituting P(n), P(i|n, τ) and P(j|n, τ) into Equation (6). Most values of P(n) are 0 except P i (τ) and P j (τ), and we use f − (τ) and f + (τ) instead of P(i|n, τ) and P(j|n, τ), respectively.

Posterior Probability Function
Generally, there are two facts in a document image: 1. The mean µ of an image can be used to measure the average intensity of the image, and the average intensity of a document image mainly related with the bright background. 2. If a document image is degraded, the text is still dark but the background tends to become tarnished. The background of a degraded document image is not very bright.
Based on the two facts, posterior probabilities functions f − (τ) and f + (τ) need to have the properties as the curves shown in Figure 1. We use the decreasing function of the following form, where µ is the mean of the image.
The increasing function has the form of, After a number of tests, g + (τ) is given by The function f − (τ) decreases fast and the function f + (τ) increases fast too. Given two degraded document images with different mean, the intensity of background in the larger-mean image is larger than the smaller, so a pixel with high intensity in the smaller-mean image is more probable to be classified into background than the same gray value in the larger-mean image. For a pixel with a high intensity, the curve of f + (τ) corresponding to a smaller-mean image (right side of curve B) should always be above on the curve of f + (τ) corresponding to a larger-mean image (right side of curve D). According to Equation (10), the curves of f − (τ) have the opposite relationship. Consequently, the crossing point of f − (τ) and f + (τ) increases with µ increasing.

Speed-up Algorithms
Suppose that an image is denoted by I, and its grayscale variable is n. τ is the optimization variable and the optimal value τ opt is obtained by a criterion function.
The image histogram H(n) and the total pixel number N are computed by using the image to obtain the probability P(n) of occurrence of each grayscale n, The global mean of the image is calculated by, Using the threshold τ, P i (τ), that the text assigned to class i is given by Equation (7). The background assigned to class j is given by, The criterion function C(τ) is computed easily and finding its maximum is a simple algorithm as shown in Algorithm 1.
Otsu's method [6] selects the threshold τ that maximizes the between-class variance σ 2 b defined as where µ i and µ j are the means of grayscale of the pixels assigned to class i and class j, respectively. The between-class variance σ 2 b has been rewritten to obtain fast algorithm [21] as, where µ c i is the cumulative mean grayscale up to threshold τ and it is given by, Thus, the fast Otsu's algorithm is given by Algorithm 2.
The criterion function of Kittler et al.'s method is given by [7], where σ 2 i (τ) and σ 2 j (τ) are the variances of grayscale of the pixels assigned to class i and class j, respectively. The variances σ 2 i (τ) and σ 2 j (τ) are calculated by, Before using Equations (22) and (23) to obtain the criterion function, µ i and µ j are calculated by, where µ c j (τ) is similar to Equation (20) and is calculated by, Thus, the Kittler et al.'s algorithm is given by Algorithm 3. The effectiveness of a thresholding algorithm depends strongly on the statistical characteristics of the image, and FADIT only use the probability rather than the mean and variance as shown in Table 1. Table 1. Comparison of FADIT, Otsu's, Kitter et al.'s algorithms, " " denotes the corresponding item is necessary to be calculated while "×" denotes not.

Grid-Based FADIT
As the sliding window-based method has a high computational complexity and the block-based method has the block effect, the grid-based method is a compromise between window-based and blockbased method. The grid-based method can greatly improve the performance of original Sauvola et al.'s method [18].
A parameter of the grid-based method, scale s, is suggested to be set to an odd number [18], and the grid step s G is set to s−1 2 . The grid step means that s G × s G number of pixels are overlap between a grid and one of its neighboring grid.
The threshold in each sliding grid is calculated and stored into a matrix. The threshold matrix is interpolated by a differential calculus approach to yield a smooth surface [18,22]. The details of the grid-based implementation have been given in Moghaddam et al.'s paper [18] and the code can be found at the website of Mathworks (http://www.mathworks.com/matlabcentral/fileexchange/27808).
We obtain each threshold in each sliding grid by Algorithm 1, and the threshold matrix is interpolated to the size of input image. By the same way, Kittler et al.'s algorithm can use the grid-based technique too.

Experimental Results
The first experiment is conducted to demonstrate that FADIT method achieves performance of the classic methods (see Section 4.2).
The second experiment is conducted to demonstrate that the grid-based FADIT scheme achieves the state-of-the-art thresholding performance (see Section 4.3).

Experimental Setup
The effectiveness of FADIT has been tested with several images (see the first column of Figure 2) from dataset DIBCO. These images suffer from different degradation, which makes the thresholding a difficult work. The dataset also comprises their corresponding ground-truth images (see the last column of Figure 2), which are used to qualitatively compare the results of different methods.
We use peak signal-to-noise ratio (PSNR) and misclassification error (ME) [23] to quantitatively evaluate different algorithms. Larger PSNR indicates that the test image is more similar to the original (ground-truth) image. The ME varies from 0 for a perfectly classified image to 1 for a totally wrongly binarized image [8].
For the two-class segmentation problem, ME can be expressed as: where B O and F O denote the background and foreground of the original (ground-truth) image, B T and F T denote the background and foreground area pixels in the test image, and | • | is the cardinality of the set •.

FADIT Compared with Classic Algorithms
In this section, experiments are conducted to demonstrate the effectiveness of FADIA by comparing FADIT method with two classic thresholding methods: Otsu's method and Kittler et al.'s method, and the experimental results are shown in Figure 2. In Otsu's method, the between-class variance can also be written as, The criterion function Equation (28) maximizes both the term P i (τ)P j (τ) and the term [µ i (τ) − µ j (τ)] 2 simultaneously. The term P i (τ)P j (τ) produces the maximum value when the object and background has same pixels. The other term [µ i (τ) − µ j (τ)] 2 determines the criterion function to be similar to the liner discriminant analysis. Although Otsu's method is popular in image thresholding, it is not suitable for degraded document image. Kittler et al.'s method selects the global threshold corresponding to minimum thresholding error based on the assumption that each class is normally distributed, which limits its application because local regions do not always obey a normal distribution. FADIT method is based on two reasonable assumptions directly according to the inherent characteristics of degraded document image, so it obtains good thresholding results.
For image 1, the result of FADIT remains the least shadow in the right side and the difference can be easily seen in Figure 2b Figure 2l,m), and in them dark blocks totally cover the text.
We quantitatively evaluate different methods with the five images and list the result in Table 2. For any image, FADIT method obtains the largest PSNR and the smallest ME, which means that the objective quantitative evaluation is consistent with the subjective visual effect of the thresholding results. Meanwhile, FADIT algorithm runs much faster than Otsu's and Kittler et al.'s algorithm.

Grid-based FADIT Compared with Other Algorithms
The grid-based FADIT method is compared with Sauvola method [17], grid-based Sauvola method [18], and GLGM (gray-level and gradient-magnitude) histogram method [24]. We also present the results of grid-based Kittler et al.'s method. [18] obtain a conclusion that grid-based Sauvola et al.'s method is better than the grid-based Otsu's method, so we do not show the grid-based Ostu method in this section.
In all of grid-based methods, s G is set to where r and c denote numbers of row and column of the image, respectively. The image results of the mentioned methods are in Figure 3. Different from Otsu's, Kittler et al.'s or FADIT method, Sauvola method selects a threshold for each pixel, then obtains a thrshold matrix with the same size of the input image. Every threshold is calculated in the window centering on the corresponding pixel, so the algorithm runs quite slow, and the running time can been seen in Figure 3. From Figure 3a,f,k,p,u, we can see that the method can successfully extract the characters but can not obtain good result for the whole image as the background contains so much black noise. Moghaddam et al. [18] have demonstrated that the grid-based technique greatly improved the results of Sauvola et al.'s method, which can be seen from Figure 3b,g,l,q,v. As such, we apply grid-based technique into Kittler et al.'s and FADIT method. Figure 3 shows that grid-based technique indeed improves the results of Kittler et al.'s and FADIT method. However, grid-based Kittler et al.'s method still suffers from the problem in Figure 2m (see Figure 3m) while grid-based FADIT method obtains results all quite close to ground-truth images. GLGM method is a novel thresholding method which selects a global threshold based on the GLGM histogram, and it obtains satisfying results with natural images [24] while not with document images (see Figure 3x).
The quantitative evaluation result of different methods with the five images is listed in Table 3. According to the metrics, a grid-based FADIT method have outperformed the above methods.

Conclusions
Segmentation of text from degraded document images is a very challenging task. Experimentally, FADIT is compared against Otsu's and Kittler et al.'s thresholding techniques. Although image thresholding has been popular in image segmentation for over 50 years, the classic Otsu's and Kittler et al.'s methods still produce relatively better results than others [8]. FADIT obtains a better segment results than two classic methods and has a lower computational complexity. FADIT is a nonparametric technique to find the optimal threshold through the optimization of a new criterion function, and is a simple and robust algorithm for document image thresholding. Experimental results of FADIT can properly segment out text from background and they are quite close to the ground-truth images.
Since it does well in binary segmentation, FADIT can be applied to some specific applications, e.g., saliency detection. Most saliency detection tasks use a very clear object in an image and the image is captured by focusing the object in attention. FADIT can be used to produce a reference results for training a neural network in unsupervised way. Furthermore, FADIT can be used to generate pseudo labels for weakly labeled segmentation tasks. Since FADIT is good at segmenting binary targets and many medical image segmentation tasks only need to segment a specific organ, and medical images can be segmented by FADIT, at least it can obtain pseudo labels for training a neural networks.
Author Contributions: Y.M. did the experimental results and wrote the manuscript. Y.Z. discussed with Y.M. and gave many useful suggestions. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the 13th Five-year Informatization Plan of the Chinese Academy of Sciences, Grant Nos. XXH13506 and XXH13505-220, aAnd Data sharing fundamental program for Construction of the National Science and Technology Infrastructure Platform (Y719H71006). Technical support was provided by the National Cryosphere and Desert Scientific Data Center of China.

Conflicts of Interest:
The authors declare no conflicts of interest.