Learning-Based Proof of the State-of-the-Art Geometric Hypothesis on Depth-of-Field Scaling and Shifting Influence on Image Sharpness

: Today, we capture and store images in a way that has never been possible. However, huge numbers of degraded and blurred images are captured unintentionally or by mistake. In this paper, we propose a geometrical hypothesis stating that blurring occurs by shifting or scaling the depth of field (DOF). The validity of the hypothesis is proved by an independent method based on depth estimation from a single image. The image depth is modeled regarding its edges to extract amplitude comparison ratios between the generated blurred images and the sharp/blurred images. Blurred images are generated by a stepwise variation in the standard deviation of the Gaussian filter estimate in the improved model. This process acts as virtual image recording used to mimic the recording of several image instances. A historical documentation database is used to validate the hypothesis and classify sharp images from blurred ones and different blur types. The experimental results show that distinguishing unintentionally blurred images from non-blurred ones by a comparison of their depth of field is applicable.


Introduction
The availability and ease of use of cameras has caused the mass production of photos, videos, and multimedia.It is more interesting to capture the moment than to think about the quality of the captured content.Generally, postsorting captured photos based on their quality seems affordable and accessible.
However, gathering hundreds and hundreds of images makes the task almost impossible, and a qualitative automatic selection method becomes an immediate necessity.The blurring effect is one of the conventional image quality degradations, and recent increasing interest [1][2][3] in classifying the captured images is evidence of such necessity.
The image quality degradation due to unintentional blurring is caused by (a) a sudden displacement of the camera or the object at the instance of image capturing, known as motion blur effect or (b) a sudden change in the adjusted shooting distance, known as defocus blur effect.Unfortunately, these conditions are rare and too often, e.g., handshaking or object movement can happen at any time during the capturing.Generally, the motion or defocus blurring is caused by external or internal incidents related to the camera, respectively.The external incidents are physical happenings outside the camera and the internal incidents are related to unexpected scene changes in conflict with the camera parameter setting.In this paper, we study the unintentional blur problem.A hypothesis based on geometrical optics has arisen, explaining that blurring occurs either by shifting or scaling the depth of field.We prove the validity of the hypothesis using historical documentation images in which there are two different depth surfaces.An independent method from geometrical optics is used to detect the depth surfaces from a single image.The DOF of each image is estimated from the position of the depth surfaces.We also show the feasibility of detecting unintentionally blurred images from non-blurred ones by comparing their depth of field.The paper is organized into eight sections, including the present section.We review previous works in Section 2. The relation between blurring and DOF based on geometrical optics is considered in Section 3. The hypothesis is presented in Section 4. Furthermore, the object depth modeling is presented in Section 5.The blur classification is investigated in Section 6. Experimental results are shown in Section 7. Finally, we discuss our approach and conclude the paper in Section 8.

Related Work
Measuring edge elongation caused by blurred effects has played a significant role in the analysis of blurred images.In previous approaches, the blurring effect was detected and estimated by measuring the blur extent of edges in [4] or by fitting the gradient magnitude to a normal distribution along the edge direction where the standard deviation of the distribution and gradient magnitude was used as the blur measure as in [5].Zhao et al. [6] presented a defocus blur estimation using a transformer encoder and edge detection method.They proposed a hybrid architecture of convolutional neural networks with an edge-guided aggregation module and a feature fusion module for defocus blur detection.Li et al. [7] investigated the defocus blur detection method to detect blurred areas in images.To address the problem of uneven pixel distribution at the edges of defocused regions, they deliberately separated the main labels to prior tokens, including a structure body region and a edge transfer detail region.Almustofa et al. [8] investigated blur detection algorithms, including support vector machine filters, focus measure thresholding, and convolutional neural networks on blurred images.
The estimation of blur filter and latent unblurred images relies on blind image deconvolution methods [9][10][11].This type of estimation tries to solve a severely ill-posed problem.Most of recent proposed methods in image deblurring implement a spatially invariant blur in which it is assumed that all pixels in the input image are blurred by the same PSF.The partial blur problem was considered in some methods by assuming a blur kernel or with the help of user interaction [12][13][14].For all these methods, the deblurring is successful if the PSF is correctly reconstructed.However, in practice, blind deconvolution usually performs unsatisfactorily, even by making restrictive assumptions on image and kernel structures.This problem becomes even more significant when the partial blurring effect in images is considered.Therefore, blind deconvolution methods are not appropriate for general blur detection in terms of efficiency and accuracy, especially for handling images in a large database.
In photography, the low-depth-of-field technique focuses the camera only on an object of interest.In the presented methods in [15,16], the object of interest was extracted automatically.However, the computed low-depth-of-field images contained out-of-focus backgrounds, making the methods inappropriate for general blur detection.Datta et al. [17] extended the previous idea for image autosegmentation as an application of blur analysis.They generated the low-depth-of-field images by calculating an indicator, defined by the ratio of high-frequency wavelet coefficients of the central regions of the whole image.Accordingly, the method simply assumed that low-depth-of-field images contained focused objects near the image center and out-of-focus objects in surrounding pixels of the image.Thus, the method also did not suit general-purpose blur analysis.
The most relevant research to this work is probably related to depth recovery from motion and defocus blurring effects.Li et al. [18] proposed a learning framework for motion and defocus deblurring networks.Their networks were trained for removing object blur as a by-product.Keshri et al. [19] presented depth recovery with a single camera scanner by applying focus blur and changing the aperture number.Their proposed model performed well with both sharp and blurred images in computational depth estimation up to a range of 3.3 m, regardless of whether the image was in focus or out of focus.Nazir et al. [20] suggested a deep convolutional neural network to estimate the depth and image deblurring.Kumar et al. [21] presented a novel technique to generate a more accurate depth map for dynamic scenes using a combination of defocus and motion cues.The combination was performed by keeping the parameter of the defocus edge points aligned in the motion direction and estimating the camera parameters with the help of motion and defocus relations.The proposed technique rectified and corrected errors in the depth map caused by moving objects and inaccurate defocus blur and motion estimation.Using a patch-pooled set of feature maps, Anwar, Hayder, and Porikli [22] presented a depth estimation method based on a novel deep convolutional neural framework from a single image.Moreover, they computationally reconstructed an allfocus image, removing the blur and achieving synthetic refocusing from the same image.The significant difference of their method from existing ones was the algorithm of the convolutional depth estimation from defocused image and incorporating the resulting depth map in deblurring.In contrast, in this work, we apply the knowledge of geometric optics on edges of images, blurred or normal, to find an estimated depth net of edges as potential seeds to recover the depth in the whole image.Then, by analyzing the depth data, blurred and normal images are classified.

Relation between Blurring and DOF Based on Geometrical Optics
To understand the origin of the blurring problem, we need to focus on the most crucial part of the capturing process: the projection of a scene by a lens on a camera sensor.Figure 1 shows the principle of such projection using a thin lens model [23] where the camera has a focal length of f.When an object stands at the focus distance of d f , i.e., the shooting distance, the image sensor capture the object as it is at the distance of d 0 , see the top sketch in Figure 1.However, at any other distances, e.g., d 1 or d 2 , the image sensor captures a deformed and blurred shape of the object.
In the thin lens model, a circular imprint of the blurring effect, known as the circle of confusion (CoC), is used to measure the blurriness.The distance C in Figure 1b,c represents the diameter of such a CoC, and it is calculated by where d i is the object distance, and N is the relative aperture or f-stop number.Figure 2 shows the D CoC as a function of the object distance d i , where distance d f is 800 mm, fstop number N is 8, 16, or 22, and the focus f is f 0 and f 1 at 50 and 52 mm, respectively.This figure shows that the diameter of D CoC is a non-linear function of the object distance.
The function monotonically decreases or increases for object distances less or bigger than the focus distance, respectively.The expansion tolerance of the CoC while capturing images is a subjective issue, and the conjugate of the CoC in object space is typically bigger than the CoC due to the lens magnification, as shown in Figure 3.   .When the conjugate of CoC is small in relation to the lens aperture size, it yields: ( Using Equation (1) in Equation ( 2) yields: Thus, a successful non-blurred image capturing is constrained by Equation (3); a more detailed calculation of the DOF can be found in Appendix A. In unintentional image degradation, i.e., caused by motion or defocus blur, the focal length f does not change, i.e., the appropriate focal length is chosen intentionally and as part of the camera setting.However, the focus distance d f , as a result of the internal camera setting, can conflict with scene changes.Thus, the DOF constraint for such degradation cases, expressed in Equation ( 3), is a function of two parameters: focus distance d f and the distance to the infocus plane d i .

Hypothesis on the Cause of Blurring Effects
We discussed earlier how a successful non-blurred image is captured; when the object is in the range of the DOF, the depth of field is calculated by Equation (3).We argued that the DOF for unintentional image degradation cases is a function of two parameters: the focus distance d f and the distance to the in-focus plane d i .Figure 4 demonstrates typical calculated DOF values using Equation 3 for a focal length of 200 mm where d f and d i are varied between 700 to 1200 mm and 300 to 3000 mm, respectively.In the figure, the relation of DOF to d i for certain d f values of 850, 1000, and 1150 mm are shown in green, red, and black, respectively.These relations for a focal length of 200 mm are shown in Figure 5.In that figure, for certain d i , the DOF is changed with respect to the d f .The scaling concept can be found in Appendix A (Figure A2).The relation of the DOF to d i for a d f value of 1000 mm for different focal lengths is shown in Figure 6.In the figure, as the d i distance is changed, and for any focal length, the DOF value is changed monotonically and non-linearly.The translation shifting concept can be found in Appendix A (Figure A3).

Object Depth Modelling
In the discrimination of objects in a scene in relation to their distance from the camera, the contour of objects has a significant role.The projection of such contours on an image sensor results in edges in the captured digital image.The orthogonal projection of a contour, represented as an edge in an image, is generally modelled as: where u(x) is a step function, and A and B are the amplitude and offset of the edge, respectively.The simplified model has no information about the object distance from the camera.If we consider light to be represented by a complex plane wave which is traveling from any object towards the lens and then the image sensor, then each point on each of these object planes acts like a spherical wave source interfering constructively or destructively with every other spherical wave in the planes beyond the object plane.According to the Huygens-Fresnel principle, the phases change in any given plane, i.e., an interference pattern is generated in any given plane through a lens, the Fourier transform of the eventual captured image on the image sensor is generated in which the phase contains the information of objects from their respective object planes.Here, it can be argued that as long as object planes are at different distances to the lens, then the phase of the Fourier transform is related to the object distances.
Blanchet et al. [24] showed that the sharpness of images edges is related to phases and their coherency, and by a decrease in such coherency, the blurring effect appears.They also showed that the degree of blurring effect can be modeled by phase incoherency, which has a Gaussian distribution.Thus, to improve the model in Equation ( 4), we assume the step edge can undergo a variational blurring effect in relation to the variation in the phase incoherency, which in its turn is related to the distance variation in the object plane.Let us assume such a function is: where  m = f(d i , d f ) = d i − d f , i.e., the standard deviation varies by two parameters of di and df.Thus, the improved model can be written as: where  represents the convolution operation.Here, the origin of x is assumed to be in the focus distance position, i.e., d f in Figure 1.f(d i , d f ) represents the variation in the object distance from the focal plane.By increasing the object distance from the focal plane, the blurring effect is increasing; the  m in g(x) is increasing.When the point (i.e., a point of the object) is not in focus, see Figure 1, its image on the image plane is no longer a point but a circular patch with a certain radius that defines the amount of defocus associated with the depth of the point in the scene.On the other hand, the defocusing process can be modeled as I(x) = ∫ f(τ)h(x, y)dy, where x is adopted to denote the 2D space coordinates, f(x) is the focused image of the scene, and h is the space-varying PSF.Here, h(x) is given by a circularly symmetric 2D Gaussian function , where σ is a function of the depth at a given point f(di, df).Here, the depth is associated with two possible variables di and df.
The increase in the blurring effect in its turn causes a non-linear increase in D CoC .Thus, it yields: where f(. ) represents a mathematical function.The effect of the lens, assuming a Gaussian PSF, p(x), on the captured contour can additionally improve the model as: Generally, the edges in digital images are detected by gradient calculations.Thus, for captured edges as modeled in Equation ( 8), it yields: ( It is notable to consider when the object position is at the focal distance.In that case,  m = 0 and g(x) = 1 in Equation (8).Then, the gradient of edges for such objects is calculated as ( The result of detected edges which are contaminated by any tolerable blurring effect (i.e., when the object distance is still in the range of the DOF) or nontolerable blurring effect (i.e., when the object distance is outside the range of the DOF due to unintentional motion or defocus blur) can be compared to the same detected edges in the focal plane as Equation (12) indicates an important result which can be used for detecting any blurring effect.The amplitude part of the equation shows the ratio of amplitudes (i.e., due to possible intensity changes) on an edge before and after any blurring effect.However, obtaining the equation assumes that the same edge is detected in and out of focal plane positions, which are difficult tasks in practice.Thus, to use the mentioned result and solve the practical issue, let us consider an estimation of g(x) in Equation (8) as which is a variable Gaussian model in which the model is modified by varying its standard deviation,  v .Let us assume the change in  v is linear, monotonic, and in a limited optional range, i.e.,  v = f(d v ) = d v , and the range of d v is chosen optionally.Now, assume we have a database of images containing sharp and blur images caused by unintentional motion or defocus blurring effects.If the database images are captured by the same capturing device and have different content, some pairs of sharp and blurred images of the database can be used in a training procedure to classify blurred images, even without the existence of their sharp counterpart, as follows.For the training set of images, the edges and the intensity values along the edges (i.e., the amplitude) of each image can be detected and computed, respectively.Using the amplitude part of Equation ( 12) and substituting g ̂(x) instead of g(x), for a sharp image, it yields: where T dv is a measurable value representing the ratio of intensity values of the pixels which are along the edges in the sharp and the generated blurred images.As long as the T dv values are obtained as a consequence of g ̂(x) , the values represent the relative distances (i.e., the relative depth) between the generated blurred and the sharp image.
As far as even sharp images contain edges of objects from different distances to the lens, i.e., the objects distances are in the range of DOF, different depths for the edges are obtained.To obtain a maximum range of depth, by varying d v a certain generated blurred image is found where e.g.,  v 2 = σ t 2 and T dv = T; i.e., a maximum value among the computed T dv values.Then, from Equation ( 14), we have: In the same way, a blurred image from the database can be compared to the generated blurred image as: Here as well, different depths are detected using a maximum depth range by varying   , where a certain generated blurred image is found, e.g.,  v 2 = σ p 2 and P dv = P , i.e., a maximum value among the computed P dv values.Then, from Equation ( 16), we obtain: By having σ l 2 from Equation ( 15), Equation ( 17) is used to obtain the standard deviation of a burring kernel which causes the most blurring effect from the unintentional motion or defocused blur.
The pseudocode of the presented algorithm that summarizes the main computational steps is shown in Algorithm 1.

Estimating the Full Depth Map of Images
A sharp or blurred image generally contains edges.A "Canny" edge detector is used to obtain the magnitude of the edges.Equation 13is implemented on the image in which (a) a variable Gaussian model g ̂(x) is generated with a stepwise variation in the standard deviation of  v from 0.5 to 0.75 with 0.05 in each step, (b) the image is filtered with each generated Gaussian model from (a).The amplitude ratio between the generated blurred images and the image, according to Equation ( 14) for a sharp image, or the unintentionally blurred image, according to Equation ( 16), is calculated, which results in obtaining different T dv and P dv , respectively.By finding the maximum of T dv and P dv values, the maximum depth ranges of T and P are obtained, respectively.A sparse depth map of the edges is computed by quantizing the amplitude ratio values in the found depth range.A joint bilateral filtering is applied on the sparse depth map to refine inaccurate estimates [25].The obtained sparse depth map of edges is then propagated to the entire image to have a full depth map using the matting Laplacian method [26].

Object Depth Estimation from Images
A full depth map of an image is used to arrange a vector data array,   , of all depth values.Then, a new vector,   , is obtained by sorting the elements of   .The histogram of   is calculated and used as a feature vector,   , which represents the relative estimated depth of the objects in the whole image.

Blur Classification
The classification between "sharp" and "blurred" images was achieved using a probabilistic RUSBoost classification approach.RUSBoost is an algorithm to handle class imbalance problem in data with discrete class labels [27].It uses a combination of RUS (random undersampling) and the standard boosting procedure AdaBoost [28], to model the minority class by removing majority class samples.We used a support vector machine (SVM) as a weak learner for boosting in the approach.The SVM determines a hyperplane in a high dimensional feature space of v_Feature, i.e., the relative estimated depth values in an image.The best hyperplane is derived by maximizing the margin; i.e., the least distance from the hyperplane to the data.The obtained RUSBoost trained model was then validated on the test set of images to find class prediction score of each image.Following the method presented in [29] to obtain binary outputs, a sigmoid model was used to achieve a posterior probability P(class/input) on the prediction results of RUSBoost.
The classification between "sharp" and "defocused blurred" images, and between "sharp" and "motion blurred" images were achieved by using the probabilistic RUSBoost classification approach as mentioned above for each of the classification frameworks.

Experimental Results
In this section, our results are presented.

Database of Images
An image database of historical documents was used, which consisted of 874 highresolution color images.When capturing the images, due to unintentional motion or defocused blur, some of images were recorded as partially or fully blurred images.There were 22 and 15 motion and defocused blurred images in the database, respectively.The database also sometimes contained both the sharp and blurred images of the same document.An expert description about the type of degradation also existed in the database.Figure 7 shows a typical example of a sharp and blurred image of the database.

Object Depth Detection from Images
An image from the database was resized, preprocessed by intensity adjustment of each color channel, and converted to a grayscale image.Then, object depths were estimated according to Section 5.2. Figure 8 shows a typical example of a full depth map of sharp and blurred images on the left and right sides of the figure, respectively.It should be noted that the size of document images was quite big.We used the original image size for training, and the time to obtain a depth image was around 7 s.After training, we did not need to use the original-size image.By reducing the image size (by a factor of 10, which means 10% of the original size) the computation for obtaining a depth image took around 1.4 s.
In the figure, the color map of Jet was used, where the dark blue and dark red indicate the nearest and furthest distance from the camera, respectively.A typical example of a   representation of depth data is shown in Figure 9, where the data related to the images in Figure 8 are used.A typical example of histogram calculation on   depth data is shown in Figure 10, where the data related to Figure 8 are used.Figure 10 shows that the related images consist mainly of two dominant object planes.As far as the image database consists of captured historical documents, our results from other images in the database also show that there are two dominant object planes in the images.

Verification of the Hypothesis
In the database, there are several pairs of sharp and blurred images for the same document.The feature vectors,   , of each such pair of images are estimated according to Section 5.2.Some examples of two types of blurring classes (blurring effects caused by unintentional defocusing and motion blur) can be found in Appendix A.

Performance of Classification
The dataset included 827 sharp images and 89 blurred ones including 47 with mixed blur, 22 with motion blur, and 15 with defocus blur.Three classifications were performed: (a) between sharp and mixed blur images, (b) between sharp and motion blur images, and (c) between sharp and defocused images, where the level of imbalance in the classifications was 5.38%, 2.59%, and 1.78%, respectively.A single class was selected to be the positive class in the classifications, while the remaining classes were combined to make up the negative class.All classifications were performed using a tenfold cross-validation.

Relative object depth
For each classification, the data set was split into ten partitions, nine of which were used to train the model, while the remaining partition was used to test the model.This process was repeated ten times so that each partition acted as test data once.In addition, ten independent runs of this procedure were performed to eliminate any biasing that may have occurred during the random partitioning process.We use four quantities to evaluate the results: true positive rate (TPrate) which is the number of true positives divided by the total number of positives in the dataset, true negative rate (TNrate), which is the number of true negatives divided by the total number of negatives in dataset, false positive rate (FPrate), which is the number of false positives divided by the total number of negatives in the dataset, and false negative rate (FNrate), which is the number of false negatives divided by the total number of positives in dataset.Table 1 shows the classifications results.The classification accuracy was calculated as the mean of the true positive rate and true negative rate.According to Table 1, the classification accuracies between sharp and mixed blur, between sharp and motion blur, and between sharp and defocused blur were 0.9506, 0.9464, and 0.9638, respectively.

Discussion and Conclusions
The use of image databases for different applications in industrial, educational, and medical problems is more relevant than ever due to the ease of image capturing and storage.Enormous numbers of degraded images are captured unintentionally or by mistake and in the long run, there is a need to clean a large quantity of data from unwanted images.
In this paper, we studied the unintentional defocus or out-of-focus blur problem.The cause of blurring effects was expressed as a hypothesis based on geometrical optics.We showed that the unintentional blur caused a shifting and scaling of the DOF, which in turn resulted in a motion and defocused blur on the captured images.We proved the validity of the hypothesis by an independent method used to compare the sharp and blurred images.
In Appendix A, Figure A4 shows the result of the independent method and Figures A2 and A3 show the hypothesis principles.In the independent method, we calculated the depth from a single image.We showed that an optimal range for a virtual DOF can be estimated by several virtually recaptures of the image.The recapturing of the image was achieved by using a Gaussian model for the lens.We argued that such recapturing process had strong effect on the amplitude of edges.Therefore, the ratio of the amplitude of edges for two virtual recaptured images, in the range of the virtual DOF, was argued to be a significant parameter in the estimation of the relative depth on the edges.We showed how a network of such edges could be propagated to the whole image to generate a depth image.The histogram of depth values in the depth image was used as a feature parameter for blur classification.A database of historical documentation was used for the verification of the hypothesis and blur classification.The use of documentation images was shown to be very useful in the simplification of a scene with its two major planes, the document and its background.The orientation of the two planes in the sharp and blurred images was easy to detect and compare, see Figure 10.In the classification of blur effects of the database, we faced a common problem in big data: the imbalance of classes.The numbers of blurred images as mixed blur, motion blur, and defocused blur were 47, 22, and 15 whereas the number of sharp images was 827 images.We used a probabilistic RUSBoost classification approach to solve the classification problem.The result of the blur classification was presented in Table 1.It should be noted that in the classification between sharp and mixed blur images, 44 of 47 of possible blurred images, between sharp and motion blur, 20 of 22 possible images, and between sharp and defocused, 14 of 15 possible images were classified.However, this indicates that the statistical results are sensitive to the number of blurred images, and there is a need to examine the methodology on more extensive databases which include a larger number of blurred images.

Figure 1 .
Figure 1.Scene projection using a thin lens model with object distance (a) equal to df, (b) less than df, (c) greater than df.

Figure 2 .
Figure 2. Relation between D CoC and object distance.

Figure 3 .
Figure 3. Relation between the CoC shown as C and the conjugate CoC shown as C/M, as well as distance regions in image space, around the CoC, and in object space, around the conjugate CoC.According to Gauss's ray construction, there are two distance regions around the CoC in image space and around the conjugate of the CoC in object space where the image plane and object can change their positions and still make it possible to capture a nonblurred image.These distance regions in image space and object space are asymmetrical and known as the depth of focus and depth of field (DOF), respectively; see the blue regions in Figure 3.The D CoC is shown by C and the diameter of the conjugate of the CoC is presented by C M in Figure 3, where M is the magnification from image space to object

Figure 4 .
Figure 4.The relation between depth of field, focus distance, and distance to the in-focus plane for the focal length of 200 mm.The green, red, and black lines are calculated for distances to the infocus plane of 850, 1000, and 1150 mm.

Figure 5 .
Figure 5. Relations between depth of field and distance to the in-focus plane for certain focus distance values of 850, 1000, and 1150 mm.

Figure 6 .
Figure 6.Relations between depth of field and distance to the in-focus plane for different focal length values with a distance to the in-focus value of 1000.

Figure 7 .
Figure 7.A typical example of a sharp (left) image and blurred (right) image, in the used database.

Figure 8 .
Figure 8.A typical example of a full depth map of a sharp (left) image and blurred (right) image.

Figure 9 .
Figure 9.A typical example of a v1 representation of depth data.The blue and red lines represent the data from the sharp and blurred image, respectively.

Figure 10 .
Figure 10.A typical example of histogram calculation on v 1 depth data.The blue and red lines represent the data from the sharp and blurred image, respectively.

Figure A2 .
Figure A2.Observation of DOF scaling when the focus distance, df, varies and the distance to the in-focus plane, di, remains the same.

Figure A3 .Figure A4 .
Figure A3.Observation of DOF shifting when the distance to the in-focus plane, di, varies and the focus distance, df, remains the same.

Table 1 .
Classification results between sharp and mixed blur, motion blur, and defocused blur images.