Mathematical Principles of Object 3D Reconstruction by Shape-from-focus Methods

Conflicts of Interest: The authors declare no conflict of interest. The funder has had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Abstract Shape-from-Focus (SFF) methods have been developed for about twenty years. They able to obtain shape of 3D object from a series of partially focused images. The plane to which the microscope or camera is focused intersects the 3D object in a contour line. Due to wave properties of light and due to finite resolution of the output device, the image can be considered as sharp not only on this contour line, but also in a certain interval of height - the zone of sharpness. SSFs are able to identify these focused parts, to compose a fully focused 2D image and to reconstruct 3D profile of surface to be observed.


Introduction
Three-dimensional reconstruction of general surfaces has an important role in a number of fields: the morphological analysis of fracture surfaces, for example, reveals information on the mechanical properties of natural or construction materials.
There are more techniques capable of producing digital threedimensional (3D) replicas of solid surfaces. In mechanical engineering, contacting electronic profilometers can be used to determine digital two-dimensional (2D) profiles to be combined into 3D surface profilessee [1,2] for example. The contacting mode of atomic force microscopes is actually in this mechanical category [3]. In addition to the mechanical tools, optical devices exist in diverse modifications [2], light section microscopy [4,5], coherence scanning interferometry [6], speckle metrology [7], www.videleaf.com stereo projection [8], photogrammetry [9] and various types of light measurement of profiles [10], to mention some of them.
3D laser scanning techniques are among other ways of obtaining 3D data. They have also been tested in some rock engineering projects, such as 3D digital fracture mapping [11][12][13].
These devices are, however, not of universal use with each of them having its own technical limitations [14,15]. Very rough surfaces, for instance, can hardly be measured by atomic force microscopes working in the nano-regions. On the other hand, it is possible to measure plane surfaces with microscopically small irregularities using a microscopic sectional technique with confocal microscopes [16][17][18][19]. Confocal microscopes, however, are not always suitable for technical purposes due to the small size of their visual fields (maximal visual field is about 2 cm [20][21][22][23]).
The present paper will summarize the existing methods of 3D reconstruction of objects by the Shape-From-Focus (SFF) method. This is a method for recovering depth from an image series of the same object taken with different focus settingsreferred to as a multifocal image It consists of these steps: a) Data acquisition (confocal microscope in the standard mode, CCD camera or standard camera) Sec. 2 b) Image registration (if necessary) -sec 3. c) Choice of the focusing criteria -sec 4. d) The 2D and 3D reconstructions -sec 5.
See [24] for one of the first papers on this subject.

Data Acquisition Parallel Projection
In technical practice, the confocal microscope serves as a standard instrument imaging microscopic three-dimensional surfaces. It has a very small depth of the optical field with its www.videleaf.com advanced hardware being capable of removing non-sharp points from the images. The points of the object situated close to the focal plane can be seen as sharp points. The parts lying further above or beneath the focal plane (out of the sharpness zone) are invisible being represented as black regions if the confocal mode is on. In this way, a so-called optical cut is obtained. In the case of non-confocal (standard) mode, the areas lying outside the sharpness zone are displayed as blurred, as they would be with a standard camera. With a confocal microscope or CCD camera, one can assume that the field of view is small with the projection used being parallel. In this case, all images are provided in the field of view with identical sizes and the corresponding pixels having the same coordinates in separate partial focused images (see Figure 1). However, the confocal microscope with its small visual angle is hardly suitable for technical purposes due to the small size of its visual field (maximal visual field is about 2 cm [5,20,21,24]).

Central Projection
The same output (Figure 1 b) can be obtained by the classical microscope or (in a wider field) common camera. The difference between microscope or CCD camera and standard camera is given by the central projection that varies the scaling of partial images in a series and, further, by the non-sharp regions, displayed by the classic camera, while missing when taken by a www.videleaf.com confocal microscope in the confocal mode (see 25 for more information). The different image scalings, however, requires subsequent corrections (including shifts and rotations).

Multifocal Image
To create a 2D or 3D reconstruction, it is necessary to obtain a series of images of an identical object, each of them with different focusing with each object point being focused in one of the images (in the ideal case)this is referred to as a multifocal image. For the acquisition of a large multifocal image, the camera must be mounted on a stand so that it can be moved in the direction approximately orthogonal to the surface with a controlled step -see Fig 2. Different transformation and different sharp parts must be identified and composed in a 2D or 3D model.

Image Registration
With a confocal microscope or CCD camera, we can assume that the field of view is small and the projection used is parallel. In this case, all images are provided in the field of view with identical sizes with the corresponding pixels having the identical coordinates in separate partial focused images. However, this assumption does not hold for larger samples; then, the angle of the projection lines is not negligible with the view fields (and the coordinates of the corresponding pixels) being clearly different for each image (see Figures 2 and 3). Taken from [26] Before reconstruction, all geometric transformations must be identified in the image series to be eliminated. The images are being analyzed as geometrically similar. Generally, similarity is achieved by composing rotation, scale-change, shift, and axial symmetry. Axial symmetry is not possible in this particular case.
If we consider scale-change only assuming that the image size is proportional to the camera shifting (see Figure 4 a), different image scaling can be obtained using elementary mathematics. This approach was used in [25]. After the initial elementary reconstruction, a subsequent 3D reconstruction is one shown in Figure 5. Huge artefacts caused by inaccurate registration were blurred by brutal low-pass filters with much useful high frequency information lost.
If we consider scale-change only assuming that the image size is proportional to the camera shifting (see Figure 5 a), different image scaling can be obtained using elementary mathematics. This approach was used in [25]. After the initial elementary reconstruction, a subsequent 3D reconstruction is one shown in Figure 6. Huge artefacts caused by inaccurate registration were blurred by brutal low-pass filters with much useful high frequency information lost. www.videleaf.com  In practice, the situation may be more sophisticated. The images may differ not just in the scale used but in the content displayed as well (different parts being focused in different images). Due to mechanical inaccuracies, the step in the z axis may be not fully constant, the images can also be mutually shifted along the x -or y -axis or rotated. Image registration is also complicated by the non-planarity of samples (see Figure 4 on the right). Therefore, sophisticated pre-processing of the image series may be necessary. A method suitable tool for this is the Fourier transform and phase correlation.

(provided that this integral exists and is finite).
Function is also referred to as the Fourier spectrum of function . It is possible to obtain the function from its Fourier spectrum by the inverse Fourier transform.
A continuous standard inverse Fourier transform of a function ( ): ℂ → ℂ is the function (provided that the integral exists and is finite). www.videleaf.com A continuous standard inverse Fourier transform of a function ( ; ): ℂ → ℂ is the function (provided that this integral exists and is finite). (where the bar denotes complex conjugation) and the Fast Fourier Transform (FFT) algorithm can be employed to calculate the discrete Fourier transform -see [28] for more information.

Discrete two-dimensional Fourier Transform and Inverse Transform
The discrete Fourier transform can be used for the image registration applied to functions that are or are assumed periodic. Generally, an image may not have the same values on the edges. www.videleaf.com Thus, by periodizing an image, the resulting function may have jumps at the edges of the original image. Such jumps are often the most contrasted structures in the function and may lead to incorrect registration. Therefore, such edges used for the shift estimation must be removed from the images. This is done by multiplying the image by a suitable function referred to as a window function. Its values must equal zero or almost zero at the image edges and one on a large part of the image. Primarily, the Gaussian and Hanning window functions can be used: Let ∈ ℝ + be a given number and sets are called rectangular or circular Hanning window function.

-distribution
A one-dimensional -distribution ( ) is a limit of a function sequence ( ); ∈ ℕ for which www.videleaf.com  [29] for proof), condition a) is fulfilled. Since This means that condition b) holds as well and the limit ( ) = →∞ ( ) is the (one-dimensional) −distribution. The www.videleaf.com expanded unitary signal * ( ) with its Fourier transform is illustrated in Figure 6.

Phase Correlation
Image processing requires the images transformed for the structures studied to be at the same position in all of them. Finding the transformation is done by image registration. In some applications, it is possible to assume shift only while, in others, shift, rotation and scale change (i.e. similarity), general linear transformation or even general transformations may all be present.
The methods used for registration depend on the expected transformation and on the image structures. Some methods, after using the corresponding structures or points in the images, find a global transformation by measuring the positions of the structures or points [29 -31]. For these methods to be applicable, the structures must be clearly visible. Other, correlation-based, methods work with the image as a whole. The phase correlation has proved to be a powerful tool (not only) for the registration of particular focused images. For functions ; , it is defined as with its modification being 1 ; 2; ; ; ( ; ) = ℱ −1 { ( ; ) where the bar denotes complex conjugation, ( ; ) is a bounded real function such that ( ; ) = (− ; − ) and ; > are arbitrary constants. It is not difficult to prove that, for real functions , , the phase-correlation function is real [31]. This is very useful since the extremes of the phasecorrelation function can be searched for.

Identical Images
Let be the infinity periodic expansion of an image. Denote + its value ( ; ) at ( ; ), + ≠ . Clearly, the value of the phase correlation of the with itself is This is the principal idea of phase correlation. Using phase correlation, rather than finding a shift between two images, we can just find the only non-zero point in a matrix. If the images are not identical (up to a shift), i.e., if the images are not ideal, the phase-correlation function is more complex, but still has a global maximum at the point whose coordinates correspond to the shift vector.

Rotated Images
The phase-correlation function can also be used for estimating the image rotation and rescale. Let be function rotated and shifted in arguments, i.e., Their Fourier spectra ; and amplitude spectra ; are related as follows: The shift results in a phase shift and the spectra are rotated the same way as the original functions. A crucial step here is the transformation of the amplitude spectra into polar coordinates to obtain functions ; : ℝ + × ⟨ ; ) → ℝ + such that ( ; ) = ( ; + ). The rotation about an unknown centre has been transformed into a shift. This shift is estimated by the standard phase correlation (see the previous paragraph) after a reverse rotation by the angle measured, the shift ( ; ) is then measured in another computation of the phase correlation.

Scaled Images
Let be function rotated, shifted and scaled in arguments, i.e. Both rotation and scale change have been transformed to a shift. The unknown angle θ and unknown factor α can be estimated by means of the phase correlation applied to the amplitude spectra in the logarithmic-polar coordinates ; . After reverse rotating function by the estimated angle and scaling by the factor α, the shift vector ( ; ) can be estimated by means of the standard phase correlation.

Multifocal Registration
Let { ; ; … ; } be the image series to be registered with image acquired by means of the biggest angle of view. This image will be not transformed or (formally) it will be transformed by the identity mapping into the image * . Now we must find the transform → * to obtain image * which only differs from * in focussed and blured parts. In the same way, transforms → * ;…; → − * ;…; → − * must be found.

After multiplying both images
; − * by the chosen window function, rotation angle and scale factor will be determined by the method described in section 2. 4. Then, image is rotated by the angle − and scaled by the factor − to compensate for the rotation and scale-change found by the phase correlation, creating image . Between images and − * , only the shifted and different focused and blurred parts remain. Now we can apply phase correlation to find the shift ( ; ) shifting image by the vector (− ; − ) to compensate for www.videleaf.com the shift, creating image * which only differs from − * in the focused and blurred parts.

Focusing Criteria
The detectors of blurred areas (sometimes referred to as focusing criteria or sharpness detectors) can be based on different principles. Probably the first attempts to carry out non-confocal reconstructions date back to 1970's and 1980's [33][34][35][36].
Tenebaum [37] developed the gradient magnitude maximization method for optimizing the focus quality using the sharpness of edges. Jarvis [38] proposed a sum-modulus-difference computed by summing the first intensity differences between neighbouring pixels along a scan-line using it as a focus quality benchmark. Schlag et al. [39] implemented and tested various self-focusing algorithms. Recently, Krotkov [40] has evaluated and compared the performance of different focus criterion functions. In [41], he also proposed a method for estimating the depth of an image area. Pentland [42] suggested evaluating the image blur to determine the depth of image points. Grossmann in [43] proposed estimating the depth of edge points by analyzing the blur of the edges due to defocusing. Darrell and Wohn in [29] developed a depth-from-focus method by which an image sequence can be obtained through varying the focus level using Laplacian and Gaussian pyramids to calculate the depth. Subbarao in [30] suggests changing the intrinsic camera parameters to recover the depth map of a scene. Ohta et al. [31] and Kaneda et al. [44] use images corresponding to different focus levels to obtain a single level of high focus quality.
As follows from the above, the statistical range or the variance of the standard Fourier transform of a certain neighbourhood of pixel [ ; ] can serve as sharpness detectors (focussing criteria). A neighbourhood of pixel [ ; ] is a square of × pixels with equalling ten to twenty. In the case of the standard Fourier transform, the FFT algorithm is used. Since it requires a square with a side of = , = or = is used in this case [25,26]. However, because the standard Fourier transform is burdened with jumps along the square edges, it is necessary to www.videleaf.com use a suitable window function like in section 3.2 . For this reason, the cosine Fourier transform is preferable. It is obtained from the standard Fourier transform applied to an even extension of the neighbourhood to be processed. It is illustrated in Figure  8. This extension eliminates jumps on the edges. Since the sine frequencies in (5) are zeroed, there is no need to apply a window function. Whereas the low frequencies in the amplitude spectrum detect the blurred parts of the image, the very high ones only mean noise. Therefore, a suitable weight must be assigned to each frequency when the sharpness detector is calculated. In our software, the following detectors may be used:

2D and 3D Reconstructions
A 2D reconstruction consists in composing a new image that contains only the focused parts of a registered multifocal image. The registration and detection of the focused parts were described in Sections 3 and 4 The principal deficiency of most of the current methods of 3D reconstruction is that they assume the profile height at a point to be precisely determined by the value of the given sharpness detector. Based on such an unrealistic hypothesis, then, these values are interpolated. Parabolic interpolation [16,34] or Gaussian interpolation [35,36]) is used.
This conclusion, however, is false. We can use the series { }; = ; ; . . ; to assess the height of pixel ( ; ). Being of a random rather than deterministic nature, it cannot be interpolated, but must be processed by a statistical method. One of such methods is a regression analysis, which, however, is rather complicated. Direct calculation of the mean value is much easier.
For each pixel ( ; ), in k-th image in the series virtually infinitely many probability distribution functions ( ) can be constructed using different exponents applied to the series terms :

Data Acquisition
The following data have been used for the purposes of this paper: • A series of fifteen partially focused images of blue marble -Olympus camera • A series of eight partially focused limestone images -Canon camera • A series of thirty partially focused images of the surface of a hydrated cement paste -the Olympus confocal microscope in standard mode. Figure 9 shows the photos of Figure 2 but this time displayed in supplementary pseudo-colours. If these images were totally identical, then the arithmetic mean of the blue-green image on the left and orange image on the right would be "perfectly gray". The arithmetic mean of these images is shown in Figure 10a. Clearly, the components of this mean value are very differentthe values of the orange image are bigger in the yellow parts of the mean vale while the values of the blue-green image are bigger in the blue-violet parts of the mean value.

Image Registration
(a) (b) Figure 9: The first (a) and the fifteenth (b) image of a series of fifteen images of blue marble (see Figure 3) displayed in supplementary pseudo-colours. The software used has been written by the author. Taken from [26] In Figure 10b, the same construction was used after the registration. The very low colour saturation of the arithmetic www.videleaf.com mean (of the images) testifies to a very good conformity. Of course, the arithmetic mean cannot be a "perfect gray" in this particular case because the original images also differ in the sharp parts.
(a) (b) Figure 10: The arithmetic mean of the images of Figure 10: before registration (a), after registration (b). The software written by the author. Taken from [26] Table 1 summarizes the indicated and applied transforms in separate images of blue marble (see Figure 2,9,10). All transformations, detected with a sub-pixel precision, are listed with a precision of one thousandth of a pixel. Obviously, the scaling has the most important role, the shifts, however, cannot be neglected either. The rotation angle between the first and the last images is larger than five arcminutes, that is, about one pixel on the image periphery (the data resolution used is 1180 by 885 pixels). Such a transformation is marginal in this particular case.

Focusing Criteria
As implied by (26)-(28) and Figure 9, detector has increased the high frequencies (right bottom corner of squares in Figure 9) too much, which means that it is very noise sensitive. Too sharp cuts removing low and high frequencies are among the disadvantages of the detector . Although visually minuscule, the differences between detectors ; and do exist. For their quantification, Root Mean Square Error, www.videleaf.com Values of these characteristics are summarized in Table 2 for data from Figure 3. Note that we have = = = ; = for a pair of identical surfaces.
A similar summary is given by Table 3 for the present reconstruction methodsparabolic interpolation of the echelonapproximations , and and in Tab 4 for parabolic, hyperbolic and Gaussian interpolation of the same echelonapproximation .
We can see the optical cuts detected in the data of Figure 3 using focusing criteria (29) (30) in Figure 11, 2D reconstruction (sharp 2D Image) of the same data using the same criterion in Figure  12. Echelon approximation is a simple method for constructing a rough 3D model of the object with all points belonging to the same optical cut having an identical heightthe height of the corresponding zone of sharpness -see Figure 13. We can also generalize the notion of low-pass filters used in image processing. These filters are good for smoothing the echelon approximation. This approximation looks much better than the echelon onesee Figure 14 www.videleaf.com

2D and 3D Reconstructions
We can see optical cuts detected in the data of Figure 2 using focusing criteria (28) in Figure 11, 2D reconstruction (sharp 2D Image) of the same data using the same criterion in Figure 12. Echelon approximation is a simple method for constructing a rough 3D model of the object, where all points belonging to the same optical cut have the same heightthe height of the corresponding zone of sharpness -see Figure 13. Figure 11: Optical cuts detected on multifocal image of limestone (two images in the seriessee Figure 3). The software written by the first author of this paper. Taken from [45] We can also generalize the notion of low-pass filters used in image processing. They can be used for smoothing an echelon approximation. This approximation looks much better than the echelon onesee Figure 14. www.videleaf.com   Figure 11. The software written by the first author. Taken from [45]. In Figure 15 and 16 we can see reconstructions of the limestone and blue marble sample by data registration according to Section 3, with focusing criterion (28) and profile height calculation (29) and (30). We can compare a single pore of hydrated Portland cement paste reconstructed by Olympus factory software ( Figure  17) and the same pore constructed using focusing criterion (28), probability distribution function (29) and (30) for = ( Figure  18). www.videleaf.com Figure 15: A 3D reconstruction of the limestone sample by data registration according to Section 4, with focusing criterion 5.3 and profile height calculations 6.1 and 6.2 (compare with Figure 5). The software written by the first author. Taken from [45].

Figure 16:
A 3D reconstruction of the blue marble sample by data registration according to Section 4, with focusing criterion (28) and profile height calculations (29) and (30). The software written by the first author. Taken from [45]. www.videleaf.com   Figure 17. 47 optical cuts with a vertical stepping of 1.2 m. Olympus LEXT 1000 again. Non-confocal mode. A 3D reconstruction by data registration according to Section 4, focusing criterion (28) and profile height calculations (29) and (30). The software written by the first author. Taken from [27]. www.videleaf.com

Conclusion
The SFF method based on the Fourier transform can provide correct 3D replicas of rough surfaces. In the case of small samples, a qualified user of this method can obtain results similar to or even better than reconstructions from a confocal microscope. For larger objects, 3D scanners and similar significantly more expensive devices can be simulated by sophisticated mathematical instruments and advanced programming techniques.