Three-Dimensional Reconstruction of Light Field Based on Phase Similarity

Light field imaging plays an increasingly important role in the field of three-dimensional (3D) reconstruction because of its ability to quickly obtain four-dimensional information (angle and space) of the scene. In this paper, a 3D reconstruction method of light field based on phase similarity is proposed to increase the accuracy of depth estimation and the scope of applicability of epipolar plane image (EPI). The calibration method of the light field camera was used to obtain the relationship between disparity and depth, and the projector calibration was removed to make the experimental procedure more flexible. Then, the disparity estimation algorithm based on phase similarity was designed to effectively improve the reliability and accuracy of disparity calculation, in which the phase information was used instead of the structure tensor, and the morphological processing method was used to denoise and optimize the disparity map. Finally, 3D reconstruction of the light field was realized by combining disparity information with the calibrated relationship. The experimental results showed that the reconstruction standard deviation of the two objects was 0.3179 mm and 0.3865 mm compared with the ground truth of the measured objects, respectively. Compared with the traditional EPI method, our method can not only make EPI perform well in a single scene or blurred texture situations but also maintain good reconstruction accuracy.


Introduction
Light field imaging as new imaging technology has become research hotspots. It is dedicated to simultaneously recording the direction and intensity information of light, and its characteristic ensures that the light field camera can obtain multiple viewing angles with a single exposure [1,2]. Therefore, light field imaging can be applied to the three-dimensional (3D) reconstruction fields, such as physical measurement [3,4], depth estimation [5,6], intelligent detection [7,8], and so on.
As is known to all that depth estimation is one of the most important research contents in 3D reconstruction [9]. At present, there are two main ways to estimate depth information by using light field data. One is the multi-view image matching, and the other is EPI based on structural information.
The multi-view image matching method requires all sub-aperture images to be matched to obtain the scene disparity. Yu et al. explored the geometry of light in threedimensional space to improve the accuracy of stereo matching, but it did not perform well when the disparity of each pixel was small [10]. A new matching method based on principal component analysis was proposed to realize multi-view stereo reconstruction [11]. An occlusion prediction framework of light field images had been proven effective in the occlusion scene, and it was able to identify the obscured edges [12]. Tao et al. estimated the depth of the scene using both defocus and matching cues and then combined shadow cues to improve the ability of detail recovery [13,14]. However, it had led to increased calculation and poor real-time performance. In addition, a bidirectional reflectance distribution function invariant theory of spatial variation was deduced to restore 3D shape and estimate the depth of non-Lambert planes [15]. Although the information in the whole scene could be utilized perfectly by using multi-view images, it inevitably led to a large amount of calculation and time-consuming.
The other method is EPI based on structural information to obtain the slope of the line structure corresponding to the pixel points, where the slope corresponds to the disparity. The EPI based on structure tensor was used to calculate the depth of the scene, and the EPI lines were extracted to calculate its slope under the framework of total variation [16]. A method based on scene confidence was designed to improve the accuracy of EPI depth estimation and leveraged the coherence of light fields to achieve the goal of 3D reconstruction [17]. In addition, sheared EPI analysis was performed well where EPIs were transformed with several shear values before the structure tensor analysis to estimate accurate disparities even from non-dense light fields [18]. However, EPI methods based on structure tensor depend too much on the complexity of scene texture, and it is difficult to show good effect in the situation of a single texture scene. The above methods still need to be further optimized for regions with similar or missing textures.
3D reconstruction with structured light is regarded as the main measuring technology for its high accuracy and fast speed [19]. Sinusoidal fringes are projected onto the object surface, and the phase information modulated by the object surface will be obtained by the multi-frequency heterodyne method. In this case, there is a unique determination relationship between the phase value and the pixel in the same scene. The method used the structured light in the light field had realized the 3D measurement [20]. However, their work still used the camera and projector calibration technology, so the system calibration was more complicated due to the feature of the multi-view angle of the light field camera. Besides, the phase modulation degrees of two-step phase shift was not obvious, and it was difficult to achieve accurate 3D reconstruction. Consequently, we take the structured light fringe projection technique into the calculation of light field EPI to make sure that each pixel in the scene can be encoded with phase information, and the four-step shifted phase algorithm is used to make the phases independent of each other.
In this paper, a 3D reconstruction method of light field based on phase similarity is proposed to make EPI perform well in the single texture scene. The process of system calibration in the structured light method is eliminated, and there is no need to conduct secondary calibration for the projector. According to the model of light field imaging, the calibration for the light field camera based on Zhang's calibration method is realized to obtain the relationship between disparity and depth. This relationship must lay the foundation for the 3D reconstruction of the light field. Then the deep estimation algorithm based on phase similarity is designed to effectively improve the reliability and accuracy of disparity calculation, where the phase information is used to replace the structure tensor. The morphological processing method is used to denoise and optimizes the disparity map to improve its accuracy. The 3D reconstruction of the light field can be realized by combining the disparity information based on phase similarity with the linear relationship obtained from calibration. Generally, our methods can not only make EPI perform well in a single scene or blurred texture situation but also maintain good reconstruction accuracy.
The rest of this paper is arranged as follows. Section 2 introduces the principle of our method and our hardware implement. Section 3 describes the imaging model of the light field camera and its calibration method. Section 4 presents the principle of light field 3D reconstruction based on phase similarity and explains it in detail. The experimental results will be shown in Section 5, and Section 6 summarizes the work and discusses further research.

Light Field Imaging and EPI Principle
On the basis of the pinhole model of a traditional camera, the micro-lens array (MLA) is added between the main lens and the image sensor. The incident light is converged onto the MLA through the main lens, and then re-imaged onto the image sensor. Therefore, the light field camera can record the four-dimensional information of angle and space simultaneously. The accepted model of a light field camera is a geometric model based on a two-parallel plane [21]. Generally, light defined as a straight line passes through the main lens plane (s, t) and the micro-lens plane (x, y) to form a number of pixel points, where point P is projected onto two lines of two planes with fixed coordinates, as presented in Figure 1a. The main lens plane (s, t) provides the angular resolution of the scene, and the micro-lens plane (x, y) provides the spatial resolution of the scene, hence the light field can be represented as L (s, t, x, y). On the basis of the pinhole model of a traditional camera, the micro-lens array (MLA) is added between the main lens and the image sensor. The incident light is converged onto the MLA through the main lens, and then re-imaged onto the image sensor. Therefore, the light field camera can record the four-dimensional information of angle and space simultaneously. The accepted model of a light field camera is a geometric model based on a two-parallel plane [21]. Generally, light defined as a straight line passes through the main lens plane (s, t) and the micro-lens plane (x, y) to form a number of pixel points, where point P is projected onto two lines of two planes with fixed coordinates, as presented in Figure 1a. The main lens plane (s, t) provides the angular resolution of the scene, and the micro-lens plane (x, y) provides the spatial resolution of the scene, hence the light field can be represented as L (s, t, x, y). The light field L (s, t, x, y) can be simply understood as a function of the light space. In particular, when the light space is limited to a two-dimensional plane, the light field can be represented as Lt*, y* = L (s, t*, x, y*). Similarly, other restrictions can be defined in the same way. For example, Ls*, t* is a sub-aperture image of a particular angle in the light field image. Figure 1b shows 7 × 7 sub-aperture images extracted from the 4D light field data, in which the size of each sub-aperture image is 512 pixels × 512 pixels. Ls*, x*, and Lt* y* are referred to as EPI, and EPI can be understood as a horizontal or vertical two-dimensional slice of 4D light field data, as shown in Figure 1c.
Light field cameras contain multi-angle information and there is a linear relationship between the change of view and the projection coordinates on the EPI plane. The rate of change depends on the depth of the projected scene point, which is also called disparity. This feature makes EPI present a unique structure and ensures that a projection point is a straight line in EPI.

The Principle of 3D Reconstruction of Light Field Based on Phase Similarity
Our proposed 3D reconstruction method of light field based on phase similarity is mainly divided into two parts: the calibration part and the disparity calculation part, as shown in Figure 2.
The calibration part is to establish a linear relationship between disparity and depth. The linear relationship between disparity and depth can be deduced by analyzing the light field imaging model. Depth is obtained by Zhang's calibration method by capturing the chessboard from different positions [22], and the disparity is obtained by using the EPI method to calculate the information of chessboard feature points. Finally, the Levenberg-Marquardt algorithm [23] is used to improve the reliability of the results.
The disparity calculation part is to uses the principle that phase presents similarity The light field L (s, t, x, y) can be simply understood as a function of the light space. In particular, when the light space is limited to a two-dimensional plane, the light field can be represented as L t*, y* = L (s, t*, x, y*). Similarly, other restrictions can be defined in the same way. For example, L s*, t* is a sub-aperture image of a particular angle in the light field image. Figure 1b shows 7 × 7 sub-aperture images extracted from the 4D light field data, in which the size of each sub-aperture image is 512 pixels × 512 pixels. L s*, x*, and L t*, y* are referred to as EPI, and EPI can be understood as a horizontal or vertical two-dimensional slice of 4D light field data, as shown in Figure 1c.
Light field cameras contain multi-angle information and there is a linear relationship between the change of view and the projection coordinates on the EPI plane. The rate of change depends on the depth of the projected scene point, which is also called disparity. This feature makes EPI present a unique structure and ensures that a projection point is a straight line in EPI.

The Principle of 3D Reconstruction of Light Field Based on Phase Similarity
Our proposed 3D reconstruction method of light field based on phase similarity is mainly divided into two parts: the calibration part and the disparity calculation part, as shown in Figure 2.
The calibration part is to establish a linear relationship between disparity and depth. The linear relationship between disparity and depth can be deduced by analyzing the light field imaging model. Depth is obtained by Zhang's calibration method by capturing the chessboard from different positions [22], and the disparity is obtained by using the EPI method to calculate the information of chessboard feature points. Finally, the Levenberg-Marquardt algorithm [23] is used to improve the reliability of the results.
will be fitted into a straight line. The slope of the line corresponds to the disparity. T disparity maps of the scene can be accurately acquired from the entire light field data i similar way.
Based on the above, 3D reconstruction of the light field can be realized finally. It be realized by combining the accurate disparity map based on phase similarity with linear relationship deduced from the light field imaging model.

Calibration
In the traditional pinhole camera model, it is known by the Gauss imaging theore where u represents the distance between the scene point P0 and the main lens, v represe the distance between the imaging plane and the main lens, and f is the focal length of main lens. The traditional camera calibration is to calculate the corresponding relationship tween the world coordinate and the imaging coordinate. Here, the calibration method the light field camera is to obtain the mapping relationship between the disparity a depth. Figure 3 shows the light field imaging model [24], and the relationship between sc points P (x c , z c ) and image points in MLA plan can be represented as: where s represents the distance from the main lens sub-aperture to the optical center and x m is the distance from the MLA's center to the position where the light passes throu the MLA plane. The disparity calculation part is to uses the principle that phase presents similarity in EPI. Firstly, Sine fringes with different phase values are projected onto the surface, the multi-frequency heterodyne phase method is used to unwrap the phase-coded scenes. And then, the pixels of the middle row or column are selected as the target points in the EPI of the structured light field. The point with the highest similarity to the phase information of the target pixel is searched in other rows (or columns) and then those points will be fitted into a straight line. The slope of the line corresponds to the disparity. The disparity maps of the scene can be accurately acquired from the entire light field data in a similar way.
Based on the above, 3D reconstruction of the light field can be realized finally. It can be realized by combining the accurate disparity map based on phase similarity with the linear relationship deduced from the light field imaging model.

Calibration
In the traditional pinhole camera model, it is known by the Gauss imaging theorem: where u represents the distance between the scene point P 0 and the main lens, v represents the distance between the imaging plane and the main lens, and f is the focal length of the main lens. The traditional camera calibration is to calculate the corresponding relationship between the world coordinate and the imaging coordinate. Here, the calibration method of the light field camera is to obtain the mapping relationship between the disparity and depth. Figure 3 shows the light field imaging model [24], and the relationship between scene points P (x c , z c ) and image points in MLA plan can be represented as: where s represents the distance from the main lens sub-aperture to the optical center O c , and x m is the distance from the MLA's center to the position where the light passes through the MLA plane. The principle of sub-aperture imaging is shown in Figure 4. The hexagonal m pixels in the light field raw data are shown in Figure 4a. Figure 4b shows the princip sub-aperture image extraction in which the pixels in different macro pixels are arran in order. Different from traditional images, the light field raw data consists of hexag macro pixels, and each macro pixel corresponds to an area in the image sensor th covered by a micro lens. Sub-aperture images are extracted in a certain sequence each macro pixel of the light field raw data. In general, (m, n) is used to represent the aperture area on the main lens, and the distance D between the two adjacent sub-ape areas can be expressed as: where q is the size of the pixels on the image sensor, and b is the distance between MLA and the image sensor. The relationship between s and the index of pixels m ca expressed as: (5) represents two adjacent sub-apertures, then Equation (2) is rewritten as E tion (6): Subtracting the top equation from the bottom one in Equation (6)   The principle of sub-aperture imaging is shown in Figure 4. The hexagonal macro pixels in the light field raw data are shown in Figure 4a. Figure 4b shows the principle of sub-aperture image extraction in which the pixels in different macro pixels are arranged in order. Different from traditional images, the light field raw data consists of hexagonal macro pixels, and each macro pixel corresponds to an area in the image sensor that is covered by a micro lens. Sub-aperture images are extracted in a certain sequence from each macro pixel of the light field raw data. In general, (m, n) is used to represent the subaperture area on the main lens, and the distance D between the two adjacent sub-aperture areas can be expressed as: where q is the size of the pixels on the image sensor, and b is the distance between the MLA and the image sensor. The relationship between s and the index of pixels m can be expressed as: Equation (5) represents two adjacent sub-apertures, then Equation (2) is rewritten as Equation (6): Subtracting the top equation from the bottom one in Equation (6) yields the expression: where d is the distance between the center of the adjacent micro-lens. ∆x represents the disparity values between the two adjacent sub-aperture views of scene point P, and it can be calculated from EPI. It should be noted that ∆x is independent of l, which means the light field disparity values are the same in any two adjacent sub-aperture images. It describes Sensors 2021, 21, 7734 6 of 12 a linear relationship between the reciprocal of scene depth 1/z c and disparity values ∆x from Equation (8). According to the disparity calculation relationship, Equation (7) can be rewritten as Equation (9).
Therefore, the central sub-aperture image captured by the light field camera is equivalent to the image captured by the traditional camera, so the main lens parameters of the light field camera can be calibrated by taking a chessboard with different positions based on Zhang's calibration method.
, where d is the distance between the center of the adjacent micro-lens. Δx represents the disparity values between the two adjacent sub-aperture views of scene point P, and it can be calculated from EPI. It should be noted that Δx is independent of l, which means the light field disparity values are the same in any two adjacent sub-aperture images. It describes a linear relationship between the reciprocal of scene depth 1/z c and disparity values Δx from Equation (8). According to the disparity calculation relationship, Equation (7) can be rewritten as Equation (9).
Therefore, the central sub-aperture image captured by the light field camera is equivalent to the image captured by the traditional camera, so the main lens parameters of the light field camera can be calibrated by taking a chessboard with different positions based on Zhang's calibration method. Finally, a non-linear optimization algorithm named Levenberg-Marquardt is used to minimize the optimization of the obtained linear equation. It is insensitive to over-parameterized and can effectively deal with redundant parameters. Therefore, LM minimization can be used to optimize the fitted linear equation, which can effectively improve the reliability of the linear relationship.
The system calibration in the structured light method is eliminated, and there is no need to conduct secondary calibration for the projector. Instead, the relationship between disparity and depth can be obtained through camera calibration, which can quickly realize 3D reconstruction.

Disparity Calculation
In the EPI calculation of the light field, each pixel is obtained by calculating the image gradient and structure tensor, and then the line slope corresponds to the disparity of each pixel in the sub-aperture image. However, when the measured object encounters texture loss or similar regions, the line structure in the EPI may no longer be clear, which makes it difficult to directly calculate the slope of the line for each pixel in the scene.
The light field sub-aperture image can be regarded as viewing objects from different angles [25]. According to the phase measurement profilometry (PMP), the phase values Finally, a non-linear optimization algorithm named Levenberg-Marquardt is used to minimize the optimization of the obtained linear equation. It is insensitive to overparameterized and can effectively deal with redundant parameters. Therefore, LM minimization can be used to optimize the fitted linear equation, which can effectively improve the reliability of the linear relationship.
The system calibration in the structured light method is eliminated, and there is no need to conduct secondary calibration for the projector. Instead, the relationship between disparity and depth can be obtained through camera calibration, which can quickly realize 3D reconstruction.

Disparity Calculation
In the EPI calculation of the light field, each pixel is obtained by calculating the image gradient and structure tensor, and then the line slope corresponds to the disparity of each pixel in the sub-aperture image. However, when the measured object encounters texture loss or similar regions, the line structure in the EPI may no longer be clear, which makes it difficult to directly calculate the slope of the line for each pixel in the scene.
The light field sub-aperture image can be regarded as viewing objects from different angles [25]. According to the phase measurement profilometry (PMP), the phase values of the same target point modulated in different angles should be the same, which makes the phase information present similarity in the corresponding straight-line direction of EPI. Therefore, the EPI method based on phase information can replace the structural tensor method.
For example, the positions of points in other rows or columns should be recorded to determine the slope of a single pixel (s * , x * ) in an EPI image (s, x) of a central-view structured light field, and those points have the highest similarity from the phase value of the target point in the light field slice. The slope of the line at the points (s * , x * ) can be obtained by linear fitting of these points, as shown in Figure 5. Each point can similarly get the corresponding slope, and then the disparity map can be obtained. The detailed implementation steps are as follows: Step 1: Project sinusoidal fringes onto the object surface. The phase information is combined with the shifted phase method to encode the object.
Step 2: Acquisition of fringe images. The frequency of the sinusoidal fringe should be selected appropriately due to the limitation of the resolution and frame rate of the light field camera, which is used to capture the sinusoidal fringes.
Step 3: Phase unwrapping. The multifrequency heterodyne method is used to phase unwrap 4D light field data.
Step 4: Searching and recording the highest phase similarity. In the light field, EPI is based on phase information, and those point positions which have the highest phase value similarity to the target point are recorded in other rows (or columns).
Step 5: Fitting straight line. According to the position index of these points, the linear fitting method is used to fit them into a straight line.
Step 6: Disparity calculation based on phase similarity. The line slope is calculated and then the disparity can be obtained by taking the inverse of the slope. The accurate disparity map can be acquired by traversing the entire light field data based on phase information encoding.
the phase information present similarity in the corresponding straight-line direction of EPI. Therefore, the EPI method based on phase information can replace the structural tensor method.
For example, the positions of points in other rows or columns should be recorded to determine the slope of a single pixel (s * , x * ) in an EPI image (s, x) of a central-view structured light field, and those points have the highest similarity from the phase value of the target point in the light field slice. The slope of the line at the points (s * , x * ) can be obtained by linear fitting of these points, as shown in Figure 5. Each point can similarly get the corresponding slope, and then the disparity map can be obtained. The detailed implementation steps are as follows: Step 1: Project sinusoidal fringes onto the object surface. The phase information is combined with the shifted phase method to encode the object.
Step 2: Acquisition of fringe images. The frequency of the sinusoidal fringe should be selected appropriately due to the limitation of the resolution and frame rate of the light field camera, which is used to capture the sinusoidal fringes.
Step 3: Phase unwrapping. The multifrequency heterodyne method is used to phase unwrap 4D light field data.
Step 4: Searching and recording the highest phase similarity. In the light field, EPI is based on phase information, and those point positions which have the highest phase value similarity to the target point are recorded in other rows (or columns).
Step 5: Fitting straight line. According to the position index of these points, the linear fitting method is used to fit them into a straight line.
Step 6: Disparity calculation based on phase similarity. The line slope is calculated and then the disparity can be obtained by taking the inverse of the slope. The accurate disparity map can be acquired by traversing the entire light field data based on phase information encoding. Our proposed method can effectively improve the complexity of the operation. It is worth noting that although the calibration part of the system is saved, it is necessary to filter and denoise the obtained disparity map. The opening operation in morphological processing is composed of erode and dilate operation, and it can effectively eliminate isolated small noise points in the image and smooth the object edges without changing the object shape [26]. Here, the morphological operation is used to process the disparity information to eliminate the small noise points and small black areas separated from the object. The clear edge contour of the object is retained to effectively denoise, which provides a guarantee to realize the 3D reconstruction of the light field. Our proposed method can effectively improve the complexity of the operation. It is worth noting that although the calibration part of the system is saved, it is necessary to filter and denoise the obtained disparity map. The opening operation in morphological processing is composed of erode and dilate operation, and it can effectively eliminate isolated small noise points in the image and smooth the object edges without changing the object shape [26]. Here, the morphological operation is used to process the disparity information to eliminate the small noise points and small black areas separated from the object. The clear edge contour of the object is retained to effectively denoise, which provides a guarantee to realize the 3D reconstruction of the light field.

Hardware Implementation
According to the principle of light field imaging and EPI, we have designed and built a 3D reconstruction system based on a structured light field, and it consisted of a light field camera Lytro Illum (San Francisco, CA, USA) and Light Crafter 4500, as shown in Figure 6. The light field camera had an angular resolution of 15 × 15 and the spatial resolution of 625 pixels × 433 pixels; Light Crafter 4500 was made by Texas Instruments with 1140 pixels × 912 pixels resolution. Computer configuration included Windows 10 (64 bit); Intel (R) Core (TM) i9-9900 K CPU @ 3.60 GHz. LFToolbox designed by Stanford was exploited to decode 4D light field data L (s, t, x, y) and obtain the multi-view information of the light field [27]. a 3D reconstruction system based on a structured light field, and it consisted of a light field camera Lytro Illum (San Francisco, CA. USA) and Light Crafter 4500, as shown in Figure 6. The light field camera had an angular resolution of 15 × 15 and the spatial resolution of 625 pixels × 433 pixels; Light Crafter 4500 was made by Texas Instruments with 1140 pixels × 912 pixels resolution. Computer configuration included Windows 10 (64 bit); Intel (R) Core (TM) i9-9900 K CPU @ 3.60 GHz. LFToolbox designed by Stanford was exploited to decode 4D light field data L (s, t, x, y) and obtain the multi-view information of the light field [27].

Experimental Results
In our experiment, the 3D reconstruction system based on the light field had been shown in Figure 6. The purpose of the calibration was to fit the mapping relationship between disparity and depth according to the disparity and depth values at feature points of the chessboard. The depth information of each checkerboard corner is obtained based on Zhang's calibration method by taking different positions of the checkerboard, and then the disparity values of the checkerboard corner are calculated based on the EPI principle. Nine kinds of postures were collected from a chessboard with a corner size of 24.5 × 24.5 mm and a corner number of 9 × 7. The light field camera and the posture distribution of the chessboard were shown in Figure 7. The data showed that the center distance d of two adjacent micro-lenses was 0.01732 mm and the size q of a single pixel on the image sensor was 0.0014 mm. To achieve the purpose of calibration, EPI could be obtained in the decoded light field data, the disparity and depth values at each feature point were calculated, and the mapping relationship between disparity and depth was fitted as shown in Figure 8. The relationship between disparity and depth was obtained as follows:

Experimental Results
In our experiment, the 3D reconstruction system based on the light field had been shown in Figure 6. The purpose of the calibration was to fit the mapping relationship between disparity and depth according to the disparity and depth values at feature points of the chessboard. The depth information of each checkerboard corner is obtained based on Zhang's calibration method by taking different positions of the checkerboard, and then the disparity values of the checkerboard corner are calculated based on the EPI principle. Nine kinds of postures were collected from a chessboard with a corner size of 24.5 × 24.5 mm and a corner number of 9 × 7. The light field camera and the posture distribution of the chessboard were shown in Figure 7. The data showed that the center distance d of two adjacent micro-lenses was 0.01732 mm and the size q of a single pixel on the image sensor was 0.0014 mm. To achieve the purpose of calibration, EPI could be obtained in the decoded light field data, the disparity and depth values at each feature point were calculated, and the mapping relationship between disparity and depth was fitted as shown in Figure 8. The relationship between disparity and depth was obtained as follows:   Due to the resolution and frame rate of the light field camera, fringes with frequencies of 15, 12, and 10 were suitable and used in our experiment. We removed the calibration work for the projector to make the experimental procedure more flexible. The horizontal and vertical fringes were used to different objects separately to prove the applicability of the proposed method, and the experimental results were shown in Figure 9. The sine fringes were projected onto the object surface and the images modulated by the surface information captured by the light field camera were shown in Figure 9a,e. Subsequently, a four-step shifted phase method was used to calculate the captured structured light field data, and the resulting wrap phase images were shown in Figure 9b,f. The multi-frequency heterodyne method was used to unwrap the phase in Figure 9c,g. On the basis of the structured light field, EPI technology was used to process light field data, and the phase-similarity-based method was used instead of the structure tensor to calculate the slope of a straight line in EPI. Phase encoding ensured that there was a unique determination relationship between the phase value and the pixel in the scene, the morphological processing method was used to denoise and optimize the disparity map to improve its accuracy. After that, the disparity maps were obtained and shown in Figure 9d,h by using the proposed method based on phase similarity. The depth information of the measured Due to the resolution and frame rate of the light field camera, fringes with frequencies of 15, 12, and 10 were suitable and used in our experiment. We removed the calibration work for the projector to make the experimental procedure more flexible. The horizontal and vertical fringes were used to different objects separately to prove the applicability of the proposed method, and the experimental results were shown in Figure 9. The sine fringes were projected onto the object surface and the images modulated by the surface information captured by the light field camera were shown in Figure 9a,e. Subsequently, a four-step shifted phase method was used to calculate the captured structured light field data, and the resulting wrap phase images were shown in Figure 9b,f. The multi-frequency heterodyne method was used to unwrap the phase in Figure 9c,g. On the basis of the structured light field, EPI technology was used to process light field data, and the phase-similarity-based method was used instead of the structure tensor to calculate the slope of a straight line in EPI. Phase encoding ensured that there was a unique determination relationship between the phase value and the pixel in the scene, the morphological processing method was used to denoise and optimize the disparity map to improve its accuracy. After that, the disparity maps were obtained and shown in Figure 9d,h by using the proposed method based on phase similarity. The depth information of the measured object could be obtained by substituting the accurate disparity map based on phase similarity into the calibrated linear equation.
In our experiments, two different objects were used as the measured objects, and the reconstruction results were shown in Figure 10. It was obvious that the traditional EPI method produced aliasing and noise phenomenon in areas with similar textures, which led to large anomalies as shown in Figure 10a,c. Figure 10b,d showed the 3D reconstruction results of our proposed method. It is obvious from the results that the reconstruction result of our method can not only maintain the integrity but also have a smooth surface and without unnecessary noise or burr, which reflected the high accuracy of our method. In addition, the standard deviation of the reconstruction results was 0.3179 mm and 0.3865 mm compared with the ground truth of the measured objects, respectively. The comparative analysis results are presented in Table 1. Therefore, the experimental results demonstrated the feasibility and reliability of our proposed method.
Sensors 2021, 21, x FOR PEER REVIEW 10 of 12 object could be obtained by substituting the accurate disparity map based on phase similarity into the calibrated linear equation. In our experiments, two different objects were used as the measured objects, and the reconstruction results were shown in Figure 10. It was obvious that the traditional EPI method produced aliasing and noise phenomenon in areas with similar textures, which led to large anomalies as shown in Figure 10a,c. Figure 10b,d showed the 3D reconstruction results of our proposed method. It is obvious from the results that the reconstruction result of our method can not only maintain the integrity but also have a smooth surface and without unnecessary noise or burr, which reflected the high accuracy of our method. In addition, the standard deviation of the reconstruction results was 0.3179 mm and 0.3865 mm compared with the ground truth of the measured objects, respectively. The comparative analysis results are presented in Table 1. Therefore, the experimental results demonstrated the feasibility and reliability of our proposed method.   In our experiments, two different objects were used as the measured objects, and the reconstruction results were shown in Figure 10. It was obvious that the traditional EPI method produced aliasing and noise phenomenon in areas with similar textures, which led to large anomalies as shown in Figure 10a,c. Figure 10b,d showed the 3D reconstruction results of our proposed method. It is obvious from the results that the reconstruction result of our method can not only maintain the integrity but also have a smooth surface and without unnecessary noise or burr, which reflected the high accuracy of our method. In addition, the standard deviation of the reconstruction results was 0.3179 mm and 0.3865 mm compared with the ground truth of the measured objects, respectively. The comparative analysis results are presented in Table 1. Therefore, the experimental results demonstrated the feasibility and reliability of our proposed method.

Discussion and Conclusions
In this paper, a novel 3D reconstruction method of light field based on phase similarity is proposed to increase the accuracy of depth estimation and the applicability of EPI. In the calibration part, system calibration has been removed, and only the light field camera needs to be calibrated, which improves the simplicity of the operation. The linear relationship between disparity and depth can be obtained by acquiring the information of the feature points of the chessboard based on the imaging model and calibration method by using the light field camera. The calibration results and the relationship had been verified each other, and it not only can reflect the mapping relationship between disparity and depth in the scene but also deduces some parameter information of the light field camera, such as the distance v, the distance b between MLA and sensor, according to the formula deduced from the calibration. Then, the object is encoded by sinusoidal fringes, and accurate disparity maps can be calculated from scene phase information. Morphological processing is used for optimization and denoising subsequently. Therefore, it is obvious that 3D reconstruction of light field based on phase similarity can be realized by combining the calibrated linear relation and the accurate disparity map. Due to the uniqueness and accuracy of the phase information, it is obvious that the phase similarity in our method is more accurate than the structure tensor in the traditional EPI method. However, although this method has good performance in the situation of a single texture or similar texture scene. For some objects with complex morphology or small size, such as portrait sculptures, small workpieces, and so on, it is difficult to achieve accurate reconstruction due to the small difference between the modulated phase information and the original phase value. Hence, future work needs to focus on improving the resolution of the light field camera so that it can obtain more subtle phase changes to improve the applicability of the algorithm.  Data Availability Statement: Data available on request due to restrictions e.g., privacy or ethical. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.