Figure 1.
An illustration of the different components of the proposed method. First, the phase space representation of the input hologram is computed. Then, a neural network is used to predict phase space masks, which will be back-projected to extract regions of interest. Finally, the proposed refinement algorithm is used to extract the final depth estimate.
Figure 1.
An illustration of the different components of the proposed method. First, the phase space representation of the input hologram is computed. Then, a neural network is used to predict phase space masks, which will be back-projected to extract regions of interest. Finally, the proposed refinement algorithm is used to extract the final depth estimate.
Figure 2.
Illustration of the two different cases that can be observed during the ROI extraction. The first case, shown at the top of the figure with a scene composed of 3 segments, is a perfect mapping that produces diamond shapes that encapsulate the scene points. The second case, shown at the bottom of the figure, is a more common mapping case of larger scene segments where diamonds are hidden due to lack of space between scene points. In both of these cases, the iterative method produces relevant results.
Figure 2.
Illustration of the two different cases that can be observed during the ROI extraction. The first case, shown at the top of the figure with a scene composed of 3 segments, is a perfect mapping that produces diamond shapes that encapsulate the scene points. The second case, shown at the bottom of the figure, is a more common mapping case of larger scene segments where diamonds are hidden due to lack of space between scene points. In both of these cases, the iterative method produces relevant results.
Figure 3.
Illustration of the ROI refinement method. The initial ground truth 2 d scene is composed of a curved line, and its phase space representation is a band with a very large thickness and a slight slope. The region of interest colored in blue completely encompasses the scene points; however, it does not represent the minimal set of points that generate the same representation as the scene points. From all possible lines that are fully embedded inside the ROI, the red line is selected as the optimal candidate line that produces the same phase space representation as the scene points and will represent the final depth estimate of the scene in 1D.
Figure 3.
Illustration of the ROI refinement method. The initial ground truth 2 d scene is composed of a curved line, and its phase space representation is a band with a very large thickness and a slight slope. The region of interest colored in blue completely encompasses the scene points; however, it does not represent the minimal set of points that generate the same representation as the scene points. From all possible lines that are fully embedded inside the ROI, the red line is selected as the optimal candidate line that produces the same phase space representation as the scene points and will represent the final depth estimate of the scene in 1D.
Figure 4.
An illustrative example that motivates the use of a buffer in the proposed algorithm. Given the input phase space representation, the ROIs colored blue are extracted. The extracted regions of interest encapsulate all the points of the scene colored in red and are completely disjoint in 2D space. However, their representation in the phase space is not disjoint. A first candidate line colored in green which is fully embedded in one of the regions of interest and contributes the most to the input mask is selected. Since no buffer is used, the contribution of the selected line is directly removed from the inputted phase space representation, and the resulting representation is mapped again in the 2D space. The resulting ROI encapsulates only two objects, and no ROI is associated with the third object which shares similar points in the phase space as the candidate line.
Figure 4.
An illustrative example that motivates the use of a buffer in the proposed algorithm. Given the input phase space representation, the ROIs colored blue are extracted. The extracted regions of interest encapsulate all the points of the scene colored in red and are completely disjoint in 2D space. However, their representation in the phase space is not disjoint. A first candidate line colored in green which is fully embedded in one of the regions of interest and contributes the most to the input mask is selected. Since no buffer is used, the contribution of the selected line is directly removed from the inputted phase space representation, and the resulting representation is mapped again in the 2D space. The resulting ROI encapsulates only two objects, and no ROI is associated with the third object which shares similar points in the phase space as the candidate line.
Figure 5.
Result of the application of the iterative method for the decomposition of phase space into sets of straight lines. The proposed algorithm provides a minimal set of lines that when mapped in the phase space produce the same representation as the input phase space representation. Each 2D line is represented in phase space by a band with a slope and a thickness which are given by the depth and the length of the 2D line segment. Even though 2D lines are spatially disjoint in 2D space, their representation in phase space can intersect.
Figure 5.
Result of the application of the iterative method for the decomposition of phase space into sets of straight lines. The proposed algorithm provides a minimal set of lines that when mapped in the phase space produce the same representation as the input phase space representation. Each 2D line is represented in phase space by a band with a slope and a thickness which are given by the depth and the length of the 2D line segment. Even though 2D lines are spatially disjoint in 2D space, their representation in phase space can intersect.
Figure 6.
Illustration of the geometric and visual interpretation of the 4D phase space transform of a 2D hologram. The 4D phase space transform represents propagation value along rays that have an orientation determined by and and a position determined by x and y. By fixing the values for a position–frequency pair , the 2D matrix can be interpreted as the phase space transform of the 1D hologram produced by the intersection points between the oriented 2D plane (whose orientation and position are determined by the selected values of and x, respectively) and the 3D scene.
Figure 6.
Illustration of the geometric and visual interpretation of the 4D phase space transform of a 2D hologram. The 4D phase space transform represents propagation value along rays that have an orientation determined by and and a position determined by x and y. By fixing the values for a position–frequency pair , the 2D matrix can be interpreted as the phase space transform of the 1D hologram produced by the intersection points between the oriented 2D plane (whose orientation and position are determined by the selected values of and x, respectively) and the 3D scene.
Figure 7.
An illustrative example that shows the initially extracted ROI, the refined ROI, and the final estimate produced by the DFF method using a fixed-size ROI of size 31 centered on the refined ROIs at different viewing angles. The extracted ROI can be used to pre-localize the scene in the 3D space. They thus define an appropriate reconstruction interval and contain the pixels on which the focus measurement applies in the reconstruction volume.
Figure 7.
An illustrative example that shows the initially extracted ROI, the refined ROI, and the final estimate produced by the DFF method using a fixed-size ROI of size 31 centered on the refined ROIs at different viewing angles. The extracted ROI can be used to pre-localize the scene in the 3D space. They thus define an appropriate reconstruction interval and contain the pixels on which the focus measurement applies in the reconstruction volume.
Figure 8.
An illustrative example of the importance of color blending. The segmentation masks made for each color channel contain small defects, which lead to a bad extraction of the regions of interest because the points that are defined on the segmentation defects will not be selected. By unifying the results produced by the different color channels, imperfections related to segmentation performance can be bridged, and a greater number of points are included in the extracted regions of interest.
Figure 8.
An illustrative example of the importance of color blending. The segmentation masks made for each color channel contain small defects, which lead to a bad extraction of the regions of interest because the points that are defined on the segmentation defects will not be selected. By unifying the results produced by the different color channels, imperfections related to segmentation performance can be bridged, and a greater number of points are included in the extracted regions of interest.
Figure 9.
An illustrative figure of the three experimental configurations: (a) the depth value is searched over the whole space; (b) regions of interest are used to delimit the search interval; (c) the regions of interest are refined, and a fixed size interval of size + 1 centered on the predicted optimal representation line is used as the search interval.
Figure 9.
An illustrative figure of the three experimental configurations: (a) the depth value is searched over the whole space; (b) regions of interest are used to delimit the search interval; (c) the regions of interest are refined, and a fixed size interval of size + 1 centered on the predicted optimal representation line is used as the search interval.
Figure 10.
Norm with respect to the selected size of the new ROI region around the refined ROI used in the DFF method for the five different scenes. The optimal representation lines computed from the extracted regions of interest using the iterative method are relatively close to the scene points but lack curvature which limits the overall performance of the algorithm. By using the DFF method on a fixed size interval centered on computed lines, the curvature can be recovered, resulting in a finer depth estimate closer to the scene points and therefore a low norm.
Figure 10.
Norm with respect to the selected size of the new ROI region around the refined ROI used in the DFF method for the five different scenes. The optimal representation lines computed from the extracted regions of interest using the iterative method are relatively close to the scene points but lack curvature which limits the overall performance of the algorithm. By using the DFF method on a fixed size interval centered on computed lines, the curvature can be recovered, resulting in a finer depth estimate closer to the scene points and therefore a low norm.
Figure 11.
Result of the application of the iterative method on 2D holograms.
Figure 11.
Result of the application of the iterative method on 2D holograms.
Figure 12.
An illustrative figure of the trade-off between the choice of the best interval size and the accuracy of the focus measure on a 1D slice of the Cars scene. The initial estimate of the optimal representation lines is relatively close to the scene points, with exception of some regions located at the bottom of the scene where the distance between the estimated points and the ground truth is quite large. With a very small value of , the distance of the badly estimated points will not be strongly reduced; only the points close to the initial estimate can be well-refined. With a large value of , the badly estimated points start to be close to the ground truth value. However, since the focus measure is not sensitive enough to the focus change, the initial correctly estimated points diverge strongly from the initial estimate, producing a bad final result.
Figure 12.
An illustrative figure of the trade-off between the choice of the best interval size and the accuracy of the focus measure on a 1D slice of the Cars scene. The initial estimate of the optimal representation lines is relatively close to the scene points, with exception of some regions located at the bottom of the scene where the distance between the estimated points and the ground truth is quite large. With a very small value of , the distance of the badly estimated points will not be strongly reduced; only the points close to the initial estimate can be well-refined. With a large value of , the badly estimated points start to be close to the ground truth value. However, since the focus measure is not sensitive enough to the focus change, the initial correctly estimated points diverge strongly from the initial estimate, producing a bad final result.
Table 1.
Scene object sizes along the X/Y/Z, in centimeters.
Table 1.
Scene object sizes along the X/Y/Z, in centimeters.
| Object 1 | Object 2 | Object 3 | Object 4 | Object 5 |
---|
Piano | 0.044/0.054/0.052 | 0.023/0.017/0.01 | | | |
Table | 0.039/0.026/0.038 | 0.018/0.037/0.016 | 0.016/0.037/0.018 | 0.018/0.037/0.016 | 0.016/0.037/0.018 |
Woods | 0.03/0.045/0.03 | 0.021/0.046/0.023 | 0.034/0.032/0.028 | 0.025/0.042/0.025 | 0.031/0.042/0.022 |
Dice | 0.02/0.017/0.021 | 0.028/0.029/0.034 | 0.023/0.025/0.028 | 0.0512/0.0512 | |
Cars | 0.085/0.014/0.052 | 0.082/0.016/0.052 | 0.049/0.025/0.082 | 0.046/0.03/0.073 | |
Table 2.
Segmentation metric for the test (piano, table, woods) and validation set (cars, dice).
Table 2.
Segmentation metric for the test (piano, table, woods) and validation set (cars, dice).
| Piano | Table | Woods | Cars | Dice |
---|
Dice | 0.97 | 0.97 | 0.98 | 0.94 | 0.94 |
Jaccard | 0.94 | 0.94 | 0.96 | 0.89 | 0.88 |
Table 3.
Abbreviations of focus measure operators used in the experiments.
Table 3.
Abbreviations of focus measure operators used in the experiments.
Focus Operator | Abbr. | Focus Operator | Abbr. |
---|
Variance of Laplacian [18] | LAPV | Image contrast [19] | CONT |
Variance of wavelet coefficients [20] | WAVV | Ratio of the wavelet coefficients [20] | WAVR |
Gray level variance [21] | GLVA | Normalized Gray level variance [22] | GLVN |
Table 4.
The obtained norm for the test (piano, table, woods) and validation set (cars, dice).
Table 4.
The obtained norm for the test (piano, table, woods) and validation set (cars, dice).
| Piano | Table | Woods | Cars | Dice |
---|
| Without ROI |
LAPV | 28.80 | 42.61 | 71.57 | 85.65 | 68.34 |
WAVV | 18.27 | 43.09 | 12.28 | 34.87 | 17.72 |
WAVR | 14.99 | 32.84 | 15.89 | 31.96 | 30.03 |
GLVN | 14.87 | 38.54 | 15.97 | 34.82 | 26.65 |
GLVA | 14.30 | 39.26 | 12.09 | 33.06 | 18.53 |
CONT | 22.67 | 54.47 | 42.88 | 59.94 | 74.01 |
| With ROI |
LAPV | 26.22 | 38.67 | 29.50 | 83.80 | 15.82 |
WAVV | 11.36 | 23.78 | 25.37 | 31.97 | 23.61 |
WAVR | 8.63 | 23.68 | 25.04 | 46.17 | 14.73 |
GLVN | 6.84 | 20.86 | 13.55 | 32.42 | 9.72 |
GLVA | 11.26 | 23.00 | 28.62 | 41.23 | 22.24 |
CONT | 24.86 | 35.46 | 36.28 | 65.55 | 29.56 |
| ROI Refinement |
11 | 9.28 | 11.04 | 10.42 | 25.58 | 5.31 |
21 | 7.44 | 10.22 | 8.65 | 24.84 | 4.05 |
31 | 6.33 | 10.03 | 7.53 | 24.28 | 3.8 |
41 | 5.65 | 10.32 | 6.97 | 24.18 | 3.93 |
51 | 4.94 | 10.79 | 6.90 | 24.03 | 4.37 |
61 | 4.86 | 11.08 | 6.76 | 24.13 | 4.98 |
71 | 4.73 | 11.48 | 6.68 | 22.87 | 5.83 |
81 | 4.81 | 12.10 | 6.57 | 21.72 | 6.68 |
91 | 4.88 | 13.00 | 6.38 | 20.73 | 7.54 |
Table 5.
Calculation time in seconds to calculate the final depth map with and without ROI.
Table 5.
Calculation time in seconds to calculate the final depth map with and without ROI.
| Piano | Table | Woods | Cars | Dice |
---|
| Without ROI |
LAPV | 3.89 | 4.10 | 4.92 | 5.91 | 4.97 |
WAVV | 11.80 | 13.00 | 14.29 | 17.84 | 14.35 |
WAVR | 6.87 | 6.88 | 7.78 | 9.20 | 7.28 |
GLVN | 4.65 | 4.82 | 5.31 | 6.42 | 5.35 |
GLVA | 3.64 | 4.19 | 4.70 | 5.75 | 4.50 |
CONT | 2.90 | 3.16 | 3.46 | 4.30 | 3.51 |
| With ROI |
LAPV | 1.18 | 1.17 | 1.46 | 2.98 | 1.23 |
WAVV | 2.58 | 2.38 | 3.39 | 8.64 | 2.54 |
WAVR | 1.37 | 1.29 | 1.82 | 3.85 | 1.35 |
GLVN | 1.26 | 1.31 | 1.62 | 3.09 | 1.43 |
GLVA | 1.12 | 1.10 | 1.44 | 2.71 | 1.19 |
CONT | 0.87 | 0.87 | 1.09 | 2.05 | 0.92 |