A Review of Depth and Normal Fusion Algorithms

Geometric surface information such as depth maps and surface normals can be acquired by various methods such as stereo light fields, shape from shading and photometric stereo techniques. We compare several algorithms which deal with the combination of depth with surface normal information in order to reconstruct a refined depth map. The reasons for performance differences are examined from the perspective of alternative formulations of surface normals for depth reconstruction. We review and analyze methods in a systematic way. Based on our findings, we introduce a new generalized fusion method, which is formulated as a least squares problem and outperforms previous methods in the depth error domain by introducing a novel normal weighting that performs closer to the geodesic distance measure. Furthermore, a novel method is introduced based on Total Generalized Variation (TGV) which further outperforms previous approaches in terms of the geodesic normal distance error and maintains comparable quality in the depth error domain.


Introduction
Measuring the depth of a scene accurately is essential for many tasks including applications in industrial environments, object recognition and security assurance. Usually the depth is measured by stereo cameras, structure from motion (SfM), time of flight (ToF) sensors, or light field cameras. These methods show accurate absolute depth maps but lack detail in high frequency depth structures. The reason for this lies in the dependency on the presence of structural information in the image as well as in the analysis routine, which is usually done by hypothesis testing and therefore limited in range and step sizes. Stereo matching methods include correlation based techniques [1], semi global matching [2,3], block based matching [4,5] and stereo matching for micro array cameras [6]. Several methods were introduced to retrieve depth information from light field data using epipolar image slope analysis [7], structure tensors [8], fine-to-coarse approaches [9,10] and by line consistency metrics [11]. Structure from motion and time of flight techniques were presented in [12][13][14], respectively. In contrast to depth based approaches, methods that recover surface normals, such as photometric stereo [15], show high frequency details but lack an absolute depth reference. Shape from shading was used to retrieve surface normals by [16]. A robust normal reconstruction using photometric stereo information with a Markov Random Field (MRF) was introduced in [17]. Previous methods have been presented, which retrieve surface normals from a calibrated stereo setup. This can be achieved e.g., by estimating the homography between two matched patches [18,19] or by using the affine transform data between two projections [20] additionally to the reconstruction of a sparse depth map retrieved from a stereo correspondence analysis. A learning-based method using a tandem of convolutional neural networks to estimate depth and surface normals from image data simultaneously was introduced in [21]. Combining depth and surface normal data allows precise depth reconstructions for low-as well as high-frequency components in the depth map.
Depth maps and surface normals were previously combined in various ways. Shape from shading was used under general illumination in [22], photometric stereo normals were incorporated in [23][24][25]. Another approach was presented in [26], where the tangent plane of the given normals was projected into the measured normal field. This normal constraint was previously used in several algorithms (e.g., [27][28][29][30]). The method described in [31] uses a standard depth constraint and forces the Laplacian of the optimal solution to be in the proximity of the derivative of the given normals. In [32] a depth and photometric stereo fusion algorithm was introduced, which uses additional Laplacian smoothing term and adaptive pixel-wise weighting parameters to preserve surface discontinuities. The Laplacian smoothing term was also added in [33]. A extended penalty is chosen in [34], where the normal is enforced to be close to the normal from the initial depth map, while 2nd-order spherical harmonics are used to constrain the normals according to the observed shading in the input image, a smoothness function enforces the similarity of 1st-order neighbors and an additional term constrains the normals to unit length. A method to refine depth by photometric stereo information using RGB-D cameras was introduced in [35], where an energy function is optimizing for the depth, smoothness, shading and temporal aliasing of a scene. Surface normals from polarization cues were used to enhance the depth map in [36], in an iterative process the depth is refined with corrected surface normal information and a depth fidelity constraint, which enforces consistency between the surface from normals and accurate regions in the depth map. An original approach for inferring about the surface normals from light field data as well as a hybrid setup combining depth maps with surface normals using a block coordinate descent algorithm was demonstrated in [37].
Even though several methods to combine surface depth with surface orientation data previously emerged, a thorough analysis and classification of the properties of those approaches was missing. In this paper, our first main contribution comprises an in-depth comparison of several variational methods using depth maps and surface orientation data, as well as a classification and evaluation of weighting terms for surface orientations using gradients or surface normals. We analyze orientation weighting terms of common methods and explain their differences in respect to the geodesic distance weighting. We show that methods which behave closer to this natural surface normal weighting term show a better performance, especially in regions with steep depth edges. Based on the findings we introduce our second main contribution, a new generalized formulation of a previously introduced method [26], which outperforms other methods regarding the error in the depth domain. Our third main contribution is a novel gradient-based approach, which is using TGV and outperforms other methods in the domain of the geodesic error of the resulting normals.

Depth and Surface Normal Cues
At present, 3D models are used for a wide range of analysis tasks. Depth models are being constructed by acquisition devices using stereo systems, light field cameras, time of flight (ToF), or other range scanning techniques. Common methods show a high precision in the absolute depth measurement, but a low quality in fine relative details. These errors are major obstructions for tasks such as finding defects in objects. Measuring the normal fields of objects by using methods such as photometric stereo [15] or shape from shading [38] will allow the reconstruction of surfaces with highly precise local details. On the downside, those methods show errors in the low frequency domain and therefore result in a low absolute depth accuracy.
Combining depth maps with surface normal information allows an exact 3D reconstruction both in absolute depth and fine surface details. This can be achieved by optimizing energy functions by variational methods, where the solution is penalized for deviating from the depth model and from the surface normals. In state-of-the-art techniques, the surface normal component is either used directly or by converting it to gradient information, where the xand y-component can be treated independently.
Such an independent treatment can be beneficial for applications where data components are missing, as for example line-scanners [39].

Notations and Preliminaries
In this section, we introduce the essential notations used across this paper. By default we assume discretized surface structures of the size of M × N pixels. In order to access the image location, we define the following index set The discrete depth map of our scene is scalar valued in each pixel and defined as follows: Variables with a bold font refer to surface structures where each pixel is vector valued. Hence, the surface gradient field G in xand y-direction is defined as: The gradient of the depth map is computed using standard finite differences: and the gradient operator in xand y-direction ∇ : R M×N → R M×N×2 is given by: The surface normal field is defined as follows: By definition, we have |N i,j | 2 = 1. The relation between the surface gradient estimation and the surface normals is defined for all i, j ∈ I as follows: Furthermore, we are using two specific surface tangent vectors, which are aligned with the xand y-vector respectively and defined as follows: T i,j = (T i,j,x , T i,j,y ) and (8)

Depth and Surface Normal Fusion Algorithms
In this paper, we analyze state-of-the-art methods in a systematic way and introduce two novel approaches. The described hybrid depth and surface normal methods are categorized in terms of their penalty functions. State-of-the-art approaches used similar depth penalty terms and differed in the surface orientation weighting and regularization. We organize the described methods in two categories: (i) gradient-based and (ii) normal-based approaches as well as with respect to their treatment of flat and steep surface regions. While the presented methods show similar behaviors in flat areas, they differ in the penalization of steep regions. We show the quadratic penalty functions of the methods presented in Figures 1 and 2 for lateral and polar deviations, which are illustrated in Figure 3. We will show that the behavior of the energy function for different inclination angles correlates with the quantitative and qualitative depth reconstruction performance of each individual method. We argue that the geodesic distance function shows the most natural behavior with the favorable property of penalizing the distance of the normal orientation independent of the steepness of the edges. Due to this ideal behavior, we use the function to evaluate other distance measures in Section 5. Without loss of generality, we assume a dense depth and normal map for the algorithms described in this paper. In case of sparse input data, we suggest an extension based on Poisson surface reconstruction [40] to deal with sparse data.
An overview of the presented methods is given in Table 1. The first method we present in Section 4.2 is the construction of depth from only the gradient surface orientation information (i.e., no prior information about the absolute depth is being used). Using only the surface orientation for the depth reconstruction results in large-scale low-frequency errors (and therefore depth offsets). Later we overcome this problem by introducing an additional depth constraint in all following methods.
Second, we introduce the gradient-based approach with a depth constraint formulated as a least squares problem in Section 4.3. The respective contour plot of the orientation distance measure shows a strong penalization of steep edges in contrast to flat surface regions. It is easy to see (Figure 1), that the error from the same angular deviation due to noisy normals may generate from small up to infinity error in the gradient domain, depending on the inclination angle. For demonstrative purposes, we additionally show two extensions of the gradient-based method with regularization terms. One forces gradient-based smoothing. The other one enforces smoothness with a Laplacian term and can be used for the reconstruction with sparse depth and surface normal data.
Third, the method of Heber [41] for combining depth with surface orientations is shown in Section 4.4, which scales the given normal by the length of the optimized gradient.
Forth, a review of the method of Nehab [26] is given in Section 4.5, which reprojects the tangents of the optimized surface onto the given normal.
Fifth, as one of our main contributions in this paper, we introduce a new penalty function in form of a generalized method of Nehab in Section 4.6. Using a novel parametrization moves the penalty function closer to the geodesic normal energy and hence penalizes deviations in steep edges and flat regions more equally.
Last, we introduce our second main contribution, a novel Total Generalized Variation (TGV) model in Section 4.7, which penalizes the distance of the gradients of the surface orientation and gives significantly improved reconstruction results. The reason for this lies in the decoupling of the gradient through the TGV term.

Geodesic Distance
In the 3D space, the geodesic distance is the most natural surface normal penalty as distances between surface normals are weighted equally, independent of the surface slope angle with respect to the observer. Therefore it is used in this paper as a comparison measure for the evaluation in Section 5.
The geodesic distance d i,j is defined as the inverse cosine of the point-wise dot product between the given normalN ∈ R M×N×3 and the estimated solution normal field N ∈ R M×N×3 as follows: where ·, · denotes the standard dot product, which is defined as a, b = ∑ n i=1 a i b i = ||a|| ||b|| cos(φ) and φ describes the angle between a and b. We can formulate the distance in Equation (10) by utilizing the surface gradient estimation ∇Z ∈ R M×N×2 for the surface normal N with the relations shown in Equation (6) as follows:  , the generalized method of Nehab with r = 0.5 (middle row) as well as r = 1.6 (bottom row). A more natural weighting, which is in the proximity of the geodesic distance, can be achieved by the adaption of the parameter r. A well balanced distance measure such as with r = 1.6 has the potential of performing better in steep regions (high inclination angle) while preserving performance in the flat regions. Colors of the contours indicate the respective penalty values obtained for different combinations of the inclination and deviation angles. Note that the range is clipped between 0 (blue) and 1 (yellow). See Figure 3 for the explanation of the parametrization used.  The surface orientation weighting of the geodesic distance is illustrated in the contour plot in the first row of Figure 1. The first column shows the polar deviation of the coordinates and the second column shows the lateral deviation, parameterized by the inclination angle α and the deviation angle β, as illustrated in Figure 3. For the following methods a balanced weighting over all inclination angles both for the polar and lateral deviation is favored, as provided by the geodesic distance.

Gradient-Based Method with Surface Orientation Constraint Only
As typical for photometric stereo methods, depth can be partly recovered from surface orientations only. In order to provide a complete context for the method considered in this paper, we show here a method that is using solely gradient-based data. Gradient-based methods have previously been frequently utilized for depth reconstruction (e.g., [16,[42][43][44][45][46][47]). Given a gradient fieldĜ ∈ R M×N×2 , we calculate the surface gradients for the estimated depth map Z in x-(∇ x Z) and y-direction (∇ y Z) respectively. Combining relations from Equation (6), the relations of surface normals and gradients are as follows:  hence the surface gradients are given as: and the given gradient fields correspond to the surface normals by: in xand ydirection respectively. Our goal is to compute a depth map Z such that The comparison of the resulting penalty between the measured and the given gradients is illustrated in Figure 4a. Since Equation (15) is an overdetermined system of linear equations, it can be solved as the following least squares problem: whose global minimizer Z min satisfies the first order sufficient optimality condition: where ∇ * : R M×N×2 → R M×N denotes the adjoint of the ∇ operator, with ∇ * = (∇ * x , ∇ * y ). We compute the minimizer using a standard conjugate gradient method.
It is well known that reconstructing the depth using only surface normal data usually results in errors in the low frequency domain. In the past, this has been improved by different approaches, such as introducing additional boundary conditions [44]. We resolve this problem by hybrid depth and surface normal formulations which proved very efficient in finding an accurate surface reconstruction.

Gradient-Based Method
In this section, we discuss a hybrid gradient-based method which reconstructs depth using a gradient-based algorithm (similar to the one from Section 4.2) extended by the use of the given initial depthẐ. Also here the gradient of the estimated depth map ∇Z is forced to be in the proximity of a measured gradientĜ in xand in y-direction. Therefore, we formulate an overdetermined system of equations as follows: The corresponding least squares problem is given as: where λ > 0 is used to balance between the depth and the orientation constraints. The global optimizer Z min is found by a standard conjugate gradient method, with the optimality condition given as follows: The contour plots of the surface orientation penalty function corresponding to the gradient-based method are shown in Figure 1. Note that with this method, the penalty is notably stronger for deviations in steep regions than in flat regions.
For demonstrative purposes, we introduce the extension of the gradient-based method in Equation (19) with a regularization term. We use a Laplace 2nd-order method which is driven by the derivative of the given gradient field. We formulate the following least squares problem: where λ R > 0 balances the regularization, ∇ * Ĝ can be decomposed into (∇ * xĜx , ∇ * yĜy ) and ∆ denotes the Laplace operator which is defined as follows: Such a gradient-based regularization can be applied to all presented methods. The presented gradient-based approaches require dense depth and surface orientation data. A possibility to cope with sparse data is an additional smoothness assumption coupled with pixel-wise weighting parameters. Hence, we add another term to Equation (19) with a Laplacian smoothness assumption as follows: The weighting parameters λ 1 and λ 2 can be given a priori by the confidence of a data point and λ 3 by the inverse confidence and are defined as follows: In case of stereo or light-field methods for depth reconstruction the parameters can be assessed base on the matching confidence. Unknown points would have a confidence of zero. The same extension for sparse data is applicable for all following methods. Therefore, without loss of generality we discuss the weighting of the surface orientation term with a focus on dense depth and surface orientation data without an additional smoothness assumption.

The Method of Heber
A hybrid variational refinement model was described by Heber [41], where an initial rough depthẐ is given and refined with surface normal information: defines the vector of pointwise 2-norms and the symbol denotes the operator for the point-wise product, also known as Hadamard product. This method is conceptually similar to the gradient-based method described in Section 4.3, also here the same depth constraint ensures a result in the proximity of an initial depth solutionẐ. However, the surface orientation constraint used in Heber's method exploits the given normal fieldN directly instead of the gradient fieldĜ. Here the normalization of ∇Z by division by the length of the vector is overcome by multiplying the term on one side by |(−∇Z, 1)| 2 , which leads to a convex problem [41]. An illustration of the comparison of the measured ∇Z and the given normalN is shown in Figure 4b. The given normal is scaled to the length of the measured gradients ∇Z. The contour map in Figure 1 demonstrates the normal weighting of the Heber's energy term. Here, similar to the gradient-based method, a deviation in steep edges is penalized more than a deviation in flat regions. Also, a different weighting applies whether the estimated gradients are steeper or flatter than the given values.
As the energy function from Equation (27) is convex and differentiable, this algorithm can be solved by an (accelerated) gradient descent method or a (fast) proximal gradient approach. For our evaluation, we used a plain gradient descent approach.Nesterov [48] proposed an accelerated gradient descent method with a simple weighted gradient step, followed by additional sliding, based on the last estimation. A fast proximal gradient method has an additional extrapolation step compared to the proximal gradient method. An example is the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [49] (see Appendix A.1).

The Method of Nehab
The method of Nehab [26] combines depth and surface normals by solving a system of linear equations, consisting of depth and surface orientation constraints. This method is similar to the gradient-based method described in Section 4.3, only the surface normal information is leveraged in a different way, making different trade-offs between flat and steep gradients.
In this method, the surface normal constraint optimizes the sum of squared projections of a set of surface tangent vectors T, as defined in Equation (7) through Equation (9), of the reconstructed surface Z onto the given normal fieldN. This surface normal penalty has been adopted previously in several approaches (e.g., [21,[27][28][29][30]). The projection is illustrated in Figure 4c. Note that the lowest penalty 0 is reached when the tangent vector is precisely orthogonal to the given normal vectorN. We first consider the formulation as described by Nehab [26] by formulating an overdetermined linear system of sparse equations: N i,j , T i,j,y =N i,j,y +N i,j,z (∇ y Z) i,j ≈ 0, ∀i, j ∈ I, which leads to the following least squares problem: where the parameter λ ∈ [0, 1] weights the influence of the initial depth and the given normals. The global optimizer Z min is calculated with a standard conjugate gradient method, with the optimality condition given as follows: The weighting of the surface normal information is explained in more detail in Figure 4c and further illustrated in a contour plot in Figure 2. The polar deviation to the given surface orientation in steep and flat regions is penalized similarly to the method of Heber (see Figure 1, bottom row, left). However, the lateral deviations are weighted equally which is the same behavior as the ideal geodesic distance function (see Figure 1, top row, right).
Note that the approach of Nehab, as described in Equation (29), corresponds to the gradient-based method with an additional local N z -weighting applied to both sides of the surface normal constraint of Equation (18). This can be shown by utilizing the equivalences defined in Equations (13) and (14) as follows: and formulating a corresponding overdetermined linear system of equations: Compared with the gradient-based method, the N z -weighting inherent to Nehab's method improves the behavior of the penalty function by weakening the influence of regions with steep edges. Rationale behind the N z -weighting follows from the fact that flat regions exhibit a higher value of N z (close to 1) while steep regions receive low N z values (close to 0). Such a local weighting helps preventing the over-penalization in steep regions as observed in the gradient-based method.

Generalized Nehab
In order to allow the normal weighting to reach a closer proximity to the geodesic distance, we propose a generalization of the method of Nehab. We extend on the concept of the gradient N z -weighting as explained in Equation (33) by introducing an additional exponent r that controls influence of the N z -weighting such that: We optimize the corresponding least squares problem with a standard conjugate gradient method: where we find a global optimizer Z min that satisfies the following optimality condition: The influence of varying r on the behavior of the surface orientation normal penalty function is shown in Figure 2. It can be seen that with r = 0.5 the generalized method of Nehab penalizes steeper slopes stronger than the original method of Nehab (equivalent to r = 1) and weaker than the gradient-based method (equivalent to r = 0). On the other hand, with r = 1.6 the proposed method exhibits a more tolerant behavior towards steeper inclination angles than the original method of Nehab, resulting in a global behavior that is reasonably close to the geodesic penalty in both the polar and lateral deviation directions. The value of the parameter r might still be further optimized.
Regarding the optimal choice of r, we have recognized that r = 1.6 behaves in the vicinity to the geodesic distance. If the structure of the data shows strong noise on edges steeper than approximately 50 degrees, a lower choice of r can be considered.

Total Generalized Variation
In this section, we introduce another novel method to reconstruct a refined surface with a hybrid depth and surface normal approach that is based on a Total Generalized Variation (TGV) approach as introduced in [50]. The TGV is an extension of the Total Variation (TV), which is a very popular and efficient regularization technique used currently in many image processing applications, however, it is known for producing staircasing artifacts in slope regions of the solution. In contrast, the TGV overcomes the staircasing problem by allowing solutions of higher order. Our formulation is a gradient-based approach, which restricts the surface to be close with a quadratic penalty to an initial depth solutionẐ, while enforcing the auxiliary gradient field G to be in the proximity of the given gradient fieldĜ, propagated through the TGV penalty. Thereby, we simultaneously reconstruct the surface Z and the auxiliary gradient field G, such that our discrete model is given as follows: where the gradient operator ∇ : R M×N×2 → R M×N×4 computes finite differences and ∇G can be decomposed into (∇G x , ∇G y ), where ∇ was defined in Equation (4). Therefore, the first and second order components of our TGV regularization have the following form: The function in Equation (37) consists of a depth and an orientation constraint as well as the TGV regularization terms. The depth constraint (i.e., the third term of Equation (37)) enforces the depth solution Z to stay in the vicinity of our noisy initial depth mapẐ. The orientation constraint (i.e., the fourth term of Equation (37)) enforces G to stay in the proximity of the given surface gradient estimationĜ. The TGV regularization part comprises a first and a second order term (first two terms of Equation (37)). The latter represents the TV component and penalizes the l 1 -norm of the second order gradient field, which forces the gradient field G to be piecewise constant. The former enforces the approximated gradient of the depth map ∇Z to stay in the proximity of the auxiliary gradient field G also through an l 1 penalty.
Given our primal problem in Equation (37) we formulate a primal-dual (PD) problem, which belongs to the class of saddle-point problems, as follows: where H describes the depth and orientation constraints, F defines the TGV component, with its convex conjugate F * and the linear operator K is defined as: Our primal variable is denoted as x ∈ R M×N×3 and y ∈ R M×N×6 represents the dual variable: where y 1 and y 2 hold the dual variables of the first and the second term in Equation (37) respectively, with y 1 holding two terms for each ∇ x and ∇ y , while y 2 holds four terms, each for ∇ x and ∇ y in G x and G y . For F(x) = α||x|| p , where || · || p is an l p -norm, we calculate the convex conjugate as follows: which is the indicator function of the polar ball. Therefore, F * and H are defined as follows: F * (y) = δ ||·|| 2,∞ ≤α 1 (y 1 ) + δ ||·|| 2,∞ ≤α 0 (y 2 ), and An optimal solution to our hybrid formulation is found with the PD algorithm, as described in Appendix A.2. For a more detailed description of the PD algorithm, see [51].

Evaluation
To evaluate all considered algorithms we need two data structures, namely an initial depth map Z and surface orientation information such as surface normalsN or gradientsĜ. Initial depth maps can be provided by stereo or light field correspondence analysis or other depth scanning methods. An estimate of the surface orientation is calculated using e.g., photometric stereo or polarization imaging. Our discussed hybrid algorithms differ mainly in how the surface orientation information is taken into account. In particular, the penalty functions associated with the surface orientation constraints vary significantly for different methods, as illustrated in Figure 1 through Figure 4c. We proposed two novel methods in Sections 4.6 and 4.7, namely the generalized method of Nehab and the TGV approach. The former balances the surface orientation information with an improved weighting function withing the least squares framework. The latter uses a TGV regularization term combined with a gradient-based approach.

Qualitative and Quantitative Evaluation
In order to perform a comprehensive quantitative evaluation of the above mentioned methods, we considered using synthetic evaluation data that comprises full ground truth (GT) information. The initial depthẐ for the synthetic evaluation data is given by a ground truth depth mapẐ GT , which is thresholded after adding noise:Ẑ In this study, we considered a normally distributed additive noise as this type of noise well simulates the behavior of matching errors of stereo or light field methods combined with image sensor noise. The noise used in our evaluations has a maximum amplitude of 7% of the depth range. The constant k defines the number of discretization steps and [·] rounds to the nearest integer number n ∈ Z. These discretization artifacts attempt to simulate the output of the discrete regularized correspondence analysis, which is often applied in real-world scenarios (e.g., as described in [37]).
For the orientation constraints, we assume surface normalsN derived from the ground truth depth modelN GT by adding a normally distributed noise with a maximum amplitude of 23% of the normal range in the spherical coordinate system: In Figure 5, examples of the evaluation dataẐ andN as well as the corresponding ground truth depthẐ GT are shown. All 3D datasets used for evaluations were taken from the Stanford 3D scanning repository [52] and rendered with POV-Ray [53].
Qualitative comparisons of the methods described in Section 4 are presented in Figures 6 and 7. The left column shows the color coded depth reconstructions as delivered by different methods. The middle two columns show the vertical and horizontal depth profiles as marked in Figure 5c. In each of these plots, the red, gray and black lines indicate the reconstructed depth Z, the initial depthẐ and the ground truth depthẐ GT , respectively. The error maps showing signed distances to the ground truth depth are provided in the right column. The corresponding geodesic distances from the ground truth surface normalsN GT for each method are displayed in Figure 8. Results of the quantitative evaluations are shown in Table 2 for gradient-based methods and in Table 3 for normal based methods. For the evaluation we used three datasets from [52]: Buddha (shown in qualitative analysis), Dragon, and Armadillo. The tables hold fractional precisions of two digits in the depth evaluation and a higher fractional precision of four digits in the normal evaluation due to different ranges. The average is calculated using 4 digits after the comma, rounding errors can cause differences in the last digit. The MSE distance in the depth domain is calculated by the quadratic distance between the ground truth depth and the depth result: The geodesic distance is calculated as described in Section 4.1 by using the following equation:  Figure 5c. In each of these plots, the red, gray and black lines indicate the reconstructed depth Z, the initial depthẐ and the ground truth depthẐ GT , respectively. The error maps showing signed distances to the ground truth depth are provided in the right column. As can be seen in Figures 7(1st row) and 8b, using surface orientation information alone in the gradient-based formulation provides visually pleasing detail reconstructions (d N = 0.2365), though it is performing worst in terms of the absolute distance to the ground truth depth with an average MSE of d Z = 66.68. In Figures 7(2nd row) and 8c, one can see that adding a depth constraint to the same gradient-based formulation improves the result drastically (d Z = 2.04 and d N = 0.2951). Nevertheless, this method still shows somewhat low performance around steep edges. Note that each of the demonstrated methods could be used to reconstruct surfaces from surface normals only solely by dropping the depth term. The method of Nehab shown in Figures 7(3rd row) and 8d exhibits the capability to improve the result over the previous methods exploiting a better surface orientation weighting strategy (d Z = 0.14, d N = 0.2532). These methods are optimized using a least squares solver. The evaluation of Heber's method is shown in Figures 7(4th row) and 8e. The method is optimized using gradient descent and reaches average results of d Z = 1.79 and d N = 0.3049, which is significantly worse than the method of Nehab. The results of our generalized method of Nehab with a parametrization r = 1.6 are shown in in Figures 6(1st row) and 8f. This method improves robustness over the standard method of Nehab against noise in surface normals and outperforms all other evaluated methods in the absolute depth error domain with d Z = 0.11. As the normal weighting here is in closer vicinity to the geodesic penalty than the method of Nehab, this approach reaches an improved normal error of d N = 0.2442. This method is optimized using a least squares solver. Our novel TGV method, shown in Figures 6(2nd row) and 8g, provides by far the best normal accuracy of d N = 0.0666 and performs among the best in the depth domain (d Z = 0.20). It is optimized with a primal-dual algorithm.
For completion, we additionally show the results of two regularized gradient methods in Table 3. First, we regularize with smoothness as shown in Equation (23), which can be used for sparse data. In our dense case, where we used scalars for the weighting parameters, the normal accuracy shows a minor improvement due to smoothing in return of a weaker depth accuracy. Second, we regularize with the gradient, as shown in Equation (21). With this we can reach an improvement both in the depth and normal accuracy. These regularization terms could be used for all presented methods, which would exceed the scope of this paper. Note that our novel TGV method shows a significantly better performance in both accuracy measurements.
As can be seen in Tables 2 and 3, the two best performing methods are our generalized method of Nehab with r = 1.6 (normal based) and our novel TGV method (gradient-based). We show the convergence of both in Figure 9 with respect to the depth error defined in Equation (48) and the normal error as defined in Equation (49). While the generalized Nehab converges in both terms after approximately 25 iterations, the TGV settles at around the same iteration step with a depth error which is performing slightly worse than the other method, but continues to highly optimize the normal error. Note that a gradient-based formulation was chosen to demonstrate graceful properties of our novel TGV approach. It significantly improved the results compared with the standard gradient-based method. Alternatively, for even better performance, other penalty functions such as the generalized method of Nehab, can be chosen and will be a matter of future research. Table 2. gradient-based: quantitative evaluation of the distance of the optimized depth values to the ground truth depth. The evaluations were performed on objects from the Stanford database [52], which were rendered using POV-Ray [53]. Evaluated are the MSE to the ground truth depth and the geodesic distance to the ground truth normals.  Table 3. Normal based: quantitative evaluation of the distance of the optimized depth values to the ground truth depth. The evaluations were performed on objects from the Stanford database [52], which were rendered using POV-Ray [53]. Evaluated are the MSE to the ground truth depth and the geodesic distance to the ground truth normals. (a) Generalized Nehab r = 1.6 (ours) (b) TGV (ours) Figure 9. Convergence analysis of the two winning methods: the generalized Nehab with r = 1.6 (a) and the TGV (b). Distances to the ground truth depth and surface normals are shown after each iteration, plotted with a logarithmic y-axis for better visibility. We demonstrated the described algorithms with different surface orientation weighting on a real world example, as shown in Figure 10. This object was acquired with a multi-line scanner, the hybrid light field -photometric stereo acquisition framework described in [37]. In this very specific case we only have one normal direction. An example how to deal with this data in the gradient-based approach is weighting the gradient vector in Equation (23) in the missing direction with zero. Specifics of the multi-line-scanning environment are out of the scope of this paper and are a matter of future research.

Conclusions
We presented a review and classification of methods combining depth and surface orientation data (normals or gradients), in order to reach an improved surface depth estimation. State-of-the-art methods differ mostly in the formulation of the surface orientation constraint (see Section 4) and capabilities of the method-specific solvers.
We illustrated the differences between various formulations of the surface orientation constraint and explained performance discrepancies. Furthermore, we used our findings to introduce a generalization of the method of Nehab (see Section 4.6) that significantly outperforms other methods in terms of absolute depth accuracy. Additionally, we introduced a novel method based on TGV (see Section 4.7), which outperforms all other methods in the surface normal domain and shows a competitive performance in the depth accuracy. While our generalized Nehab method converges faster (see Figure 9) and gives the most accurate result in the depth domain, our TGV based approach refines the surface orientation further and converges at the most accurate orientation result with a high accuracy in the depth domain.
Further research will include the specialization on line-scanning algorithms, TGV weighting adaption and computational acceleration. With specialized hybrid algorithms that fit data from line scanning sensors we will determine a solution with incomplete surface orientation data. The surface orientation constraint of our TGV formulation is currently gradient-based. Plugging in another formulation with a better balanced normal weighting could improve the results even further and will be a matter of future research. Furthermore we will focus on computational acceleration of the proposed algorithms, where we will exploit their inherent structure to achieve efficient parallelization.
If ∇F is Lipschitz continuous with a constant L and a fixed step size λ k = λ ∈ (0, 1/L] is used, the method converges in O(1/k). Accelerated proximal gradient methods have an additional extrapolation step in the algorithm and are denoted as follows: x k+1 ← prox λ k H (y k+1 − λ k ∇F(y k+1 )), where λ k denotes the step size and w k an extrapolation parameter which is described as: FISTA is one example of such an accelerated proximal gradient method and described in detail in [49]. In [41], the FISTA algorithm is used to optimize the method of Heber, which we discuss in Section 4.4.