Total Differential Photometric Mesh Refinement with Self-Adapted Mesh Denoising

Qu, Yingjie; Yan, Qingsong; Yang, Junxing; Xiao, Teng; Deng, Fei

doi:10.3390/photonics10010020

Open AccessArticle

Total Differential Photometric Mesh Refinement with Self-Adapted Mesh Denoising

by

Yingjie Qu

¹

,

Qingsong Yan

¹,

Junxing Yang

²

,

Teng Xiao

^3,4

and

Fei Deng

^1,4,*

¹

School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China

²

School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

³

School of Computer Science, Hubei University of Technology, Wuhan 430068, China

⁴

Wuhan Tianjihang Information Technology Co., Ltd., Wuhan 430010, China

^*

Author to whom correspondence should be addressed.

Photonics 2023, 10(1), 20; https://doi.org/10.3390/photonics10010020

Submission received: 26 October 2022 / Revised: 20 December 2022 / Accepted: 21 December 2022 / Published: 24 December 2022

(This article belongs to the Section New Applications Enabled by Photonics Technologies and Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Variational mesh refinement is a crucial step in multiview 3D reconstruction. Existing algorithms either focus on recovering mesh details or focus on suppressing noise. Approaches with consideration of both are lacking. To address this limitation, we proposed a new variational mesh refinement method named total differential mesh refinement (TDR), which mainly included two improvements. First, the traditional partial-differential photo-consistency gradient used in the variational mesh refinement method was replaced by the proposed total-differential photo-consistency gradient. With consideration of the photo-consistency correlation between adjacent pixels, our method can make photo-consistency achieve a more effective convergence than traditional approaches. Second, we introduced the bilateral normal filter with a novel self-adaptive mesh denoising strategy into the variational mesh refinement. This strategy maintains a balance between detail preservation and effective denoising via the zero-normalized cross-correlation (ZNCC) map. Various experiments demonstrated that our method is superior to traditional variational mesh refinement approaches in both accuracy and denoising effect. Moreover, compared with the mesh generated by open-source and commercial software (Context Capture), our meshes are more detailed, regular, and smooth.

Keywords:

variational mesh refinement; photometric mesh refinement; surface reconstruction; mesh denoising; photogrammetry

1. Introduction

A well-established pipeline of image-based 3D reconstruction technology mainly in-cludes structure from motion (SFM), multiview stereo (MVS), surface reconstruction, mesh refinement, and texture. In this pipeline, the surface reconstruction step reconstructs an initial coarse 3D mesh that may lack details and contain noise due to occlusion and non-Lambertian materials. Photometric stereo [1,2] and mesh refinement are popular methods to reconstruct high-quality mesh shapes. Photometric stereo recovers pixel-wise surface normals from a fixed scene under varying shading cues, which are widely used in the industrial field [3,4,5]. The mesh refinement method evolves the initial mesh to fine details using multiview images. In this paper, we mainly talk about the mesh refinement method for large-scale reconstruction. Variational mesh refinement is the most commonly used refinement method, and it improves mesh details and accuracy by iteratively updating all vertex positions to maximize the photo-consistency between images [6]. However, due to its isotropic regularization term, this method tends to smooth sharpened structures and cannot sufficiently remove excessive mesh noises in texture-less and non-Lambertian regions.

To solve this problem, Li et al. [7] used a content-aware mesh denoising approach as a regularization term for mesh refinement, which was effective in suppressing mesh noise while preserving sharp features. However, without the assistance of image information, some distinct errors and noises of the mesh may be wrongly identified as sharp features and preserved.

Other studies aimed to improve the accuracy of the variational mesh refinement: some researchers selected the best image pairs to refine the mesh [8]. Blaha et al. [9] and Romanoni et al. [10] used semantic information to improve mesh accuracy. However, the improvement is limited when there are few images and little semantic information. In addition, existing variational mesh refinement methods calculate the gradient of each pixel independently and do not consider the photo-consistency of neighboring pixels.

In this study, a total differential mesh refinement (TDR) approach was developed to address the abovementioned problems. First, we proposed to use the total-differential photo-consistency gradient (TDPG) to replace the partial-differential photo-consistency gradient (PDPG) calculation in the variational mesh refinement approach. The TDPG considers the influence of the gradient between adjacent pixels so that the photo-consistency can obtain more effective convergence than the PDPG via gradient descent. Second, we incorporated bilateral normal filtering [11] into the variational mesh refinement to improve the denoising and edge-preserving capabilities. A self-adaptive mesh denoising strategy was adopted to balance detail preservation and effective denoising. Specifically, the zero-normalized cross-correlation (ZNCC), which measures the photo-consistency in the image domain, was transformed to mesh vertices to form a ZNCC map indicative of the uncertainty of the mesh vertices. Then, denoising gradients are self-adaptive and weighted depending on their ZNCC. Thus, our method can remove mesh noise while preserving the details of the mesh. An overview of our method is shown in Figure 1.

Our contributions are as follows:

We proposed the TDPG method, which considers the partial derivative of all pixels in the neighborhood, makes the photo-consistency error converge to a low level, and obtains a fine-details mesh model.

We introduced the bilateral normal filtering [11] to the variational mesh refinement and adopted the self-adaptive mesh denoising strategy that utilized a ZNCC map to guide mesh denoising. This strategy enabled effective denoising while preserving mesh details (Section 2.3).

We used photo-consistency information to guide mesh denoising, which provided a new idea for the study of feature-preserving denoising.

2. Methodology

We enhanced the variational mesh refinement from two aspects. First, the total-differential photo-consistency gradient (TDPG) calculation was proposed for more effective convergence of the photo-consistency. Second, bilateral normal filtering was utilized in mesh refinement for mesh denoising, flattening planes, and sharpening edges. In order to avoid the loss of details that may be caused by denoising, we proposed the self-adaptive mesh denoising strategy. In this section, we first briefly introduce the variational mesh refinement approach (Section 2.1). Then, we present our mesh refinement method (TDR), including TDPG calculation (Section 2.2) and self-adaptive mesh denoising (Section 2.3).

2.1. Preliminaries on Variational Mesh Refinement

The variational mesh refinement approach was introduced by Pons et al. [12] and expanded by Vu et al. [13]. This method minimizes the photo-consistency error between image pairs by iteratively refining the mesh vertex. For one image pair

I_{p}

and

I_{q}

, image

I_{q}

can be projected onto the mesh

S

and then reprojected to image

I_{p}

to form a predicted image

I_{p q}^{S}

[7,14]. The photo-consistency between the predicted image and the reference image measures the correctness of the mesh, and the goal of variational mesh refinement is to minimize this photo-consistency error between all image pairs.

The energy function is expressed as:

E (S) = E_{p h o t o} (S) + E_{r e g u l a r i z a t i o n} (S)

(1)

S

is the mesh surface,

E (S)

is the total energy function,

E_{r e g u l a r i z a t i o n} (S)

enforces the smoothness of the surface. The

E_{p h o t o} (S)

is defined as:

E_{p h o t o} (S) = - \sum_{p, q} \int_{Ω_{p q}^{S}}^{} M_{z n c c} (I_{p}, I_{p q}^{S}) (x_{i}) d x_{i}

(2)

M_{z n c c} (I_{p}, I_{p q}^{S}) (x_{i})

is the ZNCC measurement between images

I_{p}

and

I_{p q}^{S}

at pixel

x_{i}

.

Ω_{p q}^{S}

is the map of the reprojection from image

I_{p}

to image

I_{q}

via the surface.

To minimize

E_{p h o t o} (S)

, the gradient is calculated using chain rules (see [13]):

g_{p h o t o} (V) = \frac{d E_{p h o t o} (S)}{d V} = \sum_{p, q} \int_{Ω_{p q}^{S}}^{} ϕ (X_{i}) \partial M (x_{i}) D I_{q} (x_{j}) D Π_{q} (X_{i}) \frac{d_{i}}{N^{T} d_{i}} N d x_{i}

(3)

V

represents all mesh vertices,

g_{p h o t o} (V)

is the vertices gradient induced by the photo in each iteration.

M

is the abbreviation for

M_{z n c c} (I_{p}, I_{p q}^{S})

,

\partial M (x_{i})

is photo-consistency gradient at pixel

x_{i}

.

X_{i}

is the intersection of S and the ray from the camera center of

I_{p}

to

x_{i}

.

ϕ (X_{i})

is the barycentric coordinate weight of

X_{i}

.

Π_{q}

represents the projection from the world to the

I_{q}

,

x_{j} = Π_{q} (X_{i})

and

D I_{q} (x_{j})

is image gradient at

x_{j}

.

D

denotes the Jacobian matrix of a function.

d_{i}

is the vector joining the camera center of

I_{p}

and point

X_{i}

.

N

is the outward surface normal at

X_{i}

.

For mesh regularization, this method adds thin plate energy [15] that penalizes mesh bending to prevent excessive bending of the mesh and excessive deviation of the gradient flow.

E_{r e g u l a r i z a t i o n} (S) = \int_{S}^{} (k_{1}^{2} + k_{2}^{2}) d S

(4)

where

k_{1}

and

k_{2}

are the principal curvatures of the mesh. The linear combination of Laplacian and Bi-Laplacian operators minimizes this energy [15].

2.2. Total-Differential Photo-Consistency Gradient Calculation

A key step in variational mesh refinement is the calculation of the vertex gradient (Equation (3)), which determines how the vertices are updated to minimize photo-consistency error. In the baseline method [13], the photo-consistency gradient (

\partial M

in (Equation (3))) is calculated separately at each pixel and the potential contradictions between pixel gradients are ignored. In our approach, we proposed to calculate the photo-consistency gradient for each pixel with consideration of the surrounding related pixels. To distinguish those two methods, we denote the baseline method by

\partial M_{P D P G}

and our method by

\partial M_{T D P G}

. The following describes the differences between the two methods.

ZNCC is a ubiquitously used photo-consistency measurement in variational mesh refinement, and its formula is as follows:

M_{z n c c} (x_{i}) = \frac{1}{n} (\frac{\sum_{m = 1}^{n} (a_{m} - μ_{A}) (b_{m} - μ_{B})}{σ_{A} σ_{B}}), M_{z n c c} (x_{i}) \in (- 1, 1)

(5)

where

x_{i}

is the position of a pixel in the reference image,

M_{z n c c} (x_{i})

is the ZNCC measurement at

x_{i}

, while

A

and

B

are two image patches of equal size from the reference and predicted images at

x_{i}

(see Figure 2).

a_{m}

and

b_{m}

(

m \in (1, n)

) denote the m-th pixel in

A

and

B

, respectively and

n

is the number of pixels in each image patch;

μ_{A}

/

μ_{B}

and

σ_{A} / σ_{B}

are the mean and standard deviations of the pixel values in

A

and

B

, respectively.

A

and

B

are two image patches from the reference and predicted images at

x_{i}

. The black lines indicate that the two image patches are combined to calculate the ZNCC value. The arrows represent the differential of ZNCC with respect to the pixels in B. The right part of Figure 2 shows that ZNCC is jointly calculated by

a_{m}

and

b_{m}

. To minimize the ZNCC error, the partial differential only changes the center pixel, while the total differential changes all the pixels in

B

.

The ZNCC error is defined as

1 - M_{z n c c}

. The variational mesh refinement [13] changes the center pixel value to reduce this error, and the partial derivative is calculated [12,14]:

\partial M_{P D P G} (x_{i}) = \frac{\partial M_{z n c c} (x_{i})}{\partial b_{c e n t e r}} = \frac{1}{n} (\frac{a_{c e n t e r} - μ_{A}}{σ_{A} σ_{B}} - M_{z n c c} (x_{i}) \frac{b_{c e n t e r} - μ_{B}}{σ_{B}^{2}})

(6)

where

\partial M_{P D P G} (x_{i})

denotes the PDPG at position

x_{i}

,

a_{c e n t e r}

/

b_{c e n t e r}

is center pixels of image patches

A

and

B

.

On the one hand, PDPG only calculates the partial derivative of the center pixel. As in Figure 2, the ZNCC is determined by all patch pixels, and the total differential should be considered. On the other hand, the PDPG is individually calculated in each image patch. As in Figure 3, pixel

x_{i}

is contained by nine image patches, but the gradient is only determined by the center patch. The gradient calculated in this way is expected to reduce the ZNCC error of

x_{i}

. However, it may increase the ZNCC error of the neighboring pixels, which is not conducive to the convergence of the photo-consistency on the entire image.

In our method, the photo-consistency gradient of a pixel is jointly determined by all image patches that contain it. Based on this idea, we propose the TDPG calculation method:

\partial M_{T D P G} (x_{i}) = \sum_{m = 1}^{n} W_{g} (d (x_{i}^{b_{m}}, x_{i})) \frac{\partial M_{z n c c} (x_{i}^{b_{m}})}{\partial b_{c e n t e r}}

(7)

\frac{\partial M_{z n c c} (x_{i}^{b_{m}})}{\partial b_{c e n t e r}} = \frac{1}{n} (\frac{a_{m} - μ_{A_{m}}}{σ_{A_{m}} σ_{B_{m}}} - M_{z n c c} (x_{i}^{b_{m}}) \frac{b_{m} - μ_{B_{m}}}{σ_{B_{m}}^{2}})

(8)

where

\partial M_{T D P G} (x_{i})

is the TDPG at

x_{i}

and

x_{i}^{b_{m}}

is the position of

b_{m}

in the image patch of

x_{i}

(see Figure 3a),

d (x_{i}^{b_{m}}, x_{i})

is the Euclidean distance between

x_{i}^{b_{m}}

and

x_{i}

.

W_{g} (d)

is the Gaussian weight.

\frac{\partial M_{z n c c} (x_{i}^{b_{m}})}{\partial b_{c e n t e r}}

is the partial derivative of ZNCC at

x_{i}^{b_{m}}

with respect to

b_{c e n t e r}

.

A_{m}

and

B_{m}

are two image patches from the reference and predicted images at

x_{i}^{b_{m}}

.

μ_{A_{m}}

/

μ_{B_{m}}

and

σ_{A_{m}} / σ_{B_{m}}

are the mean and standard deviations of the pixel values in

A_{m}

and

B_{m}

, respectively. The equation shows that the gradient of each pixel is jointly determined by the partial derivative of all pixels within the image patch and that pixels closer to

x_{i}

have a more significant impact.

Figure 4 shows the difference between the convergence process of the PDPG and TDPG on the entire image. We found that TDPG converges after ~25 iterations while PDPG takes ~35 iterations (Figure 4b). TDPG achieves a more effective convergence in that it considers the partial derivative of all pixels in the neighborhood and increases the area affected by the gradient, which thereby facilitates photo-consistency convergence on the entire image. Furthermore, TDPG also yielded a much lower ZNCC error than PDPG. The local magnification area in Figure 4c,d shows that the TDPG method makes the predicted image closer to the reference image in the iterative process.

2.3. Self-Adaptive Mesh Denoising

Although the photo-consistency gradient improves accuracy and enriches the mesh details, noise and errors in the initial mesh cannot be effectively removed, especially in the texture-less and non-Lambertian regions. The mesh regularization of variational mesh refinement is a combination of the Laplacian and Bi-Laplacian operation [15], an isotropic and one-step mesh denoising method. The one-step property means that the method cannot effectively remove the mesh noise in limited iterations, and the isotropic property makes it hard to retain high-frequency details [11,16]. Therefore, we propose to use the two-step and anisotropic bilateral normal filtering as a regularization term for mesh refinement. However, directly applying the bilateral normal filtering with its strong mesh deformation capability is inappropriate because the small mesh details that are difficult to distinguish from noise will be erased.

We utilized the image ZNCC metric, which indicates the mesh accuracy, to guide mesh denoising. For image pair

I_{p}

and

I_{q}

,

x_{i} (p, q)

is the position of a pixel in image

I_{p}

. The ray formed by the camera center of

I_{p}

and

x_{i} (p, q)

intersects the mesh at the 3D position

X_{i} (p, q)

. The ZNCC value at face

f_{k}

can be calculated from all the image pairs visible to it (Figure 5):

C (f_{k}) = \{\begin{matrix} \frac{\sum_{p, q} \sum_{X_{i} (p, q) \in f_{k}} α_{p, q}^{v i s} (X_{i} (p, q)) \cdot M_{z n c c} (x_{i} (p, q))}{\sum_{p, q} \sum_{X_{i} (p, q) \in f_{k}} α_{p, q}^{v i s} (X_{i} (p, q))}, \sum_{p, q} \sum_{X_{i} (p, q) \in f_{k}} α_{p, q}^{v i s} (X_{i} (p, q)) \neq 0 \\ 0, \sum_{p, q} \sum_{X_{i} (p, q) \in f_{k}} α_{p, q}^{v i s} (X_{i} (p, q)) = 0 \end{matrix}

(9)

where

C (f_{k})

represents the ZNCC value of face

f_{k}

.

α_{p, q}^{v i s} (X_{i} (p, q))

describes whether the 3D point

X_{i} (p, q)

is simultaneously visible by image

I_{p}

and image

I_{q}

. If it is visible,

α_{p, q}^{v i s} (X_{i} (p, q)) = 1

; otherwise,

α_{p, q}^{v i s} (X_{i} (p, q)) = 0

.

X_{i} (p, q) \in f_{k}

denotes

X_{i} (p, q)

is on face

f_{k}

.

M_{z n c c} (x_{i} (p, q))

is the ZNCC value at

x_{i} (p, q)

. Then,

C (f_{k})

is transferred from mesh face to mesh vertex to form a ZNCC map:

C (V_{i}) = \frac{\sum_{k \in N (V_{i})} A (f_{k}) C (f_{k})}{\sum_{k \in N (V_{i})} A (f_{k})}

(10)

in which,

V_{i}

is a vertex of the mesh.

C (V_{i})

represents the ZNCC value of the mesh vertex

V_{i}

.

N (V_{i})

is the one-ring face neighborhood of

V_{i}

.

A (f_{k})

is the area of the face

f_{k}

.

The ZNCC maps of different meshes are shown in Figure 6. We found that the noise area in the mesh has a low value of

C (V_{i})

due to the error of the mesh shape. In addition, in the area with a high

C (V_{i})

, the mesh shape conforms to multiview photo-consistency. We use a

C (V_{i})

-weighted denoising gradient to achieve self-adaptive mesh denoising:

g_{r e g u l a r i z a t i o n} (V_{i}) = (1 - C (V_{i})) \cdot g_{b i l a t e r a l} (V_{i})

(11)

g_{r e g u l a r i z a t i o n} (V_{i})

is the regularization gradient at each vertex, and

g_{b i l a t e r} (V_{i})

is the bilateral denoising gradient at each vertex which is described in [11].

It is worth noting that general mesh denoising methods only remove noise based on geometric information. In contrast, this paper adaptively applies a denoising gradient based on the photo-consistency metric, which is conducive to removing significant errors in the initial mesh (see Figure 1).

Finally, our TDR combines the photometric gradient and regularization gradient by

β

:

g_{T D R} (V) = g_{p h o t o}^{T D R} (V) + β g_{r e g u l a r i z a t i o n} (V)

(12)

g

is an abbreviation for gradient.

g_{p h o t o}^{T D R} (V)

is the photometric gradient of our TDR method in each iteration.

g_{p h o t o}^{T D R} (V)

replaces the

\partial M (x_{i})

as

\partial M_{T D P G} (x_{i})

in (Equation (3)).

2.4. Initialization and Implementation Details

We implemented the variational mesh refinement method [13], where ray tracing is used to calculate the projection between images, the image patch size is 5 × 5,

β

is set to 0.2. Those parameters are the same as that used in [13]. The Gaussian weights

W_{g} (d)

are normalized according to the image patch size. The mesh denoising scheme is local for the bilateral normal filtering algorithm, and the normal iterations and the vertex iterations are set to 20 and 10, respectively. To solve the nonconvex problem, we utilize the L-BFGS optimization algorithm [17,18]. The parameter setting of the TDR method is the same as the variational mesh refinement. Our algorithm is implemented with C++. All experiments were conducted on a single PC machine with Intel(R) Core(TM) i7-8700 CPU (12-core), 64 GB RAM, and Nvidia GTX 2070 GPU.

We used a variety of mainstream meshes as the initial inputs, including the OpenMVS [19] mesh, which is reconstructed by the built-in surface reconstruction function, the COLMAP [20] mesh, and the CMP-CMVS mesh [21]. We used the OpenMVS mesh as the initial mesh by default, as the mesh faces are uniformly sized, and the scene reconstruction is complete.

3. Experiments

3.1. Dataset and Evaluation Metrics

Dataset: We use six datasets that cover different scenes, including UAV (unmanned air vehicle) scenes, close-range scenes, and simulation scenes. Information about these datasets is given in Table 1.

(1): Tanks And Temples [22] is a benchmark for image-based 3D reconstruction. The image sequences come from video streams. We picked the Family, Francis, Horse, and Panther data for close-range scene evaluation.
(2): ETH3D [23] is a benchmark for multiview stereo (MVS) evaluation. It provides ultrahigh-resolution images registered to the 3D laser scan point clouds. We picked its facade, delivery_area, relief, and relief_2 data for close-range scene evaluation.
(3): BlendedMVS [24] is a large-scale simulation MVS dataset. It provides ground-truth meshes and rendered images. We selected four outdoor scenes captured by UAVs, namely, UAV_Scene1, UAV_Scene2, and UAV_Scene3, for UAV scene evaluation.
(4): Custom simulation dataset. We picked computer graph (CG) mesh models Joyful [25] as ground-truth mesh and utilized the same lighting to render 70 images from fixed perspectives using Blender [26].
(5): The EPFL dataset [27] provides two ground-truth meshes captured by LIDAR sensors, namely, Herz-Jesu-P8 and Fountain-P11, and provides the images registered with the meshes.
(6): Personal Collection Dataset. We collected multiview images from the internet and natural scenes for qualitative evaluation.

Evaluation Metrics: Similar to [22,23], we use the shortest distance from a point to the surface to evaluate the precision of a mesh. I is the input mesh to be evaluated, and R is the reference mesh. For a vertex

i \in I

, its distance to the reference mesh is defined as

d_{i \to R}

. These distances can be aggregated to define the accuracy of the input mesh I for any distance threshold

d

:

P (d) = 100 * \frac{\sum_{i \in I} [d_{i \to R} < d]}{|I|}

(13)

where

[\cdot]

is the Iverson bracket.

|I|

is the number of vertices for mesh

I

.

Similarly, for a reference mesh vertex

r \in R

, its distance to the input mesh is defined as

d_{r \to I}

. The completeness of the input mesh for any distance threshold

d

is defined as:

C (d) = 100 * \frac{\sum_{r \in R} [d_{r \to I} < d]}{|R|}

(14)

Accuracy and completeness can be combined to calculate the F-score:

F (d) = \frac{2 * P (d) * C (d)}{P (d) * C (d)}

(15)

The F-score is the harmonic mean of the accuracy and completeness at threshold

d

. Threshold

d

varies according to the different scales of the dataset. Moreover, we also use the mean of

d_{i \to R}

of all vertices as the mean-accuracy metric and the mean of

d_{r \to I}

of all vertices as the mean-completeness metric.

3.2. Comparison with the Baseline Method

3.2.1. Performance on the UAV Dataset

In urban scenes, the 3D reconstruction of structures (such as planes and edges) is the critical point. BlendedMVS provides urban meshes that are difficult to capture by a laser scanner fully. We selected the UAV_Scene1-UAV_Scene3 from BlendedMVS for experiment. Figure 7 shows the visual results. Compared with the baseline method [13], the mesh refined by our TDR approach is sharper in the edge region (see the red boxes in Figure 7a) because we utilize the bilateral normal filtering with an edge-preserving effect. In addition, our method obtains a better mesh detail than the baseline method (see the green box in Figure 7b) because our TDPG converges better on photo-consistency. Furthermore, in a limited number of iterations, the Laplace operator used in the baseline method does not remove the undulation in the initial mesh (see the blue box in Figure 7c). In contrast, our method makes the surface flatter due to the bilateral normal filtering with a better denoising ability.

Table 2 shows the quantitative results of the BlendedMVS dataset. Since the captured images are far from the objects, we set the cutoff distance

d

as 0.05 m. Table 2 shows that the baseline method and our TDR method both improve the precision of the initial mesh, but the improvement brought by our method is much more evident than that of the baseline method. The error (

d_{i \to R}

) distribution of Figure 7 shows that compared with the baseline method, the accuracy improvement brought by our TDR method is reflected in the flat area. This shows that the proposed TDR method has a stronger ability to regularize the mesh than the baseline method.

In summary, the TDR method can reconstruct sharp features and flat planes even under poor initial conditions, making our method very suitable for reconstructing urban scenes.

3.2.2. Performance on the Close-Range Dataset

For the close-range dataset, we test our method on the recently published close-range MVS datasets, i.e., ETH3D and Tanks and Temples.

ETH3D dataset: This dataset has ultrahigh-resolution images and provides the laser scan point cloud with the registered images. We use the Poisson surface reconstruction [28] to obtain the reference mesh. Figure 8 shows the results of our experiment. Comparing the baseline method, our TDR method which utilized TDPG has better mesh details (see the red ellipses in Figure 8). Then, Figure 8c,d show that the initial mesh has substantial noise in texture-less regions. The baseline method using the isotropic denoising method cannot effectively remove this noise and retain sharp edges, while our TDR methods can effectively achieve these effects (see the blue rectangles in Figure 8). The results of the accuracy evaluation are shown in Table 3. Since the captured images are close to objects, we set

d

as 0.005 m. Table 3 shows that the proposed TDR algorithm achieves the best results in terms of almost all quantitative metrics.

Tanks and Temples dataset: This dataset does not provide image poses that are registered with the reference point cloud. Therefore, the results of this experiment are qualitatively evaluated. Compared with the baseline algorithm, results of the proposed TDR method have finer details (the red boxes in Figure 9), flatter planes (the green boxes in Figure 9), and sharper edges (the blue boxes in Figure 9). This proves the effectiveness of the two improvements in this paper.

3.3. Discussion

3.3.1. Ablation Experiment

We evaluated the effectiveness of the two improvements in TDR in the simulated CG dataset (see Figure 10), and ablation experiments were conducted. We tested four different configurations: (1) w/o TDPG: using PDPG and self-adaptive bilateral regularization. (2) w/o BI: using TDPG and Laplace regularization. (3) w/o ZNCC weighted: using TDPG and bilateral regularization without ZNCC weighted. (4) Full TDR: using TDPG and self-adaptive bilateral regularization. Figure 11 shows the results. There are giant pits in the face of the initial mesh due to a sizeable texture-less area in the rendered images (Figure 11). The meshes using bilateral regularization (b–d) do not show this error. Comparing (b) with (e) in Figure 11, we found that TDPG presents a more detailed and accurate result than PDPG. Comparing (d) with (e), the ZNCC weighted strategy succeeded in preserving mesh details (see the red box in Figure 11d).

3.3.2. The Influence of Initial Meshes

To discuss the impact of initial meshes on the TDR algorithm, the widely used CMPMVS mesh, OpenMVS mesh, and COLMAP mesh models were chosen as initial meshes for comparisons. We also added the baseline method [13] for comparison. The baseline algorithm generated the CMPMVS_Vu mesh, OpenMVS_Vu mesh, and CCOLMAP_Vu mesh. The TDR method generated the CMPMVS_TDR mesh, OpenMVS_TDR mesh, and COLMAP_TDR mesh.

In this section, we use the EPFL dataset [27] for evaluation. This benchmark is designed for mesh evaluation. Therefore, we test our results using the evaluation metric proposed in this benchmark. First, the reference and the input mesh are projected into the same image, and the residual of the depth of each pixel is calculated. Then, the occupancy rates of the residuals from 3

σ

(

σ

= 1.1 mm) to 10-times 3

σ

are counted. Finally, the occupancy rates of the residuals of all images are averaged to obtain the final occupancy distribution histogram and occupancy density map, as shown in Figure 12. In addition, the weighted average of the residual distribution histogram can be drawn to obtain the accuracy metric. The proportions of the part less than 30

σ

are counted as the completeness metric [27], shown in Table 4.

Table 4 and Figure 13 show that, regardless of the initial mesh, the precision improvement brought by the TDR algorithm is much more significant than that of the baseline algorithm. For both TDR and baseline methods, the accuracy and completeness of the refined meshes are high if the initial meshes are accurate, and vice versa. The reason is that mesh refinement is a nonconvex problem and has a certain dependence on the initial value when using the gradient descent method to solve it.

Figure 13 shows the visual results for Herz-Jesu-P8. Even when handling a very poor initial mesh, TDR still reconstructs a good result (Figure 13a). All initial meshes have considerable noise at the door region (the blue boxes in Figure 13). The TDR method outperforms the baseline method in mesh denoising ability. What is more commendable is that the proposed algorithm with a strong regularization term well retains the details in the human sculpture (the red boxes in Figure 13). This is attributed to the self-adaptive weighted denoising strategy, which effectively combines the photo-consistency gradient and the mesh denoising gradient, with the denoising gradient mainly applied to the noise areas.

In other words, due to the stronger denoising ability, the TDR method can handle a worse initial mesh model.

3.3.3. Running Times Evaluation

In this section, we discuss the running time of our 3D reconstruction system and TDR algorithm on datasets EPFL and Tanks and Temples.

Our 3D reconstruction system comprises SFM, MVS, mesh reconstruction and mesh refinement steps. The SFM step uses COLMAP (CUDA version) with default parameters. The MVS and mesh reconstruction step uses OpenMVS with default parameters. The proposed TDR algorithm is used in the mesh refinement stage with no special performance optimization. Table 5 shows the processing times of the 3D reconstruction system. The SFM, MVS, mesh reconstruction, and mesh refinement time ratios are about 16%, 25%, 4%, and 55%, respectively. Although the mesh refinement step evolves the initial mesh to a high quality with fine details, it cost the most time in the 3D reconstruction system.

Table 6 shows the time consumption of critical steps in the TDR and the baseline method. The two improvements of our method correspond to the computation of

\partial M

and

g_{r e g u l a r i z a t i o n}

, respectively. On the one hand, the PDPG method can use the integral image [29] to accelerate calculations, while the TDPG method cannot. On the other hand, PDPG calculates the partial differential once per pixel, while TDPG calculates the 25 times partial differential for the 5 × 5 image patch. Therefore, the running time of

\partial M

is 8~13 times that of PDPG in the experimental data. For the regularization item

g_{r e g u l a r i z a t i o n}

, the baseline method takes less than 1 s on all experimental data. Our TDR method utilizes the bilateral normal filter method, thus increasing the time consumption, but the time ratio does not exceed 2% of the total time. Overall, the running time of our method is 1.5~2 times more than the baseline method due to a time increase for TDPG.

3.4. Comparison with Open Source and Commercial Software

In this section, we compare our method with representative open source software (OpenMVS and COLMAP), commercial software (Context Capture [30]), and the baseline method [13]. Figure 14 visually compares the reconstructed 3D models on the Personal Collection Dataset. OpenMVS can reconstruct the overall shapes, although some fine details are lost (the roof in the House data), and noise is covered on the smooth surface (all enlarged parts in Figure 14). The COLMAP mesh is similar to the OpenMVS mesh, with slightly more details (the roof in the House data) but fails in texture-less regions (the white wall in the House data). The Context Capture mesh also fails in texture-less regions (the white wall in the House data), and it is too smooth, which causes details loss (roof in the House data). The method of [13] enhances the details of the initial mesh to a certain extent (the roof in the House data), but the mesh noise is still not eliminated, and the edges are not sharpened (the white wall in the House data). In contrast, due to the strong denoising ability brought by the bilateral normal filter, our mesh result on the Woodcarving data is the smoothest, and the house edges are the sharpest. At the same time, the TDPG recovers the most details.

4. Conclusions

This study proposed a new mesh refinement approach coupling total differential photometric mesh refinement and self-adapted mesh denoising. On the one hand, traditional PDPG in variational mesh refinement is replaced by TDPG. TDPG considers the neighboring pixels and increases the area affected by the gradient, which results in more effective convergence of the photo-consistency, thus increasing the details and accuracy of the mesh. On the other hand, the self-adaptive denoising strategy provides a framework for image-guided mesh denoising. The intensity of the denoising gradient can be adaptively adjusted according to the multiview ZNCC metric, which facilitates the removal of significant errors in the initial mesh and preserving mesh details. Experiments on different scenes and comparisons with open-source and commercial software were conducted. The refined meshes are evaluated in terms of both accuracy and completeness. Results showed that our method outperformed current variational mesh refinement methods and is comparable and even better than commercial software, and the mesh refined by our method is the most detailed, accurate, and regular. In the future, we plan to run our method on GPU and explore the fusion of subpixel sampling and photometric stereo technology with mesh refinement.

Author Contributions

Methodology, Y.Q.; writing—original draft preparation, Y.Q.; writing—review and editing, Q.Y., J.Y., T.X. and F.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Ningbo Key Research and Development Project (No. 20201ZDYF020236) and Hubei Key Research and Development Project (NO. 2022BAA035).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The Tanks And Temples Dataset can be obtained from https://www.tanksandtemples.org/ (accessed on 20 December 2022). The ETH3D Dataset can be obtained from https://www.eth3d.net/ (accessed on 20 December 2022). The EPFL Dataset can be obtained from https://icwww.epfl.ch/multiview/denseMVS.html (accessed on 20 December 2022). The BlendedMVS dataset can be obtained from https://github.com/YoYo000/BlendedMVS (accessed on 20 December 2022). The CG Simulation and Personal Collection datasets are available from the corresponding author upon reasonable request.

Acknowledgments

The authors are grateful to the providers of the Temples and Tanks dataset, the Strecha dataset, and the three computer graph mesh models (Joyful, Armadillo, and Happy_vrip). We would also like to thank researchers who published open-source code or programs used to generate the CMPMVS meshes, OpenMVS meshes, and COLMAP meshes in our experiments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ju, Y.; Shi, B.; Jian, M.; Qi, L.; Dong, J.; Lam, K.-M. NormAttention-PSN: A High-frequency Region Enhanced Photometric Stereo Network with Normalized Attention. Int. J. Comput. Vis. 2022, 130, 3014–3034. [Google Scholar] [CrossRef]
Yang, J.; Ding, B.; He, Z.; Pan, G.; Cao, Y.; Cao, Y.; Zheng, Q. ReDDLE-Net: Reflectance Decomposition for Directional Light Estimation. Photonics 2022, 9, 656. [Google Scholar] [CrossRef]
Ju, Y.; Jian, M.; Guo, S.; Wang, Y.; Zhou, H.; Dong, J. Incorporating lambertian priors into surface normals measurement. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
Ju, Y.; Peng, Y.; Jian, M.; Gao, F.; Dong, J. Learning conditional photometric stereo with high-resolution features. Comput. Vis. Media 2022, 8, 105–118. [Google Scholar] [CrossRef]
Liu, Y.; Ju, Y.; Jian, M.; Gao, F.; Rao, Y.; Hu, Y.; Dong, J. A deep-shallow and global–local multi-feature fusion network for photometric stereo. Image Vis. Comput. 2022, 118, 104368. [Google Scholar] [CrossRef]
Romanoni, A.; Matteucci, M. Facetwise Mesh Refinement for Multi-View Stereo. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 6794–6801. [Google Scholar]
Li, Z.; Wang, K.; Zuo, W.; Meng, D.; Zhang, L. Detail-Preserving and Content-Aware Variational Multi-View Stereo Reconstruction. Ieee Trans. Image Process. 2016, 25, 864–877. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Romanoni, A.; Matteucci, M. Mesh-based camera pairs selection and occlusion-aware masking for mesh refinement. Pattern Recognit. Lett. 2019, 125, 364–372. [Google Scholar] [CrossRef] [Green Version]
Blaha, M.; Rothermel, M.; Oswald, M.R.; Sattler, T.; Richard, A.; Wegner, J.D.; Pollefeys, M.; Schindler, K. Semantically Informed Multiview Surface Refinement. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3839–3847. [Google Scholar]
Romanoni, A.; Ciccone, M.; Visin, F.; Matteucci, M. Multi-view Stereo with Single-View Semantic Mesh Refinement. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 706–715. [Google Scholar]
Zheng, Y.; Fu, H.; Au, O.K.; Tai, C.L. Bilateral normal filtering for mesh denoising. IEEE Trans. Vis. Comput. Graph 2011, 17, 1521–1530. [Google Scholar] [CrossRef] [PubMed]
Pons, J.-P.; Keriven, R.; Faugeras, O. Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. Int. J. Comput. Vis. 2007, 72, 179–193. [Google Scholar] [CrossRef]
Vu, H.H.; Labatut, P.; Pons, J.P.; Keriven, R. High accuracy and visibility-consistent dense multiview stereo. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 889–901. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Siu, S.Y.; Fang, T.; Quan, L. Efficient multi-view surface refinement with adaptive resolution control. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 349–364. [Google Scholar]
Kobbelt, L.; Campagna, S.; Vorsatz, J.; Seidel, H.-P. Interactive multi-resolution modeling on arbitrary meshes. In Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, Orlando, FL, USA, 19–24 July 1998; pp. 105–114. [Google Scholar]
Zhang, H.; Wu, C.; Zhang, J.; Deng, J. Variational mesh denoising using total variation and piecewise constant function space. IEEE Trans. Vis. Comput. Graph. 2015, 21, 873–886. [Google Scholar] [CrossRef] [PubMed]
Byrd, R.H.; Lu, P.H.; Nocedal, J.; Zhu, C.Y. A Limited Memory Algorithm for Bound Constrained Optimization. Siam J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
Zhu, C.Y.; Byrd, R.H.; Lu, P.H.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. Acm Trans. Math. Softw. 1997, 23, 550–560. [Google Scholar] [CrossRef]
Cernea, D. OpenMVS: Open Multiple View Stereovision. Available online: https://github.com/cdcseacave/openMVS/ (accessed on 20 December 2022).
Schonberger, J.L.; Frahm, J.-M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Jancosek, M.; Pajdla, T. Multi-View Reconstruction Preserving Weakly-Supported Surfaces. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011. [Google Scholar]
Knapitsch, A.; Park, J.; Zhou, Q.-Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. ToG 2017, 36, 1–13. [Google Scholar] [CrossRef]
Schps, T.; Schnberger, J.L.; Galliani, S.; Sattler, T.; Schindler, K.; Pollefeys, M.; Geiger, A. A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Yao, Y.; Luo, Z.; Li, S.; Zhang, J.; Ren, Y.; Zhou, L.; Fang, T.; Quan, L. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1790–1799. [Google Scholar]
Kim, K.; Torii, A.; Okutomi, M. Multi-View Inverse Rendering under Arbitrary Illumination and Albedo. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 750–767. [Google Scholar] [CrossRef]
Blender. Version v2.93.4 (Software). Available online: https://www.blender.org/ (accessed on 20 December 2022).
Strecha, C.; Von Hansen, W.; Van Gool, L.; Fua, P.; Thoennessen, U. On benchmarking camera calibration and multi-view stereo for high resolution imagery. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
Kazhdan, M.; Chuang, M.; Rusinkiewicz, S.; Hoppe, H. Poisson surface reconstruction with envelope constraints. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2020; Volume 39, pp. 173–182. [Google Scholar]
Facciolo, G.; Limare, N.; Meinhardt-Llopis, E. Integral images for block matching. Image Process. Line 2014, 4, 344–369. [Google Scholar] [CrossRef]
ContextCapture. Version v4.4.9.516 (Software). 2020. Available online: https://www.bentley.com/en/products/brands/contextcapture (accessed on 20 December 2022).

Figure 1. The flow diagram for the proposed TDR algorithm.

Figure 2. Schematic diagram of the minimization of ZNCC error.

Figure 3. The difference between PDPG and TDPG.

Figure 4. An example of the convergence process of PDPG and TDPG for the entire image. (a) shows the reference (left) and predicted image (right), and (b) shows the photo-consistency convergence curve of the PDPG and TDPG. (c,d) show the changes in the predicted image at different iterations by PDPG and TDPG, respectively.

Figure 5. The schematic diagram for the calculation of the ZNCC value of face

f_{k}

. The red triangle represents

f_{k}

. All the image pairs visible to

f_{k}

are considered for the ZNCC calculation, indicated as images from Camera 1 to 4 in this figure.

Figure 5. The schematic diagram for the calculation of the ZNCC value of face

f_{k}

. The red triangle represents

f_{k}

. All the image pairs visible to

f_{k}

are considered for the ZNCC calculation, indicated as images from Camera 1 to 4 in this figure.

Figure 6. ZNCC maps of different meshes. (a–d) are the initial meshes, and (e–h) are the corresponding ZNCC maps.

Figure 7. The meshes result (odd rows) and the error (

d_{i \to R}

) distributions of the meshes (even rows) on the BlendedMVS dataset.

Figure 7. The meshes result (odd rows) and the error (

d_{i \to R}

) distributions of the meshes (even rows) on the BlendedMVS dataset.

Figure 8. Visual comparison of results on the ETH3D dataset.

Figure 9. Visual comparison of results on the Tanks and Temples dataset.

Figure 10. The simulated CG dataset. The three columns are the mesh, cameras, and rendered images from left to right.

Figure 11. The visualization result and the accuracy metric of the Joyful data.

Figure 12. The residual occupancy density maps (a–f) and occupancy distribution histograms (g,h) of all meshes.

Figure 13. Visualization of the results of all methods on Herz-Jesu-P8.

Figure 14. Visual comparison on the Personal Collection Dataset.

Table 1. Introduction to the datasets used in this study.

Dataset	Name	Image Size	Number of Images	Initial Mesh	Image Acquisition
Tanks And Temples	Family	1920 × 1080	153	OpenMVS	Handheld
	Francis	1920 × 1080	302	OpenMVS	Handheld
	Horse	1920 × 1080	151	OpenMVS	Handheld
	Panther	1920 × 1080	314	OpenMVS	Handheld
ETH3D	delivery area	6048 × 4032	44	OpenMVS	Handheld
	facade	6048 × 4032	76	OpenMVS	Handheld
	relief	6048 × 4032	31	OpenMVS	Handheld
	relief 2	6048 × 4032	31	OpenMVS	Handheld
BlendedMVS	UAV_ Scene1	2048 × 1536	77	OpenMVS	Rendered
	UAV_ Scene2	2048 × 1536	125	OpenMVS	Rendered
	UAV_ Scene3	2048 × 1536	75	OpenMVS	Rendered
EPFL	Herz- Jesu-P8	3072 × 2048	8	OpenMVS /COLMAP/ CMPMVS	Handheld
EPFL	Fountain-P11	3072 × 2048	11	OpenMVS/ COLMAP/ CMPMVS	Handheld
CG Simulation Dataset	Joyful	1920 × 1080	70	OpenMVS	Rendered
Personal Collection Dataset	House	4592 × 3056	36	OpenMVS	UAV
Personal Collection Dataset	Woodcarving	2016 × 4032	146	OpenMVS	Handheld

Table 2. Quantitative evaluation of the BlenderMVS dataset. Acc. means accuracy and Compl. represents completeness.

		Initial Mesh	Baseline	TDR
UAV_ Scene1	Acc. [%]	28.11	77.00	81.67
	Compl. [%]	20.30	61.37	63.16
	F1 [%]	23.57	68.30	71.23
	Mean-Acc. [X10-2]	10.89	6.71	5.09
	Mean-Compl. [X10-2]	24.13	15.03	14.13
UAV_ Scene2	Acc. [%]	31.90	72.79	77.71
	Compl. [%]	29.80	90.20	90.77
	F1 [%]	30.81	80.57	83.73
	Mean-Acc. [X10-2]	12.80	8.00	6.70
	Mean-Compl. [X10-2]	42.40	28.30	27.30
UAV_ Scene3	Acc. [%]	29.53	83.98	85.79
	Compl. [%]	28.90	66.85	67.95
	F1 [%]	29.21	74.44	75.84
	Mean-Acc. [X10-2]	10.27	5.37	4.94
	Mean-Compl. [X10-2]	26.80	17.55	17.44

Table 3. Quantitative evaluation of results on the ETD3D dataset.

		Initial Mesh	Baseline	TDR
delivery_area	Acc. [%]	53.18	53.33	56.98
	Compl. [%]	37.65	41.85	42.95
	F1 [%]	44.09	46.89	48.98
	Mean-Acc. [X10-3]	7.89	7.80	7.24
	Mean-Compl. [X10-3]	51.24	50.90	50.84
facade	Acc. [%]	24.25	34.35	42.78
	Compl. [%]	27.07	38.22	45.15
	F1 [%]	25.58	36.18	43.93
	Mean-Acc. [X10-3]	34.89	30.67	26.76
	Mean-Compl. [X10-3]	15.35	13.91	13.22
relief	Acc. [%]	95.45	95.87	95.97
	Compl. [%]	94.09	95.63	96.79
	F1 [%]	94.77	95.75	96.38
	Mean-Acc. [X10-3]	1.79	16.25	13.51
	Mean-Compl. [X10-3]	2.15	1.76	13.17
relief_2	Acc. [%]	90.48	90.34	92.58
	Compl. [%]	86.72	89.74	90.72
	F1 [%]	88.56	90.04	91.64
	Mean-Acc. [X10-3]	1.94	2.16	1.93
	Mean-Compl. [X10-3]	2.66	2.19	2.13

Table 4. The accuracy and completeness of all meshes.

	Herz-Jesu-P8			Fountain-P11
	#faces [M]	Acc. [3σ]	Compl. [%]	#faces [M]	Acc. [3σ]	Compl. [%]
CMPMVS	2.76	6.25	50.57	2.47	5.08	53.42
CMPMVS_Vu	1.25	4.30	66.55	1.55	2.90	70.34
CMPMVS_TDR	1.25	3.77	71.83	1.55	2.64	71.70
OpenMVS	1.54	4.04	72.96	1.89	2.42	79.37
OpenMVS_Vu	1.26	3.59	75.62	1.52	2.15	79.52
OpenMVS_TDR	1.26	3.49	75.72	1.53	1.95	80.29
COLMAP	1.14	3.80	71.45	1.51	2.42	74.33
COLMAP_Vu	1.23	3.60	73.77	1.39	2.24	78.25
COLMAP_TDR	1.22	3.32	76.41	1.40	1.99	79.12

Table 5. The processing times of the 3D reconstruction system.

	Herz-Jesu-P8		Fountain-P11		Family		Francis		Horse		Panther
	Time (s)	Ratio (%)	Time (s)	Ratio (%)	Time (s)	Ratio (%)	Time (s)	Ratio (%)	Time (s)	Ratio (%)	Time (s)	Ratio (%)
SFM	-	-	-	-	472	12	925	19	276	12	1388	22
MVS	120	35	196	38	1009	25	1313	27	575	24	1545	24
Mesh reconstruction	60	18	86	17	222	6	131	3	91	4	331	5
Mesh refinement	159	47	233	45	2335	58	2455	51	1427	60	3149	49

Table 6. The time consumption of critical steps in the TDR and the baseline method.

	Herz-Jesu-P8		Fountain-P11		Family		Francis		Horse		Panther
	Vu	TDR	Vu	TDR	Vu	TDR	Vu	TDR	Vu	TDR	Vu	TDR
#Vertices (K)	754	754	944	944	895	895	672	672	527	527	1897	1897
#Images pixels (M)	50	50	69	69	317	317	626	626	313	313	651	651
Ray tracing (s)	30	32	43	45	561	389	624	524	416	291	850	810
Compute $\partial M$ (s)	7	85	10	129	231	1391	285	1271	184	759	198	1626
Compute $g_{p h o t o}$ (s)	17	16	26	23	371	257	277	245	211	151	339	316
Compute $g_{r e g u l a r i z a t i o n}$ (s)	0	3	0	4	0	7	0	5	0	5	0	10
Others (s)	22	24	30	32	435	290	504	410	331	221	433	387
Total (s)	76	159	109	233	1597	2335	1690	2455	1142	1427	1819	3149

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, Y.; Yan, Q.; Yang, J.; Xiao, T.; Deng, F. Total Differential Photometric Mesh Refinement with Self-Adapted Mesh Denoising. Photonics 2023, 10, 20. https://doi.org/10.3390/photonics10010020

AMA Style

Qu Y, Yan Q, Yang J, Xiao T, Deng F. Total Differential Photometric Mesh Refinement with Self-Adapted Mesh Denoising. Photonics. 2023; 10(1):20. https://doi.org/10.3390/photonics10010020

Chicago/Turabian Style

Qu, Yingjie, Qingsong Yan, Junxing Yang, Teng Xiao, and Fei Deng. 2023. "Total Differential Photometric Mesh Refinement with Self-Adapted Mesh Denoising" Photonics 10, no. 1: 20. https://doi.org/10.3390/photonics10010020

APA Style

Qu, Y., Yan, Q., Yang, J., Xiao, T., & Deng, F. (2023). Total Differential Photometric Mesh Refinement with Self-Adapted Mesh Denoising. Photonics, 10(1), 20. https://doi.org/10.3390/photonics10010020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Total Differential Photometric Mesh Refinement with Self-Adapted Mesh Denoising

Abstract

1. Introduction

2. Methodology

2.1. Preliminaries on Variational Mesh Refinement

2.2. Total-Differential Photo-Consistency Gradient Calculation

2.3. Self-Adaptive Mesh Denoising

2.4. Initialization and Implementation Details

3. Experiments

3.1. Dataset and Evaluation Metrics

3.2. Comparison with the Baseline Method

3.2.1. Performance on the UAV Dataset

3.2.2. Performance on the Close-Range Dataset

3.3. Discussion

3.3.1. Ablation Experiment

3.3.2. The Influence of Initial Meshes

3.3.3. Running Times Evaluation

3.4. Comparison with Open Source and Commercial Software

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI