#### 2.4.1. Camera Properties and Geometry Estimation

A Scale-Invariant Feature Transform (SIFT) [

31] followed by bundle adjustment [

32] using the implementation by Wu et al. [

33] is used to generate estimates of the cameras’ parameters and geometries. In the absence of calibrated metric cameras, the estimated camera positions have an arbitrary scale factor. A conventional solution is to include additional coded targets and/or scale bars in the image scene. In our workflow, automatic calibration is achieved by adding a sequence of virtual reference ‘photographs’ to each of the image sets. The reference ‘photographs’ are actually artificially generated replicas of the calibration plate, shown in

Figure 3C, rendered using a sequence of viewpoints comparable to typical camera viewpoints. A sequence of twelve images rendered at 30° rotational intervals, all from a 45° elevation, was empirically found to work well. SIFT feature points are extracted and matched for this set of images prior to processing the ‘real’ photographs.

The camera geometry estimation process uses a well-established workflow [

33], beginning by extracting the SIFT features of the real set of photographs and matching them with the existing calibration model. Bundle adjustment refines the estimated parameters of the unknown cameras whilst keeping the virtual camera positions, poses and intrinsic parameters fixed at their known, exact values (see

Figure 6). By estimating the real camera positions relative to a static, calibrated set of feature points, the correct scale factors are automatically ensured. The final stage of this process is the removal of the virtual photographs from the image set prior to dense point-cloud reconstruction using only the real photographs and the corresponding calibrated camera parameter estimates. Using this approach, the discrete coded targets, conventionally used for calibration, are substituted by a single extended coded target filling most of the image scene that is not occupied by the object being acquired itself. This gives robust auto-calibration results and works with objects of many shapes and sizes. The complete geometry estimation workflow was implemented using code from Wu et al. [

33] combined with customised code written in a combination of C++ and Matlab. A Windows batch script was used to automate the process.

#### 2.4.4. Point-Cloud Registration

After photogrammetric point-cloud reconstruction has been applied to each set of M photographs, a set of N three-dimensional calibrated point-clouds have been generated. The process of merging the N partial models is one of point-cloud registration: matching the overlapping regions of the point-clouds from pairs of partial models. Point-cloud registration is performed in two stages:

Super 4PCS is a reliable and efficient open-source algorithm for achieving coarse alignment between point-clouds in an automatic, unsupervised process. The algorithm described by Mellado et al. [

37] is used without modification.

Fine alignment is a modified version of the well-established ICP algorithm [

38]. In the conventional algorithm, in order to orient one point-cloud,

P, so that it matches another point-cloud,

Q, an error function of the following form is minimised:

where

${\mathbf{p}}_{\mathbf{i}}$ is the

i-th member of the point-cloud

P,

${\mathbf{A}}_{\mathbf{k}}$ is the current estimate of the optimal rigid transform matrix, and

${\mathbf{q}}_{\mathbf{j}}$ is the

j-th member of the fixed, reference point-cloud

Q;

j is chosen to select the closest point in the cloud to

${\mathbf{p}}_{\mathbf{i}}$. At each iteration of the process, the point-cloud correspondences are re-estimated and a new transform,

${\mathbf{A}}_{\mathbf{k}}$, calculated to minimise the error function.

The ICP algorithm in this form works under the assumption that all points in both point-clouds are equally precisely estimated, which, in our application, is not necessarily the case.

Figure 7A,B illustrate the problem, showing two partial models of an object acquired from different viewpoints. Assuming view

${n}_{1}$ was photographed using a typical camera elevation of around 40°, the points

${p}_{1}$ and

${p}_{2}$ would have been photographed from a very shallow grazing angle and would be poorly illuminated by the overhead light source. As a result, their photogrammetric reconstruction would not have been as precise as the other points illustrated in the model. From viewpoint

${n}_{2}$ (the same object turned over), the corresponding points

${p}_{3}$ and

${p}_{4}$ would be better illuminated and in better view from the camera and, so, would be more precisely reconstructed.

If the expected precision of each point can be estimated, their relative importance in the point-cloud registration process can be weighted accordingly. In addition, after registration, the overlapping regions can be automatically ‘cleaned-up’ by eliminating unreliable points where a better close-by alternative can be found in another partial model.

As part of the dense point-cloud reconstruction process, each point is associated with a surface normal vector. The vertical (

z) components of the correspondingly rotated normal vectors form a good first estimate of the potential reliability of each point.

Figure 7C illustrates this concept. Points such as

${p}_{5}$ with poor expected precision can be identified by the negative vertical (

z) component of the normal vector (i.e., the normal points downwards). Points along the top of the object, such as

${p}_{6}$, would all be expected to have good precision and can be identified by their large, positive

z components. The only exceptions to this rule are found near the base of the object, e.g., point

${p}_{7}$. In this area, shadowing can cause such poor reconstruction that the normal vector itself can be imprecise. These points can be identified by their

z coordinate relative to the bottom of the cropped object. These criteria are combined to form a single confidence metric for each point:

where

${n}_{z}$ is the

z component of the point’s normal vector and

z is the point’s

z coordinate in millimetres relative to the cropping height used in the previous section. The constant,

$\lambda $, sets the height-range of the subset of points near the base that are expected to be less precise; a value of approximately 1 mm has been found to work well in practice. An example illustrating the use of this confidence metric is shown in

Figure 8. Points on the top, upward facing surface of the object have confidence values close to the maximum whilst those near the bottom show a confidence close to zero. The sides of the object show varying confidence values according to the inclination of the surface. A comparison of

Figure 8A,B shows that the regions around the sides of the object that were poorly lit or whose view was obscured do, correctly, receive correspondingly lower confidence values.

In order to make use of the confidence values, a modified Weighted ICP (W-ICP) algorithm is used. This algorithm is the same as conventional ICP except that the error function from Equation (

1) is modified to be:

i.e., the contribution of each point-to-point distance is weighted by the product of the confidence values of each point in the pair. This ensures that the most precisely estimated points contribute most to the error function and will, therefore, be registered most precisely.

After the last iteration, each pair of points is combined to form a single interpolated point using the same confidence values. Given a pair of points

${\mathbf{p}}_{\mathbf{i}}$ and

${\mathbf{q}}_{\mathbf{j}}$, the resulting interpolated point,

${\mathbf{r}}_{\mathbf{i}}$, in the merged point-cloud would be:

This merging operation helps to ensure that no ‘seams’ remain at the edges of the original point-clouds resulting from imprecise unpruned points.

#### 2.4.6. Texturing

The texturing process refers back to the original sets of photographs and the camera position information calculated during the sparse point-cloud reconstruction to determine the detailed appearance (i.e., texture) of each face of the mesh. Meshlab [

40] provides relatively easy-to-use parameterisation and texturing processing suitable for this task providing the camera locations can be imported in the correct format. This is a more complex task than in the conventional single-scan photogrammetry workflow [

41] because the merging process will have reoriented the component parts of the mesh, requiring the camera positions to be moved accordingly.

The starting point of the process is the set of camera locations estimated by the bundle adjustment processing of VisualSFM [

42]. The position of the

m-th camera during the

n-th partial acquisition can be expressed in the form of a single 4-by-4 view-matrix,

${\mathbf{V}}_{\mathbf{mn}}$.

As a result of the point cloud registration stage, each of the

N partial meshes has been transformed from the location assumed by the bundle adjustment process. As a result, before texturing, the mesh must also be transformed by a model-matrix,

${\mathbf{M}}_{\mathbf{n}}$, which is the inverse of the optimal transform calculated during point-cloud registration for the

n-th partial acquisition. Thus, the rotation and translation of the

m-th camera during the

n-th partial acquisition is expressed by the model-view-matrix,

${\mathbf{V}}_{\mathbf{mn}}{\mathbf{M}}_{\mathbf{n}}$. An example of the resulting ensemble of camera positions is illustrated in

Figure 9.

Having correctly repositioned and reoriented the cameras, Meshlab is able to parameterise and texture the mesh producing the final 3D model.

Figure 10 shows an example of a completed reconstruction.