1. Introduction
Dynamic deformation measurement (DDM) has significant applications such as structural health monitoring [
1], industrial maintenance [
2], and vehicle refinement [
3]. Several contact and non-contact sensors are widely employed for accurate DDM, such as linear variable differential transducers [
4], accelerometers [
5], fiber optics sensors [
6], and laser sensors [
7]; yet they all suffer from different problems, like sparse measurements, mass loading, and spatial inconsistency. The rapid advancement in camera resolution has propelled the development of vision-based methods that utilize camera pixels as dense arrays of optical sensors. Vision-based methods are favored for their easy installation, as well as their non-contact, fast, and full-field consistent measurement capabilities. Pixel intensity array-tracking methods, such as digital image correlation (DIC) [
8] and template matching [
9], are widely adopted for sub-pixel displacement estimation.
Although DIC often requires a speckle pattern applied to the object surface to facilitate intensity matching, it is widely accepted as a non-contact displacement measurement technique, with the advantage of being mass load-free and independent of the tested material or the length scale of interest [
10]. DIC was first utilized in the 1980s to monitor aluminum specimens [
11], with tracking motion based on photo-consistency between pre- and post-deformation images. Since then, numerous research studies have advanced this technique in terms of refined shape function [
12], high-order interpolation [
13], enhanced intensity filtering [
14], precise and efficient parameter solving [
15], and boosted initial guess [
16]. To simplify the practical configurations and allow for full-field 3D measurements, 3D-DIC integrated stereo photogrammetry with 2D-DIC was performed in the 1990s [
17]. This integration enables 3D DDM using two overlapping images with the support of calibration [
18], stereo correspondence [
19], and triangulation [
20] from stereo photogrammetry. Meanwhile, speckle pattern also enhances calibration based on its rich features and improves measurement precision in the presence of environmental disturbances [
21]. Recently, the accuracy of stereo correspondence in 3D-DIC has been significantly enhanced through various strategies, such as image feature description and matching [
22], path-guided measurement [
23], geometric constrained semi-global matching [
24], and model-based projection [
25]. Deep learning has further elevated performance by increasing measurement range [
26], improving precision, and enabling super-resolution speckle image generation [
27]. Based on these advancements, 3D-DIC achieves micrometer-level accuracy by tracking the same patterned subset region across stereo image sequences. However, accurate tracking requires sufficient image overlap between stereo camera views, which inherently restricts the measurable region in 3D-DIC systems [
28].
A widely adopted solution for enlarging the measurement region is to use multi-camera systems supported by multi-camera geometry (MCG; also called multi-view geometry). MCG is a technique that is utilized for recovering 3D information from multiple 2D camera images, such as 3D shape measurement [
29], 3D DDM [
30], 3D position tracking [
31], and 3D pose estimation [
32]. By capturing images from different locations, MCG strongly compensates for the regions that are difficult to cover by stereo configuration [
29], with applications for cultural heritage archival practice, surgical planning, structural health monitoring, crime scene reconstruction, and entertainment [
33,
34,
35,
36]. Multi-camera systems are established in two main ways: camera array systems or pseudo-camera systems. Camera array systems place multiple real cameras at different locations, and each camera records an individual view of the object; this configuration ensures high resolution and spatial consistency [
37]. Pseudo-camera systems generate virtual cameras by temporal motion, mirror, or a prism, allowing more cost-effective realization [
38]. The continued efforts to achieve 3D scene representation [
39], calibration [
40], stereo correspondence [
29], and patchmatch-based depth estimation [
29] have strongly advanced its development. On the other hand, multi-camera configurations also make camera grouping (selection) an important issue that is firmly related to measurement accuracy [
29]. For accurate 3D shape measurement, several works have modelled it as an optimization problem aiming to achieve optimal camera grouping, based on properties estimated from 2D images, such as visibility, triangulation angle, incident angle, texture, depth, and normal vector [
29].
Multi-camera DIC (MC-DIC), which integrates MCG with DIC, is famous for its strong capability for 3D DDM [
41]. These methods effectively extend the measurement region by fusing results measured at different object areas [
30]. MC-DIC with camera array systems enables wide-range measurements on panoramic and full-region dynamic deformations of column-shaped objects [
42] and beam-shaped structures [
43], respectively. MC-DIC with pseudo-camera systems supports dual-surface and panoramic measurements while maintaining system compactness and low cost [
28]. However, these methods show weakness at camera grouping due to the presence of unavoidable errors in estimated properties [
29]. Unlike MCG-based 3D shape measurement, which generally aims at large-scale scenes and shows error tolerance, MC-DIC is sensitive to these errors, as it typically targets small-scale dynamic deformations [
28,
29]. Nowadays, optimal camera grouping in MC-DIC remains a challenge and few methods have been proposed to target it; most MC-DIC methods typically use pre-paired cameras based on experience-guided manual configuration, which increases the complexity and effort required for system setup [
28,
30].
Recently, 3D models have been integrated into MC-DIC to represent object surfaces in 3D space for isogeometric analysis [
44]; formats include polygon mesh, and non-uniform rational B-spline (NURBS) [
45]. The rich prior spatial knowledge from 3D models enables the precise identification of camera visibility and facilitates accurate DDM by grouping visible cameras [
46]. This integration has been successfully applied to efficient 3D displacement measurement in aeronautical composite structures [
47], with measurement accuracy validated by high consistency with laser scans [
48]. However, these methods adopt a simplistic strategy that groups all visible cameras for a given measurement point, which often results in the inclusion of cameras that yield poor measurements [
49]. Such over-grouping can introduce ill-measured deformations caused by factors such as object–background discontinuity, self-occlusion, and reflective highlights. Overcoming this issue can significantly improve the robustness against cluttered backgrounds, complex object geometries, and environmental light variations.
To address this gap, we propose a novel MC-DIC method with pointwise-optimized model-based stereo pairing (MPMC-DIC), comprising model-based MC-DIC (MMC-DIC; an extended version of our previous model-based 3D-DIC [
25]) and a pointwise-optimized model-based stereo pairing strategy (PMSP). By automatically evaluating multiple cameras and selecting the optimal camera pair for each measurement point on the 3D model based on evaluation factors derived from the 3D model and calibrated cameras, MPMC-DIC overcomes the over-grouping problem and achieves high-precision wide-range 3D DDM of semi-rigid objects. Our main contributions are summarized as follows:
- (1)
A novel camera pair evaluation metric is proposed for pointwise-optimized model-based stereo pairing in 3D DDM tasks. Since each camera is evaluated individually prior to pair evaluation, the metric can also be applied to assess individual cameras for 2D-DIC.
- (2)
An MC-DIC method with pointwise-optimized model-based stereo pairing is proposed. To the best of our knowledge, this is the first work dedicated to addressing camera pairing in MC-DIC, enhancing robustness against cluttered backgrounds and complex object geometries.
- (3)
Experiments were conducted to validate the proposed MPMC-DIC method for 3D DDM, demonstrating micrometer-level accuracy and strong robustness against cluttered backgrounds and complex object geometries.
The paper is organized as follows:
Section 2 describes our MPMC-DIC method for 3D displacement estimation based on a pre-measured 3D model.
Section 3 validates our MPMC-DIC method of micrometer-level accuracy in measuring a centimeter-sized cylinder and robustness against cluttered backgrounds in comparison with the existing method which groups all visible cameras.
Section 4 demonstrates the robustness of our MPMC-DIC method against complex geometries and illustrates its ability to precisely measure the vibrations of objects vibrating at audio frequencies by visualizing detailed vibrational characteristics.
2. Method
To achieve effective and efficient wide-range 3D DDM, this study proposes a novel MC-DIC method with pointwise-optimized model-based stereo pairing. Specifically, the proposed method introduces five evaluation factors to overcome the over-grouping problem that typically arises when only visibility is considered. The five evaluation factors derived from the 3D model and multiple cameras, along with their respective functions, are as follows:
Visibility, which determines the availability of cameras for measurement.
Subset validity rate, which reflects the subset’s coverage ratio with measurement object. A lower coverage ratio leads to a greater influence from the background.
Subset gradient, which reflects the depth inclination of measurement object relative to the camera in the subset region, especially depth discontinuities due to self-occlusion.
Subset ZNCC similarity (hereinafter referred to as subset similarity), which reflects the matching confidence of correlated subsets in pre- and post-deformation images.
Disparity, which reflects the angle between a pair of cameras relative to a measurement point. A small disparity often leads to high noise sensitivity, which consequently enlarges the error; a zero disparity disables 3D estimation.
Assuming multi-camera image sequences of a semi-rigid object and an associated reference 3D model, the proposed MPMC-DIC, illustrated in
Figure 1, comprises MMC-DIC and PMSP. Following pipeline in [
25] and integrating multiple cameras and visibility determination, MMC-DIC involves four steps: (a1) Camera calibration to ensure precise measurement. (a2) Projection and visibility determination to identify the spatial relationship between measurement points and cameras. (a3) Two-dimensional-DIC to obtain 2D displacements. (a4) Three-dimensional displacement estimation based on camera pairs selected by PMSP. Note that MMC-DIC can also be applied without PMSP by using manual pairing instead. By leveraging the five evaluation factors, PMSP enables automatic and reliable (b1) Individual camera evaluation; (b2) Camera pair evaluation and selection. This ensures robust and precise wide-range 3D DDM.
MPMC-DIC utilizes multi-view reference images
and
K multi-view measurement images
captured by multiple cameras
, where
. This method assumes that the target object behaves as a semi-rigid body, with a predefined 3D model Ω serving as the reference framework. The 3D model contains
N measurement points
with normal vectors
. Below, we separately outline the detailed algorithm of MMC-DIC in
Section 2.1 and PMSP in
Section 2.2.
2.1. Model-Based MC-DIC (MMC-DIC)
The MMC-DIC algorithm is outlined in detail here, including steps (a1) to (a4), as shown in
Figure 1.
- (a1)
Camera Calibration
Calibration for cameras involves capturing multiple images with known patterns, including the 3D shape of the measurement object. For each camera, intrinsic parameter matrix and distortion parameter vector are determined.
- (a2)
Projection and Visibility Determination
The poses of the cameras relative to the 3D model are pre-determined as the extrinsic parameter matrix
and vector
as follows:
where
registers the 3D model and the reference image to determine extrinsic parameters.
The
N measurement points on the 3D model are perspectively projected onto the 2D image planes of the cameras. For each camera, the projected points
are computed using the intrinsic, distortion, and extrinsic parameters:
where
denotes the perspective projection.
The visibility of the cameras to each measurement point , as a strict guideline for cameras’ availability for camera pairing, is determined by following conditions:
The point’s 2D projection is outside the camera image area;
The point’s normal vector is opposite to the cameras’ orientation;
The point is occluded by the 3D model surfaces.
If any of the above factors apply, ; otherwise, . The c-th camera is considered for measuring the i-th measurement point only when ; this consideration also encompasses its evaluation in PMSP.
- (a3)
2D-DIC
Subsets in cameras
centered at the
i-th projected measurement point are set for 2D-DIC. The 2D displacements,
, of the
i-th projected measurement point are computed at time
t for each camera via 2D-DIC by correlating the subset region of reference images in measurement images:
where
denotes the 2D-DIC function.
- (a4)
Three-Dimensional Displacement Estimation
The
N projected measurement points
are considered to be displaced to
at time
t in the 2D images as follows:
Camera pairs of the
N measurement points,
, are determined by PMSP, as outlined in
Section 2.2. The 3D positions of the
N measurement points at time
t,
, are estimated via triangulation using their 2D position vectors from selected camera pairs as follows:
where
denotes the triangulation function. The relative 3D displacement,
, is computed as the difference between the 3D coordinates of the measurement points:
2.2. Pointwise-Optimized Model-Based Stereo Pairing (PMSP)
The PMSP algorithm is outlined in detail here, including steps (b1) and (b2), as shown in
Figure 1.
- (b1)
Individual Camera Evaluation
Depth images
are determined for each camera via 2D rendering of 3D model using intrinsic, distortion, and extrinsic parameters. Each pixel of a depth image records the depth distance of its perspectively corresponding 3D model point, or 0 when this pixel is not covered by the 3D model. Mask images
define the coverage region of 3D model rendering as follows:
The 3D model rendering coverage ratios in subsets
are computed as subset validity rates
as follows:
where
computes the pixel number in the subset region.
The inclination degree of the 3D model region projected in the subsets is computed as subset gradients
as follows:
where
and
are used to search for the maximum and minimum depth distance, respectively, in a 3 × 3 domain of
regardless of the pixels uncovered by the 3D model rendering.
The subset similarities,
, of the
i-th projected measurement point are computed at time
t for each camera via 2D-DIC:
Note that
here denotes the same processing as in Equation (
3), and subset similarities are computed along with the 2D displacements.
Camera evaluation scores at time
t,
, are computed for each measurement point as follows:
where
is the evaluation function for the subset validity rate with a parameter
(
Figure 2a) as follows:
Although the subset validity rate is theoretically defined within the range
, it typically takes values significantly above 0 due to 3D model rendering coverage. Accordingly, its evaluation function
is defined as a piecewise function, which employs a monotonically increasing linear function from 0 to 1 over the interval
, and a constant zero function otherwise, to enhance distinguishability while maintaining the linear relationship.
is defined in the range
. In practice, it is selected close to the lower bound of the computed subset validity rate distribution, which typically lies around 0.5.
is the evaluation function for subset gradient with parameters
and
(
Figure 2b) as follows:
To emphasize high evaluation scores for relatively small subset gradients
G and low scores for relatively large ones, while ensuring a monotonically decreasing trend, and avoiding an abrupt change (e.g., a step function), the subset gradient evaluation function
is defined as a flipped and shifted logistic function. Logistic function
is a commonly used S-shaped function in machine learning with smooth monotonically values within the range
[
50].
governs the maximum evaluation score at
. To provide high and distinguishable evaluation scores for small subset gradients, an empirically acceptable value range for
is
.
defines the point at which the function decreases to 0.5; in other words, this ratio controls the evaluation of big subset gradients. In practice,
is selected so that
lies around half of the upper bound of the computed subset gradient distribution.
is the evaluation function for subset similarity with a parameter
(
Figure 2c) as follows:
Similar to
, the subset similarity evaluation function
is defined as a piecewise function, which employs a monotonically increasing linear function from 0 to 1 over the interval
, and a constant zero function otherwise, to enhance distinguishability, considering that most subset similarity values tend to be concentrated near to 1 in practical applications.
is defined in the range
and, in practice, is selected close to the lower bound of the computed subset similarity distribution.
- (b2)
Camera Pair Evaluation and Selection
Disparity between the
l-th and the
r-th cameras for the
i-th measurement point,
, is computed using extrinsic parameters as follows:
where
denotes the direction vector of the
i-th measurement point relative to the
c-th camera.
For a pair containing the
l-th and
r-th cameras, the camera pair evaluation score
is computed as follows:
where
is the evaluation function for disparity with a parameter
(
Figure 2d) as follows:
For a pair of cameras, a big disparity can effectively mitigate the influence of noise, while a small disparity often enlarges the error, and a zero disparity disables 3D estimation. To reduce the evaluation score of a camera pair with small disparity and guarantee a non-zero disparity value, the disparity evaluation function
is defined as a scaled and shifted logistic function. Through scaling and shifting, this logistic function ensures the evaluation value is 0 at
, and approaches 1 as
D increases. Compared to monotonically linear function, it offers a steeper growth rate near
, followed by a progressively decreasing rate of increase, aligning with the fact that measurement noise sensitivity rises more rapidly as the disparity approaches 0. The parameter
controls the maximum rate of increase in the initial phase; in other words, smaller
encourages a stronger tendency toward bigger disparities, while larger
encourages a weaker tendency.
is selected according to the desired encouragement level under the given physical conditions. To ensure effective high and low evaluation scores for camera pairs with big and small disparities, respectively, an empirically acceptable value range for
is
.
The camera pair with the highest evaluation score,
, is selected for each measurement point to estimate the 3D displacement:
4. Vibration Measurement on PC Speaker
To assess the robustness of our proposed MPMC-DIC against complex object geometries in wide-range 3D DDM, we applied it to measure panoramic vibrations of a 9 × 15 × 16 cm PC speaker featuring a recessed membrane, as shown in
Figure 10a, and visually compared it with the visibility-only method. The speaker (
Figure 10b) was painted with random pattern and affixed with 7 mm circular markers as references. Its 3D model (
Figure 10c,d) was reconstructed by the ATOS Compact Scan, including 32,055 measurement points. The same cameras and PCs as described in
Section 3, as well as the same calibration and registration methods, were utilized in the vibration measurement for image capturing, recording, and processing. The eight cameras surrounding the speaker 99 cm away captured 1920 × 1080-pixel reference and measurement images with a 2.3 ms exposure; 400 fps measurement images were captured for 0.5 s when the PC speaker played 50 Hz audio.
Figure 11 shows the reference images, where severe self-occlusion can be observed at the membrane part in the second and fifth camera images. Measurement image series were utilized to measure vibration displacements using MPMC-DIC and the visibility-only method based on similar configurations to the accuracy verification: the 2D-DIC method in OpenCorr was performed for 2D measurement using 129 × 129-pixel subsets in both methods, and camera pairs were evaluated and selected using PMSP with the evaluation parameters shown in
Table 7. Since the speaker was stable throughout the image capturing process, PMSP was applied solely to the first frame in this experiment, with the subsequent frames using the same camera pair selection. The eight-point moving average component was removed from the measured vibration displacements to suppress the artifacts caused by camera self-motion. The peak-to-peak value of
-direction displacements is calculated as
, where
. The highest value among three directions is defined as the peak-to-peak vibration value.
Figure 12 visualizes PMSP results. Most measurement points on the front, back, and lateral surfaces paired adjacent cameras which were oriented to these points, as these cameras ensured visibility, along with high evaluation scores in terms of the subset validity rate and gradient. An exception occurred at a few measurement points located on the lateral surfaces close to the membrane region, which selected camera pair {1, 3} or {4, 6}. This resulted from the fact that the second/fifth camera provided low evaluation scores due to severe self-occlusion in the subset region, while other cameras lacked visibility. Regarding measurement points on the top surface, the visibility and similar subset conditions of all cameras allowed them to select a reliable camera pair with higher disparity; for example, some points selected the camera pair {1, 6} or {6, 8}. For measurement points located on the membrane region, the third and fourth cameras were paired, with the first, sixth, seventh, and eighth cameras excluded due to lack of visibility. Although the second and fifth cameras were also visible to part of these measurement points, they were excluded from pairing due to low camera pair evaluation scores resulting from severe self-occlusion, as illustrated in
Figure 11.
Based on the PMSP, our MPMC-DIC achieved accurate 3D vibration measurements, as shown in
Figure 13a, revealing a circular vibration distribution centered on the membrane. The vibration amplitude reaches its peak at the center and gradually diminishes toward the edges. In contrast, the visibility-only method, as shown in
Figure 13b, exhibits substantial deviations due to over-grouping of all visible cameras. Ill-measured deformations from the second and fifth cameras led to remarkable errors. As for the measurements on the speaker housing,
Figure 14 presents the periodic vibration measured using MPMC-DIC. Circular vibration distributions excited by the 50 Hz audio are observed at the left and right lateral surfaces. Due to measurement biases among cameras, minor spatial discontinuities in the measured deformations are observed along the boundary where pairing alters, like the transition region between the top and left lateral surfaces at 7.5 ms. Nevertheless, future work is expected to mitigate these discontinuities by grouping a larger number of cameras and adopting weighted triangulation.
Figure 15 presents the 3D vibrations of three points (
,
, and
) located on the speaker membrane. Corresponding to the distribution shown in
Figure 13a, these points exhibit gradually decreasing peak-to-peak vibration values of 1.252, 1.032, and 0.598 mm, respectively. Their
z-direction frequency amplitude spectra reveal a prominent peak at 50 Hz, aligned with the speaker’s operating frequency, as well as harmonic peaks at 100 and 150 Hz.
Figure 16 shows the 3D vibrations of eight points located on the speaker housing, with respective peak-to-peak vibration values of 0.009, 0.011, 0.012, 0.018, 0.013, 0.013, 0.012, and 0.009 mm. Although the vibration amplitudes of these points are significantly smaller than those on the membrane, their frequency responses exhibit a similar pattern, with maximum amplitudes at 50 Hz and harmonic peaks at 100 and 150 Hz. Across all 11 points, the frequency amplitude spectra reveal that the amplitudes at 150 Hz are slightly or significantly greater than those at 100 Hz. This indicates the presence of a mechanical resonance in the PC speaker structure at 150 Hz, in addition to the harmonic components.
These results demonstrate that our proposed MPMC-DIC enables accurate wide-range 3D DDM by selecting camera pairs with high evaluation scores, effectively handling the self-occlusions caused by complex geometries and minimizing the deviations observed in the visibility-only method, thereby confirming its robustness.