Next Article in Journal
Smart Bluetooth Stakes: Deployment of Soil Moisture Sensors with Rotating High-Gain Antenna Receiver on Center Pivot Irrigation Boom in a Commercial Wheat Field
Previous Article in Journal
PigStressNet: A Real-Time Lightweight Vision System for On-Farm Heat Stress Monitoring via Attention-Guided Feature Refinement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Stereo Synchronization Method for Consumer-Grade Video Cameras to Measure Multi-Target 3D Displacement Using Image Processing in Shake Table Experiments

Department of Civil Engineering, National Taipei University of Technology, No. 1, Sec. 3, Zhongxiao E. Rd., Daan Dist., Taipei 10608, Taiwan
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(17), 5535; https://doi.org/10.3390/s25175535
Submission received: 3 August 2025 / Revised: 29 August 2025 / Accepted: 2 September 2025 / Published: 5 September 2025
(This article belongs to the Section Sensing and Imaging)

Abstract

The use of consumer-grade cameras for stereo vision provides a cost-effective, non-contact method for measuring three-dimensional displacement in civil engineering experiments. However, obtaining accurate 3D coordinates requires accurate temporal alignment of several unsynchronized cameras, which is often lacking in consumer-grade devices. Current synchronization software methods usually only achieve precision at the frame level. As a result, they fall short for high-frequency shake table experiments, where even minor timing differences can cause significant triangulation errors. To address this issue, we propose a novel image-based synchronization method and a graphical user interface (GUI)-based software for acquiring stereo videos during shake table testing. The proposed method estimates the time lag between unsynchronized videos by minimizing reprojection errors. Then, the estimate is refined to sub-frame accuracy using polynomial interpolation. This method was validated using a high-precision motion capture system (Mocap) as a benchmark through large- and small-scale experiments. The proposed method reduces the RMSE of triangulation by up to 78.79% and achieves maximum displacement errors of less than 1 mm for small-scale experiments. The proposed approach reduces the RMSE of displacement measurements by 94.21% and 62.86% for small- and large-scale experiments, respectively. The results demonstrate the effectiveness of the proposed method for precise 3D displacement measurement with low-cost equipment. This method offers a practical alternative to expensive vision-based measurement systems commonly used in structural dynamics research.

1. Introduction

The precise measurement of structural response during dynamic loading events represents a fundamental challenge in earthquake engineering, where precise quantification of displacement, deformation, and rotational behavior is essential for understanding structural performance and developing robust design methodologies [1]. Although traditional contact-based measurement systems offer high accuracy [2], they impose significant limitations in terms of practical applicability and cost in dynamic testing environments [3,4]. The mounting complexity of conventional sensors necessitates substantial preparation time and expertise [5]; their limited dynamic range may fail to capture the full spectrum of structural response during extreme loading events [6]. Furthermore, these systems are vulnerable to damage during high-intensity dynamic tests, such as shake table experiments, where sensors are possibly subjected to severe structural deformations and accelerations. The emergence of vision-based measurement techniques, enabled by advances in computational capacity and camera technology, has introduced transformative possibilities for non-contact structural monitoring and experimental characterization [7,8]. These methodologies leverage computer vision and image processing algorithms to capture three-dimensional displacement, deformation, vibration, and strain fields of structures and components under dynamic loading conditions. Recent developments in intelligent structural health monitoring have further demonstrated the potential of integrating computer vision with edge computing technologies to enable real-time monitoring capabilities [9,10]. Unlike traditional measurement approaches, vision-based systems eliminate physical contact with test specimens, thereby avoiding sensor damage while enabling comprehensive spatial measurement capabilities that extend beyond the unidirectional constraints of conventional devices. The advantages of vision-based measurement become particularly pronounced when investigating freely moving structural and non-structural components, such as suspended ceiling systems that may experience large displacements, overturning, bending, collapse, and complex three-dimensional rotations during seismic excitation [11]. Such scenarios present insurmountable challenges for traditional measurement systems, highlighting the critical need for advanced non-contact measurement capabilities in contemporary structural engineering research.
Vision-based measurement has promising theoretical advantages [12], but its real-world application in dynamic structural testing is still catching up. Most studies have not explored its potential in shake table experiments, leaving this area relatively underexplored. Ngeljaratan and Moustafa [13] observed a 5–7% overestimation of peak response values using image-based measurement compared to linear variable differential transformer (LVDT) measurements in a dynamic test of a four-story steel frame building. This finding indicates an inherent challenge in achieving measurement accuracy levels comparable to established contact-based techniques. Wani et al. [14] employed template matching algorithms in the analysis of the response of a five-story damped steel-frame building under shake table ground acceleration, with maximum displacement measurement deviations of 6.142 mm from the reference. While these results do substantiate feasibility, they indicate that improvements in algorithmic approaches and measurement processes are needed. On the contrary, Sieffert et al. [15] reported more positive results in their investigation of a single-story timber-frame building filled with earth and stone under seismic load, employing Digital Image Correlation (DIC) software called “Tracker” to achieve displacement measurement differences of less than 5% compared to results obtained from LVDT measurements. Such different results indicate the radical influence that test conditions, target properties, and measuring parameters exert on system performance.
Stereo vision-based 3D measurement technologies typically comprise four essential processes: camera calibration, target tracking, temporal synchronization, and stereo triangulation. To achieve precise geometric reconstruction, camera calibration involves adjusting for lens distortion and measuring intrinsic and extrinsic camera parameters. Target tracking involves the temporal detection and localization of measurement points within image sequences, while temporal synchronization guarantees synchronized data acquisition from two or more cameras. Stereo triangulation computes 3D coordinates through the intersection of corresponding image points. While these processes constitute the foundation of stereo vision, recent advances in computer vision showed that monocular approaches can also recover 3D measurements [2,6,16], gaining attention for their simplified hardware requirements and cost-effectiveness. However, monocular methods remain limited in out-of-plane displacement accuracy due to restricted depth perception. In contrast, stereo vision offers better geometric accuracy via triangulation but requires a more complex setup.
Among the four stereo processes, temporal synchronization remains particularly challenging for consumer-grade equipment. While recent computer vision advances have achieved subpixel spatial accuracy for single-camera vibration measurement [17,18], these approaches address spatial precision within individual video streams rather than temporal synchronization between multiple cameras. These methods operate on single-camera data and do not address the sub-frame temporal alignment required for accurate stereo triangulation in multi-target displacement measurement. The current study addresses this distinct problem: achieving sub-frame temporal synchronization between consumer-grade cameras for multi-target 3D displacement measurement in structural testing applications.
The measurement accuracy of a dynamic vision-based measurement system depends on a complex interaction of numerous factors that must be fine-tuned with caution for specific experimental conditions. The critical parameters are the selection of the tracking algorithm [19], geometry and material properties of markers [20], resolution and optical characteristics of cameras [21], sampling rate of video, illumination, lens quality, calibration procedures [22], target speed [23], surface characteristics of the specimen, environmental factors [24,25], camera stability [26], working distance, and angle of view [27,28]. Recent studies have also emphasized the need for external multimodal sensor calibration and fusion techniques to enhance the precision and reliability of measurements in advanced experimental configurations [29]. The coupled nature of these variables presents a multidimensional optimization issue that necessitates rigorous experimental design and validation.
Multi-camera synchronization is the most critical technical challenge limiting the widespread adoption of vision-based measurement in dynamic structural testing. Precise camera synchronization is crucial for accurate 3D position determination, as even minor timing errors can significantly impact stereo triangulation [30,31]. Synchronization is crucial for reliable multi-camera multi-object tracking [32], especially when measuring rapid structural responses or complex motions. The measurement of torsional response is a particularly demanding application where precise synchronization is indispensable. Vision-based systems offer unique advantages for torsional measurement due to their ability to simultaneously survey multiple points and measure complete three-dimensional motion fields. However, synchronization errors lead to phase mismatches between the rotational data and other system parameters, which magnify or reduce the observed torsional effects, resulting in incorrect structural behavior interpretation and erroneous conclusions regarding structural stability and response characteristics under introduced loading.
Existing synchronization techniques have various efficiency levels, each with inherent limitations that restrict their application in stringent experimental environments. Manual synchronization procedures, such as the time stamp correction techniques applied by Lavezzi et al. [33], are likely to produce noisy displacement values due to the inherent inaccuracy of temporal coordination. Hardware synchronization systems [27,34] are capable of higher accuracy but must be fitted with expensive data acquisition systems that limit practical field application and contribute to overall system cost and complexity. Other approaches include using optical synchronization markers, such as synchronized light sources visible to all cameras [35]. However, these techniques do not typically provide the sub-frame-level accuracy required for high-frequency dynamic measurements. Software synchronization methods offer more universally available solutions with different performance levels. Cross-correlation methods [22] work well for synchronizing camera timing differences but are limited to the frame resolution integer, which might be too low for high-speed dynamic applications. More sophisticated methods, such as wavelet transform methods [36] and multistage synchronization protocols [37], are superior. The second approach achieves synchronization accuracy of 250 microseconds through network time protocol (NTP) versions, followed by stream phase synchronization; however, it requires specialized hardware capabilities that may not be supported by all consumer cameras. Recent advances in computer vision and multi-camera systems have underscored the importance of high-fidelity synchronization in diverse real-world applications. To derive temporal correlations between cameras, sub-frame-level synchronizations have been established using time-stamped video methods with specific markers and uniformly translating targets [38]. These developments signify the increasing recognition that synchronization accuracy has direct implications for the accuracy of vision-based measurement systems in various fields.
The imperative requirement of high-precision synchronization in the measurement of structural dynamics and the limitations of available methods constitute a key research gap that restricts the broader application of vision-based measurement approaches to earthquake engineering applications. Current synchronization techniques are inaccessible to the broader research community because they either lack the accuracy needed for dynamic structural measurements, necessitate costly specialized hardware, or rely on device capabilities that are not consistently present in consumer-level cameras. This study addresses these fundamental limitations by developing a novel synchronization methodology with sub-frame accuracy specifically designed for structural engineering applications. This study presents a comprehensive solution that integrates precise temporal alignment with a graphical user interface-based software system for measuring three-dimensional displacements of multiple targets in shake table experiments using readily available consumer-level cameras. The strategy described here aims to bridge the gap between the measurement accuracy requirements and practical constraints to facilitate the broader application of vision-based measurement methodologies in structural dynamics research. A thorough and reliable experimental validation procedure is employed to assess the accuracy and dependability of the proposed synchronization method and related image-based measurement capabilities. This study represents a significant step toward making sophisticated vision-based measurements accessible to broader structural engineering applications, from laboratory-based shake table testing to field-based structural health monitoring and post-earthquake damage reconnaissance.

2. Materials and Methods

This section describes the methodological workflow and experimental program used in this study. This study encompassed two distinct experimental phases. The first phase involved a physical experiment consisting of video recordings of the experiments conducted on moving targets. The second phase comprised a non-physical experiment centered on image analysis using the developed software.

2.1. Experimental Program

This study employed four distinct experimental setups to evaluate the accuracy of the proposed synchronization method and the developed application software in determining the 3D coordinates of multiple moving targets under various conditions. The types of experimental setups used are listed as follows.
(a) 
A small-scale frame structure (Figure 1) with dimensions 26 cm wide, 35 cm long, and 36 cm high, fixed at its base, was used to test the proposed stereo synchronization method’s accuracy. A grid of 20 tracking markers was fixed and arranged in a 4-by-5 configuration on the ceiling of the specimen. Figure 1 shows an image of the structure ceiling with tracking markers on its surface. An excitation force was applied by pushing the structure’s top edge in the horizontal direction. It vibrated freely until it stopped, and a video was recorded using two consumer-grade action cameras (SJCAM) situated beneath the structure’s ceiling with a recording video quality of 4K and a sampling rate of 60 frames per second (fps).
(b) 
A shake table with a fixed harmonic displacement amplitude of 3.00 mm was used along with an industrial motion capture system (Mocap) [39] and a dial gauge to validate the displacement measurement accuracy of the image-based measurement system. Previous studies have investigated the accuracy of Mocap using different approaches, and they have proved that its measurement error is less than a millimeter [40,41]. The Mocap system used for validation has six Optiflex 13 cameras (NaturalPoint, Inc., Corvallis, OR, USA) positioned 2 m from the targets. Two SJCAM action cameras were used to record the experimental videos with a video quality of 4K and a frame rate of 60 fps. Figure 2a shows a schematic of the experimental setup used.
(c) 
A truss bridge was constructed using a plastic frame and connected to a shake table via an aluminum bar, as shown in Figure 2b. To verify the accuracy of the displacement measurements obtained from the proposed method, we utilized a Mocap system. This experiment differs from the others because the tracking points were located on a secondary object (the bridge), and the specimen was not rigid. This secondary object had a unique vibration pattern, distinct from the primary vibration source (shake table), which exhibited harmonic motion at various frequencies. This helped us to validate the capability of the proposed method in capturing higher-order vibration on the bridge.
(d) 
The proposed synchronization strategy was further validated through shake table tests conducted on a five-story full-scale test structure (see Figure 3). The El Centro earthquake ground acceleration was used as the excitation force in the experiment for both the X and Y directions. The dimensions of the structure were 5 m by 5 m, with a total height of 13 m. This experimental work took place at the National Center for Research on Earthquake Engineering in Tainan, Taiwan. Cameras were positioned approximately 12 m from the specimen, and displacement was measured on all floors. To record the experimental videos, two SJCAM action cameras with 4K video quality and a frame rate of 60 fps were used. The points at which displacement was measured on each floor are indicated in Figure 3b.

2.2. Accuracy Metrics

To rigorously assess the accuracy of our vision-based displacement measurement method, we compared its outputs, Yi, against Mocap Xi over N time steps using three complementary error types. Note that N is the number of time steps after synchronization.
  • Maximum absolute error (MAE) quantifies the worst-case deviation, given by Equation (1).
M A E = M a x i = 1 , , N = Y i     X i
2.
Mean error (ME) measures the average magnitude of errors, given by Equation (2).
M E = 1 N i = 1 N Y i X i
3.
Root mean square error (RMSE) emphasizes larger errors by squaring residuals before averaging. RMSE is particularly sensitive to outliers and provides error magnitude in the same units as the measurement. The RMSE equation is given by Equation (3).
R M S E = 1 N i = 1 N Y i X i 2
To compare the measurement results obtained from Mocap and the proposed method, we employed a Python-based signal synchronization algorithm to align the displacement signals. This algorithm leverages cross-correlation techniques to achieve precise synchronization.

2.3. Image-Based Measurement Mathematical Models and Procedures

This study employs a stereo vision system to capture the 3D coordinates of target points in images [42], generating time series data for each target across the measurement specimen. The time history of these targets enables the estimation of displacement and vibration modes. Deformation modes, including rigid body displacement, torsion, shear, bending, and crack patterns, can be derived from the 3D coordinates of multiple points over time [22]. In a stereo vision system, transferring a point from three-dimensional space to two-dimensional images and camera coordinates necessitates geometric transformations [26]. To perform these transformations, three coordinate systems must be established as world coordinates Pw = (Xw, Yw, Zw), camera coordinates Pc = (xc, yc, zc), and image coordinates Pi = (xi, yi), as shown in Figure 4.
The coordinate transformation from the image coordinate to the world coordinate can be expressed in terms of a 4 × 4 matrix, as indicated in Equation (4). This matrix is used to account for rotation, translation, affine, and perspective transformations [43]. The 3 × 3 matrix located in the upper left corner of the 4 × 4 matrix is the rotational matrix, and the vector T = (Tx, Ty, Tz)T represents the translation of the world coordinate (Pw) to the origin of the camera coordinate system (xc, yc, zc). Mathematically, the relation between the camera coordinate and the world coordinate can be defined using Equation (4) [43,44].
x c y c z c 1 = R x x R y x R z x T x R x y R y y R z y T y R x z R Y z R z z T z 0 0 0 1 X w Y w Z w 1
The geometric coordinate transformation from the camera coordinate Pc = (xc, yc, zc) to the image coordinate Pi = (xi, yi) can be represented using Equation (5). In this type of transformation, the computation is easy when the dimension of the coordinates in the projective space is equivalent to that of the projected space. Therefore, the coordinates on the camera screen will be represented by Pc = (xc, yc, zc). The proportion of values for all points originating in a three-dimensional space, when projected onto a two-dimensional screen, remains consistent and equivalent [45]. If we divide the camera coordinates (xc, yc, zc) by a unit length zc = 1, we obtain the normalized coordinates (xn, yn, 1)T [22]. Therefore, the image coordinates after the effect of distortion are accounted for as given by Equation (5).
x i y i 1 = f x 0 c x 0 f y c y 0 0 1 x d y d 1
where fx, fy, cx, and cy are the intrinsic camera parameters that define the characteristics of the camera, the 3 × 3 matrix containing these values is known as the camera matrix, and (xd, yd, 1)T is a matrix that accounts for the effect of lens distortion on normalized coordinates. The relationship between the distorted coordinates and normalized coordinates can be defined using Equation (6) [22,45].
x d y d = k + 2 p 1 y n + 3 p 2 x n p 2 x n p 1 x n k + 2 p 2 x n + 3 p 1 y n x n y n
where
k = 1 + k 1 ( x n 2 + y n 2 ) + k 2 ( x n 2 + y n 2 ) 2 + k 3 ( x n 2 + y n 2 ) 3 1 + k 4 ( x n 2 + y n 2 ) + k 4 ( x n 2 + y n 2 ) 2 + k 6 ( x n 2 + y n 2 ) 3
where k1, k2, k3, k4, k5, and k6 are radial distortions, and p1, and p2 are tangential distortion parameters.
The image-based measurement was conducted by employing the following four steps: (1) camera calibration, (2) tracking, (3) synchronization, and (4) stereo triangulation.

2.3.1. Camera Calibration

Images captured by a camera often exhibit distortion due to the lens’s perspective features. Camera calibration corrects distortion and other parameters to obtain a rectified image. There are two types of camera parameters to consider: intrinsic and extrinsic parameters. Intrinsic parameters define how the camera captures images, whereas extrinsic parameters define the camera’s location within the 3D environment. Chessboards are most commonly used to calibrate a camera [46,47,48] because calibration algorithms can easily identify corners. However, chessboards are not reliable for large-area experiments because large, rigid calibration chessboards are not practical. Different types of markers with known world coordinates, selected from a single image frame of a video, were used as input for our calibration software, which was developed using the OpenCV computer vision library. Templates belonging to the respective targets were selected using the calibration algorithm to obtain the image coordinates. Each camera was calibrated separately. The intrinsic and extrinsic camera parameters were calculated using our program based on the relations defined in Equations (4)–(7). These equations are based on the formulations presented in [22,45].

2.3.2. Tracking

Target tracking is the process of detecting and following a specific target’s movement within a sequence of images in computer vision-based measurement [49]. Having tracking points within a small area, commonly referred to as a template, is crucial in identifying the target using tracking algorithms. This template should exhibit visual characteristics that differ from those of the parent image’s surrounding elements. Templates can be created by applying paint, attaching markers, or using natural features of the measurement area, such as holes, bolts, or joints. Several types of templates were employed for the tracking task, including black-and-white high-contrast markers, holes, backgrounds with distinguishable patterns, and stickers. Figure 5 depicts some of the markers used in this study. We employed an enhanced correlation coefficient (ECC) tracking method to track the targets. ECC is a computer vision approach that tracks objects or features across a sequence of images. The ECC algorithm aligns images by optimizing zero-mean normalized cross-correlation [4], which enhances its robustness against global variations in brightness and contrast, such as differences in exposure. However, the algorithm’s accuracy diminishes in the presence of significant non-uniform illumination, including shadows or localized lighting variations. In this study, small-scale experiments were conducted under normal lighting, while full-scale experiments were conducted in a noisy lighting environment.

2.3.3. Camera Synchronization

Camera synchronization is crucial for obtaining accurate three-dimensional coordinates of target objects in multi-camera vision systems. Even small timing differences between camera captures can cause major errors in triangulated 3D coordinates, which impacts measurement accuracy. The experimental cameras used in this study do not support trigger synchronization; therefore, software-based synchronization of video streams is necessary for reliable vision-based measurements. In this study, a synchronization approach was implemented to determine the optimal time lag between camera feeds by minimizing triangulation reprojection errors. The proposed algorithm identifies the temporal offset that produces the minimum mean squared error (MSE) of the triangulation projection errors across all detected points. To support this idea, Equations (8)–(11) are proposed by this study. The synchronization process operates in two phases. Initially, the algorithm searches for the coarse time lag that minimizes the overall reprojection error. Subsequently, it refines this estimate to achieve sub-frame precision using polynomial interpolation techniques. While similar methodologies have been proposed in previous research [50], this approach differs significantly in its time lag compensation implementation. Rather than interpolating between unsynchronized 3D coordinate points, the proposed method performs interpolation directly on image points. This modification enhances triangulation accuracy by maintaining the geometric relationships inherent in the imaging process. The assumption is that the triangulation projection error (e) is a function of time lag (tlag), expressed by a quadratic equation. If (e1, e2, e3) are the triangulation MSEs of all points at a time lag of (t1, t2, t3), respectively, we can find the optimum time lag (topt) as shown in Equation (9) by derivation of Equation (8). Equation (8) can be expressed in matrix form as given in Equation (10).
e = a t 2   + b t + c
t o p t = d e d t [ a t 2   + b t + c ] = b / 2 a
The coefficients (a) and (b) can be obtained by solving Equation (9).
e 1 e 2 e 3 = t 1 2 t 1 1 t 2 2 t 2 1 t 3 2 t 3 1 a b c
where t1 = t2 − 1 and t3 = t2 + 1, with t2 representing the optimal frame-level time lag that exhibits the minimum triangulation projection error across the entire temporal range analyzed. This configuration creates a symmetric sampling window centered on the coarse optimum (t2), which provides the necessary data points for quadratic interpolation to find the best-fitting curve, as illustrated in Figure 6.
The quadratic assumption in Equation (9) follows from the mathematical structure of reprojection error within small temporal neighborhoods. For smooth trajectories, marker positions vary linearly with a temporal offset within the ±1 frame window. Since reprojection error is the square of the Euclidean distance between observed and predicted coordinates, these linear position deviations produce quadratic error scaling. This relationship is invariant across marker sizes, motion frequencies, and trajectory patterns because it derives from squaring a linear function, requiring only local motion linearity over single-frame intervals. Based on this theoretical foundation, we consistently used second-order polynomial interpolation for sub-frame refinement. The algorithm automatically selects the interpolation window, utilizing a ±1 frame window centered on the integer-frame minimum obtained from coarse synchronization. This 3-point window provides the minimum sampling for quadratic fitting while maintaining local validity around each point’s optimal lag. This ensures a consistent application across all tracking points without the need for manual parameter tuning.
The total video length recorded by camera-1 is denoted as (t1total), and (t2total) for the video recorded by camera-2. Because the cameras are operated manually, there will always be a time lag (tlag) between the cameras. According to Figure 7, camera-2 starts recording later than camera-1; hence, the video time of camera-2 (tv2) is the sum of the time lag (tlag) and the video time of camera-1(tv1) and can be expressed using Equation (11).
t v 2 = t v 1 + t l a g
where t v 1 = a + t i n t e r e s t , t v 2 = a + t l a g + t i n t e r e s t , and t l a g = t 2 0 t 1 0 .
It is not always appropriate to use the very first frame as a starting point for synchronization analysis. The range of frames needed for analysis (tinterest) can be defined as per the users’ criteria. (F10 and F20) are (a) frames away from the very first frame (t10 and t20) of camera-1 and camera-2, respectively. (F10 and F20) and (F1n and F2n) are the first and last frames required for synchronization analysis in camera-1 and camera-2, respectively. In other words, it is the time range in frames (tinterest) that is used for analysis, as shown in Figure 7.
In this study, all videos are 60 fps recordings. At higher frame rates such as 240 fps, synchronization accuracy is expected to improve due to finer temporal resolution. The ±1 frame window corresponds to smaller time intervals (16.7 milliseconds at 60 fps versus 4.2 milliseconds at 240 fps), reducing discretization errors. However, shorter exposure times in higher frame rates may introduce increased noise, potentially affecting tracking accuracy. The quadratic polynomial order is expected to remain optimal since the fundamental geometric relationship is unchanged.

2.3.4. Stereo Triangulation

Stereo triangulation is a fundamental practice in computer vision, where the spatial coordinates of an object in a scene are determined in three dimensions. This method works with the corresponding projections of an object with images taken from different perspectives. The method employed in this instance is called binocular disparity, which computes depth information using the positional differences in corresponding picture points, allowing for a 3D estimate [51]. However, in practice, the perfect convergence of sight lines is often disrupted by measurement noise, calibration errors, and computational precision constraints. The triangulation error, or the smallest distance between skew lines from each camera view, is the result of such non-convergence. Such non-convergence causes a triangulation error, defined as the minimum distance between skew lines from each camera view. The resulting error compromises the accuracy of the reconstructed 3D coordinates and negatively impacts the precision of subsequent measurements. A two-stage triangulation approach is introduced to improve the reconstructed 3D points. Optimal triangulation is employed in the first stage to minimize the reprojection MSE using stored data from the synchronization stage. The second stage uses midpoint triangulation [50], which deduces 3D coordinates by locating the midpoint of the shortest line segment that connects two non-intersecting sight lines. The midpoint technique efficiently reduces the triangulation error because it minimizes the total of all summed squared distances to both sight lines. This makes it particularly useful in time-critical applications.

2.4. Software Development

Developing a robust, well-organized software system is increasingly challenging for most civil engineers, who often lack formal software-engineering training and struggle with the complexities of modular design, version control, and testing procedures. Researchers can automate calibration, tracking, synchronization, and triangulation workflows, making measurement analysis far more efficient and accessible for structural engineers, especially when using consumer-grade cameras and IoT devices. We developed computer vision software in Python 3.12 for 3D displacement analysis. This software is specifically designed to analyze two videos of dynamic experiments that are not synchronized. The software captures the three-dimensional coordinates of multiple moving targets over a designated period. Tkinter is used to create a graphical user interface (GUI) for the program, as depicted in Figure 8 OpenCV is used as the main computer vision library, and NumPy and SciPy are used to execute mathematical operations. Users can access the folders containing their video files, calibration files, and templates. Users can define the range of tracking, synchronization, and triangulation points based on their needs. This setting allows users to select any frame required for tracking from the video file by simply inputting the number of frames. It allows users to save the analysis results in a CSV file, and they can visualize the time series displacements. The methodological flow chart of the 3D displacement measurement algorithm is shown in Figure 9. This software is freely available and can be downloaded from the GitHub repository at https://github.com/vin389/tkStereosync_v2 (accessed on 28 August 2025). The startup program is tkStereosync_v2.py.
Contact-based motion sensors, such as LVDT and accelerometers, provide motion data in only two directions, whereas vision sensors that employ stereo cameras provide spatial coordinates in three dimensions. Therefore, this software could be utilized as an additional measurement method in the field of earthquake engineering. Using this software, it is possible to analyze motion in six degrees of freedom, encompassing translation, rocking, rotation, and torsion. Furthermore, the deflection and strain modes of a particular set of structural components due to seismic loading can be computed by studying the distribution of three-dimensional positions of targets on the structure. This software can also be used in construction engineering testing facilities as a technique for non-destructive evaluation, allowing for the measurement of vibration, displacement, deflection, and even bending. In addition, it can be used to determine the deformation of the structure due to load or other environmental conditions, check the alignment of the structure, and track the growth of cracks to evaluate the structure’s condition and deterioration.

3. Results and Discussions

This section presents the experimental and theoretical results of the proposed vision-based displacement measurement and synchronization approach. The 3D positions of the points were measured for different scenarios and experimental setups.

3.1. Synchronization Method Performance Assessment Using Reprojection Errors

In the first experiment, the small-scale frame structure (mentioned in Section 2.1 as experimental program (a)) is used to assess the accuracy of the synchronization method by interpreting the triangulation projection error results. One quantitative metric for assessing the quality of three-dimensional reconstruction is the triangulation reprojection error. A lower reprojection error indicates better alignment between the computed disparity map and the actual scene geometry, which shows that the stereo image pair and the three-dimensional points are precisely aligned. This metric serves as an indicator of the accuracy with which the stereo vision system recovers the spatial position of the observed target. Ideally, two cameras are synchronized if the RMSE of triangulation is zero. However, zero error is ideal, not practical. Synchronization is critical in ensuring accurate measurements, particularly in dynamic systems where precise time alignment of data from multiple sources is required.
Table 1 presents a summary of the reprojected RMSE values computed using the conventional method (frame-level synchronization) and the proposed approach. The proposed method reduces the RMSE by 4.91% to 78.97%. For example, in Test 5, the method refined the baseline time lag from 99 to 99.28 frames, a 0.28 (4.67 milliseconds) frame correction. This refinement significantly reduced the baseline RMSE from 0.92 to 0.193 pixels, a 78.97% reduction. Across all tests, the method consistently improved triangulation accuracy, demonstrating that enhanced temporal synchronization significantly reduces triangulation reprojection error and improves stereo vision-based 3D displacement estimation quality. The effect of temporal refinement is more pronounced in dynamic high-frequency motion tests.
Conventional software-based synchronization methods apply a uniform time lag across all points within the triangulation process. However, in dynamic experimental scenarios, this approach results in significant reprojection errors due to the varying temporal characteristics of different scene elements. The proposed methodology addresses this limitation by implementing point-specific time lag correction, as shown in Figure 10, where each triangulated point is assigned an individual temporal offset based on its dynamic behavior. The results show that the reprojection error has a direct influence on the distribution of time lag across target points. When the reprojection error is large, the variation in time lag between points is also large, and vice versa. Test 4 in Figure 10 shows a 2.39-frame (39.833 milliseconds) time lag, resulting in a high reprojection RMSE of 0.99 pixels. Conversely, Test 5’s minimal 0.114-frame (1.902 milliseconds) lag yields a low RMSE of 0.114 pixels, demonstrating that minimizing reprojection errors enhances temporal alignment accuracy.
After synchronization, the time lag for all points should be identical. This ideal situation is only possible with global shutter cameras, as they eradicate rolling shutter issues by simultaneously capturing the entire frame. Unlike rolling shutter sensors, which read out the frame line by line, global shutters expose all pixels simultaneously, eliminating temporal skew. That being said, global shutter cameras are still comparatively costly and less available in the consumer market. Figure 10 illustrates the time lag for each point post-synchronization. This phenomenon exists due to the rolling shutter effect. A constant time lag across all points exacerbates rolling shutter error. Hence, to minimize this kind of error, the proposed algorithm computes a time lag for each point. The cameras used are of the rolling shutter type, which captures an image line by line, similar to scanners, which causes the image to distort if either the camera or the scene is in motion. Most rolling shutter cameras scan images from top to bottom with a fixed line delay [52,53]. Therefore, we partially solved measurement errors induced by the rolling shutter effect by assigning a point-specific time lag.

3.2. Method Validation Based on Small-Scale Structure Displacement

The 3D time series displacements of the small-scale structure mentioned in Section 2.1 were computed using three synchronization methods: light-emitting diode (LED)-based, conventional (frame-level), and proposed (sub-frame-level) synchronization methods. Then, the computed three-dimensional displacements were compared with each other. The LED-based synchronization method involves estimating the temporal offset between stereo video sequences using visual inspection of videos. The “Jump to Time” extension in the VLC media player was used for this purpose, enabling near-precise temporal alignment between video streams using an LED flashlight as a benchmark. The displacement trajectories shown in Figure 11a–c illustrate the results from the application of the three synchronization methods in tests 1, 5, and 6, characterized by the most significant reduction in reprojection RMSE. Consistent features are evident in both the synchronization phase alignment and measurement precision across all three spatial dimensions. From tests 1 and 5, amplitude and phase shift errors indicate that even sub-frame time delays can cause significant displacement errors (see Figure 11d). Time delays of 0.202 (3.370 milliseconds) and 0.278 frames (4.638 milliseconds) resulted in peak displacement differences of approximately 0.4 to 0.6 mm in the X and Y axes (in-plane). This effect was more substantial in the Z-axis (out-of-plane), where the deviations were almost 1 mm. Figure 11b illustrates this effect. These results demonstrate that even the most minor synchronization errors can lead to significant phase shifts (misplacing the displacement trajectory in time) and amplitude changes, as shown in Figure 11d, where a 0.28-frame time lag caused an 8-frame phase shift and up to a 1 mm displacement error, ultimately deteriorating displacement tracking. This is crucial in scenarios that require exact measurements of structures, such as those involving torsional effects, where the accuracy of measurement depends on the accuracy of the measurement of rotation angles and positions.
Figure 11c shows the displacement results of test 6. These results highlight the significant effect of time synchronization on measurement accuracy. The LED-based method introduces abrupt noise to the displacement signals with a 68-frame time lag. The deviations reached up to 6 mm in both the X (in-plane) and Z (out-of-plane) directions. The conventional method exhibited clear oscillations, with a time lag of 69 frames. However, the noise amplitude was slightly lower than that of the LED-based method. Although it appears to be more stable, it still does not provide a precise and reliable displacement signal due to residual time misalignment, as observed in previous studies using non-synchronized video sources [22]. However, the proposed method has a time lag of 69.27 frames and offers a smooth, continuous displacement profile. It closely follows the expected motion, with an obvious curvature and much less noise. These results demonstrate that even minor synchronization errors can lead to significant phase shifts and amplitude distortions, ultimately misleading the interpretation of the dynamic behavior of structures.

3.3. Method Validation Based on Bridge Displacement Measurement

A small-scale bridge experiment described as experimental program (c) in Section 2.1 was employed to further validate the proposed method. Mocap is used as a baseline measurement to compare the methods. Table 2 summarizes the comparison of errors between the two methods. In the X-direction (in-plane), the proposed method achieved error reductions ranging from 6.74% to 94.25% across different frequencies, with the most notable performance at 0.5 Hz, showing MAE, ME, and RMSE reductions of up to 90.60%, 94.25%, and 94.21%, respectively. The Y-direction (out-of-plane, primary excitation direction) showed consistent error reductions, with MAE reduction ranging from 12.00% to 83.47%, ME reduction up to 89.45%, and RMSE reduction reaching 87.90%. The Z-direction (in-plane) exhibited more modest reductions, with MAE reductions ranging from 0.89% to 39.64%, ME reductions up to 43.03%, and RMSE reductions reaching 38.90%. However, some conditions in the Z-direction exhibited a slight deterioration in the ME and RMSE metrics. In the Z-direction, the expected displacement is negligible, which can be compromised by noise.

3.4. Method Validation Based on Full-Scale Structure Displacement Measurement

Full-scale experimental validation represents a critical step in assessing the practical applicability and real-world performance of vision-based displacement measurement techniques as it introduces realistic challenges such as environmental conditions, structural complexity, and measurement distances that are absent in controlled laboratory settings. The results from the five-story structure experiment demonstrate substantial improvements in measurement accuracy across all three directional components (see Table 3) when evaluated using MAE and RMSE metrics. In the X-direction (in-plane), the proposed method achieved consistent MAE reductions ranging from 6.94% to 59.60% across all five measurement points, with corresponding RMSE reductions ranging from 1.08% to 48.02%. Point 5 (roof level, see Figure 12) exhibited the most significant enhancement, with the MAE reducing from 98.39 to 39.75 mm, representing a reduction of nearly 60 mm in absolute error. The Y-direction (out-of-plane) measurements, representing the most challenging measurement scenario, showed substantial MAE reductions ranging from 5.78% to 55.80%, with RMSE reductions reaching up to 58.75%. Point 5 again demonstrated exceptional performance, with the MAE decreasing from 656.41 to 290.13 mm, achieving a remarkable absolute error reduction of over 360 mm. In the Z-direction (vertical), the proposed method yielded MAE enhancements from 4.40% to 64.89% and RMSE reductions up to 62.86%, with Point 5 showing the most dramatic reduction, where MAE decreased from 410.05 mm to 143.98 mm, representing an absolute error reduction exceeding 260 mm. The measurement data reveal a clear trend in which the maximum deformation of the structure increases with height, and the MAE increases from the lowest point to the highest point. This demonstrates that errors are directly proportional to displacement magnitude, indicating that the MAE increases proportionally when the displacement is large. Although the absolute measurement errors remained substantial due to the challenging experimental conditions, including camera distance, lighting issues, the absence of distinguishable markers, and environmental noise, the proposed method consistently outperformed the conventional approach across all directional measurements, with the most pronounced improvements observed at the top of the structure, where substantial absolute error reductions were achieved in all three directions. Figure 12 illustrates how the proposed method (green line) improves displacement results by reducing temporal shifts and noise compared with the conventional method (pink line). This highlights the benefits of point-specific and sub-frame-level synchronization.
When monitoring large structures from a distance, consumer-grade 4K action cameras (3840 × 2160 pixels) with ultra-wide lenses (up to 150°) pose serious measurement challenges. Due to limited spatial sampling, each pixel represents a displacement of several millimeters, resulting in tiny tracking templates that are susceptible to noise, drift, and changes in appearance. These problems are made worse by their optical parameters, since ultra-wide lenses compromise spatial resolution for coverage, and lens distortion introduces systematic errors that are more noticeable at the edges of images. Wide-angle distortion, limited photon capacity, and sensor noise reduce measurement fidelity, even though sub-pixel techniques can refine estimates to millimeter levels. The attainment of high-accuracy displacement measurements is ultimately hampered by these intrinsic limitations, as well as calibration, tracking, synchronization, and triangulation.

3.5. Accuracy Assessment Based on Shaking Table Displacement Measurement

Additional verification was performed using the Mocap system as the primary reference, supplemented by a dial gauge for further verification. The evaluation was conducted under four shake table frequencies: 0.1, 0.5, 1.0, and 2.0 Hz. The frequency and amplitude of the shake table motion were kept constant throughout the video recording period to ensure consistent test conditions. The camera used in the experiment features an image sensor that measures 7.81 mm diagonally and provides a maximum effective resolution of 12 megapixels (4000 by 3000), with a square pixel size of 1.55 µm by 1.55 µm. The focal length of the camera is 3.3 mm. Although the experiments were conducted using specific consumer-grade cameras (SJCAM), the proposed methodology is not hardware-specific and can be applied to a broad range of camera models, provided they meet basic requirements such as adequate resolution, frame rate, and calibration capability. The results demonstrate a high level of consistency and agreement between the proposed method and the reference instruments, indicating its accuracy and applicability for structural displacement measurements under controlled dynamic conditions.
The proposed method exhibited minimal deviation from the Mocap results in the X-direction (in-plane), with maximum differences of 0.05 and 0.038 mm at 0.1 and 1.0 Hz, and 0.038 mm at 0.5 and 2.0 Hz, respectively. These discrepancies fall well within submillimeter precision thresholds, which are essential in structural dynamics experiments. In the Y-direction (out-of-plane) of the primary axis of shake table excitation, the proposed method is closely aligned with both the Mocap and dial gauge measurements. The maximum deviation recorded was 0.15 mm, observed when compared to the dial gauge at 0.1 Hz, while the differences relative to Mocap remained between 0.05 and 0.10 mm. These small variations reflect the accuracy of the proposed method and highlight the inherent resolution limitations of mechanical instruments, such as dial gauges. All methods reported low values in the Z-direction (vertical), where minimal displacement was expected; however, at higher frequencies, deviations were slightly more pronounced. The largest observed difference between the proposed method and Mocap was 0.097 mm at 1.0 and 2.0 Hz, indicating that small out-of-plane vibrations become more challenging to capture at increased motion speeds. Nonetheless, even in these cases, the proposed method maintained submillimeter accuracy. Overall, the proposed method consistently tracked displacement with high fidelity across all spatial directions and excitation frequencies. Its robust performance across a range of dynamic conditions, achieving a maximum discrepancy of just 0.20 mm (in comparison with the dial gauge in the Y direction), demonstrates its capability to deliver measurement precision comparable to that of laboratory-grade instruments. This level of accuracy is achieved using consumer-grade imaging equipment, highlighting the method’s potential for cost-effective, high-precision structural displacement monitoring in both laboratory and field environments.
Figure 13 presents the displacement time series data for a selected point under a shaking table excitation of 1.0 Hz. The X-direction displacement (Figure 13a) exhibits a high-frequency oscillatory behavior with relatively small amplitudes (±0.15 mm), characterized by complex vibrational patterns containing multiple frequency components. The proposed method achieves peak errors of 0.05 mm, demonstrating exceptional precision in capturing intricate lateral dynamics, including higher-order vibrational modes. The Y-direction (out-of-plane) response (Figure 13b) represents the primary motion axis, with large-amplitude sinusoidal displacement patterns (5 mm peak-to-peak) reflecting the principal excitation direction, where the proposed method maintains excellent tracking accuracy with peak errors limited to approximately 0.15 mm while preserving consistent phase alignment throughout multiple cycles despite the large displacement magnitudes, indicating that the method effectively addresses depth measurement issues that are typically challenging in traditional stereo vision methods. The Z-direction (vertical) measurements (Figure 13c) show intermediate-amplitude oscillations (±0.1 mm) with complex patterns. The peak errors reach approximately 0.10 mm. This three-dimensional analysis reveals that the proposed method successfully captures the complete dynamic response across all degrees of freedom with measurement errors remaining within acceptable limits for structural monitoring applications, effectively tracking the complex multi-directional interactions that reflect the three-dimensional nature of structural dynamics, where the primary excitation couples with secondary responses in orthogonal directions.

3.6. Effect of Vibration Frequency on Measurement Accuracy

Figure 14 illustrates the impact of excitation frequency on measurement accuracy across all three spatial directions, utilizing Mocap as the ground-truth reference. The primary motion occurs in the Y-direction (out-of-plane), where the displacement is largest, and measurement errors exhibit strong frequency dependence. As the excitation frequency increases from 0.1 Hz to 2.0 Hz, the RMSE rises significantly, reaching a maximum of 0.626 mm at 2.0 Hz. This increase is attributed to the limitations of the stereo vision system’s temporal resolution, which is affected by the camera’s rolling shutter and becomes less effective at tracking fast displacement changes. In contrast, the X- and Z-directions (both in-plane) exhibit only minor vibrations due to the unidirectional motion of the shake table. Nonetheless, errors in these directions remain below 0.082 mm and 0.072 mm, respectively, indicating that the proposed synchronization method maintains submillimeter accuracy even for secondary motion components. Overall, the results demonstrate that while measurement accuracy decreases with increasing excitation frequency, especially in the primary motion axis (Y), the proposed system still achieves high precision across all tested frequencies.

3.7. Structural Higher-Mode Vibration Observation

Figure 15 shows that the proposed method accurately captures high-frequency structural vibrations with submillimeter displacements, revealing a complex, multiscale dynamic response with two distinct frequency components. The displacement time series exhibits large-amplitude oscillations (2.85 mm peak-to-peak) at the fundamental excitation frequency of 0.1 Hz, representing the forced response of the bridge structure to the shake table input. Superimposed on this response are high-frequency internal vibration components with an amplitude of approximately 0.1 mm occurring at an estimated frequency of 8.0 Hz. The magnified view demonstrates exceptional measurement accuracy, with the proposed method achieving an average difference of 0.0471 mm and a maximum difference of 0.108 mm when capturing these rapid displacement variations compared to the Mocap system. Although slight discrepancies are observed in the zoomed region, they are likely attributed to the behavior of the ECC tracking algorithm under rapid structural motion rather than measurement noise. Through iterative refinement within a parametric warp space, ECC is optimized for smooth, incremental image alignment. Although it performs well under a broad range of conditions, its responsiveness to sharp, high-frequency motion may be limited by the convergence dynamics of the optimization process [4]. Nonetheless, the successful resolution of 8.0 Hz vibrations demonstrates that the proposed synchronization method effectively overcomes the limitations of stereo vision in terms of temporal resolution. This enables accurate measurement across a wide frequency range (0.1–8.0 Hz) via point-specific time lag correction and shows excellent agreement with the Mocap system. The consistency observed throughout both low-frequency primary motion and high-frequency secondary vibrations confirms the reliability and practical applicability of the synchronization approach for real-world structural monitoring scenarios where multiple vibration modes may be simultaneously excited.
The proposed method offers a reliable approach for observing and analyzing how a system responds structurally under various excitation conditions. Figure 16 shows the time history of the structural vibration of the bridge system. This data was analyzed using the proposed method and compared with the motion capture system. The results indicate that the structural frequency of the bridge increases as the excitation frequency increases. This demonstrates that the proposed method can effectively track dynamic changes in structural behavior under various excitation conditions. Figure 16a shows that the structural vibration has a lower frequency when the shake table runs at 0.5 Hz. Figure 16b shows a higher structural vibration frequency when the shake table frequency is increased to 1 Hz. The strong correlation between the proposed method and the reference motion capture system at both frequency levels confirms the reliability of the stereo vision approach in capturing frequency-dependent structural responses. This ability is crucial for structural health monitoring, in which the dynamic characteristics of the system can change under different loading conditions. The proposed method successfully maintains measurement accuracy while monitoring these variations in real time.

4. Limitations

The quality of point tracking is fundamentally the constraint on the precision of our synchronization method. Motion blur degrades the high-frequency details used by ECC correlation, leading to faulty displacements and reduced MSE curves. Partial occlusion generates spurious mismatches that distort alignment and triangulation error profiles. Both phenomena cause polynomial interpolation to produce faulty time lag estimates. While the ECC tracker neither failed nor required human intervention, marker size, video resolution, standoff distance, field of view, baseline distance, and environmental noise still affected tracking accuracy. These factors inherently impose their own limitations on displacement accuracy, which directly translates into synchronization accuracy.
The GUI software is limited to two cameras and requires the user to define their time lag search ranges; poorly chosen ranges can cause incorrect synchronization, consequently compromising the displacement results. This method assumes consistent inter-camera frame rates and relies on standard video codecs, limiting compatibility with proprietary recording formats. The tracking system processes sequentially, resulting in high computational complexity with an approximate processing time of 1–2 min per 1000 frames, depending on the video resolution. Furthermore, the software does not support real-time processing. The calibration framework necessitates predefined parameter structures without cross-platform compatibility or automated quality assessment, potentially compromising triangulation accuracy when suboptimal calibration data is employed. Further details are included in the software’s documentation.

5. Conclusions

This study presents a novel point-specific synchronization method for stereo vision systems that significantly enhances the accuracy of three-dimensional (3D) displacement measurements in dynamic structural monitoring applications. The proposed approach addresses the fundamental limitations of conventional frame-level synchronization by implementing individual temporal offset corrections for each reconstructed point through a two-phase optimization process using polynomial interpolation techniques that combine coarse frame-level synchronization with sub-frame precision refinement. This methodology accounts for the varying temporal characteristics of different scene elements under dynamic loading conditions while maintaining the imaging process’s inherent geometric relationships. Furthermore, the proposed method mitigates the rolling shutter effect and ensures precise temporal synchronization for accurate motion analysis.
The comprehensive experimental validation across four distinct test configurations, namely small-scale frame structure, shake table system, bridge model, and full-scale experimental structure, demonstrates substantial improvements in measurement accuracy. The rigorous evaluation employed various excitation frequencies (0.1–2.0 Hz) and diverse tracking templates to assess method robustness under varying conditions. The proposed method achieved RMSE reductions ranging from 4.91% to 78.97% compared with conventional synchronization approaches. Comprehensive validation against Mocap systems and dial gauge measurements confirms the method’s practical accuracy, with displacement RMSE reductions of up to 0.69 and 240 mm for small- and full-scale experiments, respectively. Additionally, the proposed method successfully captures complex multiscale dynamic responses, including high-order structural vibrations up to 8.0 Hz with submillimeter precision.
In addition, we successfully developed user-friendly GUI software for determining 3D displacement using stereo vision technology. It is challenging to build a fully functional software system that combines various libraries for calibration, target selection, tracking, synchronization, triangulation, and visualization, especially for civil engineers and researchers who may lack strong programming skills. This tool fills that gap by offering an easy-to-use interface. This enables researchers to use advanced computer vision techniques for measuring 3D displacement without the need for in-depth programming knowledge. This makes advanced displacement analysis easier for the engineering community.
The practical implications of this research extend beyond laboratory validation to real-world structural health monitoring applications where precise displacement tracking is critical for safety and performance evaluation. Despite the poor measurement accuracy in the full-scale experiment, the proposed method significantly reduces displacement errors. Its successful validation across a wide frequency range (0.1–8.0 Hz) using consumer-grade imaging equipment demonstrates the method’s cost-effectiveness while maintaining measurement precision comparable to laboratory-grade instruments. The method’s ability to suppress measurement noise while preserving temporal fidelity, combined with accurate capture of both primary motion and high-frequency secondary vibrations, establishes its suitability for critical applications such as bridge monitoring, seismic response analysis, and vibration-based damage detection systems.
Although the current study demonstrates robust performance across diverse experimental configurations with comprehensive validation against industrial-grade reference systems (Mocap), future research should investigate the method’s performance under varying environmental conditions, including different lighting scenarios and weather conditions. The observed frequency-dependent error characteristics, particularly the influence of rolling shutter effects at higher frequencies, warrant further investigation to establish optimal camera specifications and frame rates for specific monitoring applications. The development of automated target detection and tracking algorithms that can handle various template types (high-contrast markers, natural features, and stickers) could further enhance the practical applicability of the proposed synchronization approach for large-scale infrastructure monitoring. The integration of machine learning techniques for dynamic time lag prediction based on scene characteristics, camera configuration, and motion patterns represents a promising avenue for advancing real-time adaptive synchronization in stereo vision-based displacement measurement systems.
This study’s findings contribute to the growing body of knowledge in computer vision-based structural monitoring by providing a robust solution for temporal synchronization challenges. The proposed methodology offers a significant step forward in achieving the measurement accuracy required for critical infrastructure monitoring applications, where precise displacement quantification is essential for informed decision-making regarding structural integrity and safety.

Author Contributions

Conceptualization, Y.-S.Y.; methodology, Y.-S.Y. and M.K.S.; software, Y.-S.Y.; validation, M.K.S.; formal analysis, M.K.S.; investigation, M.K.S.; resources, Y.-S.Y.; data curation, M.K.S.; writing—original draft preparation, M.K.S.; writing—review and editing, M.K.S.; visualization, M.K.S.; supervision, Y.-S.Y.; project administration, Y.-S.Y.; funding acquisition, Y.-S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Council, Taiwan, grant numbers NSTC 113-2625-M-027-005 and NSTC 114-2625-M-027-004.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Acknowledgments

The authors acknowledge Fu-Pei Hsiao, National Center for Research on Earthquake Engineering, Taiwan, for providing measurement data and authorizing the authors to record videos of the five-story full-scale experiment. During the preparation of this manuscript/study, the author(s) used Claude Sonnet 4 for the purposes of editing text. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Lv, J.; He, P.; Hou, X.; Xiao, J.; Wen, L.; Lv, M. A target-free vision-based method for out-of-plane vibration measurement using projection speckle and camera self-calibration technology. Eng. Struct. 2024, 303, 117416. [Google Scholar] [CrossRef]
  2. Shao, Y.; Li, L.; Li, J.; Li, Q.; An, S.; Hao, H. 3D displacement measurement using a single-camera and mesh deformation neural network. Eng. Struct. 2024, 318, 118767. [Google Scholar] [CrossRef]
  3. Qu, C.-X.; Jiang, J.-Z.; Yi, T.-H.; Li, H.-N. Computer vision-based 3d coordinate acquisition of surface feature points of building structures. Eng. Struct. 2024, 300, 117212. [Google Scholar] [CrossRef]
  4. Evangelidis, G.D.; Psarakis, E.Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1858–1865. [Google Scholar] [CrossRef]
  5. Lin, J.-F.; Xu, Y.-L.; Zhan, S. Experimental investigation on multi-objective multi-type sensor optimal placement for structural damage detection. Struct. Health Monit. 2019, 18, 882–901. [Google Scholar] [CrossRef]
  6. Ma, Z.; Choi, J.; Sohn, H. Three-dimensional structural displacement estimation by fusing monocular camera and accelerometer using adaptive multi-rate Kalman filter. Eng. Struct. 2023, 292, 116535. [Google Scholar] [CrossRef]
  7. Feng, D.; Feng, M.Q. Identification of structural stiffness and excitation forces in time domain using noncontact vision-based displacement measurement. J. Sound Vib. 2017, 406, 15–28. [Google Scholar] [CrossRef]
  8. Baqersad, J.; Poozesh, P.; Niezrecki, C.; Avitabile, P. Photogrammetry and optical methods in structural dynamics–a review. Mech. Syst. Signal Process. 2017, 86, 17–34. [Google Scholar] [CrossRef]
  9. Peng, Z.; Li, J.; Hao, H.; Zhong, Y. Smart structural health monitoring using computer vision and edge computing. Eng. Struct. 2024, 319, 118809. [Google Scholar] [CrossRef]
  10. Azizi, S.; Karami, K.; Mariani, S. A Vision-based procedure with subpixel resolution for motion estimation. Sensors 2025, 25, 3101. [Google Scholar] [CrossRef]
  11. Feng, D.; Feng, M.Q. Computer vision for SHM of civil infrastructure: From dynamic response measurement to damage detection–A review. Eng. Struct. 2018, 156, 105–117. [Google Scholar] [CrossRef]
  12. Tan, D.; Li, W.; Tao, Y.; Ji, B. Bridge deformation monitoring combining 3D laser scanning with multi-scale algorithms. Sensors 2025, 25, 3869. [Google Scholar] [CrossRef] [PubMed]
  13. Ngeljaratan, L.; Moustafa, M.A. Digital image correlation for dynamic shake table test measurements. In Proceedings of the 7th International Conference on Advances in Experimental Structural Engineering (7AESE), Pavia, Italy, 6–8 September 2017; pp. 6–8. [Google Scholar]
  14. Wani, Z.R.; Tantray, M.; Farsangi, E.N. In-plane measurements using a novel streamed digital image correlation for shake table test of steel structures controlled with MR dampers. Eng. Struct. 2022, 256, 113998. [Google Scholar] [CrossRef]
  15. Sieffert, Y.; Vieux-Champagne, F.; Grange, S.; Garnier, P.; Duccini, J.C.; Daudeville, L. Full-field measurement with a digital image correlation analysis of a shake table test on a timber-framed structure filled with stones and earth. Eng. Struct. 2016, 123, 451–472. [Google Scholar] [CrossRef]
  16. Shao, Y.; Li, L.; Li, J.; Li, Q.; An, S.; Hao, H. Monocular vision based 3D vibration displacement measurement for civil engineering structures. Eng. Struct. 2023, 293, 116661. [Google Scholar] [CrossRef]
  17. Yi, J.; Kong, X.; Li, J.; Hu, J.; Deng, L. Full-field modal identification of cables based on subpixel edge detection and dual matching tracking method. Mech. Syst. Signal Process. 2025, 226, 112321. [Google Scholar] [CrossRef]
  18. Luo, K.; Kong, X.; Wang, X.; Jiang, T.; Frøseth, G.T.; Rønnquist, A. Cable vibration measurement based on broad-band phase-based motion magnification and line tracking algorithm. Mech. Syst. Signal Process. 2023, 200, 110575. [Google Scholar] [CrossRef]
  19. Bing, P.; Hui-Min, X.; Bo-Qin, X.; Fu-Long, D. Performance of sub-pixel registration algorithms in digital image correlation. Meas. Sci. Technol. 2006, 17, 1615. [Google Scholar] [CrossRef]
  20. Sutton, M.A.; Yan, J.H.; Tiwari, V.; Schreier, H.W.; Orteu, J.-J. The effect of out-of-plane motion on 2D and 3D digital image correlation measurements. Opt. Lasers Eng. 2008, 46, 746–757. [Google Scholar] [CrossRef]
  21. Kromanis, R.; Forbes, C. A low-cost robotic camera system for accurate collection of structural response. Inventions 2019, 4, 47. [Google Scholar] [CrossRef]
  22. Yang, Y.-S. Measurement of dynamic responses from large structural tests by analyzing non-synchronized videos. Sensors 2019, 19, 3520. [Google Scholar] [CrossRef] [PubMed]
  23. Zappa, E.; Hasheminejad, N. Digital image correlation technique in dynamic applications on deformable targets. Exp. Tech. 2017, 41, 377–387. [Google Scholar] [CrossRef]
  24. Das, S.; Saha, P. A review of some advanced sensors used for health diagnosis of civil engineering structures. Measurement 2018, 129, 68–90. [Google Scholar] [CrossRef]
  25. Lyons, J.S.; Liu, J.; Sutton, M.A. High-temperature deformation measurements using digital-image correlation. Exp. Mech. 1996, 36, 64–70. [Google Scholar] [CrossRef]
  26. Sutton, M.A.; Orteu, J.-J.; Schreier, H.W. Image Correlation for Shape, Motion and Deformation Measurements; Springer: New York, NY, USA, 2009; pp. 1–24. ISBN 978-0-387-78746-6. [Google Scholar]
  27. Malowany, K.; Malesa, M.; Kowaluk, T.; Kujawinska, M. Multi-camera digital image correlation method with distributed fields of view. Opt. Lasers Eng. 2017, 98, 198–204. [Google Scholar] [CrossRef]
  28. Zhang, Z.; Pan, B.; Grédiac, M.; Song, W. Accuracy-enhanced constitutive parameter identification using virtual fields method and special stereo-digital image correlation. Opt. Lasers Eng. 2018, 103, 55–64. [Google Scholar] [CrossRef]
  29. Qiu, Z.; Martínez-Sánchez, J.; Arias-Sánchez, P.; Rashdi, R. External multi-modal imaging sensor calibration for sensor fusion: A review. Inf. Fusion 2023, 97, 101806. [Google Scholar] [CrossRef]
  30. Bottalico, F.; NIezrecki, C.; Sabato, A. Next generation 3D-DIC technique with sensor-based extrinsic parameter calibration and natural pattern tracking. In Proceedings of the 14th International Workshop on Structural Health Monitoring 2023 (IWSHM 2023), Stanford, CA, USA, 12–14 September 2023. [Google Scholar]
  31. Li, X.; Zhang, X.; Wang, Z.; Zheng, D.; Li, P. Experimental study with DIC technique on the bond failure of structural concrete under repeated reversed loading with different amplitude. Adv. Struct. Eng. 2023, 26, 3089–3111. [Google Scholar] [CrossRef]
  32. Amosa, T.I.; Sebastian, P.; Izhar, L.I.; Ibrahim, O.; Ayinla, L.S.; Bahashwan, A.A.; Bala, A.; Samaila, Y.A. Multi-camera multi-object tracking: A review of current trends and future advances. Neurocomputing 2023, 552, 126558. [Google Scholar] [CrossRef]
  33. Lavezzi, G.; Ciarcià, M.; Won, K.; Tazarv, M. A DIC-UAV based displacement measurement technique for bridge field testing. Eng. Struct. 2024, 308, 117951. [Google Scholar] [CrossRef]
  34. Wei, K.; Yuan, F.; Shao, X.; Chen, Z.; Wu, G.; He, X. High-speed multi-camera 3D DIC measurement of the deformation of cassette structure with large shaking table. Mech. Syst. Signal Process. 2022, 177, 109273. [Google Scholar] [CrossRef]
  35. Pribanic, T.; Hedi, A.; Gracanin, V. Harmonic oscillations modelling for the purpose of camera synchronization. In Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications (VISIGRAPP 2012)—GRAPP, Rome, Italy, 24–26 February 2012; pp. 202–205. [Google Scholar]
  36. Nikfar, F.; Konstantinidis, D. Evaluation of vision-based measurements for shake-table testing of nonstructural components. J. Comput. Civ. Eng. 2017, 31, 04016050. [Google Scholar] [CrossRef]
  37. Ansari, S.; Wadhwa, N.; Garg, R.; Chen, J. Wireless software synchronization of multiple distributed cameras. In Proceedings of the 2019 IEEE International Conference on Computational Photography (ICCP), Tokyo, Japan, 15–17 September 2019; pp. 1–9. [Google Scholar]
  38. Zhou, X.; Dai, Y.; Qin, H.; Qiu, S.; Liu, X.; Dai, Y.; Li, J.; Yang, T. Subframe-level synchronization in multi-camera system using time-calibrated video. Sensors 2024, 24, 6975. [Google Scholar] [CrossRef]
  39. OptiTrack. Documentation|EXTERNAL OptiTrack Documentation. Available online: https://docs.optitrack.com (accessed on 25 December 2024).
  40. Furtado, J.S.; Liu, H.H.T.; Lai, G.; Lacheray, H.; Desouza-Coelho, J. Comparative analysis of OptiTrack motion capture systems. In Advances in Motion Sensing and Control for Robotic Applications, Selected papers from the Symposium on Mechatronics, Robotics, and Control (SMRC’18)-CSME International Congress 2018, Toronto, ON, Canada, 27–30 May 2018; Springer International Publishing: Cham, Switzerland, 2019; pp. 15–31. [Google Scholar]
  41. Hansen, C.; Gibas, D.; Honeine, J.-L.; Rezzoug, N.; Gorce, P.; Isableu, B. An inexpensive solution for motion analysis. Proc. Inst. Mech. Eng. Part P J. Sports Eng. Technol. 2014, 228, 165–170. [Google Scholar] [CrossRef]
  42. Helfrick, M.N.; Niezrecki, C.; Avitabile, P.; Schmidt, T. 3D digital image correlation methods for full-field vibration measurement. Mech. Syst. Signal Process. 2011, 25, 917–927. [Google Scholar] [CrossRef]
  43. Chen, H.-T. Geometry-based camera calibration using five-point correspondences from a single image. IEEE Trans. Circuits Syst. Video Technol. 2016, 27, 2555–2566. [Google Scholar] [CrossRef]
  44. Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
  45. Kaehler, A.; Bradski, G. Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016; ISBN 978-1-491-93799-0. [Google Scholar]
  46. Chen, F.; Chen, X.; Xie, X.; Feng, X.; Yang, L. Full-field 3D measurement using multi-camera digital image correlation system. Opt. Lasers Eng. 2013, 51, 1044–1052. [Google Scholar] [CrossRef]
  47. Yang, Y.-S.; Chang, C.-H.; Wu, C. Damage indexing method for shear critical tubular reinforced concrete structures based on crack image analysis. Sensors 2019, 19, 4304. [Google Scholar] [CrossRef] [PubMed]
  48. Yang, L.; Li, M.; Song, X.; Xiong, Z.; Hou, C.; Qu, B. Vehicle speed measurement based on binocular stereovision system. IEEE Access 2019, 7, 106628–106641. [Google Scholar] [CrossRef]
  49. Peng, Y.; Yang, J.; Feng, Y.; Yu, S.; Xing, F.; Sun, T. An image-free single-pixel detection system for adaptive multi-target tracking. Sensors 2025, 25, 3879. [Google Scholar] [CrossRef] [PubMed]
  50. Shimizu, M. Simple triangulation for asynchronous stereo cameras. In Proceedings of the 2019 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kaula Lumpur, Malaysia, 17–19 September 2019; pp. 269–274. [Google Scholar]
  51. Zhou, L.; Wu, G.; Zuo, Y.; Chen, X.; Hu, H. A comprehensive review of vision-based 3d reconstruction methods. Sensors 2024, 24, 2314. [Google Scholar] [CrossRef] [PubMed]
  52. Oth, L.; Furgale, P.; Kneip, L.; Siegwart, R. Rolling shutter camera calibration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1360–1367. [Google Scholar]
  53. Fan, B.; Dai, Y.; He, M. Rolling shutter camera: Modeling, optimization and learning. Mach. Intell. Res. 2023, 20, 783–798. [Google Scholar] [CrossRef]
Figure 1. Experimental setup of the small-scale frame structure; photo (left) and schematic illustration (right).
Figure 1. Experimental setup of the small-scale frame structure; photo (left) and schematic illustration (right).
Sensors 25 05535 g001
Figure 2. Experimental setup: (a) shake table only; (b) truss bridge connected to a shake table.
Figure 2. Experimental setup: (a) shake table only; (b) truss bridge connected to a shake table.
Sensors 25 05535 g002
Figure 3. Full-scale experimental specimen: (a) schematic illustration; (b) location of measured points.
Figure 3. Full-scale experimental specimen: (a) schematic illustration; (b) location of measured points.
Sensors 25 05535 g003
Figure 4. Illustration of stereo camera setup.
Figure 4. Illustration of stereo camera setup.
Sensors 25 05535 g004
Figure 5. Types of markers used in this study.
Figure 5. Types of markers used in this study.
Sensors 25 05535 g005
Figure 6. Schematic illustration of time lag determination: (a) frame level; (b) optimized (sub-frame level).
Figure 6. Schematic illustration of time lag determination: (a) frame level; (b) optimized (sub-frame level).
Sensors 25 05535 g006
Figure 7. Visual illustration of the time lag between two videos.
Figure 7. Visual illustration of the time lag between two videos.
Sensors 25 05535 g007
Figure 8. Graphical user interface of the developed vision-based 3D displacement measurement software.
Figure 8. Graphical user interface of the developed vision-based 3D displacement measurement software.
Sensors 25 05535 g008
Figure 9. Flowchart of methodologies of the developed application software.
Figure 9. Flowchart of methodologies of the developed application software.
Sensors 25 05535 g009
Figure 10. Point-specific time lag computation.
Figure 10. Point-specific time lag computation.
Sensors 25 05535 g010
Figure 11. Focused time series 3D displacement results comparing LED-based, conventional, and proposed synchronization methods: (a) test 1, (b) test 5, and (c) test 6. (d) Zoomed illustration of the phase shift and erroneous displacement due to frame-level synchronization.
Figure 11. Focused time series 3D displacement results comparing LED-based, conventional, and proposed synchronization methods: (a) test 1, (b) test 5, and (c) test 6. (d) Zoomed illustration of the phase shift and erroneous displacement due to frame-level synchronization.
Sensors 25 05535 g011
Figure 12. Schematic representation of a story-wise in-plane time series displacement.
Figure 12. Schematic representation of a story-wise in-plane time series displacement.
Sensors 25 05535 g012
Figure 13. Displacement comparison between the proposed method and Mocap for a selected 1.0 Hz shake table frequency: (a) X-direction, (b) Y-direction, and (c) Z-direction.
Figure 13. Displacement comparison between the proposed method and Mocap for a selected 1.0 Hz shake table frequency: (a) X-direction, (b) Y-direction, and (c) Z-direction.
Sensors 25 05535 g013
Figure 14. Comparison of displacement between the proposed method and Mocap for a selected 1.0 Hz shake table frequency: (a) X-direction, (b) Y-direction, and (c) Z-direction.
Figure 14. Comparison of displacement between the proposed method and Mocap for a selected 1.0 Hz shake table frequency: (a) X-direction, (b) Y-direction, and (c) Z-direction.
Sensors 25 05535 g014
Figure 15. Measurement accuracy for higher vibration frequency.
Figure 15. Measurement accuracy for higher vibration frequency.
Sensors 25 05535 g015
Figure 16. Bridge vibration response comparison between proposed method and Mocap: (a) 0.5 Hz; (b) 1.0 Hz.
Figure 16. Bridge vibration response comparison between proposed method and Mocap: (a) 0.5 Hz; (b) 1.0 Hz.
Sensors 25 05535 g016
Table 1. Comparison of triangulation RMSE between proposed and conventional methods.
Table 1. Comparison of triangulation RMSE between proposed and conventional methods.
CaseConventional Method (CM)Proposed Method (PM)RMSE Reduced
Relative to CM (%)
Time Lag (Frame)RMSE (Pixel)Time Lag (Frame)RMSE (Pixel)
Camera-1Camera-2Camera-1Camera-2
1−3811.4270.980−381.200.5710.39159.99
22520.3240.378252.350.2950.3448.98
3−770.2690.299−76.880.2560.2844.91
41651.1541.125164.900.9880.99214.37
5990.9200.94399.280.1930.19878.97
6690.1520.16769.270.1090.12038.7
Note: The “conventional method” refers to the frame-level synchronization throughout the paper.
Table 2. Statistical summary of errors and comparison between the conventional and proposed methods of the small-scale bridge experiment. (NB: units are in millimeters).
Table 2. Statistical summary of errors and comparison between the conventional and proposed methods of the small-scale bridge experiment. (NB: units are in millimeters).
X-DirectionConventional MethodProposed MethodError Reduced (%)
TargetMAE ME RMSE MAE ME RMSE MAE ME RMSE
0.1 Hz0.110.030.130.080.020.1021.0028.3026.65
0.5 Hz1.170.630.730.110.040.0490.6094.2594.21
1.0 Hz0.160.040.050.150.040.056.74−0.652.81
2.0 Hz1.290.100.180.220.070.0983.2030.4153.42
Y-direction
0.1 Hz0.470.20.230.140.040.0571.1279.5577.92
0.5 Hz1.540.740.840.360.080.176.989.4587.9
1.0 Hz0.780.150.190.690.150.1912−0.32−0.28
2.0 Hz2.660.190.440.440.10.1483.4748.1869.08
Z-direction
0.1 Hz0.100.040.050.100.040.052.86−5.53−3.33
0.5 Hz0.260.100.120.170.060.0735.0443.0338.90
1.0 Hz0.160.070.070.160.070.080.89−8.85−7.96
2.0 Hz0.220.060.070.130.060.0739.641.241.72
Table 3. Statistical summary of errors and comparison between the conventional and proposed methods of the full-scale experiment. (NB: units are in millimeters).
Table 3. Statistical summary of errors and comparison between the conventional and proposed methods of the full-scale experiment. (NB: units are in millimeters).
X-DirectionConventional MethodProposed MethodError Reduced (%)
TargetMAE ME RMSE MAE ME RMSE MAE ME RMSE
Point 142.998.668.1440.017.668.056.9411.541.08
Point 246.858.8410.2128.876.747.8938.3823.6922.74
Point 352.098.6710.4927.678.109.3246.886.6411.15
Point 479.3413.9216.8154.9413.4416.5930.753.451.30
Point 598.3914.8518.9839.758.029.8759.6046.0148.02
Y-direction
Point 1205.3524.42102.80166.3620.8585.0818.9814.6117.24
Point 2215.2346.3754.80147.7635.0440.5831.3524.4225.95
Point 3260.3938.8852.34165.5345.6451.0936.43−17.392.39
Point 4244.8829.9257.27230.7436.1149.795.78−20.6913.07
Point 5656.4193.17121.02290.1340.3849.9255.8056.6658.75
Z-direction
Point 130.364.375.4227.174.635.5010.52−5.93−1.62
Point 214.082.433.1513.461.802.544.4025.7319.43
Point 360.7612.1714.6735.3311.3212.7041.867.0013.46
Point 4102.0413.3624.9570.0316.1720.9731.36−21.0315.95
Point 5410.0541.3678.02143.9819.2728.9864.8953.4062.86
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Seyfu, M.K.; Yang, Y.-S. A Stereo Synchronization Method for Consumer-Grade Video Cameras to Measure Multi-Target 3D Displacement Using Image Processing in Shake Table Experiments. Sensors 2025, 25, 5535. https://doi.org/10.3390/s25175535

AMA Style

Seyfu MK, Yang Y-S. A Stereo Synchronization Method for Consumer-Grade Video Cameras to Measure Multi-Target 3D Displacement Using Image Processing in Shake Table Experiments. Sensors. 2025; 25(17):5535. https://doi.org/10.3390/s25175535

Chicago/Turabian Style

Seyfu, Mearge Kahsay, and Yuan-Sen Yang. 2025. "A Stereo Synchronization Method for Consumer-Grade Video Cameras to Measure Multi-Target 3D Displacement Using Image Processing in Shake Table Experiments" Sensors 25, no. 17: 5535. https://doi.org/10.3390/s25175535

APA Style

Seyfu, M. K., & Yang, Y.-S. (2025). A Stereo Synchronization Method for Consumer-Grade Video Cameras to Measure Multi-Target 3D Displacement Using Image Processing in Shake Table Experiments. Sensors, 25(17), 5535. https://doi.org/10.3390/s25175535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop