Measurement of Three-Dimensional Structural Displacement Using a Hybrid Inertial Vision-Based System

Accurate three-dimensional displacement measurements of bridges and other structures have received significant attention in recent years. The main challenges of such measurements include the cost and the need for a scalable array of instrumentation. This paper presents a novel Hybrid Inertial Vision-Based Displacement Measurement (HIVBDM) system that can measure three-dimensional structural displacements by using a monocular charge-coupled device (CCD) camera, a stationary calibration target, and an attached tilt sensor. The HIVBDM system does not require the camera to be stationary during the measurements, while the camera movements, i.e., rotations and translations, during the measurement process are compensated by using a stationary calibration target in the field of view (FOV) of the camera. An attached tilt sensor is further used to refine the camera movement compensation, and better infers the global three-dimensional structural displacements. This HIVBDM system is evaluated on both short-term and long-term synthetic static structural displacements, which are conducted in an indoor simulated experimental environment. In the experiments, at a 9.75 m operating distance between the monitoring camera and the structure that is being monitored, the proposed HIVBDM system achieves an average of 1.440 mm Root Mean Square Error (RMSE) on the in-plane structural translations and an average of 2.904 mm RMSE on the out-of-plane structural translations.


Introduction
Monitoring the displacements of a structure can provide significant insights into its structural behavior, operating condition, and health [1]. In recent years, accurate measurement of the structural responses under different field conditions has presented a challenging task. This challenging task requires large arrays of instrumentations and incurs high costs in the measurement process. To address this challenging task, several structural health monitoring (SHM) methods focus on monitoring structural acceleration [2,3], but these acceleration-based measurements are typically not accurate when the structural dynamic responses are in the low-frequency ranges. Global positioning systems (GPS) have been investigated by several researchers for measuring static structural displacements. However, these GPS technologies only provide accurate positionings for structures with large displacements, e.g., long-span bridges [4]. Some researchers have used a laser scanning technique [5], but it is the synchronized camera rotations obtained from the attached tilt sensor are added as the optimization constraints. In the meantime, a computational framework for measuring structural displacements aided by a stationary calibration target and an attached tilt sensor is provided. These added constraints regularize the original optimization process of estimating the extrinsic camera parameters, i.e., rotations and translations, and hence improve the accuracy of measuring the global structural displacements.
The remainder of this paper is organized as follows. Section 2 provides an overview of the proposed HIVBDM system and introduces the notations used throughout this paper. Section 3 describes the main procedures and designs of the proposed HIVBDM system. The experimental results are provided in Section 4, and the conclusions are stated in Section 5.

The Proposed HIVBDM System Overview
In this section, we provide an overview of the proposed HIVBDM System. As shown in Figure 1, the proposed HIVBDM system is motivated by the fact that the pivot pier is the structure that is being monitored, and the reference pier #1 is stationary and will be negligibly displaced and rotated, as it rests upon a solid bedrock foundation; pier #2 and the pivot pier are located in the waterway and are prone to settlement during the bridge's service life. Installation of cameras on reference pier #1 provides stability for the monitoring camera, but long distances between reference pier #1 and the pivot pier, and a lack of clear line of sight preclude such a solution in application. Shorter distances and a clear line of sight exist between moving pier #2 and both reference pier #1 and the pivot pier. Due to the movements of pier #2, the proposed HIVBDM system should include a scheme that can compensate the movements of pier #2, and hence measure the displacements of the pivot pier. Typically, the proposed HIVBDM system can be developed into a hybrid system where N consecutive bridge piers between the stationary reference pier and the pivot pier are used during the measurement. This designed HIVBDM system is able to address the limitations of the camera lenses at large operating distances and the indirect line of sight between the stationary reference pier and pivot pier in the field. However, the measurement errors between two adjacent bridge piers in the HIVBDM system accumulate with the increasing number of bridge piers that are involved in the measurement. The overview of the proposed HIVBDM system in monitoring a swing bridge pivot pier. A stationary calibration target is mounted to the stationary reference pier, #1. The movements of the cameras and the moving calibration target are subject to the moving pier, #2, and the pivot pier, respectively. We assume that there is no relative movement between the two installed cameras.
To the best of our knowledge, we are the first to propose a monocular HIVBDM system that accurately measures both the in-plane and out-of-plane structural translations using a moving camera, while considering both the camera's own rotations and translations. The proposed HIVBDM system incorporates a novel constrained optimization algorithm into the camera calibration process, where the synchronized camera rotations obtained from the attached tilt sensor are added as the optimization constraints. In the meantime, a computational framework for measuring structural displacements aided by a stationary calibration target and an attached tilt sensor is provided. A stationary calibration target is mounted to the stationary reference pier, #1. The movements of the cameras and the moving calibration target are subject to the moving pier, #2, and the pivot pier, respectively. We assume that there is no relative movement between the two installed cameras.
Before the measurement, since the natural features of the pivot pier are not distinct and might be affected by different effects, i.e., illuminance, shadows, etc., a mounted calibration target (referred as the moving calibration target in Figure 1, is used to enhance the features of the pivot pier. The displacements of the pivot pier are then obtained from this mounted calibration target by assuming that the movements of this calibration target are all subject to the pivot pier. Meanwhile, to capture the camera movements (movements of pier #2) in the SHM process, a second calibration target is mounted to reference pier #1 (referred to as the stationary calibration target in Figure 1) and a second camera that faces reference pier #1 is used (since reference pier #1 is stationary, and the obtained movements are all from the moving camera). By combining the corresponding structural displacements and camera movements from these two monitoring cameras (we assume that there is no relative movement between these two cameras), the displacements of the pivot pier can eventually be measured. During the measurement, the structural rotation responses are usually minimal compared with the structural translation responses [39,43]; hence, the structural rotation responses of the pivot pier are usually not taken into consideration in the field. However, those minimal structural rotation responses of the moving pier #2 (used to install the two monitoring cameras) are usually considered in the measurement since any minimal "uncorrected" rotation of the moving pier #2 might deteriorate the measurement accuracy, especially in the large operation distance between the moving pier #2 and the pivot pier. Therefore, in the proposed HIVBDM system, the structural translations on both the moving pier #2 and the pivot pier are considered in the measurement, but only the structural rotations on the moving pier #2 are included. Since we assume that there is no relative movement between the two installed monitoring cameras, the effective model of the proposed HIVBDM system consists of one moving camera, one stationary calibration target, and one moving calibration target. To substitute the usage of pier #1 as a stationary reference in the field applications, the stationary and moving calibration targets are required to be located within the same FOV of the moving camera in each of the monitoring images.
Based on the above motivations, the proposed HIVBDM system is then designed such that the three-dimensional displacements, e.g., translations, of the pivot pier are measured, while considering the camera's own movements, e.g., three-dimensional translations and rotations from the moving pier #2. The input of the proposed HIVBDM system is the image sequence that captures features of the structure being monitored at the pixel level, and the output of the system is the measured three-dimensional structural displacements in the world unit. The backbone of the proposed HIVBDM system is the target-based camera calibration algorithm [54].
This HIVBDM system is then evaluated on the basis of experiments simulating static bridge displacements that are performed in an indoor experimental environment. Generally, this proposed HIVBDM can be extended to measure dynamic responses of the structures provided a suitable camera with a high acquisition frame rate.
Considering the practical field operating distances between the camera pier and pivot pier, and the limited laboratory space, the operating distance between the camera and structure (calibration target) is set at 9.75 m throughout the experiments. The experimental results indicate that by using a stationary calibration target to compensate the camera movements, RMSE of approximately 8 mm and 12 mm are achieved on the measured in-plane and out-of-plane translations, respectively. By using an attached tilt sensor, the RMSE is reduced to less than 2 mm on in-plane translations and around 3 mm on out-of-plane translations. The frequently used notations in this paper are provided in Table 1 and the details of the proposed HIVBDM system design are discussed in Section 3.  3 × 1 measured structural displacement from time t i to time t j in the world coordinate system of the moving structure at time t k The world coordinate system W Mt i is associated with the structure that is being monitored, and the world coordinate system W St i only exists in the camera movement compensation. The structural displacements can only be calculated within the same coordinate system.

Procedures and Designs of the Proposed HIVBDM System
In this section, we provide the details of the proposed HIVBDM system. There are three main procedures in this proposed HIVBDM system: (1) The relative displacement measurements between the camera and the structure being monitored using a stationary camera are proposed in Section 3.1.
(2) The relative displacement measurements between the camera and structure being monitored using a moving camera are presented in Section 3.2. While the camera is moving, the measurements utilize a stationary calibration target to capture the camera movements, i.e., both translations and rotations, and infer the global structural displacements. (3) In addition, the utilization of an attached tilt sensor is provided. Since the camera rotations captured by the stationary calibration target have reduced accuracy with the increasing operating distances, an attached tilt sensor is supplemented in order to refine the camera rotations and improve the measurements. Instead of only using the stationary calibration target for capturing both the camera translations and rotations, the attached tilt sensor is only used to measure the camera rotations, where a stationary calibration target is still required for capturing the camera translations.

Relative Displacement Measurements between the Camera and Structure Using a Stationary Camera
In this section, since the camera is stationary, the relative displacement between the camera and the monitored structure represent the structural displacement. The measurement system using a stationary camera is shown in Figure 2, where the system includes a stationary camera and a calibration target mounted to the structure that is being monitored. We assume that there is no relative movement between the mounted calibration target and the monitored structure. For simplicity, only the calibration target is shown in Figure 2. Two reference coordinate systems and one plane at time t i are included in the HIVBDM system: (1) The world coordinate system of the moving structure (calibration target) at time t i , i.e., W M t i , (2) the camera coordinate system at time t i , i.e., C t i , and (3) the image plane at t i , i.e., I t i .
The input of the HIVBDM system using a stationary camera is the image sequence I with the calibration target on the monitored structure at each frame. The output of the HIVBDM system is the measured three-dimensional structural translations of the monitored structure in the world unit. Please note that the dimension of the calibration target in the world unit is known, and the feature points are distinct in the calibration target, i.e., checkerboard. As shown in Figure 2, the pixel-wise locations of the feature points on the moving calibration target, i.e., green points on the image plane, are detected at the input image I t i . The l th detected feature points on the moving calibration target of the input image I t i is denoted as p where l ∈ 1, 2, . . . , L t i and L t i is the number of detected feature points on the moving calibration target of input image I t i . The spatial locations of these detected feature points on the moving calibration target, i.e., red points in the world coordinate system, are generated for the input image I t i based on the prior calibration target dimensions. The origin of the moving calibration target in the world coordinate system is assumed to be [0, 0, 0] T , and the spacings between the checkerboard corners are known. As a result, the generated spatial location of the l th detected feature point on the moving calibration target of the input image I t i is denoted as  Illustration of structural displacement measurements using a stationary camera. The moving calibration target is assumed to have the same movements with the structure that is being monitored. The calibration images (need to cover the whole camera FOV) are taken before the monitoring images. For better visualization, only the monitoring images , are shown.
Based on the pinhole camera model with the radial lens distortion, the relationship between the 3D spatial location and the 2D pixel location is given by: (1) Figure 2. Illustration of structural displacement measurements using a stationary camera. The moving calibration target is assumed to have the same movements with the structure that is being monitored. The calibration images (need to cover the whole camera FOV) are taken before the monitoring images. For better visualization, only the monitoring images I t 1 , I t i are shown.
Based on the pinhole camera model with the radial lens distortion, the relationship between the 3D spatial location p I M t i l and the 2D pixel location p W M t i l is given by: where A M is the intrinsic camera parameter, R W M t i and T W M t i are the extrinsic camera parameters, F (·) is the radial lens distortion function, and k M is the parameter of this radial lens distortion. The dimensions of those parameters in Equation (1) are given in Table 1. Given the L t i detected feature points on the moving calibration target of the input image I t i (green points on the image plane of Figure 2, and their generated spatial locations (red points in the world coordinate system of Figure 2, the unknown camera parameters, i.e., (1) are obtained from the camera calibration algorithm by minimizing the reprojection error ε R (in the least squares sense) through a non-linear optimization process. The reprojection error ε R over all the feature points of the input image sequence is defined as: The ℘(·) is a projection function that maps the 3D spatial location p , where the M calibration images provide sufficient geometric information required for estimating the unknown intrinsic camera parameters, and the N monitoring images capture the structural displacements in the SHM process. Please note that to accurately estimate the unknown intrinsic camera parameters, the M calibration images usually need to cover the entire camera FOV with different orientations.
Since the structure that is being monitored is only subject to translations, and the camera is stationary throughout the monitoring process, the constrained optimization problem is then defined as follows: min The constrained optimization problem is iteratively solved by the Levenberg-Marquardt Algorithm [55], where the initial estimates of the parameters are given in [56]. The optimization process leverages the overall L t i detected feature points p Therefore, based on those solved camera parameters, i.e., from t 1 to t i in the world unit. The measurement process that leverages those obtained extrinsic camera parameters (from the N monitoring images) is provided in Equations (4)- (8).
Since the entire monitored structure is assumed to have the same displacement, a point P on the moving calibration target is selected as the monitored point to represent the overall structural at time t i is given by: where the R W M t i and T W M t i are the obtained extrinsic camera parameters from the camera calibration.
Following Equation (4), the point location P Since the camera is stationary, the P is achieved at any time t i . Following this stationary camera prior and then substituting P in Equation (5) using the right side of Equation (4), the location Since the structure that is being monitored is subject to only translations, and the camera is in Equation (6) is simplified as: Hence, the structural displacements between P using a stationary camera are calculated as: ∆P The ∆P in Equation (8) is the measurement output of the HIVBDM system using a stationary camera. In addition, when the monitored structure in W M t 1 is parallel to the imaging plane, i.e., where only the translation difference is considered.

Relative Displacement Measurements between the Camera and Structure Using a Moving Camera
Although the camera can be kept stationary in many structural monitoring processes, finding a stationary platform on which to place the camera throughout a long-term monitoring process may not be convenient. Therefore, if both the camera and monitored structure are moving, the relative displacement measurements between the camera and monitored structure described in Section 3.1 may not yield the valid measurement results.
In this section, we present a relative displacement measurement method that is able to distinguish the camera movements from the structural displacements by leveraging a novel camera movement compensation method and hence infers the global structural displacements under study. In the camera movement compensation, a calibration target mounted to an additional stationary structure within the same camera FOV is firstly used to capture the camera movements. However, the camera movements captured by the stationary calibration target may not be accurate enough in the applications with increasing operating distances due to the sensitive camera rotation information. Therefore, an attached tilt sensor is utilized to supplement the stationary calibration target in the camera movement compensation process and improves the relative displacement measurement accuracies. The details of the camera movement compensation using a stationary calibration target are presented in Section 3.2.1, and the details of the camera movement compensation using a stationary calibration target with a supplemental tilt sensor attached are then presented in Section 3.2.2. As shown in Figure 3, the measurement system using a moving camera includes a moving monitoring camera (with an attached tilt sensor), a calibration target mounted to a stationary structure (stationary target), and a calibration target mounted to the structure that is being monitored (moving target). These stationary and moving calibration targets are both localized within the same FOV of the camera during the measurements. Similarly, we assume that there is no relative movement between the calibration targets and the mounted structural surface, and only the calibration targets are shown in Figure 3. Hence, the structural displacements between and using a stationary camera are calculated as: The Δ in Equation (8) is the measurement output of the HIVBDM system using a stationary camera. In addition, when the monitored structure in is parallel to the imaging plane, i.e., = , the measured structural displacements in Equation (8) are simplified to − , where only the translation difference is considered. Figure 3. Illustration of structural displacement measurement using a moving camera. The stationary calibration target is assumed to have the same movements with the stationary structure, and the moving calibration target is assumed to have the same movements with the structure that is being monitored. Both the stationary and moving calibration targets are required to place within the same FOV of the camera. The calibration images (need to cover the whole camera FOV) are taken before the monitoring images. For better visualization, only the monitoring images , are shown.

Relative Displacement Measurements between the Camera and Structure Using a Moving Camera
Although the camera can be kept stationary in many structural monitoring processes, finding a stationary platform on which to place the camera throughout a long-term monitoring process may . Illustration of structural displacement measurement using a moving camera. The stationary calibration target is assumed to have the same movements with the stationary structure, and the moving calibration target is assumed to have the same movements with the structure that is being monitored. Both the stationary and moving calibration targets are required to place within the same FOV of the camera. The calibration images (need to cover the whole camera FOV) are taken before the monitoring images. For better visualization, only the monitoring images I t 1 , I t i are shown.
Similar to the HIVBDM system geometries described in Figure 2, three reference coordinate systems and one plane at time t i are included in this HIVBDM system: (1) The world coordinate system of the moving structure at time t i , i.e., W M t i , (2) the world coordinate system of the stationary structure at time t i , i.e., W S t i , (3) the camera coordinate system at time t i , i.e., C t i , and (4) the image plane at t i , i.e., I t i . The inputs of the HIVBDM system using a moving camera are the image sequence I with the calibration targets on both the stationary and the monitored structures at each frame, and the camera rotation information from the attached tilt sensor with each frame (only used in Section 3.2.2). The outputs of the HIVBDM system are the measured three-dimensional structural translations in the world unit.

Camera Movement Compensation Using a Stationary Calibration Target
Unlike the measurement setups shown in Figure 2, an extra calibration target mounted on a stationary structure is used in this series of measurements. As shown in Figure 3, the pixel-wise locations of the feature points on both the stationary calibration target, i.e., green points on the image plane, and on the moving calibration target, i.e., purple points on the image plane, are detected at the input image I t i . Specifically, in the input image I t i , the l th detected feature point on the stationary calibration target is denoted as p , and that on the moving calibration target is denoted as p where l ∈ 1, 2, . . . , L t i and L t i is the number of detected feature points on both the stationary and the moving calibration targets. Meanwhile, the spatial locations of these detected feature points on the stationary calibration target, i.e., blue points in the world coordinate system, and those on the moving calibration target, i.e., red points in the world coordinate system, are generated for the input image I t i based on the prior calibration target dimensions. The generated spatial location of the l th detected feature points on the stationary calibration target is denoted as p  (2). In this study, the estimation of these unknown camera parameters using the moving calibration target is considered to be an optimization problem.
Since the camera movements are unknown, the optimization problem is then defined as follows: where the extrinsic camera parameters are subject to rotations (from camera) and translations (from both camera and the moving structure) at any time t i . Unlike using the solved camera parameters from the stationary camera in Equation (3), the HIVBDM system using a moving camera is not able to measure the structural displacements ∆P from t 1 to t i in the world unit by using those solved camera parameters in Equation (9). Therefore, to isolate the structural displacements from the camera movements, a stationary structure within the same camera FOV of the structure that is being monitored is used to capture the camera movements on which the relative movements between the camera and the stationary structure (stationary calibration target) are considered as pure camera movements.
Similar to Equation (1), the relationship between the 3D spatial location p W S t i l and the 2D pixel location p I S t i l is given by: Given the L t i detected feature points on the stationary calibration target of the input image I t i (green points on the image plane of Figure 3, and their generated spatial locations (blue points in the world coordinate system of Figure 3, the unknown camera parameters, i.e., in Equation (10), are obtained by minimizing the reprojection error ε R (in the least squares sense) through an optimization process, where the ε R is defined as: Similarly, ℘(·) is a projection function which maps the 3D spatial location p The M calibration images provide geometric information required for estimating the unknown intrinsic camera parameters, and the N monitoring images capture the structural displacements in the SHM process. In this study, estimating those unknown camera parameters (camera movements) using the stationary calibration target is considered to be an optimization problem. Similar to Equation (9), since the camera movements are unknown, the optimization problem is then defined as follows: min where the extrinsic camera parameters are subject to camera rotations and translations at any time t i . The solved camera parameters from the stationary calibration target in Equation (12) represent the camera movements. Therefore, based on the solved camera parameters from the moving and stationary calibration target in Equation (9) and Equation (12), respectively, the HIVBDM system using a moving camera then measures the structural displacements ∆P from t 1 to t i in the world unit. The measurement process that leverages those obtained extrinsic camera parameters (both from the N monitoring images) is provided in Equations (13)- (18).
Following Equation (4), considering that the monitored point P is on a moving calibration target, the relationship between the point locations in W S t i and in W M t i at time t i can be shown as: where the location of the point P at time t i in W S t i is calculated as: Since the world coordinate system of the stationary calibration target at t i remains the same as that at the initial time t 1 , the P is achieved at any time t i . Following Equation (13), the location of the point P at time t i in W M t 1 is calculated as: Following the stationary calibration target prior, P , and substituting P W S t 1 t i using Equation (14), the location P Since the orientations of the calibration targets regarding the monitoring camera are similar, and only the camera rotations are considered throughout the entire structural monitoring process,  (16) is simplified as: Hence, the structural displacements between P W M t 1 t i and P W M t 1 t 1 using a moving camera are calculated as: The ∆P in Equation (18) is the measurement output of the HIVBDM system using a moving camera and a stationary calibration target as camera movement compensation.

Camera Movement Compensation Using a Stationary Calibration Target with an Attached Tilt Sensor
Although the camera movement compensation using a stationary calibration target is able to measure the structural displacements while the camera is moving, the captured camera rotation information using only the stationary calibration target may lead to a reduction in accuracy with increasing operating distances. Camera movement compensation using an attached tilt sensor is therefore leveraged to supplement the stationary calibration target in better capturing the camera movements and infers the global structural displacements. As shown in Figure 3, instead of using a stationary calibration target to capture the camera rotations, the camera rotations are directly obtained by using an attached tilt sensor (the blue CX-1 tilt sensor [57] underneath the camera).
In this section, the measurement process is similar to that described in Section 3.2.1. However, unlike the optimization process in Equation (9) and Equation (12) on both the moving and stationary calibration targets, the obtained camera rotations from the attached tilt sensor are added into the optimization process as the constraints.
Similarly, given the L t i detected feature points on the moving calibration target of the input image I t i (purple points on the image plane of Figure 3), and their corresponding generated spatial locations (red points in the world coordinate system of Figure 3), the unknown camera parameters, i.e., (1), are obtained by minimizing the reprojection error ε R defined in Equation (2). In this study, the estimation of these unknown camera parameters using the moving calibration target is considered to be a constrained optimization problem, where the camera rotations are known from the attached tilt sensor, and the structure is subject to only translations. Therefore, the constrained optimization problem is defined as follows: where the difference of the rotation matrices of the camera from the moving structure between the time t M+i and t M+1 , i.e., ∆R W C t M+i −t M+1 , is converted from the difference of the rotation vectors of the camera (obtained from the attached tilt sensor) between the time t M+i and t M+1 , i.e., ∆r W C t M+i −t M+1 , by using a Rodrigues formula [58]. The operator ⊕ is denoted as an addition operator between two rotation matrices, where the numerical addition is firstly applied on their corresponding rotation vectors and the Rodrigues conversion is then applied to the result of the numerical addition operations. However, using the solved camera parameters in Equation (19), the HIVBDM system using a moving camera is still not able to measure the structural displacements ∆P from t 1 to t i in the world unit since the camera and structure (moving calibration target) are both subject to translations. Similarly, a stationary structure within the same camera FOV of the structure that is being monitored is used to capture the camera translations since the relative translations between the camera and the stationary structure (stationary calibration target) are considered as pure camera translations.
Given the L t i detected feature points on the stationary calibration target of the input image I t i (green points on the image plane of Figure 3), and their generated spatial locations (blue points in the world coordinate system of Figure 3), the unknown camera parameters, i.e., A S , k S , R W S t i , T W S t i in Equation (10), are obtained by minimizing the reprojection error ε R defined in Equation (11). In this study, the estimation of these unknown camera parameters using the stationary calibration target is also considered as a constrained optimization problem, where the camera rotations are known from the attached tilt sensor, and the structure is subject to only translations. Therefore, the constrained optimization problem is defined as follows: where the stationary calibration target has the same rotational increments as the moving calibration target in Equation (19). In Equation (19) and Equation (20), the rotational information obtained from the attached tilt sensor is added as the optimization constraints to the N monitoring images. The constrained optimization problem is iteratively solved by the Levenberg-Marquardt Algorithm [55]. Therefore, based on these solved camera parameters from both the moving and stationary calibration targets, the HIVBDM system using a moving camera is able to measure the structural displacements ∆P from t 1 to t i in the world unit.
Similar to those at Section 3.2.1, the measurement process that leverages those obtained camera parameters on both the moving and stationary calibration targets from the N monitoring images is provided in Equations (13)- (18). Eventually, by using a stationary calibration target with an attached tilt sensor as camera movement compensation, the measurement output of the HIVBDM system using a moving camera is shown as follows: when the camera is stationary, i.e., R W S t i ≡ R W S t 1 , T W S t i ≡ T W S t 1 , Equation (21) yields the same result as given in Equation (8).

Experimental Results
In this section, we present the experimental results of the proposed HIVBDM system. The experiments are performed in a laboratory environment, which is shown in Figure 4. This section provides the details and analysis of the components, as follows: (1) the implementation of the camera calibration algorithm is described in Section 4.1; (2) the evaluation of the relative displacement measurements between the camera and target using a stationary camera is presented in Section 4.2; and (3) the evaluation of the relative displacement measurements between the camera and target using a moving camera is presented in Section 4.3.
Similar to those at Section 3.2.1, the measurement process that leverages those obtained camera parameters on both the moving and stationary calibration targets from the monitoring images is provided in Equations (13)- (18). Eventually, by using a stationary calibration target with an attached tilt sensor as camera movement compensation, the measurement output of the HIVBDM system using a moving camera is shown as follows: when the camera is stationary, i.e., ≡ , ≡ , Equation (21) yields the same result as given in Equation (8).

Experimental Results
In this section, we present the experimental results of the proposed HIVBDM system. The experiments are performed in a laboratory environment, which is shown in Figure 4. This section provides the details and analysis of the components, as follows: (1) the implementation of the camera calibration algorithm is described in Section 4.1; (2) the evaluation of the relative displacement measurements between the camera and target using a stationary camera is presented in Section 4.2; and (3) the evaluation of themera and target using a moving camera is presented in Section 4.3.

Implementation of the Camera Calibration Algorithm
The camera calibration algorithm in this study utilizes a planar target with coplanar features, i.e., an empty 30 squares (5 × 6) black and white checkerboard with each square size being equal to 1.25" × 1.25". Previous studies have suggested using a rigid and flat mounting surface to create a highquality planar calibration target [54,56]. The planar checkboard calibration targets used, the 2592 × 2048-resolution GigE Genie Nano C2590 camera [59], and the attached CX-1 tilt sensor are shown in Figure 4a.
The input images used in the camera calibration are calibration and monitoring images [54,56]. The calibration images are required in order to obtain a better estimate of the unknown camera parameters described in Equation (1), and those monitoring images are captured as the input for the HIVBDM system for measuring the displacements of the target during the SHM. The general process

Implementation of the Camera Calibration Algorithm
The camera calibration algorithm in this study utilizes a planar target with coplanar features, i.e., an empty 30 squares (5 × 6) black and white checkerboard with each square size being equal to 1.25" × 1.25". Previous studies have suggested using a rigid and flat mounting surface to create a high-quality planar calibration target [54,56]. The planar checkboard calibration targets used, the 2592 × 2048-resolution GigE Genie Nano C2590 camera [59], and the attached CX-1 tilt sensor are shown in Figure 4a.
The input images used in the camera calibration are calibration and monitoring images [54,56]. The calibration images are required in order to obtain a better estimate of the unknown camera parameters described in Equation (1), and those monitoring images are captured as the input for the HIVBDM system for measuring the displacements of the target during the SHM. The general process of acquiring the calibration images includes capturing these images under different target orientations and operating distances. Multiple calibration images that cover the entire camera FOV are encouraged, such that all of the detected feature points within the camera FOV are included in the camera calibration process [54,56]. Samples of these calibration images are shown in Figure 4d. Empirical experience suggests that the entire camera FOV can be covered by either moving the calibration target or moving the camera itself [54]. Andreas Geiger's algorithm [60] is then applied to detect the corners of the calibration targets, i.e., checkerboards, in those calibration images with sub-pixel accuracy. Please note that the indoor illumination changes shown in Figure 4d do not affect the camera calibration algorithms due to the robust checkboard corner detections [60]. Since the distance between two selected feature points of the checkerboard pattern is known, a ratio R of physical unit to pixel [37] is defined as: where d is the pixel distance of square side (33.932 pixels), and D is the length of the square side (31.750 mm). Therefore, the ratio R equals to 1.069 (pixel/mm). In this study, Root-Mean-Square Error (RMSE) is used as the evaluation metric to evaluate the performance of the relative displacement measurements between the camera and the target [27,39]. The RMSE ε is defined as: where ∆ i is the ith measured target displacement, ∆ i is the ith ground-truth target displacement and N is the total number of measurements.

Evaluations of the Relative Displacement Measurements between the Camera and Target Using a Stationary Camera
In this section, evaluations of the relative displacement measurements between the camera and target using a stationary camera are reported. A 50 mm lens GigE camera is fixed on the stationary platform in the measurements, and the operating distance between camera and moving calibration target is set to 9.75 m. The displacements in the X and Y directions, i.e., longitudinal and vertical, are considered as "in-plane" translations, and displacements in the Z direction, i.e., towards and away from the camera, are considered to be "out-of-plane" translations. Similarly, ε x and ε y are termed as "in-plane" RMSE, and ε z is termed as "out-of-plane" RMSE. The target is moved to seven different positions in the X, Y and Z directions, respectively. The synthetic target displacements are controlled on an optical table and are measured by a digital caliper with 0.0127 mm (0.0005") resolution as references. The camera separately captures the static initial position of the target and these seven static target positions. Measuring static target displacements provides the ability to take multiple images of each target position under an assumption that the target and the camera do not move, or the movements are minimal that can be ignored during the image acquisition at each target position. Therefore, to improve the corner detection accuracy, ten different images are taken at each measurement (target position) by the utilized GigE camera with a frame rate of 10 FPS. The detected feature locations of the image shots are averaged before feeding into the camera calibration algorithm. The initial position of the target is set as zero in each of the X, Y and Z directions, and the evaluation results of those synthetic static target displacements using a stationary camera are reported in Table 2. Table 2. Comparative analysis of applying averaging processing to the synthetic static target displacements using a stationary camera (mm).

Static Target Displacement Measurements in X, Y and Z Directions
With Averaging Processing Without Averaging Processing The negative values represent that the target displacement measurements are as the opposite directions as the actual target displacements.
Although neither the target nor the camera move, or the movements are so minimal that they can be ignored during this image acquisition process, the importance of applying the averaging processing for the feature locations at each target position requires some discussions. Therefore, a comparative analysis for applying the averaging processing for the feature locations is provided in Table 2. The detected feature locations of the first image shot at each target position are fed into the camera calibration algorithm as a comparison.
As shown in Table 2, when comparing the calculated in-plane and out-of-plane RMSE between the cases with and without averaging processing of the detected feature locations at each target position, in-plane RMSE ε x and ε y displacement measurements in the X direction are obtained with an average of 0.433 mm vs. 0.421 mm, and the out-of-plane RMSE ε z is obtained at 1.457 mm vs. 1.604 mm. Moreover, in the Y direction displacement measurements, the in-plane RMSE ε x and ε y are obtained with an average of 0.142 mm vs. 0.161 mm, and the out-of-plane RMSE ε z is obtained at 2.046 mm vs. 2.171 mm. As for the Z direction displacement measurement, the in-plane RMSE ε x and ε y are obtained at an average of 0.477 mm vs. 0.467 mm, and the out-of-plane RMSE ε z is obtained at 0.849 mm vs. 0.625 mm. A comparison of these results indicates that the deviations between these two considered processing variations are trivial, and hence the averaging processing is applied throughout the experiments for consistency.

Evaluations of the Relative Displacement Measurements between the Camera and Target Using a Moving Camera
In this section, a series of experiments is conducted to analyze the performance of relative displacement measurements between the camera and the target using a moving camera as described in Section 3.2. Similar to the measurements given in Section 4.2, a 50 mm camera lens with 9.75 m operating distance between the camera and the moving calibration target was also used for this series of experiments. Also, to capture the camera movements, the distance between the camera and the stationary calibration target was set as 9.85 m. During the displacement measurements, both the stationary and moving calibration targets were required to be placed within the same FOV of the camera.
In Section 4.3.1, the relative displacement measurements between the camera and the target using a moving camera are evaluated with respect to the same seven synthetic static target displacements in each of the X, Y and Z directions, respectively. In Section 4.3.2, an experimental validation of the exact camera movements using a conventional linear variable differential transformer (LVDT) sensor is provided. In Section 4.3.3, the static displacement measurements are evaluated using a long-term indoor monitoring process whereby the moving structure (moving calibration target) is also kept stationary throughout the monitoring process.

Evaluation on the Synthetic Target Displacements
On the synthetic target displacements, the target is moved to seven different positions in the X, Y and Z directions. The synthetic target displacements are controlled on an optical table, and are measured by a digital caliper with 0.0127 mm (0.0005") resolution as references. As shown in Figure 4a, a GigE camera with an attached CX-1 tilt sensor is fixed above the tip of a cantilever plate, and a weight, i.e., W, is hung underneath the plate to move the camera. The initial position of the target before hanging the weight is set to zero in each of the X, Y and Z direction. The camera captures the static initial position of the target before hanging the weight and those seven static target positions after hanging the weight W. The target displacement measurements are calculated between the initial target position and each of the seven target positions. Meanwhile, the hanging weight rotates the camera support axis and hence rotates and translates the camera. The camera movements mainly come from beam deflection, and can be controlled by using different weights and adjusting different lengths of the cantilever plate. In this study, the hung weight was 0.5 kg, and the length of the cantilever plate to the applied weight was equal to 203 mm. We assume that there is no relative movement between the camera and the attached CX-1 tilt sensor. Therefore, the camera vertical displacement, δ C , is [61]: where θ is the rotation captured by the CX-1 tilt sensor, and L is the length of the cantilever plate to the applied weight. Moreover, to validate the calculated camera movements in Equation (24), a validation of the exact camera movements by using a LVDT sensor is provided in Section 4.3.2.
Measuring the static target displacements follows the assumption that the target and the camera do not move, or that the movements are so minimal that they can be ignored during image acquisition at each target position. As shown in Figure 4c, a stationary calibration target is located near the moving calibration target, such that both the stationary and the moving calibration targets are detected in the same FOV of the camera in each of the captured image. Similar to in Section 4.2, ten different image shots were taken at each target position by the utilized GigE camera with a frame rate of 10 FPS.
The detected feature locations of the images were averaged before being fed into the camera calibration algorithm. Moreover, during the image capture process, the attached CX-1 tilt-meter records the simultaneous camera rotations. The responses of the camera and the tilt sensor are synchronized based on the timestamps provided by the GigE Camera and the CX-1 tilt sensor. Since the detected feature locations of the image shots at each target position are averaged, the corresponding synchronized camera rotations are averaged accordingly. At each target position, the synchronized-and-averaged camera movements, e.g., rotations and translations, are provided in Table 3 for repeatability. The initial camera position before hanging the weight is set as zero, and the exact camera movements are calculated between the initial camera position and each of the seven camera positions. Please note that based on the limited experimental facilities, only Y direction camera movements are provided as a reference throughout the paper. The evaluation results of those synthetic static target displacements using a moving camera are reported in Table 4.
In Table 4, the camera movement compensation using a stationary calibration target achieves the RMSE at an average of 7.529 mm and 11.832 mm on the in-plane and out-of-plane translations, respectively. By using a supplemental attached tilt sensor, the RMSE is reduced to an average of 1.440 mm and 2.904 mm on the in-plane and out-of-plane translations, respectively. Specifically, using this supplemental attached tilt sensor, the in-plane RMSE ε x and ε y are decreased from an average of 1.884 mm to 0.852 mm, and from an average of 1.707 mm to 0.702 mm, both on in-plane translations. Similarly, on out-of-plane translations, ε x is reduced from 2.107 mm to 1.109 mm, and ε y is reduced from 8.846 mm to 3.081 mm by using the supplemental tilt sensor. However, by using only the stationary calibration target in compensating the camera movements, the Z direction measurements of the static target displacements are not accurate, where the out-of-plane RMSE ε z is achieved at an average of 18.996 mm on in-plane translations and 24.542 mm on out-of-plane translation. Since the camera rotations captured by the stationary calibration target is less accurate, an attached tilt sensor is used to supplement the stationary calibration target in capturing the camera rotations. Camera movement compensation using a supplemental tilt sensor achieves the least ε z on in-plane translations, which is at an average of 2.768 mm, and the ε z also achieves the least value (4.522 mm) on out-of-plane translations by using the tilt sensor. As a result, comparing the measurement results using a moving camera in Table 4 with those using a stationary camera in Table 2, the measurements using a stationary camera show less RMSE than those using a moving camera, in both in-plane and out-of-plane translations. In the measurements using a stationary camera, the in-plane RMSE ε x and ε y are achieved at an average of 0.350 mm and the out-of-plane RMSE ε z is achieved at an average of 1.451 mm, in both in-plane and out-of-plane translations, respectively. Meanwhile, in the measurements using a moving camera where a stationary calibration target with an attached tilt sensor is used as camera movement compensation, the in-plane RMSE ε x and ε y are increased to an average of 1.216 mm and the out-of-plane RMSE ε z is achieved at an average of 3.353 mm, in both in-plane and out-of-plane translations, respectively.

Validation of Exact Camera Movements by Using a LVDT Sensor
In this section, a validation of the exact camera movement measurements given in Equation (24) is provided by using a LVDT sensor (SP2-50 Celesco string potentiometer). The validations are performed on two different weights under three different lengths of cantilever. The validation results are reported in Table 5, where the δ LVDT is the measurements from the LVDT sensor, the δ C is the measurements given by Equation (24). The error percentage is calculated between the δ C and δ LVDT , where δ LVDT is used as ground truth. Please note that the error percentage is defined as |δ C − δ LVDT |/δ LVDT .
As shown in Table 5, the average of the error percentages across the six test sets between exact camera movements (δ C ) and LVDT sensor (δ LVDT ) is 5.12% (less than 0.5 mm error in absolute value). Therefore, the validation results show that the exact camera movements given by Equation (24) are close to the camera movements measured by the LVDT sensor.

Evaluation on the Long-Term Indoor Monitoring Process
In the long-term indoor monitoring process, as shown in Figure 4b, without hanging the weight W to move the camera, a 50 mm lens GigE camera with an attached tilt sensor is fixed above a free-moving cantilever plate. The length of cantilever plate to the applied weight also equals to 203 mm. However, without hanging a weight underneath the tip of the cantilever plate, the camera is kept free during the entire monitoring process. In this long-term indoor monitoring process, some environmental effects, such as the temperature changes, causes the length changes of the cantilever, and hence move the camera support. Also, some small activities within the building might also slightly affect the of the camera position on the cantilever. At every ten minutes along the entire monitoring process, i.e., approximately six days, the camera captures the locations of the stationary and moving calibration targets, and the attached CX-1 tilt-meter records the simultaneous camera rotations. Similarly, for each camera capture, the synchronized and averaged camera movements are provided in Figure 5b for repeatability. Meanwhile, in Figure 5c, the temperature history captured by the CX-1 sensor is also provided as a reference. The temperature changes share the similar trends of the camera movements, which indicates that the temperature changes cause length and stiffness changes of the cantilever, and hence moves the camera support and affects the measurements of target displacements. The moving calibration target is kept fixed in this long-term monitoring, and hence the measurement ground truths should indicate that there is zero target displacement in the X, Y, and Z directions of the measurements, respectively.
The numerical results of the static target displacements in the long-term monitoring process are reported in Figure 5a. In the X direction static displacement measurements, the camera movement compensation using a stationary calibration target achieves 1.878 mm RMSE. By using the supplemental attached CX-1 tilt-meter, the RMSE is further decreased to 0.514 mm. Moreover, in the Y direction, static displacement measurements, the camera movement compensation using a stationary calibration target achieves 2.525 mm RMSE. By using the supplemental CX-1 tilt-meter, the RMSE is further decreased to 1.102 mm. In addition, in the Z direction static displacement measurements, the camera movement compensation using a stationary calibration target fails due to the inaccurate camera rotation information. The RMSE of Z direction increases to 35.844 mm by using a stationary calibration target, and an RMSE of 3.578 mm is achieved by using the supplemental tilt sensor.
Y direction, static displacement measurements, the camera movement compensation using a stationary calibration target achieves 2.525 mm RMSE. By using the supplemental CX-1 tilt-meter, the RMSE is further decreased to 1.102 mm. In addition, in the Z direction static displacement measurements, the camera movement compensation using a stationary calibration target fails due to the inaccurate camera rotation information. The RMSE of Z direction increases to 35.844 mm by using a stationary mental tilt sensor. Figure 5. Evaluations of static target displacements in long-term indoor monitoring process using a moving camera: (a) static target displacement measurements in the X, Y and Z directions. For the legends, a stationary calibration target is used as the camera movement compensation in the red plots, a stationary calibration target with an attached CX-1 tilt sensor is used as the camera movement compensation in the blue plots, and the green plots show the ground truth target displacements; (b) the synchronized and averaged camera movements at each camera capture in the monitoring process; (c) the temperatures at each camera capture in the monitoring process. Figure 5. Evaluations of static target displacements in long-term indoor monitoring process using a moving camera: (a) static target displacement measurements in the X, Y and Z directions. For the legends, a stationary calibration target is used as the camera movement compensation in the red plots, a stationary calibration target with an attached CX-1 tilt sensor is used as the camera movement compensation in the blue plots, and the green plots show the ground truth target displacements; (b) the synchronized and averaged camera movements at each camera capture in the monitoring process; (c) the temperatures at each camera capture in the monitoring process.

Conclusions
This paper presents a novel monocular target-based HIVBDM system that can measure both in-plane and out-of-plane static structural displacements. The proposed HIVBDM system does not require the camera to be stationary during the displacement measurements. Typically, this HIVBDM uses two calibration targets, i.e., one calibration target is kept stationary to compensate camera movements, and the other calibration target is mounted on the surface of the monitored structures in representing the structural displacements. In addition to the stationary calibration target, to further improve the robustness of the HIVBDM system to rotations of the camera, a tilt sensor attached to the camera is used to provide an accurate measurement of the camera rotations. Future research can focus on designing a target-less monocular HIVBDM system that not only supports arbitrary camera movements, but can also accurately measure both the structural translations and rotations. Also, measuring the high-dynamic structural responses will also be considered.