Cross-Correlation-Based Structural System Identification Using Unmanned Aerial Vehicles

Computer vision techniques have been employed to characterize dynamic properties of structures, as well as to capture structural motion for system identification purposes. All of these methods leverage image-processing techniques using a stationary camera. This requirement makes finding an effective location for camera installation difficult, because civil infrastructure (i.e., bridges, buildings, etc.) are often difficult to access, being constructed over rivers, roads, or other obstacles. This paper seeks to use video from Unmanned Aerial Vehicles (UAVs) to address this problem. As opposed to the traditional way of using stationary cameras, the use of UAVs brings the issue of the camera itself moving; thus, the displacements of the structure obtained by processing UAV video are relative to the UAV camera. Some efforts have been reported to compensate for the camera motion, but they require certain assumptions that may be difficult to satisfy. This paper proposes a new method for structural system identification using the UAV video directly. Several challenges are addressed, including: (1) estimation of an appropriate scale factor; and (2) compensation for the rolling shutter effect. Experimental validation is carried out to validate the proposed approach. The experimental results demonstrate the efficacy and significant potential of the proposed approach.


Introduction
Structural system identification is the process of obtaining a model of a structural system based on a set of measurements of structural responses. Oftentimes, the responses of a structure such as displacements, accelerations, and strains are measured by traditional systems which usually require a tedious installation process or expensive equipment [1]. For example, linear variable differential transformers require fixed reference (e.g., scaffolds), and a GPS is either inaccurate or expensive (e.g., NDGPS) [2]. Accelerometers are reference-free, but inaccurate in reconstructing the DC component of the response [3].
Recently, computer vision-based techniques have been adopted to measure dynamic displacements for structural system identification purposes. Vision-based methods can measure multiple points on the structure, with a relatively simple installation and an inexpensive setup [3][4][5][6][7][8]. While the early stages of vision-based techniques focused on the measurement of the response itself, recent studies showed how these techniques could be used for system identification. Schumacher and Shariati [9] introduced the concept of a virtual visual sensor (VVS) that could be used for the modal analysis of a structure. Bartilson et al. [10] utilized a minimum quadratic difference (MQD) algorithm to analyze traffic signal Figure 1 shows an overview of the proposed methodology. The underlying pipeline is composed of three main phases: (1) the conventional, vision-based method for displacement measurement proposed by Yoon et al. [14]; (2) novel methods for addressing the varying scale factor, rolling shutter effect, and; (3) the system identification method using relative displacements of a structure free from the UAV's motion.

Overview
Sensors 2017, 17,2075 3 of 12 Figure 1 shows an overview of the proposed methodology. The underlying pipeline is composed of three main phases: (1) the conventional, vision-based method for displacement measurement proposed by Yoon et al. [14]; (2) novel methods for addressing the varying scale factor, rolling shutter effect, and; (3) the system identification method using relative displacements of a structure free from the UAV's motion.

Conventional, Vision-Based Displacement Measurement
The first phase is to estimate the displacement of the structure relative to the UAV. This phase is based on the structural displacement measurement proposed by Yoon et al. [14]. The first step is camera calibration. Due to the imperfection of camera lenses, significant radial distortion exists in the raw video. The camera calibration module removes this distortion for achieving more accurate displacement measurement by applying the method proposed by Zhang [20]. Once the distortion is removed, the dynamic response of structures is determined by analyzing the calibrated video frameby-frame. Once the region of the interests is selected by the user, feature points such as corner points are extracted by using the Harris corner detector [21]. Using the extracted feature points, relative displacements are calculated by applying the KLT tracker [22] while removing the outliers by MLESAC [23]. The result of the procedure will be a displacement of each story relative to the reference. This displacement vector -in pixel coordinates-will be converted into world coordinates and appropriately scaled as described in the following steps.

Displacement Measurement Using an UAV
The 2D projected point [ , on an image plane obtained from a camera installed on an UAV for given 3D point in the world coordinate , , (see Figure 2) can be expressed by using the pinhole camera model [24] as

Conventional, Vision-Based Displacement Measurement
The first phase is to estimate the displacement of the structure relative to the UAV. This phase is based on the structural displacement measurement proposed by Yoon et al. [14]. The first step is camera calibration. Due to the imperfection of camera lenses, significant radial distortion exists in the raw video. The camera calibration module removes this distortion for achieving more accurate displacement measurement by applying the method proposed by Zhang [20]. Once the distortion is removed, the dynamic response of structures is determined by analyzing the calibrated video frame-by-frame. Once the region of the interests is selected by the user, feature points such as corner points are extracted by using the Harris corner detector [21]. Using the extracted feature points, relative displacements are calculated by applying the KLT tracker [22] while removing the outliers by MLESAC [23]. The result of the procedure will be a displacement of each story relative to the reference.
This displacement vector u i v i T -in pixel coordinates-will be converted into world coordinates and appropriately scaled as described in the following steps.

Displacement Measurement Using an UAV
The 2D projected point [u, v] on an image plane obtained from a camera installed on an UAV for given 3D point in the world coordinate [X, Y, Z] (see Figure 2) can be expressed by using the pinhole camera model [24] as where λ is an arbitrary value, R is a rotation matrix with 3 degree-of-freedom (i.e., θ x , θ y and θ z ), t = [t x , t y , t z ] is a UAV's translation vector, α x , α y are the normalized focal length of a camera both directions, and u 0 and v 0 are the principle points. After applying the camera calibration, γ, u 0 , and v 0 can be set to zero, and α = α x = α y ; then, the projected 2D image point can be written as below: where S = −X sin θ y +Y sin θ x cos θ y +Z cos θ x cos θ y +t z α is a scale factor that will be derived in Section 2.4. Assuming in-plane structural motion and no translation in the z-direction, the change in 2D relative displacement of a moving object with respect to the UAV's motion at the i-th image frame can be written as follows where ∆X and ∆Y are the displacement of the structure between i-th and (i + 1)th image frame. When the rotation angles are small, the Equation (3) can be re-written as below: As indicated by Equation (4), the change in relative displacement captured by the UAV is influenced by two factors: (1) scaling of the displacement related to S, and; (2) DC bias introduced by translational motion of the UAV (i.e., t x and t y ). The scaling issue can be resolved by introducing an adaptive scale factor, as discussed in Section 2.4; the rolling shutter effect will be discussed in Section 2.5, and the influence of translational motion will be reduced using the Natural Excitation Technique (NExT) introduced in Section 2.6. where λ is an arbitrary value, is a rotation matrix with 3 degree-of-freedom (i.e., , and ), [ , , ] x y z t t t t  is a UAV's translation vector, αx, αy are the normalized focal length of a camera both directions, and u0 and v0 are the principle points. After applying the camera calibration, , , and can be set to zero, and = = ; then, the projected 2D image point can be written as below: = 1 cos cos + − cos sin + sin sin cos + sin sin + cos sin cos + cos sin + cos cos − sin sin sin + − cos sin + cos sin sin + where = is a scale factor that will be derived in Section 2.4.
Assuming in-plane structural motion and no translation in the z-direction, the change in 2D relative displacement of a moving object with respect to the UAV's motion at the i-th image frame can be written as follows where ∆ and ∆ are the displacement of the structure between i-th and (i + 1)th image frame. When the rotation angles are small, the Equation (3) can be re-written as below: As indicated by Equation (4), the change in relative displacement captured by the UAV is influenced by two factors: (1) scaling of the displacement related to , and; (2) DC bias introduced by translational motion of the UAV (i.e., tx and ty). The scaling issue can be resolved by introducing an adaptive scale factor, as discussed in Section 2.4; the rolling shutter effect will be discussed in Section 2.5, and the influence of translational motion will be reduced using the Natural Excitation Technique (NExT) introduced in Section 2.6.

Adaptive Scale Factor
Most of the methods for measuring the displacement of the structure from a fixed camera on the ground assume the in-plane motion of the structure. This assumption allows for a constant scale factor for not only the entire frame but between frames as well. Given such a condition, the scale factor does not need to be determined at every frame to estimate the natural frequencies and mode shapes. However, this assumption is no longer valid for responses obtained through a moving camera, as in the case of the UAV, since there is bound to be movement in the Z-direction (i.e., ).

Adaptive Scale Factor
Most of the methods for measuring the displacement of the structure from a fixed camera on the ground assume the in-plane motion of the structure. This assumption allows for a constant scale factor for not only the entire frame but between frames as well. Given such a condition, the scale factor does not need to be determined at every frame to estimate the natural frequencies and mode shapes.
However, this assumption is no longer valid for responses obtained through a moving camera, as in the case of the UAV, since there is bound to be movement in the Z-direction (i.e., t z ).
To address this issue, the scale factor for each frame can be calculated using the (1) known physical length of an object and (2) the pixel distance of the two corresponding points defined by the user. The distance of the two corresponding image points l can be expressed in terms of the known physical length of an object L by expanding the Equation (2).
where p 1 and p 2 are the two end points of the object with a known physical length. · 2 is the L2-norm of a vector. Note that α and Z are constant, and the angles are small, so the scale factor S will be a function of camera motion in the Z-direction. The scale factor S in the initial image frame can be determined by dividing the length of the known object L by the pixel distance of the object in image frame l 0 . Based on the initial scale factor S 0 , the scale factor in each frame can be calculated using two designated feature points in the image as: where p 0,1 and p 0,2 are the two points having the maximum distance among the feature points in the initial frame and p i,1 and p i,2 are the two corresponding feature points in frame i.

Rolling Shutter Compensation
Besides adjusting for a scale factor in each frame, the rolling shutter effect that causes geometric distortions such as skew should be compensated. The rolling shutter (RS) is one type of image data acquisition that scans each row of the image sequentially using the complementary metal-oxide semiconductor (CMOS) sensors [19]. In the RS, due to the nature of sequential readout, if the relative velocity of a UAV and the object is large, the error become dramatic; compensation should be made to enable accurate system identification. Figure 3 illustrates the readout time and exposure time of each scanning line (row) in the image in the CMOS sensor. The readout time is the time required to digitize a single array of pixels and is determined by the speed of the A/D conversion and the number of rows in an image frame. The exposure time is the length of time when the CMOS sensor is exposed to light. Assuming the frame-rate f s of the camera is constant, the readout time and the maximum processing time delay can be expressed as where t r is the readout time and maximum time delay for each line, and h is the height (total number of lines) of the image. The distortion, in terms of the number of pixels, and due to the rolling shutter in the i-th line compared to the first line, d i,RS , can then be written as the following equation: where V rel is the relative velocity in the direction where the rolling shutter occurs between the camera and the structure. V rel was obtained by the optical-flow using each sequential image frame.
To compensate for the effect of the RS, the d i,RS was subtracted from the feature points at the i-th line.

Natural Excitation Technique (NExT) for System Identification
As described in the Section 2.2, the displacement obtained by a UAV denoted in Equation (4) is comprised of both the movement of the target structure and of the UAV. Rather than identifying the UAV's movement to get absolute structural displacement, the proposed method seeks to use relative displacements from multiple locations of interest on the structure to conduct system identification.
To conduct system identification using the multiple relative displacements of the target structure obtained by a UAV, the cross-correlations between the relative displacements are used for system identification. Provided that the natural frequencies of the target structure do not overlap with the frequencies induced by the UAV's non-stationary drift, the correlation between the relative displacements can be used for structural system identification.
Consider two 2-D displacements Di extracted at a point designated i in the image as: where Di is the decoupled matrix of 2D displacement; Provided that the dynamic translational movement of a UAV does not have correlation with the relative displacements of a structure, the cross-correlation 1 2 D D R between the two 2D relative displacements can be expressed as

Natural Excitation Technique (NExT) for System Identification
As described in the Section 2.2, the displacement obtained by a UAV denoted in Equation (4) is comprised of both the movement of the target structure and of the UAV. Rather than identifying the UAV's movement to get absolute structural displacement, the proposed method seeks to use relative displacements from multiple locations of interest on the structure to conduct system identification.
To conduct system identification using the multiple relative displacements of the target structure obtained by a UAV, the cross-correlations between the relative displacements are used for system identification. Provided that the natural frequencies of the target structure do not overlap with the frequencies induced by the UAV's non-stationary drift, the correlation between the relative displacements can be used for structural system identification.
Consider two 2-D displacements D i extracted at a point designated i in the image as: where D i is the decoupled matrix of 2D displacement; x i (t), y i (t) are the true absolute displacements of a target structure in the x and y axis at a point i in the image; t x (t), t y (t) are the translational movements of a UAV; and m i (t) and n i (t) are measurement noise in the x and y axis, respectively. The cross-correlation functions between D 1 , D 2 are evaluated as Provided that the dynamic translational movement of a UAV does not have correlation with the relative displacements of a structure, the cross-correlation R D 1 D 2 between the two 2D relative displacements can be expressed as Because the cross-correlation between the translational motion (i.e., R t x t x or R t y t y ) does not involve the dynamic characteristics of the target structure, the cross-correlation between relative displacements can be used for system identification using a natural excitation technique (NExT), output-only modal analysis (James, et al. [25]). The NExT uses the correlation function of R D i D re f (τ) to express the structural equation of motion in terms of M, C, and K as: where, R D i D ref (τ) is the correlation function between the relative displacement obtained at point i and at the reference point in the image , and M, C, and K are the mass, damping, and stiffness matrices, respectively. To extract the structural modal properties, the eigensystem realization algorithm (ERA) was applied after the NExT (see Figure 4), in which the cross-correlations between multiple relative displacements were obtained by applying the inverse Fourier transform of the cross power spectral density (CPSD) functions. Because the cross-correlation between the translational motion (i.e., To extract the structural modal properties, the eigensystem realization algorithm (ERA) was applied after the NExT (see Figure 4), in which the cross-correlations between multiple relative displacements were obtained by applying the inverse Fourier transform of the cross power spectral density (CPSD) functions.

Experimental Setup
An experiment was designed to verify the proposed method. More specifically, the purpose was to verify whether the proposed method could allow a commercial grade UAV and an attached camera to accurately conduct system identification of a target structure by effectively reducing the error caused by the varying scale factor and the rolling shutter effect. A lab-scale test was carried out on a six-story building model. The six-story building model is a shear building that has equal masses at every floor and the building model is fixed on a uni-directional shaking table (see Figure 5). The six natural frequencies of the shear building model are 1.657, 5.04, 8.14, 10.83, 12.93, and 14.34 Hz. To excite the various structural vibration modes, a band-limited white noise (BLWN) was used as input motion during this experiment.
The DJI Phantom 3, a commercial grade UAV was used for the test. The camera installed on the UAV has 1080 p resolution and a frame rate of 25 fps. Also, the gimbal that holds the camera to the UAV embeds an accelerometer and a gyroscope to stabilize the motion of the camera against 3-axis rotation. The UAV was hovering by maintaining a 2 m distance from the structure to avoid collision.
As a reference, a stationary camera, an LG smartphone G3, was installed on the ground 1m away from the structure to extract the absolute structural displacements. The video in the stationary camera was originally recorded at 60 fps with 1080 p and resampled at 25 fps for comparison. In addition, accelerometers were deployed on each story of the shear building model to compare the system identification results. The sampling rate for the acceleration was 1024 Hz.

Experimental Setup
An experiment was designed to verify the proposed method. More specifically, the purpose was to verify whether the proposed method could allow a commercial grade UAV and an attached camera to accurately conduct system identification of a target structure by effectively reducing the error caused by the varying scale factor and the rolling shutter effect. A lab-scale test was carried out on a six-story building model. The six-story building model is a shear building that has equal masses at every floor and the building model is fixed on a uni-directional shaking table (see Figure 5). The six natural frequencies of the shear building model are 1.657, 5.04, 8.14, 10.83, 12.93, and 14.34 Hz. To excite the various structural vibration modes, a band-limited white noise (BLWN) was used as input motion during this experiment.
The DJI Phantom 3, a commercial grade UAV was used for the test. The camera installed on the UAV has 1080 p resolution and a frame rate of 25 fps. Also, the gimbal that holds the camera to the UAV embeds an accelerometer and a gyroscope to stabilize the motion of the camera against 3-axis rotation. The UAV was hovering by maintaining a 2 m distance from the structure to avoid collision.
As a reference, a stationary camera, an LG smartphone G3, was installed on the ground 1 m away from the structure to extract the absolute structural displacements. The video in the stationary camera was originally recorded at 60 fps with 1080 p and resampled at 25 fps for comparison. In addition, accelerometers were deployed on each story of the shear building model to compare the system identification results. The sampling rate for the acceleration was 1024 Hz.

Analyzed Adaptive Scale Factor
As discussed in the Section 2.3, the scale factor for the each image taken by the camera changes over time, due to the z-direction motion of the UAV. The scale factor was calculated for each frame automatically using the proposed method. Figure 6 shows that the scale factor varied from 0.925 to 0.981 during 30 s of the video, leading to maximum of 5.7% error in displacement conversion from pixel to mm.

Comparison of the Dynamic Responses
Using the proposed procedure, the relative displacements of the six-story building model using a UAV were obtained, analyzed, and compared with the ones from a stationary camera on the ground. Figure 7 shows that there was large discrepancy between the relative displacements obtained from the UAV and the absolute displacements obtained from a stationary camera. It is easy to conclude that the motion of the UAV causes the drifts occurring in all six stories, since the trends are consistent across the floors. Figure 8 shows the CPSD of responses from the 5th floor and 6th floor of the shear building model from the UAV and two reference measurements from the stationary camera and accelerometers. Figure 8 shows that the cross power spectral densities from the UAV agreed very well with the reference in the frequency domain by showing three clear natural frequencies, while the relative displacements obtained from the UAV were significantly different compared to the absolute displacements from the stationary camera. Note that the CPSD obtained from the UAV had a large discrepancy in the DC component, with references due to the motion of the UAV.

Analyzed Adaptive Scale Factor
As discussed in the Section 2.3, the scale factor for the each image taken by the camera changes over time, due to the z-direction motion of the UAV. The scale factor was calculated for each frame automatically using the proposed method. Figure 6 shows that the scale factor varied from 0.925 to 0.981 during 30 s of the video, leading to maximum of 5.7% error in displacement conversion from pixel to mm.

Analyzed Adaptive Scale Factor
As discussed in the Section 2.3, the scale factor for the each image taken by the camera changes over time, due to the z-direction motion of the UAV. The scale factor was calculated for each frame automatically using the proposed method. Figure 6 shows that the scale factor varied from 0.925 to 0.981 during 30 s of the video, leading to maximum of 5.7% error in displacement conversion from pixel to mm.

Comparison of the Dynamic Responses
Using the proposed procedure, the relative displacements of the six-story building model using a UAV were obtained, analyzed, and compared with the ones from a stationary camera on the ground. Figure 7 shows that there was large discrepancy between the relative displacements obtained from the UAV and the absolute displacements obtained from a stationary camera. It is easy to conclude that the motion of the UAV causes the drifts occurring in all six stories, since the trends are consistent across the floors. Figure 8 shows the CPSD of responses from the 5th floor and 6th floor of the shear building model from the UAV and two reference measurements from the stationary camera and accelerometers. Figure 8 shows that the cross power spectral densities from the UAV agreed very well with the reference in the frequency domain by showing three clear natural frequencies, while the relative displacements obtained from the UAV were significantly different compared to the absolute displacements from the stationary camera. Note that the CPSD obtained from the UAV had a large discrepancy in the DC component, with references due to the motion of the UAV.

Comparison of the Dynamic Responses
Using the proposed procedure, the relative displacements of the six-story building model using a UAV were obtained, analyzed, and compared with the ones from a stationary camera on the ground. Figure 7 shows that there was large discrepancy between the relative displacements obtained from the UAV and the absolute displacements obtained from a stationary camera. It is easy to conclude that the motion of the UAV causes the drifts occurring in all six stories, since the trends are consistent across the floors. Figure 8 shows the CPSD of responses from the 5th floor and 6th floor of the shear building model from the UAV and two reference measurements from the stationary camera and accelerometers. Figure 8 shows that the cross power spectral densities from the UAV agreed very well with the reference in the frequency domain by showing three clear natural frequencies, while the relative displacements obtained from the UAV were significantly different compared to the absolute displacements from the stationary camera. Note that the CPSD obtained from the UAV had a large discrepancy in the DC component, with references due to the motion of the UAV.  Based on the CPSD, NExT-ERA was then performed to conduct system identification; Table 1 and Figure 9 compare the results, in which the following observation can be made.


Assuming that the system identification result from the accelerometers was the most reliable, the result from both the stationary camera and the UAV showed around 1% of maximum error in terms of natural frequency estimation. The comparison indicates that the proposed method using the UAV provides as accurate natural modes as those obtained from the stationary camera and accelerometers.  Mode shapes extracted by the proposed method using the UAV were compared with ones from the stationary camera and the accelerometers in terms of the mode assurance criteria (MAC) value; all three mode shapes showed over 99% consistency when compared with the reference, and are shown in Figure 9.  Based on the CPSD, NExT-ERA was then performed to conduct system identification; Table 1 and Figure 9 compare the results, in which the following observation can be made.


Assuming that the system identification result from the accelerometers was the most reliable, the result from both the stationary camera and the UAV showed around 1% of maximum error in terms of natural frequency estimation. The comparison indicates that the proposed method using the UAV provides as accurate natural modes as those obtained from the stationary camera and accelerometers.  Mode shapes extracted by the proposed method using the UAV were compared with ones from the stationary camera and the accelerometers in terms of the mode assurance criteria (MAC) value; all three mode shapes showed over 99% consistency when compared with the reference, and are shown in Figure 9. Based on the CPSD, NExT-ERA was then performed to conduct system identification; Table 1 and Figure 9 compare the results, in which the following observation can be made.

•
Assuming that the system identification result from the accelerometers was the most reliable, the result from both the stationary camera and the UAV showed around 1% of maximum error in terms of natural frequency estimation. The comparison indicates that the proposed method using the UAV provides as accurate natural modes as those obtained from the stationary camera and accelerometers. • Mode shapes extracted by the proposed method using the UAV were compared with ones from the stationary camera and the accelerometers in terms of the mode assurance criteria (MAC) value; all three mode shapes showed over 99% consistency when compared with the reference, and are shown in Figure 9.

Discussion
The proposed method was able to cancel out the three translational motion (i.e., , and ) without directly calculating the camera motion. As shown in Section 2, the adaptive scale factor introduced in Equation (6) could resolve the issue of out-of-plane motion (Z-direction) of the camera. The cross-correlations of the projected displacement introduced in Equation (11) could cancel out the other two translation motions (X and Y direction) of the camera.
In this study, the validation test was conducted at an indoor laboratory where wind affect was negligible. Due to the environment, the assumption of the small rotational motion of the camera was able to hold. However, when subjected to strong wind, the assumption of small rotational motion cannot hold anymore, and thus the correlation will not be invariant to the camera motion. For example, displacements at each story will be subjected to the same amount of translation when the rotational motion is negligible. On the other hand, the displacement of each story will be subjected to a different amount of translation when the rotational motion is large, resulting in inconsistent correlations. Therefore, in such cases, a method calculating absolute displacement such as Yoon et al. [14] should be applied, even though the method requires additional stationary points and a longer calculation time.

Conclusions
This paper presents a vision-based method for system identification using consumer-grade cameras in a UAV to conduct system identification. Unlike traditional, vision-based system identification, three major problems arise due to the non-stationary motion of the UAV. These are: (1) change in scale factor; (2) the rolling shutter effect; and (3) conducting system identification using the relative displacement obtained from the aerial camera. To address these issues, novel image-

Discussion
The proposed method was able to cancel out the three translational motion (i.e., t x , t y and t z ) without directly calculating the camera motion. As shown in Section 2, the adaptive scale factor introduced in Equation (6) could resolve the issue of out-of-plane motion (Z-direction) of the camera. The cross-correlations of the projected displacement introduced in Equation (11) could cancel out the other two translation motions (X and Y direction) of the camera.
In this study, the validation test was conducted at an indoor laboratory where wind affect was negligible. Due to the environment, the assumption of the small rotational motion of the camera was able to hold. However, when subjected to strong wind, the assumption of small rotational motion cannot hold anymore, and thus the correlation will not be invariant to the camera motion. For example, displacements at each story will be subjected to the same amount of translation when the rotational motion is negligible. On the other hand, the displacement of each story will be subjected to a different amount of translation when the rotational motion is large, resulting in inconsistent correlations. Therefore, in such cases, a method calculating absolute displacement such as Yoon et al. [14] should be applied, even though the method requires additional stationary points and a longer calculation time.

Conclusions
This paper presents a vision-based method for system identification using consumer-grade cameras in a UAV to conduct system identification. Unlike traditional, vision-based system identification, three major problems arise due to the non-stationary motion of the UAV. These are: (1) change in scale factor; (2) the rolling shutter effect; and (3) conducting system identification using the relative displacement obtained from the aerial camera. To address these issues, novel image-processing techniques for compensating errors and a cross-correlation-based system identification process were proposed in the paper.
An image-processing technique is applied to extract relative structural displacements when the camera is subjected to the nonstationary motion of a UAV. Based on the information extracted from vision-based displacement measurements, the signal is corrected in two steps. First, adaptive scaling is developed and applied to update the time-varying scale factor with each frame. Second, a rolling shutter compensation is applied which compensates for the image distortion due to progressive scanning of CMOS sensors. However, the displacements extracted from the camera in the UAV still contain not only the motion of the target structure but also the motion induced by the movement of the UAV. The proposed method eliminates the effect of motion of the UAV for system identification by employing the NExT-ERA method, which uses cross-correlation between the extracted displacements.
An experimental test was carried out on a six-story shear building model with the UAV to validate the proposed method. A stationary camera and accelerometers were employed to obtain reference measurement. The shear building model was excited by a shaking table with band-limited white noise. The experimental results show that the change in scale factor due to the out-of-plane motion of the UAV (up to 5.7%) was successfully compensated through the proposed method. Also, the results of system identification indicate that the proposed UAV-based method estimated the natural frequency, with a maximum error of 1%, and the mode shapes, with MAC value of as low as 99.67, for mode shape extraction when compared with the reference obtained from the accelerometers. The experimental results show that the proposed method has the potential to identify the natural frequencies and the mode shapes with reasonable levels of accuracy.
The proposed method can estimate the modal properties of a target structure without identifying the location of a UAV in real-time for system identification, which is highly expensive in computation. However, careful attention must be paid to the hovering motion of the UAV, which accounted for 0-1 Hz in the frequency range. As the UAV's motion is predominantly below 0.5 Hz, it is likely to miss the natural frequencies that are located below 0.5 Hz.