Computer Vision Technique for Blind Identiﬁcation of Modal Frequency of Structures from Video Measurements †

: Operational modal analysis (OMA) is required for the maintenance of large-scale civil structures. This paper developed a novel methodology of non-contact-based blind identiﬁcation of the modal frequency of a vibrating structure from its video measurement. There are two stages in the proposed methodology. The ﬁrst stage is extracting the motion data of the vibrating structure from its video using a complex steerable pyramid. In the second stage, the principal component analysis combined with analytical mode decomposition is used for modal frequency separation from the motion data. Numerical validation of the methodology on a 10 DOF model is presented. The application of the proposed methodology on the London Millennium Bridge is also presented.


Introduction
In structural health monitoring (SHM), modal analysis of structures is considered an important aspect. The operational modal analysis (OMA) relies only on the response data collected from the sensor attached to the structure, independent of the force excitation [1]. Modal parameters depend on the accuracy of the data collected from the sensors attached to the structure. Sensors commonly used for the OMA are contact accelerometers, adding additional mass to the structure. These mass loading effects can be corrected, but they are not accurate [2]. Physically attached sensors have proved that spatial resolution of the measurement critically limits the effectiveness of standard mode shape-based damage detection and localization methods [3]. Non-contact methods of OMA overcame the drawbacks of sensor-based measurement. Microwave interferometers are used to analyze the interference reflected off the vibrating target surface for displacement response [4]. Laser Doppler vibrometer (LDV) measures the velocity of a point projected by a focused laser beam, using the Doppler shift between the incident and scattered light returning to the measuring instrument [5][6][7]. LDV provides accurate results and can be used to find the modal parameters of structures that are inaccessible. However, microwave interferometers and LDV are expensive. Other methods through computer vision techniques, such as digital image correlation and 3-dimensional point tracking techniques, can estimate modal parameters from a video measurement. However, these techniques require a speckle pattern/markings placed on the structure [8,9].
An alternative non-contact measurement system is to employ the computer vision technique using a digital high-speed video camera which is low-cost, convenient and desirable for high-resolution measurement. Generally, civil structures, such as bridges and other large structures, have a low natural frequency, thus cameras capable of recording video at 30 frames per second (FPS) can be used. The video should have neither disturbances nor artifacts. The spatial domain consists of pixel information, such as intensity levels, whereas the temporal part contains the framerate of image sequences in a video. Pixels of each frame contain the motion data of the objects in the video, which can be magnified using a phase-based video motion magnification technique that can magnify the local motions of objects and translate the noise present in the video [10,11]. It enables referring to the subtle motions, which are hard to perceive with the naked eye. Each frame in the video is decomposed into multi-scale and multi orientations using complex steerable pyramids [12]. The multi-scale decomposition of the video frames enables measuring the phase information of each frame, which can be manipulated to magnify the motion in the video. Phase-based video motion magnification works as a basis for many methods, such as modal identification of simple structures [13] and OMA of a light pole-viaduct system [12,14].
Saravanan et al. [12] used phase-based video motion magnification for the identification of modal frequencies of the system. The method in [12] requires the approximate frequency range of the target structure for modal frequency identification. The current work overcame this difficulty by using statistical and analytical methods. In this paper, a computer vision-based vibration measurement of the structures using PCA and analytical mode decomposition (AMD) methods for blindly identifying the modal frequencies. Firstly, the current methodology is validated on a 10 degree of freedom (DOF) numerical model and the proposed methodology is applied to the practical field measurement videos of the London Millennium Bridge, and natural frequencies are extracted. The obtained results are in good agreement with the reference sensor values.

Methodologies
The two main methodologies are implemented in this study to obtain the modal frequencies from a camera-based video measurement. Figure 1 demonstrates the flowchart of the proposed method for OMA using non-contact video measurements with comprehensive procedures essential in each step.
Eng. Proc. 2021, 10, 12 2 of 8 video at 30 frames per second (FPS) can be used. The video should have neither disturbances nor artifacts. The spatial domain consists of pixel information, such as intensity levels, whereas the temporal part contains the framerate of image sequences in a video. Pixels of each frame contain the motion data of the objects in the video, which can be magnified using a phase-based video motion magnification technique that can magnify the local motions of objects and translate the noise present in the video [10,11]. It enables referring to the subtle motions, which are hard to perceive with the naked eye. Each frame in the video is decomposed into multi-scale and multi orientations using complex steerable pyramids [12]. The multi-scale decomposition of the video frames enables measuring the phase information of each frame, which can be manipulated to magnify the motion in the video. Phasebased video motion magnification works as a basis for many methods, such as modal identification of simple structures [13] and OMA of a light pole-viaduct system [12,14]. Saravanan et al. [12] used phase-based video motion magnification for the identification of modal frequencies of the system. The method in [12] requires the approximate frequency range of the target structure for modal frequency identification. The current work overcame this difficulty by using statistical and analytical methods. In this paper, a computer vision-based vibration measurement of the structures using PCA and analytical mode decomposition (AMD) methods for blindly identifying the modal frequencies.
Firstly, the current methodology is validated on a 10 degree of freedom (DOF) numerical model and the proposed methodology is applied to the practical field measurement videos of the London Millennium Bridge, and natural frequencies are extracted. The obtained results are in good agreement with the reference sensor values.

Methodologies
The two main methodologies are implemented in this study to obtain the modal frequencies from a camera-based video measurement. Figure 1 demonstrates the flowchart of the proposed method for OMA using non-contact video measurements with comprehensive procedures essential in each step.

Phase Extraction Using Complex Steerable Pyramids
The time history response of a structure can be measured from a video, as the frames contain the temporally displaced intensity of a pixel represented by I(x+d(x,t)), where x is the pixel coordinate and d(x,t) represent spatially local and temporally varying motion. The multi-scale and multiband decomposition technique used to extract the phase d(x,t) encoded in the I(x+d(x,t)), is known as the complex steerable pyramid. According to Simoncelli and Freeman [11], the steerable pyramid algorithm initially divides a given image into a high-frequency part and a low-frequency part. The bandpass-oriented filters bp are then applied sequentially to the low-frequency image followed by down sampling. It

Phase Extraction Using Complex Steerable Pyramids
The time history response of a structure can be measured from a video, as the frames contain the temporally displaced intensity of a pixel represented by I(x+d(x,t)), where x is the pixel coordinate and d(x,t) represent spatially local and temporally varying motion. The multi-scale and multiband decomposition technique used to extract the phase d(x,t) encoded in the I(x+d(x,t)), is known as the complex steerable pyramid. According to Simoncelli and Freeman [11], the steerable pyramid algorithm initially divides a given image into a high-frequency part and a low-frequency part. The bandpass-oriented filters b p are then applied sequentially to the low-frequency image followed by down sampling. It forms a pyramid, including high-frequency and low-frequency residuals and levels with certain scales and orientations.
The phase d(x,t) of each pixel is extracted by constructing the complex steerable pyramids. This phase contains a temporal mean 2πωx; after removing the temporal mean, Eng. Proc. 2021, 10, 12 3 of 8 we get d (x, t) = 2πωd(x, t) which can be expressed by modal superposition as a linear combination of modal responses.
where φ(x) is a mode shape matrix with τ i (x) as the ith mode shape and q(t) is the modal response vector with q i (t) as the ith modal coordinate. Equation (1) is overcomplete with high spatial dimension (due to large number of pixels) and low modal dimension over the complete model, thus the modal identification problem cannot be solved directly [15]. The dimension of the phase matrix is reduced by PCA and then AMD is used for separating the signals.

Principal Component Analysis
The obtained motion matrix is large in terms of the matrix's data being represented by its principal components. Thus, dimension reduction is accomplished by PCA. The singular value decomposition of the motion matrix (d ) is, where Σ is a diagonal matrix containing t (t is the number of elements) diagonal elements, σ i as the ith singular value (σ 1 ≥ . . . ≥ σ i ≥ . . . ≥ σ T ) and U and V are the matrices of the left and right singular vectors obtained by eigenvalue decomposition (EVD) of the covariance matrices of d (refer to Equations (3) and (4)) The rank of d is r if the number of non-zero singular values is r. σ i is directly related to the ith principal direction vector of d . If its mass matrix is proportional to its uniform mass distribution identity matrix for a lightly damped structure, then, principal directions will converge to modal shape direction [15]. The structure's active modes, under broadband excitation, are projected onto the r principal components. Empirically, it is observed that principal active components are less compared with the matrix's spatial dimension. Thus PCA significantly reduces the dimension of the motion matrix by projecting it linearly onto a small number of principal components.
where ζ is a matrix containing principal components of d . PCA also reserves the matrix ζ; d is obtained by using, These principal components contain the information of the dominant frequency modes. The average of these principal components is taken as input for analytical mode decomposition.

Validation of Proposed Method on Numerical Model
The proposed method, which uses PCA and AMD to identify the modal frequencies, is applied to a 10-DOF model for validating the technique. The twelve DOF model is excited with an initial velocity at the twelfth DOF, and the output is collected at all the 10-channels in terms of displacement y(t). The 10-DOF system is represented as masses connected with springs, as shown in Figure 2.

Validation of Proposed Method on Numerical Model
The proposed method, which uses PCA and AMD to identify the modal frequencies, is applied to a 10-DOF model for validating the technique. The twelve DOF model is excited with an initial velocity at the twelfth DOF, and the output is collected at all the 10channels in terms of displacement y(t). The 10-DOF system is represented as masses connected with springs, as shown in Figure 2. Among twelve masses, the first (m1) and last (m3) masses are 2 kg, and all other masses are 1 kg, as represented in Figure 2. The stiffness of all the springs used is 20 KN. The damping matrix is taken proportional to the mass matrix. The first four theoretical mode shapes are used to construct the new response ( ). The new displacement response ( ) is the input for the PCA. The PCA gives the number of components through the eigenvalues of the covariance matrix of displacement data, and it identified that there are only four active components. The results are shown in Figure 3 and Table 1. Among twelve masses, the first (m1) and last (m3) masses are 2 kg, and all other masses are 1 kg, as represented in Figure 2. The stiffness of all the springs used is 20 KN. The damping matrix is taken proportional to the mass matrix. The first four theoretical mode shapes are used to construct the new response y(t). The new displacement response y(t) is the input for the PCA. The PCA gives the number of components through the eigenvalues of the covariance matrix of displacement data, and it identified that there are only four active components. The results are shown in Figure 3 and Table 1.

Implementation of Proposed Method on Full-Scale Video Measurement of London Millennium Bridge
The proposed method is implemented in full-scale field measurement to obtain the vibration response of the London Millennium footbridge, also known as the wobbly bridge [17]. It is a steel suspension bridge, as shown in Figure 4, and it shows the cropped frame of a video [18] to the region of interest used for the blind identification of modal frequency. The cropped video has a resolution of 480 p, 480 pixels in width, and 60 pixels in height. The number of frames used is 600, with a frame rate of 30 FPS. The bridge swaying occurs as the pedestrians' walking frequencies and the bridge's natural frequency range matches well. Only three frequencies are detected as the pedestrians walking patterns might have only three dominant frequencies. The three modes are identified from the EVD plot from the implementation of the PCA-AMD algorithm. The modal coordinates and their frequency values are presented in Figure 5. The modal coordinates are not accurate and are non-decaying due to the pedestrian's movement. Table 2 shows the comparison between the estimated results with the sensor data, and they are in good agreement with higher than 99% accuracy. The results have revealed that the proposed method can be extended to other spontaneous robust non-contact OMA structures.

Conclusions
This study develops a hybrid output-only OMA algorithm that uses PCA and AMD to blindly extract the modal frequencies and modal coordinates from line-of-sight video measurement of structures. The 10-DOF dynamic numerical model validation resulted in higher than 99% accuracy in detecting the modal frequencies. The proposed methodology is implemented on practical full-field videos recorded on the London Millennium Bridge, resulting in modal frequencies with an accuracy of 99%. The modal coordinates are nondecaying in nature for the bridges because of the external loading factors.