Vision-Based Vibration Monitoring of Structures and Infrastructures: An Overview of Recent Applications

: Contactless structural monitoring has in recent years seen a growing number of applications in civil engineering. Indeed, the elimination of physical installations of sensors is very attractive, especially for structures that might not be easily or safely accessible, yet requiring the experimental evaluation of their conditions, for example following extreme events such as strong earthquakes, explosions, and ﬂoods. Among contactless technologies, vision-based monitoring is possibly the solution that has attracted most of the interest of civil engineers, given that the advantages of contactless monitoring can be potentially obtained thorough simple and low-cost consumer-grade instrumentations. The objective of this review article is to provide an introductory discussion of the latest applications of vision-based vibration monitoring of structures and infrastructures through an overview of the results achieved in full-scale ﬁeld tests, as documented in the published technical literature. In this way, engineers new to vision-based monitoring and stakeholders interested in the possibilities of contactless monitoring in civil engineering could have an outline of up-to-date achievements to support a ﬁrst evaluation of the feasibility and convenience for future monitoring tasks.


Introduction
The experimental method, whose founder is considered to be Galileo Galilei (1564-1642), is the basis of all applied sciences and engineering. Its fundamental principle is that science is based on experience, the starting point to formulate scientific laws and the criterion to verify their validity. Inevitably, the experimental method requires the definition of procedures for performing tests as well as technologies and instruments to acquire and eventually post-process the experimental outcomes. If attention is focused on civil engineering, experimental testing involves different levels: materials, structural components, connections and sub-assemblies, small-scale structures, full-scale structures, and infrastructures.
Loading tests on structural materials are common practice to evaluate their mechanical characteristics and performances, e.g., strength and deformation capacity under monotonic or cyclic conditions. Many of such material tests follow specific protocols detailed by building codes when used for the qualification of construction materials. Laboratory testing of structural components, connections, sub-assemblies, small structures such as downscaled prototypes, or full-scale building splices are, for example, essential to validate or calibrate prediction models and support the development of innovative solutions. Such structural tests are commonly performed up to different levels of damage or even failure; many testing options are possible depending on the structural aspects being investigated, e.g., from quasi-static monotonic and cyclic loading tests to high-energy dynamic tests such as those performed on shake tables to simulate destructive seismic events.
The situation changes when the size of the structure increases and experimental testing is required to study the behavior of large structures and infrastructures. In this case, high-energy loading is generally not feasible (because of technical and economic limits) although possibly familiar with consolidated monitoring methodologies. Accordingly, the objective of this review article is to provide an introductory discussion of the latest state-ofthe-art of vision-based dynamic monitoring of structures and infrastructures through an overview of the results achieved in field monitoring of vibrations for full-scale case studies. In this way, engineers and stakeholders interested in the possibilities of contactless monitoring of structures and infrastructure could have an overview of up-to-date achievements of vision-based techniques to support a first evaluation of the feasibility and convenience for future monitoring tasks.

Monitoring Process
A vision-based system could consist of a set of video cameras connected to a computer installed with software having real-time processing capacity of the acquired images, or could be made by a set of video cameras whose recordings are only acquired during monitoring and later processed. Depending on the distance between the cameras and the structure to be monitored, appropriate lenses must be selected to obtain images with adequate resolution, indispensable to track the motion of the selected targets with sufficient accuracy, e.g., [65,66,71,73]. Lighting lamps could be added for conducting measurements in positions with scarce illumination or even at night.
The monitoring process roughly consists of the following phases: (1) installation, i.e., the video cameras equipped with the selected lenses are placed on tripods in the most convenient locations, connected to the computer and synchronized; for each video camera the targets to be tracked are set (depending on post-processing procedures, they could be, for example, applied markers or existing textures in the structure surface); (2) calibration, i.e., the relationship between the pixel coordinates and the physical coordinates is obtained, usually based on known physical dimension on the object surface and its corresponding image dimension in pixels; and (3) video acquisition and processing, i.e., the videos are recorded and the motion of each target is tracked in the image sequences; as a result, the displacement time history is given as output. A schematic representation of this simple flowchart is depicted in Figure 1, with the sources of errors and uncertainties discussed in the following paragraph. conditions on acquired measurements. The abundancy of information is inevitable, given the large number of research contributions published in the last decade, as detailed in the third paragraph of this review article. All such information could be overwhelming to structural engineers and stakeholders with no background on vision-based monitoring, although possibly familiar with consolidated monitoring methodologies. Accordingly, the objective of this review article is to provide an introductory discussion of the latest state-of-the-art of vision-based dynamic monitoring of structures and infrastructures through an overview of the results achieved in field monitoring of vibrations for full-scale case studies. In this way, engineers and stakeholders interested in the possibilities of contactless monitoring of structures and infrastructure could have an overview of up-to-date achievements of vision-based techniques to support a first evaluation of the feasibility and convenience for future monitoring tasks.

Monitoring Process
A vision-based system could consist of a set of video cameras connected to a computer installed with software having real-time processing capacity of the acquired images, or could be made by a set of video cameras whose recordings are only acquired during monitoring and later processed. Depending on the distance between the cameras and the structure to be monitored, appropriate lenses must be selected to obtain images with adequate resolution, indispensable to track the motion of the selected targets with sufficient accuracy, e.g., [65,66,71,73]. Lighting lamps could be added for conducting measurements in positions with scarce illumination or even at night.
The monitoring process roughly consists of the following phases: (1) installation, i.e., the video cameras equipped with the selected lenses are placed on tripods in the most convenient locations, connected to the computer and synchronized; for each video camera the targets to be tracked are set (depending on post-processing procedures, they could be, for example, applied markers or existing textures in the structure surface); (2) calibration, i.e., the relationship between the pixel coordinates and the physical coordinates is obtained, usually based on known physical dimension on the object surface and its corresponding image dimension in pixels; and (3) video acquisition and processing, i.e., the videos are recorded and the motion of each target is tracked in the image sequences; as a result, the displacement time history is given as output. A schematic representation of this simple flowchart is depicted in Figure 1, with the sources of errors and uncertainties discussed in the following paragraph.

Hardware installation Calibration
Video acquisition and processing

Sources of errors and uncertainties
Hardware Calibration and software Environment

Errors and Uncertainties
Differently from other measurement approaches where the accuracy of the employed sensors/systems is provided by their manufacturers and generally remains stable within assigned operational conditions during a given calibration time span, the accuracy of visionbased systems cannot be related solely to the technical specifications of the video cameras. The accuracy determination in vision-based monitoring is a rather complex problem as it depends on a multifaceted combination and interaction of different parameters. The sources of errors and uncertainties in vision-based monitoring can be subdivided in three groups: (1) intrinsic to the monitoring hardware, e.g., optical distortions and aberrations in the lenses, limitations in the resolution, and performance of the sensor of the video camera; (2) relevant to the software and calibration/synchronization process, e.g., limitations in the motion tracking algorithm, synchronization lags among cameras, and round-offs in camera calibration; and (3) environmental, e.g., influence of the location where the camera is installed, vibrations induced in the camera-tripod system, variable ambient light, and non-uniform air refraction due to variable temperatures between installed cameras and the structure being monitored. These sources inevitably influence each other, for example, the resolution of the hardware influences the precision that can be achieved in the calibration, which is in turn influenced by the environmental conditions. The scheme depicted in Figure 1 summarizes the possible interactions between the three phases of the vision-based monitoring process and the sources of errors and uncertainties.
Investment can be made in the hardware (high quality cameras and lenses), in up-todate software, in efforts to access the most favorable locations for camera installation, and in accurate controls of the calibration and synchronization. Nevertheless, the variability of the environmental parameters might still jeopardize the quality of the results; this is a concern especially for long-term field monitoring as required in structural health monitoring, which faces large variations in ambient light, temperature, humidity, wind, and other possible interferences inducing vibrations in the cameras. As a consequence, these sources of errors and uncertainties have a larger impact on vision-based monitoring as compared to the case of conventional monitoring procedures when sensors are in direct contact with the object being monitored.
Different studies on the assessment of the errors and uncertainties in vision-based monitoring can be found in the earlier works, e.g., [65,71], as well as in the recent literature, e.g., [94][95][96][97][98][99][100][101][102][103][104][105][106][107], mostly through theoretical analyses and laboratory testing aimed at evaluating the influence of specific aspects related to the hardware or to external causes. For example, D'Emilia et al. [97] investigated how two different types of camera could influence the accuracy of vibration monitoring based on video acquisition. Both cameras had the same sensor resolution (1280 × 1024 pixels), but with two different maximum frame rates at full resolution: 25 frames per second (FPS), as found in low cost consumergrade cameras, and 2000 FPS, as found in high-speed more expensive industrial cameras. Tests were made in a laboratory under controlled conditions; laser vibrometry as well as contact accelerometers were used to evaluate the accuracy of the vision-based systems. It was observed that, if a slow camera (25 FPS) is used, together with the techniques of controlled aliasing, the experimental results showed that the vibration uncertainty is in the order of 3.4% of the vibration amplitude for vibrations in the range of 10-70 Hz. If a high speed camera (2000 FPS) is used, the experimental data showed 8% relative uncertainty to the vibration amplitude in the frequency range of 100-300 Hz and 13% in the frequency range of 400-600 Hz. Given that the frequency range of interest in civil structures and infrastructure is well below 70 Hz, the effect of the acquisition frame rate on the evaluation of the measured amplitude of vibration could be expected to be limited.
Another example of uncertainty evaluation is a laboratory study by Liu et al. [102], focused on the influence of the distance between camera and object being monitored, focal length of the lens, and calibration process. The results showed that the uncertainty in the measurements (displacements in the considered tests) increased with distance and decreased with the increase of focal length; using a longer known length in the calibration process could greatly reduce the measurement uncertainty; the measurement uncertainty was more sensitive to the uncertainty of the known length used in calibration than the projection of the known length in the image. However, as the distance increased, the sensitivity to the known length was weaker and the sensitivity to the projection of the known length in the image was stronger. When a longer focal length was used, the influence of the working distance to the sensitivity was weaker. Indeed, these indications provide useful support in preliminary choices when designing the configuration and installation of a vision-based monitoring system.
In addition to the studies on the assessment of the errors and uncertainties mentioned earlier [65,71,[94][95][96][97][98][99][100][101][102][103][104][105][106][107], important information on the accuracy of vision-based systems could be obtained from the outcomes of field application, as discussed in the following paragraph. Inevitably, monitoring of full scale structures and infrastructures poses more difficulties in the evaluation of the sources of errors and uncertainties, given the number of possible concurring causes in the field as compared with the laboratory.

General Overview
Many published works presenting applications of vision-based monitoring in civil engineering can be found in the technical literature. Contributions (only refereed journal articles are here considered) can be organized in three areas of monitoring applications: (1) measurements of displacements and strains under static and quasi-static loadings ; (2) measurements of displacement time histories in prototypes or small-scale structures in controlled environmental conditions, typically in a laboratory, ; (3) field measurements of displacement time histories in full scale structures ; (4) development of sensors using vision-based techniques [201][202][203][204][205][206][207]; and (5) field measurements of moving components, as in the case of wind turbines, e.g., [208][209][210][211]. Such a subdivision is made regardless of the adopted vision-based techniques and image processing algorithms. It should be remarked that overlaps exist between these monitoring applications, as in some cases, there are publications that, prior to field testing, illustrate preliminary laboratory validations. Hence, the proposed subdivision should be considered on the basis of the main contribution provided.
Attention in this review article is given to the analysis of recent results obtained in vibration (displacement time histories) monitoring of civil engineering structures and infrastructures in the field, as documented in refereed journal articles published in the last four years [183][184][185][186][187][188][189][190][191][192][193][194][195][196][197][198][199][200]. The results presented are subdivided into six structural groups: steel bridges, steel footbridges, steel structures for sport stadiums, reinforced concrete structures, masonry structures, and timber footbridge. For each field study, a short description of the monitored structure is provided, with a summary of the main information and conclusions provided in the publication. A list of the considered applications is reported in Table 1; it is observed that half of them are in the U.S.A. and that bridges/footbridges are the most recurring structures.
For each reference, some essential information on the adopted hardware is provided in Table 2, alongside video processing (optical flow, template matching, feature matching, motion magnification, and proprietary commercial software), loading condition during monitoring, as well as comparisons with monitoring using other technologies. In this way, Tables 1 and 2 are supposed to serve as a guide to the following paragraphs, each dedicated to one of the six structural groups, presented in the same order used in the tables. Template mat. Imetrum [87] Passage of trains Low cost and high-end vision-based, accelerometers [193] Go Pro, 1920 × 1080, 30 Template mat. Crowd of pedestrians Wireless accelerometers [196] Go Pro, 1920 × 1080, 25 Template mat. Crowd of pedestrians Accelerometers [197] DJI 3840 × 2160, 30 Optical flow Walk, running, jumping Accelerometers [199] Low cost, 1920 × 1080, 60 Feature mat. Walk, running, jumping Accelerometers [184,185,198] Canon, N/A, 30  It is anticipated that comparisons in all cases provided good correlations between vision-based monitoring and the other considered technologies, with one exception being the steel footbridge (vertical truss frames) tested by Dong et al. [199], where differences between accelerometers and vision-based measurements were not negligible. It should be remarked that, in four cases, no direct comparisons were made: Shariati and Schumacher [183], as well as Feng and Feng [187], compared the magnitudes of the measurements to those obtained in previous tests, concluding that such comparisons were favorable; in Dhanasekar et al. [195], the outcomes of the experimental monitoring were satisfactory compared with numerical simulations in terms of magnitude of the monitored structural parameters; and in Lydon et al. [196], vision-based monitoring was part of an integrated monitoring system that included fiber optics with the objectives to complement the two systems.

Steel Bridges
Feng and Feng [187] presented the outcomes of vision-based field monitoring of the Manhattan Bridge (New York, NY, USA) using a single camera for remote real-time displacement measurements at one single point and simultaneously at multiple points. The Manhattan Bridge, opened to traffic in 1909, is a suspension bridge spanning the East River in New York City, connecting Manhattan and Brooklyn; the main span is 448 m long; the deck is 36.5 m wide, including seven lanes in total and four subway lanes. The camera was placed on stable stone steps around 300 m away from the bridge mid-span and the video recording was made using a frame rate of 10 FPS. The known dimensions (7.2 m) of the vertical trusses were used for camera calibration. Displacement responses at one single point at the mid-span region were measured during the passage of subway trains, having estimated the scale factor as 20.5 mm/pixel. The authors commented that the dynamic displacement response was similar to that measured by GPS and interferometric radar systems in previous studies. Then, by zooming out the lens to obtain a large field of view (FOV), i.e., the area that is visible in the image, three points at the mid-span region were selected and a scaling factor of about 36 mm/pixel was estimated. The authors commented that such measures displayed more fluctuations, especially for small displacement amplitudes, as a consequence of the larger FOV, determining a decreased measurement resolution compared with the single point case. In addition, the authors studied the influence of the camera vibration during the field measurements. Such a test was conducted by looking at a building in the background and tracking its apparent motion; the camera motion was estimated with the assumption that the building was not moving. The authors concluded that, compared with the bridge displacement, the camera motion was insignificant.
Chen et al. [189] illustrated their application of field vision-based monitoring of the WWI Memorial Bridge, a vertical-lift truss bridge, spanning the Piscataqua River (USA) from Portsmouth, New Hampshire, to Badger's Island in Kittery, Maine, with a total length of 366 m. Measurements of the vibrations due to the lift span impact excitation and normal in-service traffic were made with a single video camera and compared with the results from accelerometers and strain gauges. The camera was placed over 80 m away from the bridge in a nearby park, on a heavy tripod with an accelerometer installed to measure the camera motions. Manual calibration was made based on known dimensions of the structural elements. The videos were processed using a technique inspired by motion magnification and detailed in [140,188]. Two accelerometers and two strain gauges were placed on the bridge to compare the remote video camera measurements with those from the contact sensors. The results for both lift span impact and in-service traffic were identical in terms of peaks in the frequency spectra; in addition, the authors observed that the time-series measurements also compared favorably.
Xu et al. [194] presented their experience in field vision-based monitoring of the Mineral Line Bridge, a skew steel girder bridge with a span length of 14.7 m, carrying the West Somerset Railway near Watchet (UK). Three sensing systems were adopted and compared: one consumer-grade camera, a high-end commercial vision-based system, and two accelerometers. The authors observed that a vision-based monitoring system using a single consumer-grade camera could provide an accurate characterization of the bridge in favorable test conditions, which included choosing salient target patterns for tracking and avoiding any camera shake. Regarding the control of the camera shake, the criterion for camera stability evaluation was proposed by the authors based on the tracked motions of a stationary target, and the correction was performed only when necessary, given that tracking the nominal motion of an adjacent stationary object was very effective to remove the low-frequency drift error, but the measurement resolution was possibly reduced in this way. In addition, the authors investigated a data fusion method to combine the vision-based measurement with data from accelerometers. Such a method was shown to be capable of denoising the measurement and providing better estimates. Accordingly, the authors concluded that mixed systems consisting of cameras and accelerometers overcame the field testing limitations of vision-based monitoring and had the potential for accurate and robust sensing on bridge structures.

Steel Footbridges
Xu et al. [193] illustrated the activities for field vision-based monitoring of the Baker Bridge, a cable-stayed footbridge spanning 109 m over the A379 dual-carriageway in Exeter (UK). The bridge provides cyclist and pedestrian access to the Sandy Park Stadium and experiences heavy pedestrian traffic on match days. The bridge comprises a single A-shaped tower that supports the continuous steel deck over a simple support at the pylon cross-beam and via seven pairs of stay cables. Because of the range of frequencies of its first vibration modes, the bridge is prone to noticeable vibration response owing to pedestrian traffic. A consumer-grade camera was mounted on the top of a tripod at the central reservation of the A379 carriageway below and approximately 55.30 m from the bridge tower. Video recording was done at 30 FPS. Camera calibration was set using the known structural dimensions from the as-built drawings, using a narrow FOV setting. Four triaxial wireless accelerometers were installed in the bridge deck to validate the results obtained from processing the images acquired by the video camera. The monitoring of the bridge included periods when large crowds of spectators crossed the deck. The results in terms of identified modal frequencies of the bridge deck as obtained from visionbased monitoring accurately matched those obtained for the contact accelerometers. In addition, measurements of cable vibration using the vision-based system were performed and compared to the results from two triaxial wireless accelerometers installed on the cables. The authors concluded that the vision-based system works better to capture the lower modal frequencies of cables, whereas the accelerometers provide reliable estimations of higher frequency modes. However, the multipoint deformation data obtained using the vision system proved to be effective for tracking cable dynamic properties at the same time as bridge deformation, allowing for the effect of varying load on cable tensions to be observed. In this way, a powerful diagnostic capability for larger cable-supported structures was achieved.
Lydon et al. [196] presented the field vision-based monitoring of The Peace Bridge over the Foyle River in Derry (North Ireland), a self-anchored suspension bridge with a single 96.3 m suspended central span, two suspended 63.4 m (east and west) side spans, and sections not supported by cables carried by guided supports between the side spans and abutments. The bridge was monitored using 14 accelerometers and a low-cost camera installed on a tripod on the east bank of the river, at 71.2 m from the mid-span of the east span. The video camera pointed at the position of an accelerometer to validate the results obtained from video acquisition. Monitoring was performed for ambient input as well as during the flow of a large number of pedestrians during a local event. Camera vibration caused by environmental conditions was removed through image stabilization using the stationary building in the background of the image as a reference point. The authors observed a very good correlation of the measurements achieved from the camera with those obtained from the accelerometer, with the findings considered very promising for this low cost monitoring system. The authors also commented that the single camera was set up in a matter of minutes, compared with several hours required to place and run cables to the accelerometers along the 312 m footway.
Hoskere et al. [197] illustrated the field vision-based monitoring of the Little Golden Gate Bridge over Lake of the Woods in Mahomet, Illinois, about 18 km northwest of Champaign, IL (USA). A pedestrian suspension bridge made of steel girders and cable with wooden slats on the deck, spanning 67 m with 10 m tall posts on either side, the deck is suspended by cables with hangers at every meter. Bridge vibrations were monitored using a camera installed on an unmanned aerial vehicle (UAV) with thirteen markers affixed at regular intervals on one side of the bridge. For comparison, four accelerometers were installed on the first half span. The bridge was excited by three pedestrians jumping on the second half-span. The test was conducted in challenging field conditions with wind speed between 25 and 35 km/h. The modal properties as determined by the vision-based approach were compared with the results from accelerometers. The corresponding modal assurance criterion (MAC) values were all above 0.925, and the difference in the natural frequencies was less than 1.6% for all three compared modes. Thus, the authors concluded that these results demonstrate the efficacy of the proposed vision-based approach to conduct modal surveys of full-scale infrastructure. Sophisticated algorithms such as those employed are able to go beyond complex situations and can handle video image processing from video cameras installed in UAVs for structural monitoring purposes.
Dong et al. [199] presented the field vision-based monitoring of a footbridge on a campus in the southeast of the US.A, made by vertical truss frames connected via splice connection in the middle and spanning an entire length of 39 m over a pond; the deck width is 4.17 m, and it serves light pedestrian traffic and small vehicles such as golf carts. A single video camera with resolution of 1920 × 1080 pixels and rate speed of 60 FPS was located near one of the abutments and employed to monitor the vertical vibration of the mid-span. Bolts in the truss system were adopted as a target in vision-based monitoring. An accelerometer was installed at mid-span for comparison purposes. The footbridge was excited under different types of human loading (walking, running, and jumping with different paces). The authors highlighted that the differences in the acceleration spectra between vision-based acquisition and contact accelerometer were not always negligible. However, serviceability assessment of the footbridge for the different loading cases provided the same outcomes using vision-based data or accelerometer recordings.

Steel Structures for Sport Stadiums
Khuc and Catbas [184,185,198] illustrated a campaign of field vision-based monitoring of the steel superstructures of a football stadium in the USA with approximately 45,000 seating capacity that exhibited considerable vibration levels, especially at the sections of the highly active local team supporters. The vision-based method and framework as implemented by the authors was verified under different experimental conditions including altering light conditions, different camera locations (distances and angles), and camera frame rates (30 and 60 FPS). Specifically, a beam under the grandstand was selected for monitoring predetermined measurement points. A displacement potentiometer and an accelerometer were installed for comparative purposes. The contact sensors and camera recorded the structural vibrations synchronously during periods of intense crowd motion throughout football games. The authors concluded that the results from vision-based measurements were consistent with those from contact measurements and the first three operational modal frequencies under a human jumping load were almost the same. In addition, the authors commented that, although quite accurate results for defined measurement ranges and conditions could be achieved through a completely non-contact vision-based implementation with low-cost hardware, some issues such as data storage requirement for clips and images, processing time for image data, and limitation for horizontal displacement measurement needed to be addressed in future developments.
Feng et al. [186] presented the field vision-based monitoring of the Hard Rock Stadium, home to the Miami Dolphins NFL team in Florida (USA). Specific attention was given to the monitoring of the cable forces during the construction phases of a new long-span, cable-supported canopy covering the entire seating bowl. An industrial video camera with a maximum resolution of 1280 × 1024 pixel and maximum rate of 150 FPS was used with manual focus optical lens, having a focal length in the range of 16 mm to 160 mm. Considering that tensioned cables in similar civil engineering infrastructure have a fundamental frequency typically under 10 Hz, the authors decided to adopt for video recording a sampling rate of 50 FPS (meaning any frequency beyond 25 Hz would be aliased according to the Nyquist criteria), which would make it possible to capture enough cable vibration components. The vibrations of the four tie down cables at each quad of the stadium were simultaneously measured using one single camera, while the vibration of the inclined cables was measured by one single camera placed remotely on the seating bowl. It was found that the measured cable forces using the vision-based method agreed with the results from load cell readings installed for validation purposes, with a maximum discrepancy of 5.6%. The authors noted that the noncontact measurement capacity of the vision sensor eliminated the need to access the cable to install sensors, an operation typically highly difficult and risky. Compared with the expensive and time-consuming method of using conventional accelerometers and associated data acquisition systems, it was concluded that the noncontact vision-based acquisition approach represented a convenient low-cost method for either periodic or long-term monitoring of cable-supported structures.

Reinforced Concrete Structures
Shariati and Schumacher [183] documented the field vision-based monitoring of the Streicker Bridge, a footbridge in the Princeton University campus (New Jersey, USA) with a straight main deck section supported by a steel truss system underneath and four curved ramps leading up to the straight sections. Structurally, the main span is a deck-stiffened arch and the legs are curved continuous girders supported by steel columns. The legs are horizontally curved and the shape of the main span follows this curvature. The arch and columns are weathering steel, while the main deck and legs are made of reinforced post-tensioned concrete. A consumer-grade camera with a zoom lens was used to acquire a 60 FPS video of one of the ramps while a number of volunteers jumped up and down on it. A target mounted on the edge of the bridge slab was used to track displacement time histories. Such a target was set up by a research team from Columbia University that also investigated the same footbridge with their own video-based monitoring system [202] a few years earlier. In addition, the Streicker Bridge was equipped with two fiber-optic sensing technologies, i.e., discrete long-gauge sensing, based on fiber Bragg-Gratings, and truly-distributed sensing, based on Brillouin optical time domain analysis; both sensors were embedded in concrete during the construction. The natural frequencies obtained by the authors in their tests were found to be the same as those measured by the fiber-optic measurement system and by the other vision-based method in [202]. In addition to the frequency contents, the two vision-based measurements gave comparable amplitude of displacements, showing the replicability of the obtained results.
Harvey and Elisha [190] aimed at demonstrating that existing cameras installed within buildings, such as surveillance cameras, might capable of extracting the structural response by tracking the interstory drifts. To this end, a full-scale five-story reinforced concrete building tested on the unidirectional large high performance outdoor shake table at the University of California San Diego (UCSD) was used as a case study. The test protocol consisted of six different earthquake ground motions applied to the building. The building was heavily instrumented with an array of analog sensors and cameras. The adopted method involved the extraction of vision-based dynamic displacement measurements from the recorded video footages and the estimation of the dynamic properties of the building to which the cameras were attached. The results showed that the footage captured by these cameras was adequate to identify the natural frequencies of the building vibration during free and forced (seismic) responses.
Lydon et al. [196] presented the field vision-based monitoring of the Governors Bridge, a three-span reinforced concrete beam-slab bridge that crosses the River Lagan to the south of Belfast City (North Ireland). The bridge has an overall length of 62.6 m and carries two lanes of west bound traffic from the Annandale embankment to the Stranmillis embankment. The field test was carried out under normal, non-rush hour, vehicular traffic loading on the bridge. Two low-cost wireless action cameras were used as visionbased monitoring; one was located 3.75 m from one of the deck beams to monitor its displacements (determined scale factor 0.0798 mm/pixel) and a second camera to identify the load above the deck. For comparative purposes, a fiber optic displacement gauge with a resolution of 0.03 mm was installed and data acquisition was carried out using a dynamic interrogator at a scanning rate of 25 Hz, synchronized with the video acquisition of the camera set at 25 FPS. The identified displacements based on the two systems showed excellent agreement. Therefore, it was concluded that the same accuracy in displacement measurement could be obtained from the vision sensor as compared with the fiber optic displacement gauge, even if low-cost cameras were adopted.

Masonry Structures
Fioriti et al. [191] presented monitoring of two cultural heritage constructions in Italy, i.e., the temple of Minerva Medica, a ruined nymphaeum of the ancient Imperial Rome, and Ponte delle Torri in Spoleto, an aqueduct and pedestrian bridge with multiple arches having a total length of 230 m and piers of height up to 80 m, completed in the Middle Ages and possibly built over Roman ruins. The Minerva Medica ruins are very close to a tramway producing strong vibrations whose effects were clearly evident in the video taken using a low-cost consumer grade camera at a distance of 9 m. Modal analysis by motion magnification of the field video recordings was performed and compared to the results obtained through conventional contact velocimeters; the differences were limited to just a few percentage points. Satisfactory results were also achieved for the Ponte delle Torri, despite the small level of structural excitation due to the wind action and the low resolution of the adopted video cameras. The authors commented that such results constituted a remarkable starting point for future experimentations and improvements. Indeed, monitoring the ambient vibration of a massive multiple-arch masonry structure under normal conditions through vision-based monitoring appears to be a major successful case study, considering the oppositions often found in installing contact sensors in cultural heritage.
Acikgoz et al. [192] illustrated the field vision-based monitoring of Marsh Lane viaduct, a masonry bridge with multiple arches on the Leeds-Selby route (UK) carrying two electrified train tracks. A commercial vision-based system was used to monitor the displacements of two consecutive arches of the viaduct and complemented a fiber optic system installed in the bridge. The objective was to estimate rigid body rotations of the monitored masonry arch segments. The vision-based system consisted of two video cameras and a system controller. The cameras recorded videos of the monitored structure at 50 FPS. Data processing consisted of tracking the sub-pixel position of natural brick texture in the image and scaling of pixel movements to metric movements with the use of a new registration technique proposed by the authors [192]. In order to understand the viaduct behavior, two different camera location configurations were investigated with two-dimensional DIC. In the first configuration, the cameras monitored planar movements of two arches in the vertical plane directly under the northern tracks, aligned with the bridge longitudinal axis. In the second configuration, the cameras monitored the movements in the vertical planes lying under the northern and southern tracks. In both configurations, the cameras were positioned centrally in line with the crown of the arches. This setup allowed capturing all the targets with a declared 0.08 mm resolution in each plane, using the natural brick texture in the image for motion tracking. As already mentioned in the introduction to this paragraph, the installed vision-based monitoring was part of an integrated monitoring system that included fiber optics with the objectives to complement the two systems and provide a comprehensive description of the structural response and damage mechanisms activated. The authors commented that the quasi-distributed nature of data allowed extensive measurements of time histories of displacements, rotations, and strains under the transit of trains. Such extensive measurements provided unique data that enabled new insight into understanding of the rigid body motions and damage mechanisms of the viaduct.
Dhanasekar et al. [195] presented field vision-based investigations on two masonry arch rail bridges in Australia. Digital images of speckled patches in three key regions (crown, support, and quarter point) of one-half of an arch were acquired from three independent cameras, each focusing on one of the patches from approximately 4 m under the passages of trains at night. Images were acquired using industrial monochrome cameras at 50 FPS. The time histories of the deflections and strains were measured. The wheel positions, train lengths, and speeds were ascertained using three lasers. The wheel position was identified to be the critical element for the deflection and strain in the arch. A threedimensional finite element model was implemented to compare the field strain magnitudes obtained in the vision-based monitoring to those from numerical simulations, obtaining a favorable agreement. The authors concluded that vision-based monitoring was a suitable method to measure deflection and strains on masonry arch rail bridges provided adequate care is taken to ensure the quality of images.

Timber Footbridge
Fradelos et al. [200] illustrated the field vision-based monitoring of the Kanellopoulos timber arch footbridge (Patras, Greece), 30 m long and 2.9 m wide, made of glulam wood and metallic elements. The omission of X-bracing below the deck and poor construction of the metal X-bracing at its roof made the footbridge prone to lateral oscillations. The bridge was monitored using satellite systems, robotic theodolites, and accelerometers. Videos were made during testing using common low-cost cameras without the initial intention for visionbased monitoring. Such video recordings were later examined and used to try to estimate the dynamic horizontal deflections of specific points of the footbridge. It was shown that the analysis of low-cost video images using a simple approximate technique permitted the reconstruction of the movements of the bridge and the computation of some of its structural characteristics. This result was possible under ideal conditions: the movement was two-dimensional, displacements of the selected target points were characterized by a signal exceeding the pixel resolution, the camera was in a fixed position and the video image covered stable points defining a reference system, and structural elements near the selected target points allowed to scale the photo in the two examined axes. As a result, the first lateral natural frequency of the footbridge obtained from video processing differed by less than 2% from that estimated using accelerometers and geodetic sensors.

Discussion
The overview of the vision-based field applications presented in this review article led to the following remarks involving four main aspects: camera installation, hardware, software, and hybrid contact-contactless solutions.
Regarding the camera installation, the possibility to place the video camera in a good vantage point, both stable and allowing views of the structural displacements with few perspective distortions, appears to be the most important aspect in the considered applications. If this is the case, good results can be achieved even with low-cost video cameras and simple video processing algorithms. This condition inevitably sets the inherent limits of video-based monitoring: only points that are clearly visible from the video camera can be monitored; major difficulties are expected in locating good vantage points in urban environments, for example, when monitoring tall buildings in crowded downtown areas.
Regarding the hardware, it is essential to choose the appropriate camera lens so that the obtained field of view is suitable for testing. In fact, the sensitivity is controlled by the scale factor (the ratio between the physical displacement and the pixels in the recorded image); a lower scale factor results in higher resolution of the measure and in lower noise. Accordingly, narrower fields of view (zooming in the lens) provide better resolution of the monitored structure, hence decreasing the scale factor. On the other hand, a wider field of view (zooming out the lens) provides less resolution of the monitored structure and reduces the quality of the displacement measures, even if more monitoring points can be identified and tracked with the same camera. Other ways to reduce the scale factor might be the use of higher camera resolutions, i.e., more pixels for the same displacement. However, the increment in the size of the digital image would be demanding in terms of video footage storage and post-processing; the latter point might compromise the possibility of real-time processing. Nevertheless, the achievable resolution in video-based monitoring is a quantity that, of course, does not make sense if not compared to the magnitude of the structural response. In the examined case studies, there were situations with limited resolution in absolute terms that, however, led to satisfactory results, as the monitored structure had important displacements when excited, e.g., lively footbridges under heavy pedestrian traffic, bridges under train passages, and masonry structures close to subways.
Another aspect involving the hardware discussed in the field applications considered in the presented overview is the image sampling rate. The maximum frame rate of most conventional video cameras is in the range of 30 to 60 frames per second; such speeds are indicated as sufficient for most civil engineering structures. Industrial video cameras are available with much higher speeds; however, such speeds do not find application for field monitoring in civil engineering, mostly owing to low frequency contents of structures and infrastructures, as well as the fact that, the higher the frame rate, the more difficult it is to achieve real-time processing.
Regarding the software, there are many possibilities in image processing given the number of algorithms available in the technical literature. Template matching and feature matching appear to be the most common approaches. However, recent motion magnification algorithms were tested for field applications and provided very interesting results, even with stiff and massive structures.
The final remark is made on the fact that some field applications used vision-based monitoring together with conventional contact sensors. In most cases, such combined use was for comparisons or validation purposes of the vision-based monitoring. However, some studies highlighted significant benefits in combining the results obtained from two such different technologies by means of appropriate data fusion methods. In this way, it is possible to successfully combine the benefits of each technology in a hybrid contactcontactless monitoring system.

Conclusions
A general review of the vision-based approach as a prominent methodology for contactless monitoring of civil engineering structures and infrastructures was provided. Specific attention was given to the overview of recent applications in field monitoring of the structural dynamic response of full-scale case studies. From the examined articles, the following main conclusions can be made: (1) vision-based monitoring might be able to provide results equivalent to those obtained with consolidated monitoring technologies such as the use of contact accelerometers and displacement transducers; (2) vision-based monitoring appears to be the most convenient solution for monitoring cable structures and, more in general, those structure and infrastructures with elements where the installation of contact sensors is demanding; (3) successful applications of vision-based monitoring depend on the combination of the adopted hardware-software system (video camera and lens, tripod, monitoring of camera movements, video processing algorithms for motion tracking, and motion magnification) and the influence of the environment (accessibility of favorable locations for installing the video camera, weather conditions, and their variability during video acquisition); (4) hybrid monitoring schemes combining contact sensors and contactless vision-based approaches appear to be very interesting solutions that benefit from the advantages of each of the two approaches, without the limitations inherent to the use of a single technology; and (5) the use of vision-based technologies for long-term or permanent monitoring is to date an unexplored field of application.

Acknowledgments:
The author acknowledges the constructive comments of the anonymous reviewers that helped improving this review article.