Computer Vision and Image Processing in Structural Health Monitoring: Overview of Recent Applications

: Structural deterioration is a primary long-term concern resulting from material wear and tear, events, solicitations, and disasters that can progressively compromise the integrity of a cement-based structure until it suddenly collapses, becoming a potential and latent danger to the public. For many years, manual visual inspection has been the only viable structural health monitoring (SHM) solution. Technological advances have led to the development of sensors and devices suitable for the early detection of changes in structures and materials using automated or semi-automated approaches. Recently, solutions based on computer vision, imaging, and video signal analysis have gained momentum in SHM due to increased processing and storage performance, the ability to easily monitor inaccessible areas (e.g., through drones and robots), and recent progress in artiﬁcial intelligence fueling automated recognition and classiﬁcation processes. This paper summarizes the most recent studies (2018–2022) that have proposed solutions for the SHM of infrastructures based on optical devices, computer vision, and image processing approaches. The preliminary analysis revealed an initial subdivision into two macro-categories: studies that implemented vision systems and studies that accessed image datasets. Each study was then analyzed in more detail to present a qualitative description related to the target structures, type of monitoring, instrumentation and data source, methodological approach, and main results, thus providing a more comprehensive overview of the recent applications in SHM and facilitating comparisons between the studies.


Introduction
Health monitoring is a critically important issue, not only for people, but also for civil infrastructures throughout their lifetime.Bridges, railways, tunnels, roads, and the various buildings distributed across the land play a crucial role in economic growth [1,2] and the public life of cities [3].Structural deterioration is one of the major long-term concerns regarding civil infrastructure, as it can become a potential and latent risk factor for human safety over time.Indeed, the wear and tear of materials due to weathering, continuous solicitations, or unforeseen events (such as natural disasters) can progressively compromise the integrity of a structure to the point of sudden collapse or loss of functionality, with severe consequences [4].Therefore, periodic integrity inspections of civil infrastructure are crucial in their ensuring safety and total efficiency.On-site and manual visual inspections were the conventional method to detect damage and structural alterations in civil infrastructure in the past.However, on-site visual inspections conducted by experienced examiners are often impractical, time-consuming, and laborious, especially in the case of large-scale structures, such as bridges and buildings [5,6].Moreover, in many cases, the assessment results largely depend on the inspectors' personal competence and subjective evaluation [7].
To overcome the difficulties and limitations of manual visual inspections, structural health monitoring (SHM) has received significant attention over the last two decades as a powerful emerging diagnostic tool for evaluating structural integrity due to advances in technology.The evidence of this growing interest was objectively demonstrated in reference [8].An in-depth analysis of articles dealing with SHM topics, published in three major journals during 2005-2019, showed a progressive and significant upsurge in the number of articles published, increasing from 22 publications in 2005 to 495 in 2019 (increase factor: 22.5), with a significant increase in the number of authors involved in this research as well (increase factor : 28).
The primary aim of SHM is to collect objective measurements and information related to specific structural properties to trigger timely maintenance interventions and prevent serious consequences.In practice, an SHM system includes sensors, data acquisition modules, along with algorithms for signal processing, structural diagnosis, and damage detection [9].In recent years, different types of sensors and techniques have been proposed for SHM.Traditional approaches involved the use of contact-based sensors applied directly to the structure under investigation: these include accelerometers, strain gauges, fiber-optic sensors, piezoelectric sensors, ultrasonic waves, displacement sensors, and others, as reported in reference [10].However, these wired and wireless sensors are often impractical for installation and maintenance on large-scale structures and typically provide only scattered measurements related to application points.
In the past few years, SHM approaches have shifted from conventional contact-based solutions to more efficient and practical non-contact sensors, partially due to recent advances in innovative technologies.In particular, optical sensors, drones, robots, and smartphones, combined with artificial intelligence (AI), such as deep learning models, machine learning methods, data mining techniques, and data-fusion/sensor-fusion approaches, are attracting growing interest, overcoming the practical limitations of contact-based sensors and fostering a paradigm shift in the context of SHM [11][12][13][14][15].
The in-depth analysis in [8] confirms the existing paradigm shift related to the change in supporting sensors.By comparing the top 20 topics covered by studies published in three five-year periods (2005-2009, 2010-2014, 2015-2019), it is evident, for example, that computer vision (CV), empowered by optical sensors, made its appearance only in the third period; while machine learning, which appeared in the top 20 in the second period, made a significant leap forward in the third period, ranking eighth.In contrast, the "damage type" topic has been addressed regularly since 2005.However, its ranking has recently increased significantly (from 14th to seventh place), probably due to the spread of automatic classification methods.The ranking related to 2015-2019 still does not mention deep learning (DL), likely because this is a methodology explored intensively in SHM only more recently, and it was ranked lower than other more established topics [16].
As previously pointed out, CV-based solutions are proving to be promising and powerful tools for SHM investigation in large-scale structures due to the recent developments in optical sensors (high-performance cameras), supporting devices (i.e., drones and robots), and enhanced image processing techniques that take advantage of machine and deep learning [17][18][19][20][21][22][23].These solutions have been shown to be easily implemented as an alternative to manual visual inspections to estimate the properties and integrity of the structures in several SHM areas, including the assessment of specific local (i.e., cracking, spalling, corrosion, and delamination) and global (i.e., vibration, deformation, displacement) structural conditions [24][25][26][27].
Moreover, CV-based solutions offer several advantages over conventional approaches, including non-contact and long-distance measurements, portability (when installed on vehicles), higher spatial information density, cost-effective installation, and automated assessment capability (when combined with artificial intelligence algorithms) crucial in long-term monitoring and the timely triggering of maintenance actions [16,26,28].However, some constraints need to be considered, especially in real-world applications, including the impact of environmental and weather conditions; reference markers on the structure; measurement accuracy; the real-time processing, storage, and transmission of large amounts of data (images and videos); calibration procedures; and visual optimization techniques, especially over large distances [29].
On this line of research, this paper reviews the most recent studies (2018-2022) in the literature that have proposed solutions regarding the structural health monitoring of infrastructure using optical devices (specifically, standard color cameras), computer vision, and image processing approaches.Two electronic reference databases (Scopus and the Web of Science) were explored through ad hoc queries to select published articles based on well-defined keywords.Subsequently, the automatically selected articles were manually screened to include only those that met the established eligibility criteria.This review aims to provide a comprehensive overview of vision-based solutions for SHM and to analyze them from the perspective of target structures, monitoring types, instrumentation and data sources, methodological approaches, testing scenarios, and main results.Detailed information was collected from the full text to facilitate comparison between studies and to provide general statistical information concerning the resulting categories.
In summary, the main contribution of this article the evaluation of recent work using optical sensors and image/video signals for SHM, focusing on solutions that can also be applied to real-world cement-based infrastructure.With the information gathered, it was possible to classify the selected studies into two main categories (studies implementing vision-based solutions and studies using pre-collected image datasets) and subgroups based on specific technological and methodological characteristics to highlight the most relevant application domains, trends, peculiarities, strengths, and weaknesses to render the studies and approaches as comparable as possible.
The paper is organized as follows: Section 2 describes the selection procedures, with a focus on databases, search and refinement strategies, and information collected; Section 3 presents the global results of the selection process and the lists of included articles, with detailed information; Section 4 summarizes the results and highlights the more critical issues related to CV-based approaches in SHM that require further investigation from the perspective of long-term and automated monitoring; and Section 5 provides a brief conclusion, with some final remarks and future prospects.

Sources of Information
As technological progress rapidly drives new solutions due to the availability of higher-performance devices and computational resources, facilitating the development of innovative methodological approaches, this study aims to explore the latest trends, solutions, and practical applications for SHM using vision systems, computer vision techniques, and image processing approaches as alternatives to traditional inspection methods.Therefore, the search strategy was defined by limiting the analysis to the most recent publications, focusing on 2018-2022.The two most comprehensive electronic databases, namely Scopus and the Web of Science, were considered because both allow ad hoc queries for automatic preliminary selection.
The overall screening procedure followed the guidance of the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA 2020) [30], and a standard flow diagram [31] was customized to illustrate the progressive steps in the process.All the authors were involved in the decision-making process to include or exclude studies automatically selected according to the predetermined eligibility criteria.In the case of decision discrepancy, the majority criterion was applied.

Search Strategy and Eligibility Criteria
The search strategy for automatic preliminary selection was based on ad hoc queries supported by the two databases.Queries were performed only on the "Title" and "Abstract" fields, as these fields commonly reflect the most relevant topics (i.e., keywords) of a scientific study to facilitate proper indexing of the scientific research.Specifically, the ad hoc queries included both mandatory (in the "Title" or "Abstract" fields) and optional (only in the "Abstract") keywords, using logical operators ("AND" and "OR") to concatenate the query elements correctly.
As can be seen, the optional keywords included terms that generally relate to visionbased approaches (i.e., computer vision, cameras, optical sensors, and RGB).In contrast, the mandatory keyword (i.e., structural health monitoring) was used to limit the application context.At this stage, an additional constraint was included regarding the publication date (2018 to 2022 only), but not for the type and language of the publications.These last filters were applied during the refinement process.

Selection Refinement and Final Inclusion of Papers
On the studies automatically extracted from the two databases, a manual a posteriori screening (refinement process) was performed through the following actions, in this order:

•
Removal of duplicates (i.e., studies available in both databases);

•
Removal of studies with a different focus (i.e., studies with optional keywords in the abstract, but whose content did not match the purpose of this review);

•
Removal of preliminary studies with a more recent extended version already published in a journal and selected from the databases.
At the end of the overall process, the included papers were analyzed in detail to extract relevant information according to the main objectives of our review.

Collected Information
For each article, relevant information providing insights inherent to the purpose of this review was manually extracted through an in-depth analysis of the full text.This information was collected and organized into a customized database to create tables, generate statistics and categorizations, and provide specific details about each study.Key information included:

•
First author and year of publication;

Study Selection
The automatic search of electronic databases selected 540 studies (219 from the Web of Science and 321 from Scopus, respectively).A preliminary analysis of the selected studies identified 209 duplicates, which were subsequently removed.Next, we screened the remaining 331 studies by abstract to verify that they were relevant to our review's purpose: after this evaluation, an additional 117 studies were manually removed.Notably, these studies were out of the scope of the review because they used other technologies (e.g., optical fibers, thermal cameras, micro cameras), methodologies (e.g., acoustic emission analysis, vibrational signals, piezoelectric signals), or were applied to specific components of the targets (e.g., cables, wind turbines, polymers).The remaining articles (214) were considered suitable for retrieval of detailed information from the full text: for 67 articles, the full text was not available (only the abstract was directly accessible), and thus were discarded.
Therefore, only 147 articles were further screened for eligibility according to the established criteria: of these, 3 were discarded because they were published in a non-English language; 21 because of the type of publication (e.g., review, book, data report); 40 because they had a different focus (e.g., traffic load estimation, load forces, performance comparisons between cameras) or did not provide results; and 10 because more comprehensive and extended work was already published.In the end, after the automatic selection process and manual screening, 73 articles were eligible and included in this review.The PRISMA flow diagram summarizes the automatic selection and manual screening process (Figure 1).
of Science and 321 from Scopus, respectively).A preliminary analysis ies identified 209 duplicates, which were subsequently removed.Ne remaining 331 studies by abstract to verify that they were relevant t pose: after this evaluation, an additional 117 studies were manually these studies were out of the scope of the review because they used (e.g., optical fibers, thermal cameras, micro cameras), methodologies sion analysis, vibrational signals, piezoelectric signals), or were appl ponents of the targets (e.g., cables, wind turbines, polymers).The rem were considered suitable for retrieval of detailed information from th ticles, the full text was not available (only the abstract was directly a were discarded. Therefore, only 147 articles were further screened for eligibility tablished criteria: of these, 3 were discarded because they were publish language; 21 because of the type of publication (e.g., review, book, data they had a different focus (e.g., traffic load estimation, load forces, pe sons between cameras) or did not provide results; and 10 because m and extended work was already published.In the end, after the auto cess and manual screening, 73 articles were eligible and included PRISMA flow diagram summarizes the automatic selection and manu (Figure 1).The general analysis of the articles selected and included in this review confirmed the use of computer vision approaches and image processing techniques for various SHM applications while revealing a preliminary breakdown into two macro-categories: studies that implemented vision systems (71%) and studies using image datasets available from other studies (29%).The following sections present and detail the selected studies according to this primary categorization, while Figure 2 shows their distribution per year.

2023, 4, FOR PEER REVIEW 6
The general analysis of the articles selected and included in this review confirmed the use of computer vision approaches and image processing techniques for various SHM applications while revealing a preliminary breakdown into two macro-categories: studies that implemented vision systems (71%) and studies using image datasets available from other studies (29%).The following sections present and detail the selected studies according to this primary categorization, while Figure 2 shows their distribution per year.

Studies Implementing Vision Systems (VSS)
This section describes studies that have implemented vision systems (VSS) using standard color cameras.Table 1 includes an overview of their main characteristics, focusing on target structures and types of SHM, study objectives, methods, camera features and placement, test scenarios, and main results.

Studies Implementing Vision Systems (VSS)
This section describes studies that have implemented vision systems (VSS) using standard color cameras.Table 1 includes an overview of their main characteristics, focusing on target structures and types of SHM, study objectives, methods, camera features and placement, test scenarios, and main results.

Overall Statistics
This section presents and discusses general statistical information regarding the studies in Table 1.The first analysis concerns the type of infrastructure addressed in the studies and the type of SHM implemented (Figures 3a and 3b, respectively).
The analysis revealed that VSS have been implemented on different types of infrastructure, particularly on more general targets (such as bridges and buildings) or other more specific examples (such as containment vessels or retention ponds).The item "structures (generic)" refers to solutions suitable for multiple infrastructures.Regarding the type of SHM, the analysis revealed four categories, even though they are pairwise related.For example, the "vibration estimation" category includes those studies whose primary objective is to estimate vibration frequencies (mainly of bridges) starting from a displacement mea-surement determined by the vision system.On the contrary, the "displacement estimation" category includes those studies that have as their primary objective the measurement of displacements (mainly of buildings) and only secondarily attempt to estimate vibration frequencies.The "damage detection" category includes studies that focus on general structural damage (including deformations, deflections, and corrosion).In contrast, the "cracks on surface " category includes studies addressing this type of damage exclusively.1) with respect to the type of infrastructure (a) and the type of structural health monitoring (b).
The analysis revealed that VSS have been implemented on different types of infrastructure, particularly on more general targets (such as bridges and buildings) or other more specific examples (such as containment vessels or retention ponds).The item "structures (generic)" refers to solutions suitable for multiple infrastructures.Regarding the type of SHM, the analysis revealed four categories, even though they are pairwise related.For example, the "vibration estimation" category includes those studies whose primary objective is to estimate vibration frequencies (mainly of bridges) starting from a displacement measurement determined by the vision system.On the contrary, the "displacement estimation" category includes those studies that have as their primary objective the measurement of displacements (mainly of buildings) and only secondarily attempt to estimate vibration frequencies.The "damage detection" category includes studies that focus on general structural damage (including deformations, deflections, and corrosion).In contrast, the "cracks on surface " category includes studies addressing this type of damage exclusively.
Another relevant item involving vision systems concerns the type of cameras and configurations used.Regarding configuration, 85% of studies implemented a single-camera solution, while the others adopted a multi-camera approach.Single-camera solutions are more easily manageable, transportable, and therefore feasible for SHM of large-scale structures.In addition, they do not require complex calibration procedures, as is the case  1) with respect to the type of infrastructure (a) and the type of structural health monitoring (b).
Another relevant item involving vision systems concerns the type of cameras and configurations used.Regarding configuration, 85% of studies implemented a single-camera solution, while the others adopted a multi-camera approach.Single-camera solutions are more easily manageable, transportable, and therefore feasible for SHM of large-scale structures.In addition, they do not require complex calibration procedures, as is the case with multiple cameras.Regarding the type of cameras, high-resolution and high-speed cameras are often used to ensure spatial and temporal image quality, especially for estimating displacements and vibrations and for taking accurate long-distance measurements.It is also interesting to note the increasing number of studies using smartphone cameras (10 studies).The analysis also revealed that cameras are commonly mounted on tripods to ensure stability.However, new emerging alternatives for mounting cameras, such as UAVs (6 studies), USVs (1 study), and robots (1 study), are becoming common in for reaching locations that would otherwise be difficult to access.
Finally, two other fundamental aspects of VSS concern measurement accuracy and application in real-world or, at least, outdoor scenarios.In the first case, it is essential to Signals 2023, 4 556 compare the measured quantities with those provided by a gold-standard system through a validation procedure.In the second case, it is crucial to verify the behavior of the implemented systems in real situations, where weather and lighting conditions can become critical issues for the system itself, affecting its overall performance.
The following sub-sections provide a detailed analysis of the studies shown in Table 1 regarding these two key points.

Validation with Gold-Standard
The in-depth analysis revealed that almost all VSS include a validation of the proposed solution/algorithm against traditional SHM techniques, based on one or more contact and non-contact sensors.
The analysis revealed that several sensors had been used for validation purposes, including laser displacement sensors [32,34,41,42,49,70,74,76,82], accelerometers [32][33][34]50,51,53,55,59,61,[64][65][66][67][68]76], IMU [60], linear variable differential transformers [34,36,41,50,54,61,66,67,73], laser Doppler vibrometers [38,46], stationary cameras [42,75,83], actuators [71], laser trackers [81], infrared sensors [48], and marker-based systems [44,73].Marker-based systems are routinely used in other fields (such as medicine), where they are the traditional reference systems for motion capture and analysis.However, for specific measurements, such as displacement estimation, marker-based systems can also be successfully employed in SHM to compare the performance and accuracy of the proposed algorithms, thanks to retro-reflective markers applied to target locations on infrastructure-scale models.All the studies adopted the mentioned sensors as the gold standard with which to compare the performance, accuracy, and measurements of the proposed vision-based solutions.Figure 4 shows the percentage breakdown regarding the most common sensors used for validation purposes, as revealed by the analysis of the studies in Table 1.It is important to note that some studies adopted more than one type of sensor to validate their proposed solution.In addition, there is a high percentage related to accelerometers because many of the studies aimed to estimate infrastructure vibration, generally from displacement measurements.

Experimental and Real-World Scenarios
Implementing vision systems, especially in the case of SHM, implies addressing typical problems, including environmental conditions (for example, lighting and weather events), long-distance measurements, calibration procedures, and stability issues.The indepth analysis revealed that many studies implemented experimental tests to verify the  1).

Experimental and Real-World Scenarios
Implementing vision systems, especially in the case of SHM, implies addressing typical problems, including environmental conditions (for example, lighting and weather events), long-distance measurements, calibration procedures, and stability issues.The in-depth analysis revealed that many studies implemented experimental tests to verify the effectiveness of the proposed solutions only in controlled experimental scenarios.
In many cases, the authors chose experimental indoor scenarios because these allowed the effects of environmental conditions to be limited and solutions to be tested only on scaled models.
In addition, some of the studies that implemented experimental tests in indoor or outdoor scenarios also ventured out into the field, verifying the behavior of the proposed solutions in real-world environments and on full-scale targets, that is, under more challenging conditions.This is the case for [32,34,38,40,41,45,55,60,66,69,71,75,81,82].
In contrast, the authors of [39,46,47,52,56,68,79,[83][84][85] tested the proposed solution only directly on real scenarios.The authors of [48] also mentioned an in-field experiment (i.e., estimation of building vibration using a video recorded from the roof of a building).However, they did not present any results regarding this experiment.
Finally, a few studies only verified the proposed solutions through numerical simulations [33,37,57].In [69,71], numerical simulations were used to validate part of the proposed framework before experimental and in-field tests.

Studies Using Image Databases (IDS)
This section describes studies that have implemented frameworks on image databases (IDS), generally available from other studies, mainly to detect and identify areas with structural damage.Indeed, many other studies in the state-of-art literature focus on the same topics that could fall into this category.However, it is essential to note that only studies selected from the initial queries on the electronic databases, based on the mandatory and optional keywords, were included in this section.Table 2 provides an overview of the main characteristics of these studies, focusing on target structures and types of SHM, study objectives, methods, database features, processing unit hardware, and main results.

Overall Statistics
This section reports and discusses general statistical information about the studies in Table 2.The first analysis concerns the type of infrastructure addressed in the study and the type of SHM implemented (Figures 5a and 5b, respectively).

Overall Statistics
This section reports and discusses general statistical information about the studies in Table 2.The first analysis concerns the type of infrastructure addressed in the study and the type of SHM implemented (Figure 5a and Figure 5b, respectively).The analysis revealed that frameworks using image datasets were implemented on different types of infrastructure, explicitly addressing more general targets (such as bridges, buildings, and concrete surfaces), as well as specific examples (such as underwater structures or tunnels).Again, the item "structures (generic)" refers to solutions suitable for multiple types of infrastructures, while the item "walls" could be equated with concrete surfaces.
Regarding the type of SHM, the analysis revealed more categories than the four that emerged for VSS.The two main ones ("cracks detection" and "damage detection") are more generalized and typically concern the presence or absence of structural damage.The other categories, with much lower percentages, concern the detection of specific deteriorations and refer to a much finer classification of the type of damage.Obviously, this fineness is highly dependent on the availability of images supporting this type of investigation.
Additional relevant information about IDS concerns the hardware of the processing unit.The need for higher or lower performance hardware is closely related to the type of framework implemented.Deep learning-based solutions demand significant computational resources, particularly GPU, CPU, and RAM, to handle and process the images.Most studies have used very high-performance graphics cards (in particular, NVIDIA GeForce) and at least 8 GB of RAM (up to a maximum of 64 GB).In contrast, solutions using machine learning and computer vision are much less demanding in terms of computational resources, so much so that in most cases, the hardware configuration was not specified (3 out of 5 studies).The analysis also found that accuracy is the primary metric used to evaluate and compare performance among classification solutions (14 studies).More sporadically, some studies compared performance using MIoU (1 study), F1-score (3 studies), mean square error (1 study), segmentation indices (1 study), and AUC (1 study), whereas 3 studies did not expose a precise metric.Finally, although execution time is one of the crucial aspects of the classification process, only 5 studies have reported results in this regard.
As just pointed out, accuracy is one of the most widely used metrics in automated classification (i.e., detection and recognition of specific damage conditions) to evaluate the performance of networks and predictive models.However, accuracy highly depends on the number of classes to be recognized.The following sub-section provides a detailed analysis of the studies shown in Table 2 in regards to this aspect.

Type of Classification
Machine learning and deep learning approaches allow classification problems with different complexity to be addressed.In the field of SHM, this can lead to an architecture capable of detecting the presence or absence of a specific condition (i.e., binary classification), such as distinguishing structurally damaged areas from intact sections.Alternatively, they can be used for more specific classification purposes based on recognizing different categories of structural deficits (multi-class classification), identifying, for example, the type of damage, type of materials, shape of cracks, and other specific features.Table 3 provides a summary of the main types of classification addressed by studies using image datasets.and, consequently, the accuracy of defect detection.For example, the authors in [103] proposed a de-noising filter to retain the edges and details of concrete surface images, which allows for improvement of the image quality and detection of the finest cracks.The proposed algorithm was compared similar approaches used in five different studies in the literature.The results showed a reduction in mean square error (MSE) between 36.3% and 73.6%, an increase in peak-to-noise ratio (PSNR) between 7.6% and 54.2%, and an increase in mean structural similarity (MSSIM) between 2.4% and 45.1%.A combined strategy based on high boost filtering and enhanced thresholding was proposed in [108], which outperformed other thresholding techniques in terms of Dice and Jaccard's [109] average indices (+0.1103 and +0.2078, respectively), demonstrating the best similarity between processed and original images, and consequently, better image quality that could be useful and necessary to increase the accuracy in finer damage detection.

Discussion
This paper provides an overview of recent applications (2018-2022) in structural health monitoring using optical devices (in particular, standard color cameras), computer vision, and image and video signals.To this end, we queried two electronic reference databases (Scopus and the Web of Science) using the built-in functions to select studies with specific mandatory and optional keywords in their title and abstract.The automatically selected items were then manually screened against pre-established eligibility criteria, ultimately including 73 articles, which we subsequently analyzed in detail, in this review.
Two interesting aspects emerge from the first, more general analysis of the selected studies.The first one is the preliminary breakdown of the studies into two macro-categories: those implementing vision systems (71%) and those using image databases (29%), in most cases, made available by other studies.While the first group was fairly expected, having used terms commonly used in the field of vision-based systems in the optional keywords, the second category, however significant, was somewhat unexpected, considering no related terms were used in the queries.This fact is explainable if we consider that these studies deal with one of the components of a vision-based system, namely image and video processing, regardless of the acquisition phase and data source.However, this group was undoubtedly underestimated here compared with what is available in the literature, and it has been analyzed in recent dedicated reviews [16,114,115].Nevertheless, this review also considered these studies as resulting from the selection process.
The second aspect is the sharply increasing trend (+61.5%) in the number of SHM applications over a relatively short observation period (only the last five years).This result confirms findings from papers over extended periods (fifteen years) that pointed out the rapid escalation of vision-based solutions and automatic learning approaches [8].
The in-depth analysis showed that VSS studies mainly focus on bridges (35.7%) and buildings (25.0%).Moreover, some solutions address specific targets (e.g., towers, retention basins).On the contrary, 19.6% of the studies report a more general target (category "structure (generic)").Regarding IDS, the main target infrastructures are the same (bridges, buildings, and general structures).However, the highest percentage (25%) concerns concrete surfaces, probably because several databases containing thousands of images acquired for this purpose are available.In addition, as with vision-based systems, some studies address specific infrastructures (such as tunnels and underwater structures).
Regarding the type of SHM, VSS are distributed into four clusters, with a predominance of studies focused on estimating displacements and vibrations (77.4%) rather than on detecting structural damage and cracks on the surface (22.6%).In contrast, IDS studies clearly dominate in structural damage and crack detection (50%).However, 47.5% of the remainder are focused on detecting and recognizing specific damage conditions; therefore, they are always attributable to the two predominant clusters.Unlike VSS, only 2.5% of the studies focused on vibration estimation.
Going into more detail, the full-text analysis of VSS showed that 85% of the studies implemented a single-camera system, thus overcoming the problems related to the calibra-tion of multiple cameras; 19% of the studies used smartphone cameras, trying to switch to more widespread and low-cost sensors; and about 15% of the studies took advantage from emerging technologies (UAVs, USVs, and robots) for support for reaching remote locations.In addition, 63% of the studies verified the accuracy of measurements through a validation procedure with other systems/sensors commonly used as gold standards in SHM applications.The main limitations of VSS, which emerged during the analysis, concern the following: (i) improving robustness and accuracy through optimization algorithms [54,63,71,73]; (ii) the lack of real-world scenarios (only 46% of the studies proposed in-field tests on real-world structures); (iii) the use of specific targets [67, 81,82]; (iv) constraints on image and video sizes [60,64,76,77,83]; and (v) the effects of environmental and weather conditions that can affect performance (44% of the studies proposed only indoor and scaled-target experimental scenarios, with controlled lightning and environmental conditions).This last point is fundamental when using technologies and support aids such as UAVs and tripods.For example, wind can alter the camera stability (and consequently, the performance) if not suitably compensated for.At the same time, poor lighting (due to rain, fog, or evening hours) can reduce visibility and the accuracy of measurements [69,75].
The analysis performed on IDS showed that only three studies implemented a framework based on supervised classifiers (machine learning approach) and two studies based on computer vision.In contrast, 76% of the studies investigated deep learning approaches.Almost 43% of the studies addressed only binary classification, thus determining whether an area is damaged or undamaged.Only 28.5% of the studies also addressed multi-class classification purposes, mainly related to identifying the type of damage, type of materials, and structural components.Finally, three studies investigated the potential of transfer learning, i.e., using a pre-trained model for a different task to overcome issues related to the time to train a network/model from scratch.This approach was used in [107,110,113], which use pre-trained neural networks on images from different sources, targets, materials, and infrastructures.The main limitations of IDS concern the following: (i) in general, the results are presented only on images from the original datasets (through training, validation, and testing phases), without testing trained models on different images, except in [110,111,113]; (ii) the use of training datasets of limited size, as in [95,103,106]; (iii) a lower agreement between labelers of damaged areas, as in [104]; (iv) the model performance was dependent on image resolution and quality [101,107,111]; (v) the need for high-performance hardware (especially for deep learning networks) to manage training phases and optimize execution time on larger image databases; and (vi) high accuracy in binary classification, but lower performance in multi-class classification.
Despite the problems and challenges remaining, this review has highlighted the potential of computer vision and image processing in the context of SHM.There is undoubtedly ample room for improvement to overcome the major weaknesses highlighted by the analysis of selected VSS and IDS studies.Nevertheless, thanks to constant technological progress, the availability of increasingly high-performance resources, and the growing interest in the creation of large image databases (also exploiting techniques such as transfer learning and data augmentation), these methodologies will respond more and more comprehensively to the specific requirements of structural health monitoring, as is already happening in the field of human health [116][117][118][119][120][121].

Conclusions
Structural health monitoring has received significant attention in recent decades because of the importance of keeping infrastructure in good condition and in full working order.Non-contact solutions are gradually finding application in this area because they are easier to manage and more practical than contact-based approaches.Among these, solutions based on cameras, computer vision, imaging, and video processing are proving particularly effective because they provide the ability to carry out visual inspections with more continuity, even in hard-to-reach areas, thanks in part to the support of new technologies (drones and robots) that can operate remotely.A further impetus comes from the potential offered by automatic classification algorithms (machine and deep learning approaches) that, thanks to increasingly high-performance hardware, make it possible to detect and recognize the type and level of structural damage.This trend is expected to increase in the coming years, also benefiting from the data fusion of multiple sensors for the increasingly timely and comprehensive detection of structural damage and the activation of appropriate maintenance actions, thus ensuring the safety and efficiency of civil infrastructure.
However, several challenges still need to be addressed to achieve the maximum benefits of these approaches, especially in real-world scenarios [26].For example, when using optical approaches, environmental conditions (light changes, weather events, obstructive elements) can interfere with structural monitoring.Another factor concerns the effect of vibrations caused by the ground or wind, which can make the image captured by a fixed camera or drone blurry, thus altering the performance, especially over long distances.Another major challenge relates to the amount of data to be managed, stored, and/or transmitted; for example, in long-term monitoring, images and videos, particularly when high resolution is necessary, are large and require special attention for their use, especially in automated and remote damage assessment solutions.In addition, machine learning and deep learning approaches, which can support automatic damage recognition and localization, require huge datasets of images with well-labeled damage to be effective and accurate.Nevertheless, the availability of larger datasets could be addressed by merging smaller datasets, using transfer learning, or through data augmentation approaches.All these factors may affect the applicability of optical approaches in SHM or introduce uncertainties in measurements, so they need to be considered during data acquisition and processing.However, these are well-known problems in computer vision, and ad hoc solutions can be designed to solve and overcome these limitations, as has been done in other fields, benefiting from the use of these approaches for SHM as well.

Figure 1 .
Figure 1.PRISMA flow diagram related to the overall screening procedure.

Figure 1 .
Figure 1.PRISMA flow diagram related to the overall screening procedure.

Figure 2 .
Figure 2. Breakdown of selected studies by year and category.The symbol # indicates the number of studies.

Figure 2 .
Figure 2. Breakdown of selected studies by year and category.The symbol # indicates the number of studies.

Figure 3 .
Figure 3. Percentage distribution of VSS (Table1) with respect to the type of infrastructure (a) and the type of structural health monitoring (b).

Figure 3 .
Figure 3. Percentage distribution of VSS (Table1) with respect to the type of infrastructure (a) and the type of structural health monitoring (b).

Figure 4 .
Figure 4. Percentage distribution of sensors used for validation purposes, emerging from VSS studies (Table1).

Figure 5 .
Figure 5. Percentage distribution of studies (Table 2) with respect to type of infrastructure (a) and type of structural health monitoring (b).

Figure 5 .
Figure 5. Percentage distribution of studies (Table 2) with respect to type of infrastructure (a) and type of structural health monitoring (b).

Table 2 .
Studies included in the "image database" category: main features.