Underexposed Vision-Based Sensors’ Image Enhancement for Feature Identification in Close-Range Photogrammetry and Structural Health Monitoring

This paper describes an alternative structural health monitoring (SHM) framework for low-light settings or dark environments using underexposed images from vision-based sensors based on the practical implementation of image enhancement algorithms. The proposed framework was validated by two experimental works monitored by two vision systems under ambient lights without assistance from additional lightings. The first experiment monitored six artificial templates attached to a sliding bar that was displaced by a standard one-inch steel block. The effect of image enhancement in the feature identification and bundle adjustment integrated into the close-range photogrammetry were evaluated. The second validation was from a seismic shake table test of a full-scale three-story building tested at E-Defense in Japan. Overall, this study demonstrated the efficiency and robustness of the proposed image enhancement framework in (i) modifying the original image characteristics so the feature identification algorithm is capable of accurately detecting, locating and registering the existing features on the object; (ii) integrating the identified features into the automatic bundle adjustment in the close-range photogrammetry process; and (iii) assessing the measurement of identified features in static and dynamic SHM, and in structural system identification, with high accuracy.


Introduction
In recent years, vision-based sensors have been significantly developed for structural health monitoring (SHM) of engineering structures, and depend strongly on the acquisition of high-quality images or videos [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]. However, monitored images or videos rarely meet the computer vision (CV) requirements to be processed further when the SHM is conducted during the night and in hazy atmospheres, or under merely dark settings due to the camera design trade-off. Collected data from these environments are lacking visible details and result in underexposed and low-contrast images or videos that are not only dim for human vision, but also challenging to be interpreted. They may not capture important image characteristics such as sharpness, contrast, or dynamic range, leading to difficulties in analysis using image segmentation, structure from motion, pattern recognition, detection and matching, or other CV algorithms. Without adequate lighting, more hardware or tools should be incorporated into the SHM system. Alternatively, further image processing should be conducted before employing these algorithms to enable feature identification, track structural movement, or identify structural vibration characteristics.
Only limited works are solely dedicated and reported for vision-based SHM in a dark or night environment using real images. Li et al. [18] conducted a dynamic test using a smartphone and Kim et al. [19] installed a vision-based monitoring system equipped with a digital camera with a zoom lens on a three-span cable-stayed bridge. However, these two studies were conducted in low light and completely dark settings without additional lighting, so the SHM was unable to identify the monitored object [19] and a significant quantity of time-signals were missing in the data [18]. To solve these problems, a small number of studies that added additional components to the vision-based system have been reported. An SHM using a smartphone camera with a laser device was reported by Li et al. [20]. Choi et al. [21] proposed a night vision camera equipped with an IR pass filter to remove the red-eye effect in the infrared region. Digital cameras with LED lights as targets were used for night monitoring as evaluated by Feng et al. [22]. In terms of accuracy, these studies showed good precision and promising results; however, they only validated their works using low-amplitude testing.
Post-processing underexposed images using image enhancement algorithms is also a solution as it improves the image quality. Histogram equalization [23,24] was used to enhance image gray resolution for crack detection [25] and crack monitoring from thermal imaging [26]. Wavelet transforms [27,28] were used to correct vision-based images for damage and crack detection [29,30] and fatigue crack detection [31]. Contrast enhancement was conducted on vision-based images to separate the crack and the background area [32], and the advanced deep learning method was capable of autonomously detecting concrete cracking, steel corrosion, and delamination [33]. A recent study by Zollini et al. [34] deployed UAV monitoring and applied a contrast enhancement technique on imaging photogrammetry to enable monitoring on the deteriorated concrete area. Image enhancement is also commonly integrated into other remote sensing fields such as satellite imagery [35] and aerial system imagery [36]. However, at the present time, almost no relevant works can be referred to in this study that are related to vision-based system image enhancement with specific implementation for vibration SHM purposes.
Although prior studies successfully conducted SHM under low-light settings and night environments with good accuracy and by integrating image enhancement methods, several research gaps can still be identified. First, an alternative SHM framework can be proposed to improve the real vision-based SHM data under the complexity of a dark environment without assistance from additional equipment or hardware. Second, a specific study of vibration-based SHM in a dark environment should be conducted because available studies were only proposed for the damage-detection SHM. Third, more experiments are necessary to identify SHM accuracy, ranging from a very small displacement to higher amplitudes, as the prior works only validated their SHM framework under a very low amplitude of dynamic excitation. To fill these gaps, this study proposes the integration of image enhancement algorithms for low-light settings and dark environments. The objective of this study is to modify the underexposed and low-contrast image characteristics to improve their quality before implementation into automatic processing of bundle adjustment in close-range photogrammetry. The goal is to assess the accuracy of the enhanced images in measuring displacement and in identifying structural dynamic properties through system identification.

Methods
Remotely operated vision systems equipped with cameras, sensors, lighting, and natural or artificial features form an image through a process. This process starts from a light source with an intensity, polarization, and color spectrum that travels through a medium, then hits and is scattered on the surface of an object. The reflected rays on the object surface are captured by a camera sensor, and are then converted into electrons to form a two-dimensional pixel intensity map, i.e., an image [37]. Therefore, lighting directly affects the pixel intensity map and significantly simplifies the applied classical to advanced matching algorithms procedure, if the object is illuminated adequately [18,[38][39][40][41][42].

Feature Detection Problems in Low-Light Setting and Dark Environment
Examples of SHM images captured by two types of cameras in the laboratory environments are shown in Figure 2. The monitoring of these structures completely relied on the ambient light. Without extra lighting, it is difficult to stop fast action or to maximize the depth of field, and these factors impact the brightness of the captured images, resulting in the underexposed images shown in Figure 2a,c. When using commercial DSLR cameras, a higher ISO should be set to compensate for the dim light. Without proper lighting, the DSLR system will capture a low-contrast image, as shown in Figure 2e, especially when a fast shutter speed is required in high-speed testing.
For a vision-based sensor with a tracking system based on a specifically designed artificial feature or template [67] as shown in Figure 2, separating the black background from the white template rings is the fundamental step before applying a feature detection algorithm. The background is defined as the template region with the lowest gray level intensity (black). The object is identified as the white circle feature that is separated from the background in an area of the whole template Point Spread Function (PSF) size with a higher density. Then, the template is detected based on the principles of the scale-space theory [68,69] such that the center of the circle is identified based on second-order partial derivatives of the Laplacian of Gaussian (LoG). When the template is illuminated sufficiently and the vision system exposure is set appropriately, the circle center and template can be registered and identified automatically as shown in Figure 2g. It is clearly shown in Figure 2a,c that no features on the templates can be identified as there is no distinction between the background and the object. Even though the structure is visible as a higher ISO is set on the DSLR cameras as shown in Figure 2e, the low-level of the dynamic range

Feature Detection Problems in Low-Light Setting and Dark Environment
Examples of SHM images captured by two types of cameras in the laboratory environments are shown in Figure 2. The monitoring of these structures completely relied on the ambient light. Without extra lighting, it is difficult to stop fast action or to maximize the depth of field, and these factors impact the brightness of the captured images, resulting in the underexposed images shown in Figure 2a,c. When using commercial DSLR cameras, a higher ISO should be set to compensate for the dim light. Without proper lighting, the DSLR system will capture a low-contrast image, as shown in Figure 2e, especially when a fast shutter speed is required in high-speed testing.
For a vision-based sensor with a tracking system based on a specifically designed artificial feature or template [67] as shown in Figure 2, separating the black background from the white template rings is the fundamental step before applying a feature detection algorithm. The background is defined as the template region with the lowest gray level intensity (black). The object is identified as the white circle feature that is separated from the background in an area of the whole template Point Spread Function (PSF) size with a higher density. Then, the template is detected based on the principles of the scalespace theory [68,69] such that the center of the circle is identified based on second-order partial derivatives of the Laplacian of Gaussian (LoG). When the template is illuminated sufficiently and the vision system exposure is set appropriately, the circle center and template can be registered and identified automatically as shown in Figure 2g. It is clearly shown in Figure 2a,c that no features on the templates can be identified as there is no distinction between the background and the object. Even though the structure is visible as a higher ISO is set on the DSLR cameras as shown in Figure 2e, the low-level of the dynamic range due to the low-contrast image only detects a few templates and falsely identifies a few backgrounds as the object. Therefore, completely relying on the ambient light without any additional lighting will lose image details and makes it challenging for CV algorithms to automatically extract their important features.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 20 due to the low-contrast image only detects a few templates and falsely identifies a few backgrounds as the object. Therefore, completely relying on the ambient light without any additional lighting will lose image details and makes it challenging for CV algorithms to automatically extract their important features.

Figure 2.
Underexposed and low-contrast images with their associated enhanced versions and identified gray level intensity (green): (a-d) images from high-speed cameras, (e-f) images from digital cameras, (g) normal image from a digital camera with a zoomed view of detected templates and gray level intensity.

Image Enhancement Algorithms
Based on the modified area, image enhancement can be categorized as the local and global enhancement methods; more about the application of these methods on grayscale images can be found in Pathak et al. [70]. This study focuses on improving the image characteristics using the global instead of the local method, with the explanation as follows. Vision-based SHM has the capability of measuring multiple locations at the same time by tracking the movement of the artificial templates. In monitoring large-scale structures, these templates are distributed in the entire structure component as shown in Figure  2f. This means that these templates should be correctly identified in the image after the enhancement process. Local operations will be less efficient for this purpose as processing multiple targets is more time-consuming. These operations also result in noise and other types of spatial artifacts that will affect the background separation and feature detection, which requires clarity of the processed images.
The mathematical fundamental of global image enhancement is to find the mapping function, ℱ, to improve the quality of input image, , , to the optimum output image, , as shown in Equation (1): Underexposed and low-contrast images with their associated enhanced versions and identified gray level intensity (green): (a-d) images from high-speed cameras, (e-f) images from digital cameras, (g) normal image from a digital camera with a zoomed view of detected templates and gray level intensity.

Image Enhancement Algorithms
Based on the modified area, image enhancement can be categorized as the local and global enhancement methods; more about the application of these methods on grayscale images can be found in Pathak et al. [70]. This study focuses on improving the image characteristics using the global instead of the local method, with the explanation as follows. Vision-based SHM has the capability of measuring multiple locations at the same time by tracking the movement of the artificial templates. In monitoring large-scale structures, these templates are distributed in the entire structure component as shown in Figure 2f. This means that these templates should be correctly identified in the image after the enhancement process. Local operations will be less efficient for this purpose as processing multiple targets is more time-consuming. These operations also result in noise and other types of spatial artifacts that will affect the background separation and feature detection, which requires clarity of the processed images.
The mathematical fundamental of global image enhancement is to find the mapping function, F , to improve the quality of input image, I(x, y), to the optimum output image, O(x, y) as shown in Equation (1): Appl. Sci. 2021, 11, 11086 5 of 20 In this study, five global image enhancement algorithms as shown in Figure 1 are implemented to improve vision-based image quality. The algorithms are contrast stretching (CS) [71], contrast limited adaptive histogram equalization (CLAHE) [72], histogram equalization (HE), haze removal with an inverted operation (HRIO) [73], and with single dark channel prior (HRDC) [74]. To visualize how each method improves image characteristics, the examples of an underexposed input and enhanced output images are given in Figure 3 with their associated gray level histogram. The processed image size is of width, X = 2560 pixels, and height, Y = 2048 pixels. The histogram bins for monochrome images with bit n = 8 are defined as 2 n = 256 ranging from the darkest gray value of zero to the brightest value of N = 2 n − 1 =255.  He et al. [74] proposed a modification of Equation (4) that is based on statistics of haze-free images. The concept is defined as haze removal using dark channel prior (HRDC) and is expressed by Equation (5) below. The transmission of is restricted by the lower bound so a small amount of haze is preserved in the dense haze region.
Previous studies conducted by Dong et al. [73] discovered that low-lighting video or image enhancement has similarities with dehazing or haze removal operation. Equation (4) is modified by Dong et al. [73] following the haze removal procedure that is started by inverting the low-lighting image as . The global atmospheric light is selected from the highest intensity pixel from the input image . A multiplier is introduced to adjust because the brightness of the object is still low when is being applied directly to the low-light image. The multiplier is set following the assigned value to avoid over-or under-enhancement of the input image. This procedure is expressed in Equation (6) and is defined as haze removal with an inverted operation (HRIO) in this study. (6) A clear difference between the dehazing algorithms expressed in Equations (5) and (6) can be observed in Figures 3d,e. The output image from the HRDC algorithm is almost similar to that of CLAHE, resulting in a more natural image without oversaturated colors. Furthermore, the improvement of white level intensity is more visible in the HRIO The gray distribution of the input image in Figure 3a shows a very low gray level intensity with several localized peaks near the top corner of the image from the light background. The contrast stretching (CS) algorithm in Figure 3a linearly scales these underexposed image pixel values between specified upper lim up and lower limit lim low . The mathematical relationship of CS operation is given in Equation (2): The example in Figure 3a is the output of the CS algorithm that defines the lim low =0.01 pixel and lim up = 0.99 pixel for 255 gray level intensity. This block finds these pixels and saturates the values above and below this limit. Of all the proposed methods, histogram equalization (HE) is the most commonly selected algorithm to improve monochrome images. The HE operation on a dark image can be expressed in Equation (3) as follows: O(x, y) = {F (I(x, y))|∀I(x, y) ∈ I } The transform function F in Equation (3) is based on the cumulative density function (CDF) that maps the input image I(x, y) to the entire dynamic range (I 0, I N ). The enhanced image using the HE method in Figure 3c shows that this method redistributes the probabil-ity of occurrence of the input gray level to make it uniform in the output image using the entire range of intensity level N. The modification of the HE method that also supports its potential for image enhancement is the contrast limited adaptive histogram equalization (CLAHE). The method limits the contrast amplification by histogram clipping at a specified value before computing the CDF. Therefore, the resulting output image from this method as given in Figure 3b is not brightened excessively because the peaks that are present in the input image are still clearly visible in the output image.
Images captured in a hazy environment have high-intensity pixels in the background for each channel, either in monochrome or RGB images, whereas the object is mainly disturbed by shadows, streaks, etc., causing it to have low intensity. The goal of haze removal in CV is given in Equation (4), in which I(i) is the image intensity, J(i) is the scene radiance, A is the global atmospheric light, and t(i) is the light portion that is not dispersed and reaches the sensor [74]. The direct attenuation of J(i)t(i) decays in the air as a multiplicative distortion of the scene radiance, whereas the air light term from A(1 − t(i)) is the additive of the scene radiance that shifts the image colors.
He et al. [74] proposed a modification of Equation (4) that is based on statistics of haze-free images. The concept is defined as haze removal using dark channel prior (HRDC) and is expressed by Equation (5) below. The transmission of t(i) is restricted by the lower bound t 0 so a small amount of haze is preserved in the dense haze region.
Previous studies conducted by Dong et al. [73] discovered that low-lighting video or image enhancement has similarities with dehazing or haze removal operation. Equation (4) is modified by Dong et al. [73] following the haze removal procedure that is started by inverting the low-lighting image as R(i). The global atmospheric light A is selected from the highest intensity pixel from the input image I(i). A multiplier P(x) is introduced to adjust t(i) because the brightness of the object is still low when t(i) is being applied directly to the low-light image. The multiplier P(x) is set following the assigned t(i) value to avoid over-or under-enhancement of the input image. This procedure is expressed in Equation (6) and is defined as haze removal with an inverted operation (HRIO) in this study.
A clear difference between the dehazing algorithms expressed in Equations (5) and (6) can be observed in Figure 3d,e. The output image from the HRDC algorithm is almost similar to that of CLAHE, resulting in a more natural image without oversaturated colors. Furthermore, the improvement of white level intensity is more visible in the HRIO method such that the separation between the background and the object is more obvious.

Image Quality Assessment
When a field deployment of the vision-based system is conducted in a low-light or dark environment, no input image can be used as a reference image, i.e., an image that is captured under normal lighting conditions, so it is assumed to have good visual quality. Therefore, the assessment of output image quality from enhancement operations in this study is conducted based on the no-reference quality metrics, namely, the blindreferenceless image spatial quality evaluator (BR) [75], naturalness image quality evaluator (NQ) [76], and perception-based image quality evaluator (PQ) [77]. Essentially, BR, PQ, and NR metrics use similar NSS features but BR and PQ metrics use trained features based on natural and distorted images, in addition to human interpretation. Therefore, BR and PQ scores are restricted to the assigned types of distortion, whereas NS is more independent in predicting the image quality.
No reference quality metrics as described previously are used to estimate the quality of the output image from enhancement procedures. Meanwhile, the classical quality metrics, i.e., image entropy (E), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSI M) are still used in this study to measure how each of these indexes changes following the enhancement process. The difference in image characteristics before and after implementing the enhancement algorithms can be estimated using these metrics.

Automated Identification of Object Features and Significance in Close-Range Photogrammetry and SHM Procedures
In automated close-range photogrammetry, the object detection is tested as a homogenous white area based on the predefined search window. Template matching based on normalized cross-correlation coefficient (NCC) computes all possible radii of the center of the white area in two directions within the search window. Overall, the adapted photogrammetry procedure in this study is computed automatically within all photogrammetry images by the self-calibrating bundle adjustment. When the photogrammetry is completed without error, the SHM is conducted, and the recorded videos or images are processed to generate the data. The sub-pixel registration of the pattern or template matching method [78] based on NCC is also used to track the object locations within the image sequences. Finally, using the relationship between two cameras (as a full-projection matrix) and the change in object location in each image (from the template matching method) as outlined in Figure 1, images are translated into time-domain response signals, i.e., displacement, velocity, or acceleration. The SHM accuracy is computed based on the difference between the vision-based measurement and reference values as the absolute or relative error based on the experiments.

Experimental Setup
The proposed image enhancement framework was experimentally evaluated using a one-inch steel block test in the Earthquake Engineering Laboratory at the University of Nevada, Reno. For the largest field of view, the vision-based system monitored the test approximately at a 5 m distance and was set on the top of a shake table as shown in Figure 4. The deployed vision systems consisted of two digital cameras with specifications listed in Table 1. Two high-speed (HS) cameras that required a host computer were triggered from the control room, while the second set consisted of two DSLR cameras that were operated manually (standalone DSLR, SD). A total of 28 templates were glued to the specimen as shown in Figure 4, with a radius of the white circle of 21 mm. They were not illuminated by extra lights so the monitoring completely relied on the ambient lighting. The HS camera exposures were also set such that the captured image was completely dark and underexposed as shown previously in Figure 2a. They were set as f /14 and 1/3940 for the f -stop number and shutter speed settings, respectively. Regarding the SD cameras, the general setting for the ambient light environment was selected as given in Table 1 with ISO 400 f -stop number of f /14, and shutter speed of 1/50, resulting in a normal image, as shown previously in the example in Figure 2g.   The main component of the validation test model shown in Figure 4 is a sliding bar attached to a concrete column-capital-slab specimen. The sliding bar consists of a Novotechnik displacement sensor, an aluminum plate, and six circular templates. Other templates shown in Figure 4 were used for other static experiments; however, the minimum target constraints in the bundle adjustment process required them to be included in the photogrammetry images. A one-inch magnetic block was used in the static test by inserting it to the sliding bar that displaced the six templates by exactly one inch as read by the Novotechnik sensor. Three still images were recorded in the tests, i.e., two images without the block inserted (before and after) and one when the templates were moved by exactly one inch when the block was placed. Therefore, the accuracy measured from this test was based on an absolute single value of 25.4 mm; this value was compared with the six-points measurement shown in Figure 4.

Output Object Visualization
A total of 50 photogrammetry images, 25 captured by each camera of the HS system, was taken from different positions and orientations towards the specimen. The underexposed input images were improved first before the automatic object detection and closerange photogrammetry, in addition to the SHM procedures. The global histogram for input and associated output images for each enhancement method are shown above in Figure 3. Because the measurement accuracy was conducted based on the displacement of the six templates shown in Figure 5, the detailed modification of each point after enhancement at their 2D locations is given in Figure 5. This figure displays the change in gray level and the results clearly show that the intensity is evenly stretched for all points. The clipping effect is observed from the CS and HE methods, i.e., the pure white block is clipped at a maximum of 255 intensity. The gray values at this specific region are outside the sensor dynamic range after enhancement so they are set as the maximum (255) and appear as the clipped peaks in the histogram bins. Another observation is that the HRDC  Table 1. Vision-based system configuration sets for one-inch block validation experiment.

Camera Type High-Speed (HS) Standalone DSLR (SD)
The main component of the validation test model shown in Figure 4 is a sliding bar attached to a concrete column-capital-slab specimen. The sliding bar consists of a Novotechnik displacement sensor, an aluminum plate, and six circular templates. Other templates shown in Figure 4 were used for other static experiments; however, the minimum target constraints in the bundle adjustment process required them to be included in the photogrammetry images. A one-inch magnetic block was used in the static test by inserting it to the sliding bar that displaced the six templates by exactly one inch as read by the Novotechnik sensor. Three still images were recorded in the tests, i.e., two images without the block inserted (before and after) and one when the templates were moved by exactly one inch when the block was placed. Therefore, the accuracy measured from this test was based on an absolute single value of 25.4 mm; this value was compared with the six-points measurement shown in Figure 4.

Output Object Visualization
A total of 50 photogrammetry images, 25 captured by each camera of the HS system, was taken from different positions and orientations towards the specimen. The underexposed input images were improved first before the automatic object detection and close-range photogrammetry, in addition to the SHM procedures. The global histogram for input and associated output images for each enhancement method are shown above in Figure 3. Because the measurement accuracy was conducted based on the displacement of the six templates shown in Figure 5, the detailed modification of each point after enhancement at their 2D locations is given in Figure 5. This figure displays the change in gray level and the results clearly show that the intensity is evenly stretched for all points. The clipping effect is observed from the CS and HE methods, i.e., the pure white block is clipped at a maximum of 255 intensity. The gray values at this specific region are outside the sensor dynamic range after enhancement so they are set as the maximum (255) and appear as the clipped peaks in the histogram bins. Another observation is that the HRDC method effectively separates the white and black background, such that the low-level intensity of the dark background is visually clear in Figure 5e. Meanwhile, CLAHE softens the clipping effect that is evident in the HE method. It limits template brightness by setting a threshold of 0.01 pixel, thus avoiding oversaturation. The HRIO method also confines the gray level distribution within the sensor dynamic range without the clipping effect.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 20 method effectively separates the white and black background, such that the low-level intensity of the dark background is visually clear in Figure 5e. Meanwhile, CLAHE softens the clipping effect that is evident in the HE method. It limits template brightness by setting a threshold of 0.01 pixel, thus avoiding oversaturation. The HRIO method also confines the gray level distribution within the sensor dynamic range without the clipping effect. The radii of each point in Figure 5 that are identified as an object from NCC template matching are listed in Table 2 for each enhancement algorithm. The search window for the object is set as a 5.0-pixel minimum to allow automatic detection of the center. From Table 2, the pixel length in each direction is not uniform and there is approximately a scale factor of 1.2 due to the difference. Because the images were taken from different angles, the appearance of the object was not always in a circular shape. Therefore, instead of detecting a circle feature, an ellipse threshold of 2.0 pixels was selected to check the similarity of the radius in each direction. When an ellipse feature was detected, the center of the search window defines the ellipse center based on the minimum threshold average length of 2.0 pixels in each direction. The results given in Table 2 show the range of 19-21 pixels and 15-17 pixels for the first and second radius, respectively. The variations within each enhancement method are at the largest using the HRIO method, and at the minimum when CS and HE methods are used. The radii also have variations within each point due to the applied enhancement method, with slightly more percentages for points 2 and 3. The radii of each point in Figure 5 that are identified as an object from NCC template matching are listed in Table 2 for each enhancement algorithm. The search window for the object is set as a 5.0-pixel minimum to allow automatic detection of the center. From Table 2, the pixel length in each direction is not uniform and there is approximately a scale factor of 1.2 due to the difference. Because the images were taken from different angles, the appearance of the object was not always in a circular shape. Therefore, instead of detecting a circle feature, an ellipse threshold of 2.0 pixels was selected to check the similarity of the radius in each direction. When an ellipse feature was detected, the center of the search window defines the ellipse center based on the minimum threshold average length of 2.0 pixels in each direction. The results given in Table 2 show the range of 19-21 pixels and 15-17 pixels for the first and second radius, respectively. The variations within each enhancement method are at the largest using the HRIO method, and at the minimum when CS and HE methods are used. The radii also have variations within each point due to the applied enhancement method, with slightly more percentages for points 2 and 3.

Image Quality Assessment
The effectiveness of each algorithm in modifying image quality was measured based on the classical entropy (E), PSNR, and SSI M, and the output image quality was estimated using the no-reference image quality index metrics, i.e., BR, NQ, and PQ. They focused only on the quality and index changes due to image enhancement procedures applied to the underexposed images captured by the vision-based HS system. Quality assessment was conducted on all enhanced photogrammetry images with the statistics shown in Table 3. The coefficient of variations (CV) were computed from 50 output images for each enhancement method and index, and the index change (∆ input ) was measured from the mean difference between each algorithm and input index. Although uniform boundary conditions of enhancement algorithms were applied to the 50 input images, some variations based on CV percentage were observed in the output images, especially when the enhancement was conducted using the HRDC method. No reference index metrics also measure input image CV within 1.6-9.4% with more variants computed by NQ index, as listed in Table 3. The input images were visually dark and underexposed; however, they were taken from different positions and orientations towards the specimen. Because the monitoring depends entirely on the ambient lights and the lighting cannot be controlled to evenly illuminate the templates, changing the camera positions while taking pictures affected the images captured by the camera sensor. Overall, the observation based on the output image statistics in Table 3 shows that the implementation of the image enhancement algorithm modifies the input image characteristics, and some metrics detect major changes compared to other indices. These variations cannot be identified merely from the output image perception or gray level histogram.

Effect of Image Enhancement on the Object Identification in the Close-Range Photogrammetry
The automatic object identification procedure using the ellipse assumption and NCC matching was described previously in Section 3.2. The example of accurate identification is shown in Figure 6a, in which the object center is detected and positioned at the center with correct registration following the white rings. As a result of the predefined window search and threshold, these rings were sometimes detected as objects, so their center coordinates and residuals were computed in the preliminary object orientation, as shown in Figure 6c. The white rings were detected separately from the circular center and are considered objects. Therefore, these rings cannot be grouped into a single object or correctly registered as a single template. The object center can also be detected but when the matching is not convergent, the detected object is unable to be registered, like the example in Figure 6e. Ellipse shape assumption also may detect a few non-object features, such as the bolts or stair reflections in Figure 6d. These features may cause the photogrammetry image to fail in bundle adjustment computation if they are more dominant in the image plane, as shown in Figure 6f. This image was excluded from automatic bundle adjustment and the photogrammetry cannot be completed when most objects are incorrectly identified in each image plane. The automatic object identification procedure using the ellipse assumption and NCC matching was described previously in Section 3.2. The example of accurate identification is shown in Figure 6a, in which the object center is detected and positioned at the center with correct registration following the white rings. As a result of the predefined window search and threshold, these rings were sometimes detected as objects, so their center coordinates and residuals were computed in the preliminary object orientation, as shown in Figure 6c. The white rings were detected separately from the circular center and are considered objects. Therefore, these rings cannot be grouped into a single object or correctly registered as a single template. The object center can also be detected but when the matching is not convergent, the detected object is unable to be registered, like the example in Figure 6e. Ellipse shape assumption also may detect a few non-object features, such as the bolts or stair reflections in Figure 6d. These features may cause the photogrammetry image to fail in bundle adjustment computation if they are more dominant in the image plane, as shown in Figure 6f. This image was excluded from automatic bundle adjustment and the photogrammetry cannot be completed when most objects are incorrectly identified in each image plane. The results of object identifications in photogrammetry based on each enhancement algorithm are listed in Table 4. If the 28 templates are all visible in 50 photogrammetry images, the correct object identification should result in 1400 objects. However, these images were taken from different positions, with some pictures taken closer to the specimen. The templates for the top specimen are lost in this position; therefore, the correct identifications in Table 4 comprise fewer than 1400 objects. The total numbers given in Table 4 are the results from the summation of the correct, incorrect, and non-object identification, from which are then subtracted by the unidentified objects. It is observed that haze removal-based algorithms identified 80% of the objects correctly but have slightly higher percentages of non-object identification. Almost 20% of the images enhanced by the HRDC procedure failed as these images consist of incorrect object detection. Furthermore, histogram-based enhancements have lower accuracy of approximately 60% with 0.4% or less unidentified objects. Incorrect object identification also has a higher percentage than in the haze removal-based algorithm. However, they are not concentrated into one image but rather distributed within all 50 images, so there are no failed images from either CS, CLAHE, or HE methods.
The enhanced images from each procedure were carefully analyzed, confirming that all failed images were excluded in the bundle adjustment procedure. Therefore, despite the failed images and variations in object identification observed in each enhancement method, the bundle adjustment could still reach convergence. The photogrammetry using output images from each enhancement method was completed, with results given in Table 5. Principal point locations and were determined by the projected positions of the light rays through the lens center that are perpendicular to the image plane. The length of the perpendicular line is the principal distance that is equal to the focal length at infinity focus. It is related to the HS hardware system setting for the validation tests; therefore, the The results of object identifications in photogrammetry based on each enhancement algorithm are listed in Table 4. If the 28 templates are all visible in 50 photogrammetry images, the correct object identification should result in 1400 objects. However, these images were taken from different positions, with some pictures taken closer to the specimen. The templates for the top specimen are lost in this position; therefore, the correct identifications in Table 4 comprise fewer than 1400 objects. The total numbers given in Table 4 are the results from the summation of the correct, incorrect, and non-object identification, from which are then subtracted by the unidentified objects. It is observed that haze removalbased algorithms identified 80% of the objects correctly but have slightly higher percentages of non-object identification. Almost 20% of the images enhanced by the HRDC procedure failed as these images consist of incorrect object detection. Furthermore, histogram-based enhancements have lower accuracy of approximately 60% with 0.4% or less unidentified objects. Incorrect object identification also has a higher percentage than in the haze removalbased algorithm. However, they are not concentrated into one image but rather distributed within all 50 images, so there are no failed images from either CS, CLAHE, or HE methods. The enhanced images from each procedure were carefully analyzed, confirming that all failed images were excluded in the bundle adjustment procedure. Therefore, despite the failed images and variations in object identification observed in each enhancement method, the bundle adjustment could still reach convergence. The photogrammetry using output images from each enhancement method was completed, with results given in Table 5. Principal point locations u 0 and v 0 were determined by the projected positions of the light rays through the lens center that are perpendicular to the image plane. The length of the perpendicular line is the principal distance that is equal to the focal length at infinity focus. It is related to the HS hardware system setting for the validation tests; therefore, the variations are negligible and estimated as 0.23% within all methods. Furthermore, the principal point locations are known to have correlations with other internal camera parameters such as distortion coefficients. A strong correlation with other camera internal parameters resulted in higher variations in the principal point locations within each method, which were computed as −18% and −24.64%, respectively. Table 5. High-speed system internal parameters measured from the photogrammetry process using enhanced images. Overall, the global image enhancement method implemented in the study may still have some limitations in automatically identifying and registering objects. In addition, the careful selection of the images that need to be enhanced is required early in the process in the bundle adjustment procedure. However, as demonstrated from the results in this section, the method is still valid when the enhanced images are carefully selected to be included in the bundle adjustment procedure so that the process is complete and proper camera system parameters can be obtained (Table 5).

Effect of Image Enhancement on the Vision System Measurement Accuracy
Similar enhancement methods were applied in three static images taken from the validation experiments. The measurement accuracy was assessed by computing the absolute error, ∆ abs , of the displacement of the six points, δ, with respect to the absolute value of 25.4 mm. The results are shown in Table 6 and the average absolute error, ∆ abs,mean , is computed from all points. A high accuracy of less than 1% error is observed from all measurements using enhanced images. Only HRDC output images achieve a slightly higher error of 1.37%. Overall, the results shown in Table 6 provide the ultimate validation and verification for implementing image enhancement using either histogram-based or haze-removal-based algorithms, where displacement measurement absolute errors can be less than 1%.

Monitoring Setup and Building Description
The accuracy of image enhancement in measuring seismic vibrations and identifying structural dynamic characteristics was evaluated using a large-scale seismic shake table test of a three-story reinforced concrete (RC) building, as shown in Figure 7. The test was part of the Tokyo Metropolitan Resilience Project Subproject C and was performed in December 2019 at the National Research Institute for Earth Science and Disaster Resilience (NIED) in Kobe, Japan. The tests were dedicated to improving the resiliency of buildings and developing SHM techniques that could rapidly assess the safety of the buildings after major seismic shaking due to their post-disaster functions. More information related to the project or the building system can be found in Yeow et al. [79]. (2)

Monitoring Setup and Building Description
The accuracy of image enhancement in measuring seismic vibrations and identifying structural dynamic characteristics was evaluated using a large-scale seismic shake table test of a three-story reinforced concrete (RC) building, as shown in Figure 7. The test was part of the Tokyo Metropolitan Resilience Project Subproject C and was performed in December 2019 at the National Research Institute for Earth Science and Disaster Resilience (NIED) in Kobe, Japan. The tests were dedicated to improving the resiliency of buildings and developing SHM techniques that could rapidly assess the safety of the buildings after major seismic shaking due to their post-disaster functions. More information related to the project or the building system can be found in Yeow et al. [79].
Two vision sensor systems and their configurations used in seismic monitoring are shown in Table 7, in which both systems used CMOS sensors. The first system comprised two high-speed (HS) cameras shown as Cam 1 and Cam 2 in Figure 7, which were similar to those used in the validation test. The second system used a standalone DSLR (SD), shown as Cam A and Cam B in Figure 7, which recorded monochrome videos of the tests. The SD system test videos were later converted into continuous images with a resolution of 1920 × 1080 pixels. The sampling rates for the seismic test were selected as 32 framesper-second for the HS system and the default setting of 30-frames-per-second was selected for the SD system. Both vision systems completely relied on the ambient light sources in the test environment and the setting adjustments in each camera. Therefore, the captured images for photogrammetry and the SHM required image processing to improve their dynamic range.    Table 7, in which both systems used CMOS sensors. The first system comprised two high-speed (HS) cameras shown as Cam 1 and Cam 2 in Figure 7, which were similar to those used in the validation test. The second system used a standalone DSLR (SD), shown as Cam A and Cam B in Figure 7, which recorded monochrome videos of the tests. The SD system test videos were later converted into continuous images with a resolution of 1920 × 1080 pixels. The sampling rates for the seismic test were selected as 32 framesper-second for the HS system and the default setting of 30-frames-per-second was selected for the SD system. Both vision systems completely relied on the ambient light sources in the test environment and the setting adjustments in each camera. Therefore, the captured images for photogrammetry and the SHM required image processing to improve their dynamic range. Table 7. Vision-based system configuration using two sets of cameras for the seismic shake table test.

Output of Image Enhancement
Given the promising results obtained from the simple 1-inch block test, it was desired to extend the study to more realistic cases, including full-scale building vibration monitoring, which is the focus of the next section. The validation test described previously highlights several enhancement algorithms that result in less error compared to other methods. An example of the enhanced image histogram using the CLAHE method and the quality index metric for both sensor systems is shown in Figure 8. The input images initially captured by each system were initially underexposed for the HS system and low in contrast for the SD system, as shown by their histogram in Figure 8c,d. As previously highlighted in Figure 2, the identification algorithms are unable to locate any features from the original HS image, whereas the low-contrast SD image identifies a small number, but their total is inadequate for bundle adjustment convergence. After processing using the CLAHE method, the histogram of the HS vision system clearly shows the stretching of pixel distribution within the gray level intensity as the effect of image enhancement. The reduction in the pixel counts is also observed, especially in darker areas. Regarding the SD system, the CLAHE algorithm relaxes the pixel counts so the separation between dark and bright areas is more evident in the output image. Similar to the static test image, the entropy of the seismic test output image also measures higher values due to the applied image enhancement. It is more noticeable in the HS output, whereas less change is computed for the low-contrast image as recorded by the SD system. From the metrics shown in Figure 8, the enhancement procedure is observed to affect underexposed images more than low-contrast images, especially when the quality is estimated by the PQ metric.

Seismic Behavior and System Identification of the Three-Story Building
Several ground motion excitations ranging from low to high wave amplitude were applied to the RC building. White noise excitations in terms of low amplitude vibration were applied to the building between the seismic tests with a loading duration of 180 s. A sample of the displacement history from high amplitude (150% scale of a synthetic ground motion seismic excitation [79]) and white noise are given in Figures 9 and 10 for the measured template marked in the figures. The HS system is selected as the reference sensor and the relative difference between each system measurement is presented in detail.
the SD system, the CLAHE algorithm relaxes the pixel counts so the separation between dark and bright areas is more evident in the output image. Similar to the static test image, the entropy of the seismic test output image also measures higher values due to the applied image enhancement. It is more noticeable in the HS output, whereas less change is computed for the low-contrast image as recorded by the SD system. From the metrics shown in Figure 8, the enhancement procedure is observed to affect underexposed images more than low-contrast images, especially when the quality is estimated by the PQ metric.

Seismic Behavior and System Identification of the Three-Story Building
Several ground motion excitations ranging from low to high wave amplitude were applied to the RC building. White noise excitations in terms of low amplitude vibration were applied to the building between the seismic tests with a loading duration of 180 s. A sample of the displacement history from high amplitude (150% scale of a synthetic ground motion seismic excitation [79]) and white noise are given in Figures 9 and 10 for the measured template marked in the figures. The HS system is selected as the reference sensor and the relative difference between each system measurement is presented in detail. Figure 9. Comparison between high-speed and commercial DSLR system measurements of high-amplitude seismic excitation as measured from the marked template. Figure 10. White noise response as measured by high-speed and commercial DSLR system measurements in three principal axes as measured from the marked template.
The seismic response of the building under high-amplitude excitation is shown in Figure 9 based on the measurement of the two vision systems, together with their relative difference. More details are presented in Table 8, which provides a summary of the peak displacement values from both monitoring systems. The peaks observed from the displacement measurement of HS systems were computed as 1118.3 and −787.4 mm, respectively, whereas the SD system shows the peaks at a maximum of 1113.3 and −779.2 mm. The relative maximum error in the SD system measurement relative to the HS system was computed as −28.57 mm (3.63%), which shows that both consumer-grade and high-end high-speed sensor systems are comparable. were applied to the building between the seismic tests with a loading duration of 180 s. A sample of the displacement history from high amplitude (150% scale of a synthetic ground motion seismic excitation [79]) and white noise are given in Figures 9 and 10 for the measured template marked in the figures. The HS system is selected as the reference sensor and the relative difference between each system measurement is presented in detail. Figure 9. Comparison between high-speed and commercial DSLR system measurements of high-amplitude seismic excitation as measured from the marked template. Figure 10. White noise response as measured by high-speed and commercial DSLR system measurements in three principal axes as measured from the marked template.
The seismic response of the building under high-amplitude excitation is shown in Figure 9 based on the measurement of the two vision systems, together with their relative difference. More details are presented in Table 8, which provides a summary of the peak displacement values from both monitoring systems. The peaks observed from the displacement measurement of HS systems were computed as 1118.3 and −787.4 mm, respectively, whereas the SD system shows the peaks at a maximum of 1113.3 and −779.2 mm. The relative maximum error in the SD system measurement relative to the HS system was computed as −28.57 mm (3.63%), which shows that both consumer-grade and high-end high-speed sensor systems are comparable. Figure 10. White noise response as measured by high-speed and commercial DSLR system measurements in three principal axes as measured from the marked template.
The seismic response of the building under high-amplitude excitation is shown in Figure 9 based on the measurement of the two vision systems, together with their relative difference. More details are presented in Table 8, which provides a summary of the peak displacement values from both monitoring systems. The peaks observed from the displacement measurement of HS systems were computed as 1118.3 and −787.4 mm, respectively, whereas the SD system shows the peaks at a maximum of 1113.3 and −779.2 mm. The relative maximum error in the SD system measurement relative to the HS system was computed as −28.57 mm (3.63%), which shows that both consumer-grade and high-end high-speed sensor systems are comparable. The building response in three principal axes based on low-amplitude white noise excitation is given in Figure 10 for the two vision systems. These data were analyzed using the SSI-COV algorithm to enable frequency and modal identifications of the building system. The HS sampling rate was 32 Hz, so the SD system acceleration signal was resampled to increase the computational efficiency of the identification algorithm. The signals from both systems were filtered using a 4th order Butterworth bandpass filter with cutoff frequencies of 3 and 13 Hz. A model order of six was selected for all signals to enable the identification of the first three fundamental modes, with the fitting computed up to the 30th order to show the stability of the poles at a higher level. The frequency response function of each signal is plotted in Figure 11 together with the stability of the poles. With a model order of six, only the transverse and longitudinal response data are able to extract three stable poles in frequency and damping. Higher modes and different filtering can also be selected to extract more modes in the vertical direction. For the uniformity in signal processing, the filtering and model order selection were set to be similar in this case study. to increase the computational efficiency of the identification algorithm. The signals from both systems were filtered using a 4th order Butterworth bandpass filter with cutoff frequencies of 3 and 13 Hz. A model order of six was selected for all signals to enable the identification of the first three fundamental modes, with the fitting computed up to the 30th order to show the stability of the poles at a higher level. The frequency response function of each signal is plotted in Figure 11 together with the stability of the poles. With a model order of six, only the transverse and longitudinal response data are able to extract three stable poles in frequency and damping. Higher modes and different filtering can also be selected to extract more modes in the vertical direction. For the uniformity in signal processing, the filtering and model order selection were set to be similar in this case study. Figure 11. Stabilization plots from output-only SSI-COV method measured for building dynamic modal properties in three directions.
The comparison between HS and SD systems in extracting the first mode of vibration in the terms of frequency, , and damping, , is given in Table 9. The difference in measuring frequency, ∆ , shows the lowest difference is computed in measuring the transverse frequency and a slightly higher difference is observed in other directions. The relative difference within the range of 3% is observed in measuring structural damping properties, ∆ . Overall, similar to what was demonstrated for high-amplitude displace- Figure 11. Stabilization plots from output-only SSI-COV method measured for building dynamic modal properties in three directions.
The comparison between HS and SD systems in extracting the first mode of vibration in the terms of frequency, f 1 , and damping, ζ 1 , is given in Table 9. The difference in measuring frequency, ∆ f , shows the lowest difference is computed in measuring the transverse frequency and a slightly higher difference is observed in other directions. The relative difference within the range of 3% is observed in measuring structural damping properties, ∆ ζ Overall, similar to what was demonstrated for high-amplitude displacement measurement, the structural modal properties computed from both vision systems are also very comparable. Table 9. Building fundamental frequency f 1 , and damping, ζ 1 , with their differences, ∆ f and ∆ ζ , as measured by twovision systems.

Conclusions
This work presents a framework of improving underexposed images using image enhancement algorithms for feature identification with implementation in close-range photogrammetry and structural health monitoring. An experimental validation with systematic evaluation was conducted using a one-inch steel block text which measured the absolute difference between two vision-based systems and the one-inch block displacement. The framework was also tested in measuring the seismic response and modal properties of a three-story building tested under high-amplitude seismic excitation and a white noise test. Based on these laboratory experiments, the key findings and main conclusions can be drawn as follows: • Image enhancement efficiently improves the quality of image data collected from vision-based sensors and needs to be adopted more often in infrastructure and largescale SHM applications. The proposed algorithms can modify the underexposed and low-contrast input images captured by high-speed or commercial DSLR cameras, thus allowing automatic feature identification. Their efficiency can be estimated through the classical image quality metrics, and their output quality can be assessed by more advanced blind image quality metrics.

•
The precision of the enhanced images in measuring static displacement shows a very high accuracy as observed by the two vision systems in the one-inch block test. Comparable results from both systems were also assessed in measuring highamplitude displacement from the large-scale seismic tests, and in estimating structural modal properties through the system identification procedure. • Overall, it is concluded that image enhancement does have a significant effect on feature identification and implications for the close-range photogrammetry and SHM accuracy. The applied enhancement algorithms were shown to be computationally effective and are recommended for vision-based SHM image enhancement applications. • On a specific note, automatic feature detection in enhanced images may be a limitation of this method. Thus, future users are cautioned against selecting the search window and the threshold options for enabling automatic detection of the features on the output images when the global enhancement algorithms are implemented. Instead, a careful check is recommended of the number of obsolete objects identified within each enhanced image plane to allow the bundle adjustment to converge in the photogrammetry process. Measurement accuracy seems to slightly deteriorate when more failed images are identified from the bundle adjustment procedures. With due care, successful monitoring using underexposed and low-contrast images is still possible, not only for different vision system hardware, but also for a wide range of experimental works, through a proper selection of the image enhancement algorithm.