Design and Laboratory Validation of a Low-Cost Vision-Based Strain Monitoring System Using ESP32-CAM with Centralized Processing

Anim, Asare Kwaku; Li, Weijie; Zhao, Xuefeng; Ma, Jun; Liu, Ronghuan; Sun, Dong

doi:10.3390/buildings16091681

Open AccessArticle

Design and Laboratory Validation of a Low-Cost Vision-Based Strain Monitoring System Using ESP32-CAM with Centralized Processing

by

Asare Kwaku Anim

^1,2,

Weijie Li

^1,2,*

,

Xuefeng Zhao

^1,2,*

,

Jun Ma

³,

Ronghuan Liu

^1,2 and

Dong Sun

³

¹

School of Civil Engineering, Dalian University of Technology, Dalian 116024, China

²

State Key Laboratory of Coastal and Offshore Engineering, Dalian University of Technology, Dalian 116024, China

³

State Grid Jilin Electric Power Company Limited, Changchun 130022, China

^*

Authors to whom correspondence should be addressed.

Buildings 2026, 16(9), 1681; https://doi.org/10.3390/buildings16091681

Submission received: 23 March 2026 / Revised: 13 April 2026 / Accepted: 17 April 2026 / Published: 24 April 2026

(This article belongs to the Section Building Structures)

Download

Browse Figures

Versions Notes

Abstract

Vision-based structural health monitoring offers a promising alternative to conventional wired sensing systems; however, its adoption is often limited by high hardware costs and computational constraints at sensing nodes. This study presents the design and laboratory validation of a low-cost vision-based system for displacement and strain monitoring using a centralized processing architecture. The proposed system separates image acquisition from computation, where an ESP32-CAM module serves as a lightweight edge node for grayscale image capture and wireless transmission, while computational tasks including displacement tracking, subpixel localization, scale calibration, and strain estimation are performed on a centralized unit. This enables low-cost deployment at USD 60 per node with low power consumption at 1 W. System performance was evaluated through controlled experiments, including a 24 h zero-drift test and quasi-static displacement tests up to 15 μm. Validation against a Linear Variable Differential Transformer (LVDT) shows close agreement, with an absolute error of 2.63 µε and drift within ±2 μm. The system achieves an effective strain range of ±35,000 με. These results demonstrate the potential of low-cost centralized vision-based systems, demonstrating strong potential for practical deployment in structural health monitoring applications.

Keywords:

low-cost vision-based sensing; displacement and strain measurement; ESP32-CAM; centralized processing; structural health monitoring; wireless sensing

1. Introduction

Structural health monitoring (SHM) has evolved from a research curiosity to an engineering necessity as building infrastructure ages, safety expectations rise, and data-driven maintenance becomes economically imperative [1]. Modern building management increasingly relies on quantitative condition assessment to optimize repair schedules, extend service life, and prevent catastrophic failures [2,3]. Among measurable response parameters, strain is particularly diagnostic: it directly indicates localized material deformation, enables stress state assessment, and provides an early indication of damage initiation from microcracking in concrete to yielding in steel [4,5,6]. Despite decades of sensor development, comprehensive strain monitoring in buildings remains rare. Conventional technologies such as electrical resistance strain gauges, fiber Bragg grating (FBG) sensors, vibrating wire gauges, and Linear Variable Differential Transformers (LVDTs) offer established accuracy but impose prohibitive barriers to scaled deployment [7,8,9]. FBG interrogators cost USD 10,000–50,000; individual wired sensors require expensive installation labor [10,11]. Dense wiring for power and signal transmission may be incompatible with existing buildings and operationally disruptive [9,12]. Wired sensors are vulnerable to electromagnetic interference, and physical damage; replacement requires access to embedded installations [13,14,15,16].

Vision-based SHM has attracted substantial research interest as a spatially distributed alternative [17,18,19,20]. Established techniques such as digital image correlation (DIC), template matching, and feature tracking achieve subpixel displacement resolution under ideal conditions [21,22,23,24]. However, two dominant system architectures have limited practical building deployment. Fully embedded edge-processing systems perform all computation at the sensing node using single-board computers such as Raspberry Pi and NVIDIA Jetson with high-speed industrial cameras. This approach introduces its own challenges: per-node costs of USD 100–500, power consumption of 5–20 W requiring dedicated electrical infrastructure, thermal management complications, and reduced flexibility for algorithm updates or system reconfiguration [25,26,27,28]. The current literature lacks practical architectures that achieve measurement accuracy comparable to conventional sensors using minimal-cost hardware while avoiding the computational constraints and inflexibility of fully embedded processing. Specifically, there is need for systems that reduce the cost per sensing point to levels comparable with a basic wired sensor, minimize power consumption for deployment via building infrastructure-integrated live USB power and enable computational flexibility for algorithm evolution without field hardware modification as well as integrating naturally with existing Wi-Fi networked systems. This study addresses the identified gap through architectural decoupling: separating image acquisition from computational processing. The specific innovation is using ESP32-CAM modules USD 5 wireless camera systems designed for IoT applications, as edge nodes performing exclusively image capture and transmission, with all analysis executed on centralized servers. This proposed system approach is different from both high-end vision systems which centralize processing but require expensive acquisition hardware and edge-processing systems which distribute computation to sophisticated and expensive local hardware.

This study develops and rigorously validates a vision-based strain monitoring system with centralized processing specifically designed for building-scale deployment. The specific contributions include: the development and laboratory validation of a vision-based strain monitoring system achieving micrometer-level displacement resolution and microstrain-level strain accuracy using a USD 5 ESP32-CAM microcontroller edge hardware device with centralized processing; the demonstration of scalable architecture with identified technical pathways for addressing environmental robustness, cybersecurity, and network resilience requirements for field deployment in smart building applications.

The remainder of this paper is organized as follows: Section 2 details the system architecture, hardware composition, and algorithms with explicit uncertainty analysis; Section 3 presents laboratory validation experiments and performance metrics; Section 4 discusses results: Section 5 discusses the limitations of the proposed method, and the path to field deployment; and Section 6 concludes with future work directions.

2. Materials and Methods

2.1. System Architecture

The proposed monitoring system as illustrated in Figure 1 adopts a distributed three-tier architecture, separating functionality across edge, communication, and centralized processing layers. The system consists of (i) an edge camera node, (ii) a wireless communication layer, and (iii) a centralized processing unit.

The edge node, based on the ESP32-CAM module, is responsible solely for image acquisition and wireless transmission. All computationally intensive operations—including displacement tracking, scale calibration, and strain estimation—are performed on a centralized PC. This design reduces hardware complexity at the sensing node and enables scalable deployment across multiple monitoring locations. The various system attributes for the edge node as well as the centralized systems are summarized in Table 1.

However, while the centralized architecture simplifies system design, it introduces potential scalability and reliability limitations. As the number of deployed nodes increases, data transmission and processing demands may lead to network congestion and computational bottlenecks. Additionally, the centralized processing unit represents a single point of failure, where interruptions may affect monitoring continuity. In this study, the system is evaluated under limited-scale deployment, and future work will explore distributed or edge-assisted processing strategies to improve scalability and fault tolerance.

Image data are transmitted over a 2.4 GHz Wi-Fi network and processed sequentially at the central unit.

1. Edge Camera Node: The MISS-Building sensor is built on the ESP32-CAM module mounted on the monitored structural component, with only two functions: grayscale image capture and wireless image transmission.

2. Wireless Communication Layer: The system uses standard IP-based networking to transmit image frames to a centralized server over a 2.4 GHz Wi-Fi network. A store-and-forward buffering mechanism using onboard microSD storage ensures measurement continuity during intermittent network outages. Captured images are timestamped and stored locally when connectivity is unavailable, and transmission resumes automatically upon reconnection.

3. Centralized PC/BMS Analysis Layer: All computational operations are performed on a centralized PC/BMS platform. These include image decoding, preprocessing, displacement tracking using normalized cross-correlation (NCC), subpixel localization via quadratic interpolation, checkerboard-based scale calibration, and strain computation. The centralized layer also supports data logging, visualization, and alert generation.

Network and Security Considerations: While the current implementation uses standard WPA2-secured Wi-Fi, we explicitly identify cybersecurity, data encryption, and network congestion resilience as critical requirements for field deployment. The architecture supports these enhancements at the centralized layer without edge hardware modification.

2.2. Hardware Design

2.2.1. Core Hardware Components

The edge camera node is a compact unit with four core components.

ESP32-CAM AI Thinker Module: The system uses an ESP32-CAM module equipped with an OV5640 CMOS image sensor. Although the sensor supports a native resolution of 3264 × 2448 pixels, images are captured in grayscale at 1280 × 720 resolution to balance data efficiency and measurement performance. The sampling rate is set to 1 Hz based on system design constraints and default acquisition configuration. This rate is sufficient for capturing quasi-static structural behavior while minimizing power consumption and the communication load. However, it limits the system’s ability to capture transient or high-frequency structural responses and is therefore most suitable for long-term monitoring of gradual deformation. The entire cost of the hardware components and their respective specifications are summarized in Table 2.

MicroSD Card: A 4 GB Class 10 microSD card provides local buffering capability. This enables continuous image storage during network outages, ensuring measurement continuity over extended durations.

Image Sensor Configuration: Images are captured in grayscale to reduce computational complexity and eliminate color-channel noise, improving template matching consistency. The camera operates using the default exposure settings provided by the ESP32-CAM module. While this simplifies configuration, variations in illumination may affect image quality and tracking performance under non-uniform lighting conditions.

Mounting Shelf: A custom stainless-steel mounting platform supports the ESP32-CAM and optical targets, including a checkerboard reference and a bullseye tracking target. The configuration ensures stable relative positioning during laboratory validation.

2.2.2. Optical Targets

Bullseye Target: The bullseye target as shown in Figure 2 is attached to the moving structural component. Its radial symmetry supports robust detection and provides partial invariance to rotational misalignment.

Checkerboard Reference Pattern (1 mm squared size): The reference checkerboard pattern is fixed to a stationary structural surface. It is used for pixel-to-physical scale calibration and periodic recalibration to compensate for drift.

2.2.3. Usage and Energy Efficiency

The main design criteria are function, a low cost, a low power requirement, and ease of deployment, which is necessary for the system’s suitability for smart building SHM.

Power Consumption: The system consumes the nominal power of 1 W during operation. This low power demand enables 66 h of continuous operation with a single 10,000 mAh portable battery or permanent operation via the smart building’s existing 5 V DC USB power supply. This eliminates the need for a dedicated high-voltage or high-current power infrastructure, making the system compatible with smart building power supplies.

Deployment Workflow: The sensor deployment utilizes a simple two-step workflow. First the strain transfer mechanism and the shelf are attached using the M4 bolt and the structural attachment point. Secondly the ESP32-CAM within the smart building network is connected to a 5 V power supply source then the image transmission and centralized PC-edge node begin immediately.

2.3. Algorithms and Uncertainty

2.3.1. Algorithm Overview

The monitoring framework follows a sequential image-based processing pipeline consisting of image acquisition, transmission, preprocessing, displacement tracking, and strain computation. Image frames captured at the edge node are transmitted to a centralized processing unit for analysis. The algorithm is designed for one-dimensional horizontal displacement tracking and incorporates a region of interest (ROI), confidence-based tracking validation, and periodic scale recalibration to improve robustness. The overall workflow is illustrated in Figure 3.

2.3.2. Image Acquisition and Compression Effects

Image frames are captured at fixed sampling intervals and encoded in JPEG format prior to transmission. JPEG encoding reduces data size and improves transmission efficiency; however, the compression ratio depends on the scene content and default camera configuration.

JPEG compression is inherently lossy and may introduce artifacts that affect pixel-level intensity distributions. These artifacts are particularly relevant for subpixel localization, which relies on accurate intensity gradients around correlation peaks. In this study, the compression configuration was not explicitly optimized, and its impact on displacement accuracy was not systematically quantified.

Although the experimental results show agreement with reference measurements under controlled conditions, the influence of compression artifacts under varying imaging conditions such as low contrast, noise, or illumination variability remains an important area for further investigation.

2.3.3. Initialization and Scale Calibration

At the start of monitoring, the initialization procedure is performed using the first valid image frame as shown in Figure 4. The bullseye target is detected using the circular Hough transform, and a template is extracted for subsequent tracking. The checkerboard pattern is detected using Harris corner detection, and geometric consistency is verified using RANSAC-based homography estimation.

The pixel-to-physical scale is determined using the known checkerboard square size (1 mm). The median inter-corner pixel distance is used to define the conversion factor:

um_per_px = \frac{1000}{px_per_mm}

(1)

where px_per_mm denotes the number of pixels per millimeter, computed as the median inter-corner pixel distance of the checkerboard pattern.

2.3.4. Displacement Tracking

For each frame, the horizontal position of the bullseye target is updated using a template-matching approach constrained to one-dimensional motion. Unlike feature-based methods such as optical flow, normalized cross-correlation (NCC) does not require distinctive keypoints and is well suited for the symmetric, repetitive structure of the bullseye target.

The NCC is computed between a stored reference template and a local search region centered around the previous target position. This localized search reduces the computational cost and improves tracking stability. NCC is selected for its deterministic behavior, robustness to uniform illumination variations, and effectiveness in tracking repetitive patterns.

Tracking confidence is evaluated based on correlation peak magnitude and inter-frame displacement consistency. When the confidence falls below a predefined threshold, a reinitialization procedure is triggered to re-detect the target and update the template.

While NCC performs well for translational motion under controlled conditions, it remains sensitive to geometric transformations such as rotation, deformation, and partial occlusion. The radial symmetry of the bullseye target provides partial robustness to rotation; however, tracking accuracy may degrade under more complex structural motion.

2.3.5. Subpixel Localization

To improve accuracy beyond pixel resolution, subpixel localization is applied to refine the detected target position. Quadratic interpolation is performed around the peak of the correlation response to estimate the subpixel offset as shown in Figure 5. Let R(x) denote the correlation response along the horizontal axis, where x is the pixel coordinate along the horizontal search axis, with a discrete maximum at pixel location x₀. The subpixel offset δx is computed as

δ x = \frac{R (x_{0} - 1) - R (x_{0} + 1)}{2 [R (x_{0} - 1) - 2 R (x_{0}) + R (x_{0} + 1)]}

(2)

The refined bullseye position is expressed as

x_{b} (t) = x_{0} + δ x

(3)

This approach enables subpixel displacement estimation while maintaining computational efficiency. However, the accuracy of the refinement depends on the quality of the correlation response and may be affected by noise, compression artifacts, and reduced contrast.

2.3.6. Scale Recalibration

Although the checkerboard reference is assumed to remain stationary, slight movement due to thermal expansion, vibration, or mounting imperfections may introduce scale drift over long monitoring durations. To mitigate this, periodic checkerboard detection is performed to update the scale factor and reference position. However, this approach assumes stability of the reference region between recalibration intervals, and long-term performance under real-world conditions requires further validation.

2.3.7. Displacement and Strain Computation

The horizontal displacement of the bullseye relative to its initial position was computed as follows:

Δ x_{μ} (t) = (x_{b} (t) - x_{b, 0}) \cdot um_per_px

(4)

The axial strain, expressed in microstrain, was calculated as

ε_{μ} (t) = \frac{L (t) - L_{0}}{L_{0}} \times 10^{6}

(5)

This formulation enables strain estimation suitable for structural components, where conventional wired sensors may be impractical.

3. Laboratory Performance Evaluation

3.1. Experimental Details

Laboratory experiments were conducted to evaluate the performance of the proposed vision-based strain monitoring system under quasi-static and static load conditions. The experiments conducted assessed static accuracy, repeatability, stability, and sensing range.

The experimental setup as shown in Figure 6 consisted of a fixed platform and a platform driven by a precision stepper motor (positioning accuracy: 0.03 mm, phase voltage: 3.6 V, holding torque: 0.28 N·m, maximum speed: 100 mm/s², and drive voltage: 24 V) to generate controlled relative displacements. Both platforms were aligned along the same horizontal plane. The MISS-Building sensing unit was rigidly mounted on the optical platform. The DM8-02-5V model LVDT, manufactured by Guangzhou Ceheng Technology Co., Ltd., Guangzhou, China, was used as a benchmark displacement sensor. It features a pen-type design with a spring return mechanism and a stainless-steel housing of 8 mm diameter. Its key technical parameters include a measurement range of 0–20 mm, a precision of 1 µm, an output voltage of 0–5 V, a linearity of 0.036%, and a sensitivity of 2.5029 V/mm. The LVDT was mounted parallel to the direction of motion on the movable platform and connected to a dedicated data-acquisition unit. A schematic overview of the experimental setup is shown in Figure 6.

The sensing node was powered by a 5 V supply, and wireless communication was provided via a portable modem. All image processing and strain computations were performed on a PC-based processing unit.

3.2. Zero-Drift Test

The long-term stability of the proposed system was evaluated through a 24 h zero-drift test under controlled laboratory conditions. During this period, no external load was applied and the relative position of the measurement targets remained fixed.

The displacement measurements remained bounded within ±2 µm over the entire monitoring duration, corresponding to strain fluctuations within approximately ±6 µε. The results as shown in Figure 7 indicate stable baseline behavior with no significant drift, demonstrating the temporal stability of both the sensing hardware and the processing pipeline.

The minimal drift observed over the 24 h period indicates stable system performance under controlled laboratory conditions, supporting its effectiveness for monitoring quasi-static structural behavior.

3.3. Sensing Range

The measurable displacement and strain ranges of the system were evaluated through incremental displacement testing. A step size of 100 µm was applied incrementally until the observable limit of the imaging system was reached.

The system as shown in Figure 8 achieved an effective strain measurement range of approximately ±35,000 µε. This range is primarily governed by optical configuration parameters, including the field of view, target size, and image resolution, and can be adjusted based on application requirements.

3.4. Static and Quasi-Static Performance

The static step-loading experiments were performed to evaluate the accuracy, repeatability, and stability of the proposed sensing system under controlled displacement inputs. Three displacement step sizes, 50 µm, 25 µm, and 15 µm, were selected to represent typical incremental deformation levels encountered in static or slowly varying structural responses. For each displacement step, corresponding measurements were recorded using the LVDT and the MISS-Building sensor. The strain responses obtained for the different loading cases are shown in Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13.

Measurement errors were computed as the difference between the strain values obtained from the proposed system and those measured by the LVDT as shown in Table 3. For the 50 µm loading case, the maximum error (MAXE), mean absolute error (MAE), and maximum percentage error were 4.93 µε, 2.63 µε, and 3.41%, respectively. For the 25 µm loading case, the corresponding values were 2.47 µε, 1.31 µε, and 3.54%. For the 15 µm loading case, the maximum error, mean absolute error, and maximum percentage error were 3.64 µε, 1.24 µε, and 8.60%, respectively.

Repeated loading sequences produced consistent strain responses, indicating stable and repeatable sensor behavior under static loading conditions. These results demonstrate that the proposed system is capable of resolving small static strains relevant to building and civil engineering applications.

The observed maximum absolute error of 1.8 με for the quais-static performance reflects the combined influence of mechanical tolerances in the strain transfer mechanism, image noise, and subpixel estimation uncertainty as shown in Table 4.

4. Discussion

The experimental results demonstrate that the proposed vision-based monitoring system is capable of measuring static and quasi-static strain responses using a low-cost imaging platform. Across all laboratory evaluations, including zero-drift testing, sensing range assessment, and step-loading experiments, the system exhibits stable performance and consistent agreement with reference LVDT measurements under controlled conditions.

The 24 h zero-drift test indicates stability of the sensing framework, with displacement variations bounded within ±2 μm, corresponding to strain fluctuations of approximately ±6 με. This suggests that the system maintains a stable baseline under controlled laboratory conditions, which is essential for long-term monitoring applications where distinguishing the true structural response from measurement noise is critical. However, this evaluation is limited to controlled conditions, and long-term stability under varying environmental influences requires further investigation.

The sensing range results highlight that measurement capability is governed not only by algorithmic performance but also by optical configuration parameters such as field of view, target size, and image resolution. These factors define the achievable measurement range and provide flexibility in adapting the system to different structural monitoring scenarios.

Static step-loading experiments further demonstrate the accuracy and repeatability of the proposed system. The close agreement with LVDT measurements across all displacement levels confirms that the system can resolve small service-level strains relevant to building applications. As expected, measurement uncertainty increases at smaller displacement magnitudes; however, the errors remain within acceptable bounds for practical SHM deployment.

A key contribution of this work lies in the system-level architecture rather than in the development of a novel image-processing algorithm. By decoupling image acquisition from strain computation, the proposed framework enables the use of low-cost sensing nodes while leveraging centralized processing for high-precision analysis. This approach addresses a critical scalability challenge in vision-based SHM, where cost, power consumption, and computational limitations at the sensing node often restrict deployment.

The proposed architecture aligns naturally with smart building infrastructure, where wireless networks and centralized processing systems are already available. This enables distributed multi-point monitoring without complex wiring or high-end sensing hardware, making the system well suited for deployment in building environments. Compared to closely related vision-based strain sensing systems, such as the MISS-Dym sensor system, proposed by Bai et al. [28], which employs higher-cost embedded processing hardware, the proposed system achieves comparable sensing functionality at significantly reduced cost.

However, the current system is primarily designed for static and quasi-static applications. The use of a 1 Hz sampling rate and reliance on centralized processing limit its suitability for dynamic structural monitoring scenarios. Additionally, factors such as illumination variability, compression artifacts, and geometric transformations (e.g., rotation or occlusion) may affect tracking robustness under real-world conditions. These limitations highlight the need for further development to ensure reliable performance in long-term field deployment.

5. Limitations

While laboratory validation establishes the foundational performance of the proposed system, several critical limitations remain that constrain immediate real-world deployment. These limitations arise from differences between controlled experimental conditions and complex field environments, as well as from current system design assumptions and implementation choices.

A primary limitation relates to environmental robustness, particularly illumination variability. To reduce the influence of lighting fluctuations during validation, a constant LED panel illumination source is included in the sensor’s setup to provide stable and uniform lighting conditions. This controlled setup minimized variations in pixel intensity and ensured consistent feature contrast, thereby improving the reliability of template matching and subpixel localization. However, this approach does not fully represent real-world conditions, especially in outdoor or semi-exposed environments, where factors such as direct sunlight, shadows, reflections, and weather-induced lighting changes can introduce significant variability. These effects may degrade image quality, reduce contrast, and introduce noise, ultimately affecting displacement estimation accuracy.

More broadly, the system was validated under controlled laboratory conditions with a stable temperature and minimal disturbance. In real-world environments, additional factors such as airborne particulates, humidity, and structural vibrations may further degrade image quality and affect feature detection and subpixel localization. These influences may introduce noise or bias in displacement estimation, particularly under low-contrast or partially occluded conditions.

A second limitation is the restriction of the current implementation to one-dimensional horizontal displacement measurement under quasi-static conditions. In practice, structural systems experience multi-axis deformation, including vertical displacement, rotation, and dynamic loading across a wide frequency range. Furthermore, the current sampling rate of 1 Hz limits the ability of the system to capture transient or high-frequency responses, such as those induced by wind, traffic, or seismic activity. Consequently, the present system is primarily suited for monitoring slow or quasi-static structural behavior.

The mechanical strain transfer mechanism introduces a non-negligible source of uncertainty, estimated at approximately ±6 µm due to backlash, alignment imperfections, and component tolerances. This uncertainty is comparable to the resolution of the vision-based measurement, indicating that the mechanical subsystem currently constrains the overall system accuracy. Additionally, long-term deployment may be affected by thermal expansion, material creep, or the slight movement of the reference checkerboard surface, which may introduce drift in strain estimation over extended periods.

Another important limitation relates to the use of JPEG compression for image transmission. While compression reduces data size and communication load, it introduces lossy artifacts that may affect pixel-level intensity distributions, particularly under low-contrast or noisy imaging conditions. These artifacts can influence template matching accuracy and subpixel localization, potentially introducing small but systematic errors in displacement estimation. The present study does not explicitly quantify the impact of different compression levels on measurement accuracy, which remains an important area for further investigation.

From a system architecture perspective, the current implementation relies on the centralized processing of transmitted image data. While effective for small-scale deployments, this approach introduces potential scalability challenges when extended to large sensor networks. Increasing the number of nodes may lead to network congestion, higher latency, and processing bottlenecks. Additionally, centralized architectures introduce a single point of failure, where loss of the processing node may interrupt monitoring. The current system also does not incorporate redundancy or distributed fallback mechanisms.

Network reliability and cybersecurity also present important considerations. The system currently relies on standard Wi-Fi communication without dedicated encryption or authentication mechanisms, and its performance under network congestion, packet loss, or extended outages has not been characterized. These factors may affect data integrity, latency, and overall system reliability, particularly in shared building networks.

Finally, the displacement estimation algorithm, based on normalized cross-correlation, provides computational efficiency but remains sensitive to illumination variation, partial occlusion, and geometric transformations such as rotation and deformation. While effective for controlled translational motion, its robustness may degrade under complex field conditions.

Collectively, these limitations define the current boundary between laboratory validation and scalable field deployment, highlighting the need for further development to achieve robust long-term operation under realistic environmental and structural conditions.

6. Future Work

Future research will focus on extending the proposed system toward robust field deployment and large-scale structural health monitoring applications.

A key direction is the development of multi-axis displacement and strain measurement capabilities using stereo vision or multi-camera configurations, enabling the capture of vertical, rotational, and complex structural deformations. Additionally, increasing the sampling rate will allow the system to capture dynamic structural responses, including transient and high-frequency events.

To address environmental variability, future work will explore advanced illumination handling strategies, including adaptive exposure control, high-dynamic-range (HDR) imaging, and illumination-invariant preprocessing techniques. Furthermore, learning-based approaches such as DeepLab- and EfficientNet-based models will be investigated to improve robustness against lighting changes, occlusion, and geometric transformations [29,30].

The impact of JPEG compression on subpixel localization and displacement accuracy will be systematically evaluated across different compression levels and imaging conditions to better understand its influence on measurement reliability.

From a system design perspective, future developments will focus on improving scalability and resilience through the integration of edge-based processing and distributed computation, reducing reliance on centralized architectures and minimizing network load. Redundant communication strategies and fault-tolerant system designs will also be explored to ensure continuous operation in the presence of network failures.

Long-term field validation will be conducted to assess system performance over extended periods, including the effects of thermal expansion, environmental exposure, and structural aging on measurement stability and drift.

Finally, efforts will be directed toward simplifying the mechanical strain transfer mechanism, including the use of direct surface-mounted optical targets, in order to reduce system complexity, cost, and mechanically induced uncertainty.

7. Conclusions

This study presented a low-cost, vision-based strain monitoring system designed for static and quasi-static building monitoring applications. The proposed approach adopts a distributed architecture in which compact camera nodes are used exclusively for image acquisition and wireless transmission, while displacement tracking and strain This study presented a low-cost, vision-based strain monitoring system designed for static and quasi-static building monitoring applications. The proposed approach adopts a distributed architecture in which compact camera nodes are used exclusively for image acquisition and wireless transmission, while displacement tracking and strain computation are performed using centralized PC-based processing.

Laboratory experiments confirmed stable measurement behavior across all evaluation scenarios. The system achieved zero-drift within ±2 µm (±6 µε) over a 24 h period, a sensing range of ±35,000 µε, and a mean absolute error of 2.7 µε across all static loading cases. This demonstrates consistent agreement with the reference LVDT sensor. Static step-loading tests with displacement increments of 50 µm, 25 µm, and 15 µm confirmed the system’s ability to resolve small strains relevant to service-level structural assessment. The achievable sensing range depends primarily on optical configuration and measurement geometry, allowing flexibility in adapting the system to different monitoring requirements.

Rather than introducing a novel image-processing technique, the primary contribution of this work lies in the system-level design and experimental validation of a viable monitoring architecture. By separating sensing from computation, the proposed framework reduces hardware complexity at the sensing node and supports deployment at scale using low-cost components, making it well-suited for building-scale monitoring scenarios where installation simplicity, cost efficiency, and multi-point sensing capability are essential.

Future work will focus on extending the proposed framework beyond the controlled laboratory conditions considered in this study. One important direction is the evaluation of system performance under varying environmental conditions, including changes in lighting, temperature, and camera alignment, which are commonly encountered in real building environments. Additional investigations will consider longer monitoring durations and field deployments to assess long-term robustness and data continuity. While the present study focuses on static and quasi-static loading, future research may also explore adaptations of the framework for low-frequency dynamic response, provided that suitable sampling strategies and synchronization methods are implemented. Further improvements may include the integration of automated quality assessment metrics, adaptive image acquisition strategies, and streamlined data management for large-scale deployments.

Author Contributions

Conceptualization, X.Z.; methodology, A.K.A., X.Z. and W.L.; validation, A.K.A.; investigation, A.K.A., R.L., D.S. and J.M.; data curation, A.K.A., R.L., D.S. and J.M.; writing—original draft preparation, X.Z.; writing—review and editing, W.L. and A.K.A.; supervision, W.L. and X.Z.; project administration, X.Z., D.S., J.M. and W.L.; and funding, W.L. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Project of State Grid Jilin Electric Power Company Limited (2025-16).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Jun Ma and Dong Sun were employed by the company State Grid Jilin Electric Power Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Ju, M.; Dou, Z.; Li, J.-W.; Qiu, X.; Shen, B.; Zhang, D.; Yao, F.-Z.; Gong, W.; Wang, K. Piezoelectric Materials and Sensors for Structural Health Monitoring: Fundamental Aspects, Current Status, and Future Perspectives. Sensors 2023, 23, 543. [Google Scholar] [CrossRef]
Hassan, A.M.; Adel, K.; Elhakeem, A.; Elmasry, M.I.S. Condition Prediction for Existing Educational Facilities Using Artificial Neural Networks and Regression Analysis. Buildings 2022, 12, 1520. [Google Scholar] [CrossRef]
Besiktepe, D.; Ozbek, M.E.; Atadero, R.A. Condition Assessment Framework for Facility Management Based on Fuzzy Sets Theory. Buildings 2021, 11, 156. [Google Scholar] [CrossRef]
Ren, P.; Zhou, Z. Two-Step Approach to Processing Raw Strain Monitoring Data for Damage Detection of Structures under Operational Conditions. Sensors 2021, 21, 6887. [Google Scholar] [CrossRef] [PubMed]
Glisic, B. Concise Historic Overview of Strain Sensors Used in the Monitoring of Civil Structures: The First One Hundred Years. Sensors 2022, 22, 2397. [Google Scholar] [CrossRef] [PubMed]
Singh, M.P.; Elbadawy, M.Z.; Bisht, S.S. Dynamic Strain Response Measurement-based Damage Identification in Structural Frames. Struct. Control Health Monit. 2018, 25, e2181. [Google Scholar] [CrossRef]
Kang, X.; Zhu, B.; Cai, Y.; Xiao, Y.; Liu, N.; Guo, Z.; Wang, Q.-A.; Luo, Y. A Concise Review of State-of-the-Art Sensing Technologies for Bridge Structural Health Monitoring. Sensors 2025, 25, 5460. [Google Scholar] [CrossRef]
Mardanshahi, A.; Sreekumar, A.; Yang, X.; Barman, S.K.; Chronopoulos, D. Sensing Techniques for Structural Health Monitoring: A State-of-the-Art Review on Performance Criteria and New-Generation Technologies. Sensors 2025, 25, 1424. [Google Scholar] [CrossRef]
Liu, G.; Wang, Q.-A.; Jiao, G.; Dang, P.; Nie, G.; Liu, Z.; Sun, J. Review of Wireless RFID Strain Sensing Technology in Structural Health Monitoring. Sensors 2023, 23, 6925. [Google Scholar] [CrossRef]
Díaz, C.; Leitão, C.; Marques, C.; Domingues, M.; Alberto, N.; Pontes, M.; Frizera, A.; Ribeiro, M.; André, P.; Antunes, P. Low-Cost Interrogation Technique for Dynamic Measurements with FBG-Based Devices. Sensors 2017, 17, 2414. [Google Scholar] [CrossRef]
Yassin, M.H.; Farhat, M.H.; Soleimanpour, R.; Nahas, M. Fiber Bragg Grating (FBG)-Based Sensors: A Review of Technology and Recent Applications in Structural Health Monitoring (SHM) of Civil Engineering Structures. Discov. Civ. Eng. 2024, 1, 151. [Google Scholar] [CrossRef]
Nuzzo, F.D.; Brunelli, D.; Polonelli, T.; Benini, L. Structural Health Monitoring System with Narrowband IoT and MEMS Sensors. IEEE Sens. J. 2021, 21, 16371–16380. [Google Scholar] [CrossRef]
Duobiene, S.; Ratautas, K.; Trusovas, R.; Ragulis, P.; Šlekas, G.; Simniškis, R.; Račiukaitis, G. Development of Wireless Sensor Network for Environment Monitoring and Its Implementation Using SSAIL Technology. Sensors 2022, 22, 5343. [Google Scholar] [CrossRef]
Malekmohammadi, A.; Farzadnia, N.; Hajrasouliha, A.; Lyn Mayer, A. Sensing Systems in Construction and the Built Environment: Review, Prospective, and Challenges. Sensors 2023, 23, 9632. [Google Scholar] [CrossRef]
Lin, H.; Xu, Z.; Hong, W.; Yang, Z.; Wang, Y.; Li, B. Long-Gauge Fiber Optic Sensors: Strain Measurement Comparison for Reinforced Concrete Columns. Sensors 2025, 25, 220. [Google Scholar] [CrossRef]
Mustapha, S.; Lu, Y.; Ng, C.-T.; Malinowski, P. Sensor Networks for Structures Health Monitoring: Placement, Implementations, and Challenges—A Review. Vibration 2021, 4, 551–585. [Google Scholar] [CrossRef]
Kim, J.-W.; Choi, H.-W.; Kim, S.-K.; Na, W.S. Review of Image-Processing-Based Technology for Structural Health Monitoring of Civil Infrastructures. J. Imaging 2024, 10, 93. [Google Scholar] [CrossRef] [PubMed]
Feng, D.; Feng, M.Q. Computer Vision for SHM of Civil Infrastructure: From Dynamic Response Measurement to Damage Detection—A Review. Eng. Struct. 2018, 156, 105–117. [Google Scholar] [CrossRef]
Peng, Z.; Li, J.; Hao, H.; Zhong, Y. Smart Structural Health Monitoring Using Computer Vision and Edge Computing. Eng. Struct. 2024, 319, 118809. [Google Scholar] [CrossRef]
Wu, P.; Li, W.; Zhao, X. Displacement Sensing Based on Microscopic Vision with High Resolution and Large Measuring Range. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 2840–2858. [Google Scholar] [CrossRef]
Cardoso, S.M.; Ribeiro, M.M.; Silva, D.S.; Junio, R.F.P.; Monteiro, S.N.; Da Rodrigues, J.S. Increased Virtual Resolution for Sub-Pixel Displacement Algorithm Optimization in Digital Image Correlation for AISI 1020 Steel. J. Mater. Res. Technol. 2024, 33, 4206–4214. [Google Scholar] [CrossRef]
Liu, G.; Li, M.; Zhang, W.; Gu, J. Subpixel Matching Using Double-Precision Gradient-Based Method for Digital Image Correlation. Sensors 2021, 21, 3140. [Google Scholar] [CrossRef] [PubMed]
Liu, T.; Lei, Y.; Mao, Y. Computer Vision-Based Structural Displacement Monitoring and Modal Identification with Subpixel Localization Refinement. Adv. Civ. Eng. 2022, 2022, 5444101. [Google Scholar] [CrossRef]
Zhao, C.; Bai, B.; Liang, L.; Cheng, Z.; Chen, X.; Li, W.; Zhao, X. Design and Verification of a Novel Structural Strain Measuring Method Based on Template Matching and Microscopic Vision. Buildings 2023, 13, 2395. [Google Scholar] [CrossRef]
Garcia-Perez, A.; Miñón, R.; Torre-Bastida, A.I.; Zulueta-Guerrero, E. Analysing Edge Computing Devices for the Deployment of Embedded AI. Sensors 2023, 23, 9495. [Google Scholar] [CrossRef] [PubMed]
Biglari, A.; Tang, W. A Review of Embedded Machine Learning Based on Hardware, Application, and Sensing Scheme. Sensors 2023, 23, 2131. [Google Scholar] [CrossRef]
Iqbal, U.; Davies, T.; Perez, P. A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision. Sensors 2024, 24, 4830. [Google Scholar] [CrossRef]
Bai, B.; Lu, B.; Wen, Z.; Yuan, H.; Li, W.; Zhao, X. Development of a Low-cost Microscopic Vision-based Real-time Strain Sensor Using Raspberry Pi. Comput.-Aided Civ. Infrastruct. Eng. 2025, 40, 1886–1905. [Google Scholar] [CrossRef]
Song, Z.; Zou, S.; Zhou, W.; Huang, Y.; Shao, L.; Yuan, J.; Gou, X.; Jin, W.; Wang, Z.; Chen, X.; et al. Clinically Applicable Histopathological Diagnosis System for Gastric Cancer Detection Using Deep Learning. Nat. Commun. 2020, 11, 4294. [Google Scholar] [CrossRef]
Kabir, H.; Wu, J.; Dahal, S.; Joo, T.; Garg, N. Automated Estimation of Cementitious Sorptivity via Computer Vision. Nat. Commun. 2024, 15, 9935. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Proposed system architecture.

Figure 2. Optical displacement target configuration used in the MISS-Building system.

Figure 3. Algorithm framework.

Figure 4. Initialization procedure showing bullseye target detection and checkerboard-based scale calibration.

Figure 5. Subpixel localization using quadratic interpolation of the correlation–response curve.

Figure 6. Schematic of LVDT experimental validation setup. (I) LVDT data acquisition, (II) stepper motor controller, (III) microstep driver, (IV) 2.4 GHz portable Wi-Fi module, (V) LVDT, (VI) MISS-Building sensor, (VII) power supply cable, (VIII) personal computer, (IX) strain transfer mechanism, and (X) LVDT magnetic base.

Figure 7. Results of the 24 h zero-drift test under laboratory static conditions: variation over time acquired at 5 min intervals with no external mechanical loading applied.

Figure 8. Strain-sensing range.

Figure 9. Static strain results induced by 15-micron stepper motor-controlled displacement.

Figure 10. Static strain results induced by 25-micron stepper motor-controlled displacement.

Figure 11. Static strain results induced by 50-micron stepper motor-controlled displacement.

Figure 12. Quasi-static strain performance and error results induced by 20-micron stepper motor-controlled displacement.

Figure 13. Quasi-static strain results induced by 45-micron stepper motor-controlled displacement.

Table 1. Proposed System Attributes.

Attribute	Edge Node	Centralized System
Cost	USD 60	Shared with existing infrastructure
Power	1 W battery or USB powered	Existing building power supply network
Computational Demand	None	Based on existing system capability
Maintenance and Update Constraints	Firmware only and can be deployed via the centralized PC	Software algorithms and parameters
Security	minimal attack at the edge sensing node surface	Centralized protection with existing system

Table 2. Hardware Components Specifications.

Component	Specification	Unit Cost (USD)	Function
ESP32-CAM AI-Thinker	ESP32-S, OV5640, Wi-Fi	5	Edge core
OV5640 Sensor	5 MP, adjustable-focus	5	Image Acquisition
MicroSD Card	4 GB, Class 10, industrial	3	Local Buffering
Stainless-Steel Assembly	304 grade, custom fabricated	45	Mechanical platform
LED Panel	5 V, adjustable brightness	1	Illumination
Cabling and Hardware	USB power, M4 fasteners	1	Connection and Mounting
Total Cost		60

Table 3. Static strain error metric comparison table.

Error Metric	15 µm	25 µm	50 µm
RMSE	5.40	4.18	4.94
MAE	1.24 µε	1.31 µε	2.63 µε
Max % Error	8.60	3.54	3.41

Table 4. Quasi-static strain performance error metric comparison table.

Displacement Amplitude	45 µm	20 µm
RMSE	2.5	1.3
MAE	1.8	1.54
Max % Error	5.72	4.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Anim, A.K.; Li, W.; Zhao, X.; Ma, J.; Liu, R.; Sun, D. Design and Laboratory Validation of a Low-Cost Vision-Based Strain Monitoring System Using ESP32-CAM with Centralized Processing. Buildings 2026, 16, 1681. https://doi.org/10.3390/buildings16091681

AMA Style

Anim AK, Li W, Zhao X, Ma J, Liu R, Sun D. Design and Laboratory Validation of a Low-Cost Vision-Based Strain Monitoring System Using ESP32-CAM with Centralized Processing. Buildings. 2026; 16(9):1681. https://doi.org/10.3390/buildings16091681

Chicago/Turabian Style

Anim, Asare Kwaku, Weijie Li, Xuefeng Zhao, Jun Ma, Ronghuan Liu, and Dong Sun. 2026. "Design and Laboratory Validation of a Low-Cost Vision-Based Strain Monitoring System Using ESP32-CAM with Centralized Processing" Buildings 16, no. 9: 1681. https://doi.org/10.3390/buildings16091681

APA Style

Anim, A. K., Li, W., Zhao, X., Ma, J., Liu, R., & Sun, D. (2026). Design and Laboratory Validation of a Low-Cost Vision-Based Strain Monitoring System Using ESP32-CAM with Centralized Processing. Buildings, 16(9), 1681. https://doi.org/10.3390/buildings16091681

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Laboratory Validation of a Low-Cost Vision-Based Strain Monitoring System Using ESP32-CAM with Centralized Processing

Abstract

1. Introduction

2. Materials and Methods

2.1. System Architecture

2.2. Hardware Design

2.2.1. Core Hardware Components

2.2.2. Optical Targets

2.2.3. Usage and Energy Efficiency

2.3. Algorithms and Uncertainty

2.3.1. Algorithm Overview

2.3.2. Image Acquisition and Compression Effects

2.3.3. Initialization and Scale Calibration

2.3.4. Displacement Tracking

2.3.5. Subpixel Localization

2.3.6. Scale Recalibration

2.3.7. Displacement and Strain Computation

3. Laboratory Performance Evaluation

3.1. Experimental Details

3.2. Zero-Drift Test

3.3. Sensing Range

3.4. Static and Quasi-Static Performance

4. Discussion

5. Limitations

6. Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI