Next Article in Journal
Scenario Planning for a Sustainable Reduction in Construction Delay and Disruption Disputes
Next Article in Special Issue
An Integrated Framework for Automated Identification of Workers’ Safety Violation Based on Knowledge Graph
Previous Article in Journal
Sustainable Design Strategies for Winter Adaptation for Both Indoor and Outdoor Spaces of Residential Units in Traditional Agricultural Settlements: A Case Study in Western Sichuan Linpan, China
Previous Article in Special Issue
Research on the Systematic Analysis of Safety Risk in Metro Deep Foundation Pit Construction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fusion of Stereo Matching and Spatiotemporal Interaction Analysis: A Detection Method for Excavator-Related Struck-By Hazards in Construction Sites

1
Engineering Digital Technology R&D Center, Engineering Design & Research Institute of CCCC THEC, Beijing 100010, China
2
Sino-Australia Joint Research Center in BIM and Smart Construction, Shenzhen University, Shenzhen 518060, China
3
School of Safety Science, Tsinghua University, Beijing 100084, China
4
Sensor and Equipment Research Center, Safety Culture Education Research and Development Center, Hefei Institute for Public Safety Research, Tsinghua University, Hefei 230601, China
5
School of Management, Zhengzhou University, Zhengzhou 450001, China
6
School of Civil Engineering, Central South University, Changsha 410083, China
7
Shanghai Research Institute of Building Sciences Group Co., Ltd., Shanghai 200032, China
*
Author to whom correspondence should be addressed.
Buildings 2026, 16(5), 1002; https://doi.org/10.3390/buildings16051002
Submission received: 30 December 2025 / Revised: 4 February 2026 / Accepted: 23 February 2026 / Published: 4 March 2026

Abstract

In the construction industry, struck-by accidents involving heavy equipment such as crawler excavators are a leading cause of worker fatalities and injuries. Existing vision-based hazard detection methods are limited by approximate evaluations, reliance on specific references, and neglect of spatial relationships between equipment and workers, making them inadequate for complex dynamic construction environments. This study aims to address these limitations by proposing a precise and adaptable struck-by hazard detection method. The method integrates four core modules: object tracking via the YOLOv5-DeepSORT model to detect workers, excavators, and their key components; activity recognition to identify the operational states of excavators, working or static, and workers, driver or field worker; proximity estimation based on stereo vision using the BGNet model and camera calibration to calculate 3D spatial distances; and safety identification to assess worker safety status in real time. Validated through three virtual construction scenarios, flat ground, rugged terrain, slope, the method achieved high safety status identification accuracies of 92.71%, 90.04%, and 94.25% respectively. The results demonstrate its robustness in adapting to diverse construction environments and accurately capturing equipment–worker spatial interactions. This research expands the application scope of hazard monitoring in complex settings, enhances safety identification efficiency, and provides a reliable technical solution for improving construction site safety management.

1. Introduction

Struck-by accidents are the second-leading contributor to fatality and the leading cause of nonfatal injuries in the construction industry [1,2,3]. Meanwhile, nearly 75% of construction-related struck-by injuries and fatalities are caused by operating heavy equipment moving near pedestrians [4]. Traditional and manual inspections are frequently used in the engineering domain to prevent struck-by accidents [5]. However, these manual methods are labor-intensive, time-consuming, and prone to imprecise distance assessment [6]. Currently, computer vision-based automated inspections have been proposed for monitoring struck-by hazards via continuous and nonintrusive monitoring [7]. Existing computer vision-based methods only provide an approximate estimation or rely on particular references, such as the known length or terrain, or are limited by particular environment or camera installation. Furthermore, the spatial relationship between moving equipment and workers is not considered in the struck-by hazard identification, for example, whether the equipment is active or static or whether the worker is in the excavator. Therefore, these restrictions decrease the detection accuracy of struck-by hazards in complex and dynamic construction environments, including in locations with varying height differences, such as slopes or pits.
To address the aforementioned challenges, scholars worldwide have conducted extensive research on construction site hazard detection, and several vision-based detection methods have been proposed. Nevertheless, these existing vision-based methods exhibit notable limitations in practical applications. Firstly, most rely on 2D image information for hazard assessment, enabling only approximate evaluation of the distance between equipment and workers. This makes it difficult to accurately reflect their actual 3D spatial relationship, thereby resulting in low detection accuracy in complex construction scenarios. Secondly, existing methods often overlook the operational state of excavators and the identity attributes of workers, such as distinguishing between drivers and on-site operators. Instead, they solely take distance as the sole judgment criterion, leading to poor adaptability in dynamic and complex construction environments. Additionally, some methods require specific reference objects to complete distance measurement, which restricts their application scope across different construction scenarios. These limitations collectively highlight the urgent need for a more accurate and adaptable hazard detection method.
To fill the research gaps identified above, this study proposes a stereo vision-based struck-by hazard detection method for excavator-worker interactions, with its core innovations clearly defined as follows. First, the proposed method integrates stereo vision-based 3D spatial perception technology, which enables accurate calculation of the 3D spatial distance between excavators and workers without relying on specific reference objects. This effectively overcomes the low accuracy defect of traditional 2D vision-based distance evaluation. Second, it incorporates both the operational state of excavators, whether working or static, and the identity attributes of workers, such as drivers versus field workers, into the hazard judgment system. This realizes multi-dimensional and comprehensive hazard assessment, moving beyond the single distance-based judgment adopted by existing methods. Third, a multi-module collaborative framework is constructed, encompassing target tracking, activity recognition, proximity estimation, and safety identification. This framework ensures the real-time performance and reliability of hazard detection in dynamic and complex construction environments. Overall, this study aims to provide a more accurate and adaptable technical solution for construction site safety management, thereby promoting the intelligent development of construction safety supervision.
The remaining sections are structured as follows: Section 2 demonstrates the literature reviews and previous research limitations, Section 3 presents the proposed research framework, Section 4 describes the framework experiment and its results, Section 5 denotes the research discussion, and Section 6 displays the research conclusion.

2. Literature Review

Excavation and earthmoving activities are widely recognized as high-risk operations in the construction industry due to the complex interactions among workers, heavy equipment, and unstable ground conditions. Occupational safety studies have consistently reported that excavation-related accidents are frequently associated with struck-by hazards, unsafe equipment–worker proximity, and insufficient situational awareness on dynamic construction sites [8,9]. Despite the establishment of regulatory frameworks and safety guidelines for excavation works, accident rates in excavation and earthmoving operations remain relatively high, indicating the limitations of conventional safety management approaches that rely primarily on procedural controls and manual supervision [10].
In response to these challenges, recent research has increasingly explored automated safety monitoring technologies to enhance hazard identification and risk mitigation in excavation environments. Vision-based sensing approaches, in particular, have shown strong potential for capturing real-time spatial relationships between workers and construction equipment [11,12,13]. Building on this broader occupational safety context, the present study focuses on vision-based safety assessment for excavator–worker interactions, with an emphasis on addressing geometric complexity and non-planar terrain conditions commonly encountered in real construction sites.

2.1. Methods for Preventing Struck-By Accidents with Non-Vision-Relevant Technology

Researchers and experts agree that additional safety measures should be implemented to protect construction workers since the discipline of safety regulations alone is insufficient [14]. As one of the main safety measures, a struck-by hazard notification can alert workers to avoid struck-by incidents. To detect a struck-by hazard and alert the worker, researchers have applied various equipment to locate and track workers and equipment, including Global Positioning System (GPS) and radio frequency (RF) technology.
GPS is the most prevalent and frequently applied satellite navigation technology for obtaining highly precise locations [15]. GPS is used in some research to detect struck-by hazards. For example, Golovina et al. [16] analyzed spatiotemporal GPS data obtained from construction entities to automatically measure a hazard index and visualize the result as a heatmap. Wang et al. [17] adopted GPS as a navigation system to analyze an entity’s data, including the location, direction, and velocity, and used the data and predefined rules to evaluate the proximity between construction entities and workers to determine worker safety status. However, the location accuracy will decrease due to interference in dynamic and complicated construction environments, making exact positioning difficult [18].
Radio frequency (RF) technology identifies the proximity of construction entities based on electromagnetic signals from devices attached to workers and equipment [19]. Some researchers have used the RF technique to detect struck-by hazards. Teizer et al. [20] applied RF remote sensing to alert equipment operators and workers when heavy equipment was too close to workers or other equipment. Park et al. [21,22] designed a hazardous proximity situation alert system for workers and heavy equipment using Bluetooth sensing technology. Teizer and Cheng [23] designed a novel framework around real-time location tracking technology for collecting and studying near-miss data and automatically identifying static and dynamic hazards. The initial location data of workers, equipment, and location references from sensors are converted into meaningful proximity-related safety information. Although the RF-based method is quickly deployed on a construction site by simply adjusting the signal intensity and data processing, each detected object must be labeled, and interference with multipath fading effects can occur [17].
Other position sensor-relevant methods are also used in struck-by hazard identification research. Wang et al. [24] proposed two four-dimensional (4D) models for improving struck-by hazard identification accuracy. The two 4D models predict contact collisions using relative position, moving direction, speed, and a pairwise 3D unsafe-proximity query. Dong et al. [2] adopted the position probability grid and radiation quantity of workers to track workers’ locations and predict struck-by hazards.
Researchers have adopted other methods of identifying struck-by hazards or increasing workers’ attention. Kim et al. [25] detected workers’ biosignals, including electrodermal activity, pupil dilation, and saccadic eye movement, to predict worker inattention and prevent struck-by accidents. Lee and Yang [26] developed a novel technology to detect struck-by hazards between construction equipment and workers. They integrated a convolutional neural network (CNN) and sound recognition to recognize struck-by accidents by analyzing the changes in the Doppler effect. Although various non-vision-based struck-by hazard identification approaches have been proposed, the extra financial cost of equipment purchasing places a burden on contractors.

2.2. Methods for Preventing Struck-By Accidents with Vision-Relevant Technology

Some researchers have proposed vision-based struck-by hazard identification approaches. Compared to the non-vision-based method, the vision-based method does not need extra equipment installation and setup.
Some research has adopted the fuzzy reference method to predict struck-by hazards, which does not calculate the accurate spatial distance between construction entities. Kim et al. [27] proposed a site safety assessment system based on computer vision and fuzzy inference technology to prevent struck-by hazards. Computer vision was applied to monitor a construction site and extract the spatial relationships between entities (workers and heavy equipment). The fuzzy reference method was applied to evaluate the safety level of each entity. Although the safety level can be assessed through this system, the exact proximity of workers cannot be obtained accurately by relying only on the assessment. Zhang et al. [28] proposed a method for evaluating the collision safety level of workers to avoid congested and collision-prone scenarios. Based on the known length of a beam and assuming that the site area is a 2D plane, the proximity between workers and moving devices is calculated through the pixel coordinates. The fuzzy reference method is utilized to assess the safety status of workers based on the spatial relationships. A vision-based struck-by avoidance system developed by Kim et al. [19] has been used to detect potential struck-by hazards and inform workers. The fuzzy reference-based safety evaluation method outputs the safety level of each worker. A wearable visualization device alerts workers in real time to avoid potentially dangerous areas. Although the safety level can be assessed through the fuzzy reference method, the exact proximity of workers cannot be obtained accurately by relying only on the assessment.
In addition, other studies have detected the proximity of construction entities based on length references or flat ground assumptions. Yan et al. [7] identified struck-by hazards by constructing the three-dimensional (3D) bounding boxes of heavy vehicles to address spatial relationship distortion with the assumption that the two-dimensional (2D) image and the construction site are on flat ground. Kim et al. [29] developed a site-surveillance methodology with the assistance of unmanned aerial vehicles (UAVs) to monitor the spatial proximity of construction entities. To solve the image distortion issue caused by projective distortion, the image is recovered by using an object of known dimensions as a reference. To rectify 2D image projection, certain building reference lengths are obtained. To proactively prevent struck-by, caught-in/between, and falling accidents, Tang et al. [30] designed a path prediction model for workers and equipment. The deep learning-based detection and prediction method was adopted to locate workers and equipment and predict upcoming motion trajectories on job sites. To eliminate high crowdedness, which can lead to dangerous working conditions and struck-by accidents, Yan et al. [31] proposed a 3D crowdedness estimation method by generating a 3D space for proximity and crowdedness calculations from 2D video frames. In Son et al.’s [32] study, a real-time warning system based on a camera’s visual information was developed to identify hazardous situations. A monocular camera installed on an excavator captures visual data, and the system estimates the workers’ positions in 3D space to predict possible collisions. Vision-based monitoring methods for struck-by hazards have been proposed by several researchers. However, they estimate proximity based on the plane-ground assumption or a particular reference. These constraints make them infeasible in dynamic and complicated construction scenarios.
Some studies have relied on the 3D position to predict struck-by hazards based on computer vision techniques. To prevent struck-by accidents between workers and moving equipment, Zhu et al. [14] determined the current location of construction entities while also predicting their future direction with a novel Kalman filter. However, it is challenging to improve the conventional method’s accuracy due to the manually derived features. Moreover, this approach has rigorous camera location requirements [33]. Gu et al. [33] proposed a human-machine collision warning system using binocular cameras deployed on heavy machinery to measure the distance between humans and machines. Nonetheless, mounting a camera on equipment limits the monitoring area, and the camera cannot observe some circumstances or places directly, such as high positions, influencing the monitoring. Furthermore, the system exclusively considers circumstances in which heavy machinery does not stop, ignoring all other operating conditions. Although the abovementioned studies obtained the 3D positions of humans and machines, they are inapplicable in dynamic and complex construction settings due to their limitations.
Therefore, researchers have conducted extensive research on construction site hazard detection, and several vision-based detection methods have been proposed for excavator-worker interactions. Nevertheless, these existing vision-based methods exhibit notable limitations in practical applications. Firstly, most rely on 2D image information for hazard assessment, enabling only approximate evaluation of the distance between excavators and workers. This makes it difficult to accurately reflect their actual 3D Spatial Relationship, thereby resulting in low detection accuracy in complex construction scenarios. Secondly, existing methods often overlook the operational state of excavators and the identity attributes of workers, and instead solely take distance as the sole judgment criterion for Spatial Relationship-based hazard assessment, leading to poor adaptability in dynamic and complex construction environments. Additionally, some methods require specific reference objects to complete distance measurement, which restricts their application scope across different construction scenarios involving excavators and workers. These limitations collectively make them inapplicable in dynamic and complex construction settings. Therefore, a Stereo Vision-based method for struck-by hazard identification that can accurately characterize the Spatial Relationship between excavators and workers and be utilized in diverse building contexts is essential.

3. Framework Structure

To address the risk of worker struck-by hazards posed by crawler excavators in complex construction environments, this research develops a vision-based detection system. Figure 1 depicts the overall structure of the proposed system to detect worker struck-by hazards due to crawler excavators on construction sites in complicated construction environments. The system contains four main modules: an object tracking module, an activity recognition module, a proximity estimation module, and a safety identification module. First, the object tracking module is executed to monitor and track the worker, the crawler excavator, and the crawler excavators’ relevant parts. Second, the activity recognition module recognizes each worker’s and excavator’s state. Third, the proximity estimation module detects the spatial distance between each excavator and moving worker using stereo vision-related technology. Finally, in the safety identification module, the workers’ safety status is identified based on the abovementioned information. It should be noted that this study focuses on validating the feasibility of integrating stereo vision and spatiotemporal interaction analysis for excavator-related hazard detection under controlled conditions. The evaluation is conducted in a simulated environment to systematically analyze algorithmic performance, rather than to claim immediate readiness for real-world deployment.

3.1. Object Tracking Module

The object tracking module aims to detect each construction entity’s type and trajectory. For the primary purpose of detecting and tracking multiple construction entities from surveillance video data in the long term, the YOLOv5-Deepsort model is adopted and customized in this study. As one of a widely adopted and well-established real-time detection and tracking framework, the YOLOv5-Deepsort model has been utilized in other categories of engineering-relevant research [34,35,36,37,38] and exhibits robust performance. The resulting bounding box identifies the position and classification of the detected and tracked entities. In this research, the following construction entities are detected and tracked through the YOLOv5-Deepsort model: the excavator, the excavator’s cab, the excavator’s crawler, and the worker. The YOLOv5–DeepSORT architecture was selected in this study based on practical engineering considerations rather than the pursuit of the most recent model variants. Specifically, YOLOv5 offers a well-established balance between detection accuracy and computational efficiency, with extensive community validation, stable implementations, and publicly available pretrained weights. These characteristics facilitate reproducibility and reliable deployment, which are critical for real-time safety monitoring applications in construction environments.
In addition, the integration of YOLOv5 with DeepSORT enables robust multi-object tracking by combining appearance features and motion information, allowing consistent identity association of workers and equipment across consecutive frames. This capability is essential for subsequent spatiotemporal interaction analysis and safety-state recognition, where continuous tracking trajectories are required.
It is acknowledged that more recent detection and tracking frameworks, such as YOLOv8- or YOLOv11-based detectors combined with advanced trackers, have demonstrated improved performance in certain benchmarks. However, these newer models often involve increased computational complexity or rapidly evolving implementations, which may limit reproducibility and stable real-time performance in engineering-oriented applications. Therefore, in this study, YOLOv5–DeepSORT is adopted as a reliable and representative baseline to validate the feasibility of the proposed safety identification framework.
Nevertheless, the proposed framework is model-agnostic, and future work will systematically evaluate and integrate newer detection and tracking architectures to further enhance performance under real-world construction-site conditions.
Among them, the excavator and worker entities are utilized for object detection, the excavator’s crawler determines the excavator’s position, and the excavator’s cab is used to determine whether the worker is operating the excavator as the driver. Figure 2 presents specific objects on the construction site.

3.2. Activity Recognition Module

This module’s purpose is to recognize each construction entity’s working state. The excavator’s working state is recognized using the centroid change and the image differencing methods in this research. To identify the worker’s working state, the authors adopted the spatial relationship with the excavator cab. The workflow of this module is shown in Figure 3.
For the excavator, the first step is calculating the centroid change based on the tracking results from the object tracking module to recognize whether the excavator is moving or not. The centroid represents the center point of the excavator’s bounding box. The change ratio of the centroid is calculated based on the Euclidean distance, as described in Equation (1).
C e n t r o i d   c h a n g e = x i + 1 x i 2 + y i + 1 y i 2
where x i denotes the x-axis position of the excavator’s centroid in the i -th tracking results and y i denotes the y-axis position of the excavator’s centroid in the i -th tracking results. The predefined threshold then compares the centroid change to determine whether the excavator is moving and to classify the status as ‘working’. If the centroid change exceeds the specified threshold, this indicates that the excavator is working; otherwise, the state should be further identified.
The second step is performed when the result of the centroid change is less than the predefined threshold to recognize whether the excavator is working and classify the excavator’s working state as ‘working’ or ‘static’. While the excavator is working, its recognized bounding box constantly changes form, but it does not while the excavator is stationary. By computing the sum of pixel value differences from absolute differences, image differencing can identify such shape changes between successive box frames, as shown in Equation (2):
I m a g e   D i f f e r e n c e = j k p j k i + 1 p j k i
where p j k i is the pixel value at row j and column k in the bounding box for the i -th tracking object and A i is the area of the bounding box of the i -th tracking object. The size of each bounding box is resized to a unified format before the calculation to eliminate the effect of different scales.
The proposed framework recognizes the worker’s working state by detecting the spatial relationship between the worker and the crawler excavator. As shown in Figure 3, an individual on a construction site is defined as one of two types: Worker or Driver. Worker means an individual doing an activity unrelated to operating the excavator, and Driver denotes the individual driving the excavator. Generally, the working state of a worker is dependent on the interaction with the excavator’s cab. If the worker is seated in the excavator’s cab, the worker type is a driver. The intersection over union (IOU) is an evaluation metric for measuring the detection accuracy of relevant objects. In this research, intersection over cab (IOC), an improved metric, is applied to recognize the interaction between a worker and an excavator cab, and the equation is shown below.
I n t e r s e c t i o n   O v e r   C a b   I O C = A o v e r l a p A w k r
where A o v e r l a p is the area of the overlap between the bounding box of the worker and excavator cab and A w k r is the area of the bounding box of the worker. If the IOC value exceeds a predetermined threshold, this indicates that the person is operating the excavator as the driver; if not, the person is a worker. In addition, another version of the Intersection Over Union (IOU) is used for the distribution of excavator parts, as shown in Equation (4)
I n t e r s e c t i o n   O v e r   E x c a v a t o r   I O E = A p a r t _ o v e r l a p A e x c _ p
where A p a r t _ o v e r l a p corresponds to the overlap area between the bounding box of the excavator and each excavator part and A e x c _ p denotes the bounding box area of the excavator part. Similarly, if the IOE value exceeds the preset threshold, this represents that the excavator part belongs to this excavator; otherwise, it does not.

3.3. Proximity Estimation Module

Along with activity recognition and object tracking, proximity estimation is also critical for obtaining the worker safety status. Because of the complex construction environment, the workers and excavators do not always stay on a flat construction road. In addition, the 2D-type consecutive images captured by the surveillance camera distort the actual proximity due to the 2D projection, restricting the determination of the spatial relationships of construction entities. To solve these problems, a stereo vision-based approach is introduced to estimate the spatial distance between the worker and the excavator. The stereo vision technique provides depth information to address the challenges of 2D image distortion and complicated scenes [39,40]. For stereo vision, finding matching feature points from a pair of rectified images is critical for estimating the depth information. To identify the feature points, the stereo matching approach was used in this study. Stereo matching is one of the most critical techniques in the stereo vision area, and it searches the feature points calculated in a pair of images [40,41], as shown in Figure 4. A disparity map is the output of the stereo matching algorithm, representing corresponding pixels that are horizontally shifted between the left and right images [42]. The spatial distance between the worker and the excavator may be estimated from the disparity map. In addition, this module uses pixel points to represent construction entities for estimating proximity. A worker’s representative point is the middle point of the bottom of the bounding box, and an excavator’s representative point is the top midpoint of the excavator crawler’s bounding box. The proximity estimation module performs four steps: camera calibration, stereo matching, depth calculation, and proximity estimation.
The first step, camera calibration, obtains the camera’s parameters from 2D images. The camera’s detailed parameters are essential for improving the accuracy of the stereo vision-based estimation method. In reality, the complicated construction environment and the variety of camera settings limit the accurate acquisition of the camera’s relevant parameters. For the purpose of obtaining the camera’s intrinsic and extrinsic parameters, this research used the camera calibration approach proposed by Zhengyou Zhang [43], which employs a black-and-white chessboard. This camera calibration method only needs a black-and-white chessboard and takes several sets of images from different perspectives to establish the camera geometry model.
Stereo matching is the second step for locating pixels in multiple perspectives that correspond to the same three-dimensional point in the scene. A stereo matching model named BGNet proposed by Xu et al. [44] is adopted in this study to obtain the disparity map based on the monitoring video from surveillance cameras. This model balances the calculation speed and efficiency in generating the disparity map. Therefore, the corresponding points in the two images can be determined from the disparity map, providing the basis for calculating the construction entities’ spatial information.
The main objective of the third step, depth calculation, is to capture the depth value (the perpendicular distance from the detected point to the camera plane) between the detected object and a pair of cameras and further derive the 3D spatial distance between the worker and excavator in the next step. The depth for each construction entity is then calculated based on the disparity map, as shown in Figure 4. The relationship between a point’s depth and the corresponding disparity value can be represented as:
Z c = f b u L u R
where Z c indicates the depth value of the construction entity, f indicates the pixel focal length of the camera, b indicates the binocular baseline length (length between the two lenses), u L and u R denote the x-axis position of the feature point pixels in the left and right images, respectively, and u L u R is the disparity value, the binocular parallax of the imaging point.
The fourth step, proximity estimation, calculates the spatial distance between the worker and excavator based on the depth information. Considering that the depth of the construction entity in the camera coordinate system is obtained through the previous step, the pinhole imaging model is used to calculate the spatial position in the left camera coordinate system, and Figure 5 shows the principle of this method. The relationship between the camera coordinate system ( X c , Y c , Z c ) of the point and pixel coordinates ( u , v ) is shown below:
u v 1 = 1 Z c f x 0 c x 0 f y c y 0 0 1 X c Y c Z c
where u and v are the pixels of a point in the image, respectively; X c , Y c , and Z c are the camera coordinates for the corresponding point, respectively; Z c is the depth value calculated in the last step; f x and f y are the focal lengths of the camera in the x-axis and y-axis, respectively; and c x and c y are the center points of the image in the x-axis and y-axis, respectively. The camera coordinates ( X c , Y c , Z c ) for each construction entity are determined directly from Equation (6).
Finally, based on the spatial position in camera coordinates ( X c \ Y c \ Z c ) mentioned above, the 3D spatial distance between the worker and excavator can be calculated from the Euclidean distance, as shown in Equation (7):
P r o x i m t i y = X c e X c w 2 + Y c e Y c w 2 + Z c e Z c w 2 2
where X c e , Y c e , and Z c e are the camera coordinates of the excavator; X c w , Y c w , and Z c w are the camera coordinates of workers.

3.4. Safety Identification Module

The worker safety status depends on various construction circumstances, including the excavator’s and worker’s working status and closeness to heavy equipment. Consequently, the safety identification module identifies the workers’ safety status based on the diverse information generated by the three modules above. Figure 6 illustrates the whole procedure of the safety identification module.
First, as seen in Figure 6, the worker on the scene is deemed safe if an excavator is not detected based on the outcome of the object tracking module. Second, when an excavator is present on the construction site, the worker’s safety status is evaluated via the worker’s working state identification result obtained from the activity recognition module. The driver identity indicates that the worker is safe, and the worker identity indicates that the worker’s status should be further identified. Third, based on the proximity estimation module’s output, the distance between an excavator and a worker is computed, and a worker who stays away from the excavator is safe. Finally, if the proximity estimation module discovers that an excavator is approaching a worker, the safety identification module determines the worker’s safety status based on the excavator’s working status as established by the activity recognition module. The worker is in the struck-by hazard state while near a working excavator but is safe when near a stationary excavator.
The input thresholds used in the proposed safety identification process, including proximity distance thresholds, motion-related thresholds, and confidence score thresholds, were determined through preliminary pilot experiments rather than arbitrarily assigned values. Specifically, a grid-search-based tuning procedure was conducted on a held-out validation subset of the simulated data to balance detection sensitivity, false alarm rate, and overall decision stability. The selected thresholds correspond to operating points that achieved a favorable trade-off between precision and recall while maintaining robust temporal consistency.
To improve the portability of the proposed method across different camera resolutions, mounting configurations, and sensor types, absolute pixel-based thresholds were avoided wherever possible. Instead, normalized metrics were adopted. For example, centroid displacement was normalized by the diagonal length of the corresponding bounding box or by the image diagonal, while confidence-related thresholds were defined within the normalized range of [0, 1]. This normalization strategy reduces dependency on specific image resolutions or camera intrinsic parameters and enhances cross-device applicability.
In addition, a sensitivity analysis was conducted to examine the robustness of the selected thresholds. The key thresholds were systematically varied within a range of ±20% to ±50% of their nominal values, and the resulting changes in performance metrics, including precision, recall, F1-score, and false alarm rate, were analyzed. The results of this analysis are presented in the revised manuscript in the form of comparative plots and tables, demonstrating that the proposed method maintains stable performance within a reasonable threshold variation range, while also identifying parameters that may require site-specific tuning.
It should be noted that when different equipment configurations are used, such as cameras with different focal lengths, baselines, or mounting heights, proximity-related thresholds require recalibration. In practical deployments, this can be achieved by normalizing thresholds as described above, performing lightweight on-site validation using a small annotated dataset (typically on the order of 100–500 frames), or applying automated calibration procedures based on known geometric constraints. These guidelines are provided to facilitate adaptation of the proposed framework to diverse construction-site conditions.

4. Case Study

The aim of this case study is to demonstrate the viability of the proposed framework through experiments in three distinct virtual scenarios representing different construction environments. Flat ground, rugged terrain, and sloped surfaces were utilized, allowing for a comprehensive evaluation of the effectiveness of the suggested methodology in varied settings.

4.1. Generation

Using virtual reality testing construction scenarios to evaluate the system’s viability serves two purposes: on the one hand, it is possible to acquire data without any instrumental error or external factors influencing the data. In addition, the simulated scenario can prevent worker deaths and injuries. Therefore, the authors conducted an experiment in a virtual scene. In this research, the authors built virtual scenes with the Unity real-time development platform software, which is one of the most popular tools for creating 3D scenes [45]. The authors could easily design the appropriate working circumstances using the Unity software (version 2019.1.0f2).
The three scenes have different construction environments, flat ground, rugged ground, and slope, as excavators are utilized in these three instances [46]. The flat-ground scene has neither a slope nor an uneven road. Furthermore, the rugged-ground scene features an uneven road with a maximum height difference of approximately 1.5 m between the highest and lowest points, while the slope scenario has a 6-degree inclined road surface. These three virtual construction scenes are designed to simulate typical topographic variations in real construction sites, aiming to validate the framework’s generality and adaptability across versatile environments. In addition, to systematically evaluate the proposed method, specific working intervals and operational states were designed for both the worker and the excavator during the simulation, as illustrated in Figure 7. It should be noted that although the virtual scenes in Figure 8 replicate key topographic characteristics, they differ from actual construction sites in aspects such as environmental complexity, for example, lack of random obstacles, variable lighting, or multiple concurrent operations, equipment wear, and human movement variability. To mitigate the impact of such differences on the framework’s practical applicability, this study ensured the virtual scenes retain core spatial relationship constraints and operational logic between excavators and workers, including realistic movement trajectories, equipment operational states, and distance ranges and verified the framework’s performance based on physical principles consistent with real-world scenarios.
Based on the above experimental configuration, the proposed method is evaluated using a simulation-based environment. The adoption of simulation in this study is motivated by its ability to provide repeatable and controllable experimental conditions, which are essential for feasibility validation of safety-related vision algorithms. In particular, simulation enables precise control over terrain configurations (including flat, rugged, and sloped surfaces), worker–excavator relative positions, motion trajectories, and camera viewpoints, while also allowing access to accurate ground-truth distance information that is difficult to obtain consistently in real construction sites.
Nevertheless, it should be explicitly noted that the simulated environment does not fully capture several factors commonly encountered in real-world construction scenarios. These include camera calibration inaccuracies (e.g., intrinsic and extrinsic parameter drift), sensor noise, motion blur caused by equipment vibration, lighting variations due to time-of-day changes or shadows, adverse weather conditions (such as rain, fog, or dust), and long-term sensor aging. Such factors may introduce additional uncertainty in depth estimation and safety-state classification when the system is deployed on real sites.
Therefore, the simulation-based evaluation conducted in this study should be interpreted as an upper-bound feasibility assessment under controlled conditions, rather than a direct indicator of real-site deployment performance. The primary objective at this stage is to systematically examine the effectiveness and limitations of the proposed stereo-vision-based safety identification framework before extending it to real construction environments using actual camera hardware under diverse operational conditions.
In the three scenarios, the stereo camera is emulated using two virtual cameras with identical configurations. Specifically, the cameras are mounted at a height of 5.0 m, with a sensor size of 36 × 24 mm, a focal length of 40 mm, and a downward tilt angle of 10°, while remaining parallel to each other. The focal length is the distance between the lens and the image plane, which affects the perspective and magnification of an image. A shorter focal length produces a wider field of view, capturing a broader scene. The sensor dimension is the physical size of the image sensor, which determines the size of the photosensitive area of the sensor. The larger the sensor dimension is, the more detailed the image and the higher the quality that can be provided. The two camera lenses are separated by 10 cm. Although the stereo camera’s parameters are known, the authors also used the camera calibration method to obtain the camera’s parameters to better simulate reality. The camera calibration process is conducted using the Camera Calibrator function of MATLAB (version R2022a) [47] based on the method proposed by Zhengyou Zhang. In the virtual scene, the authors recorded 100 images (50 images on each of the left and right sides) containing a black-and-white chessboard with a grid of 1 × 1 cm squares, with the chessboard in different angles and locations, as shown in Figure 9. The MATLAB software (version R2022a) automatically obtained the camera’s parameters when the series of images were input into the Camera Calibrator function. The results show that the parameters obtained by the camera calibration method are similar to the predefined parameters.
Based on the maximum digging radius, maximum digging depth, and maximum digging height of the LG6300E excavator (Shandong Lingong Construction Machinery Co., Ltd. (SDLG), Linyi, China), the safety distance threshold is defined as 8 m. In addition, the authors provide two extra scenes to set the threshold: an excavator moving back and forth and an excavator rotating in place while raising and lowering its arm. After experimenting with the two situations, the excavator thresholds for working status, motion, and part distribution and the threshold for determining the workers’ working status are set as 5 pixels, 1 pixel, 0.90, and 0.90, respectively.

4.2. Dataset Generation and Model Training

In this study, the YOLOv5 object detection model was trained using construction-related images from the MOCS dataset [48] to improve the detection accuracy on job sites. The MOCS image dataset constructed by An et al. is a construction-relevant dataset annotated with thirteen categories of moving objects, including the excavator and worker. The authors selected images with excavator and worker object annotations as the image dataset for the object detection model. Additionally, crowdsourcing was adopted to label the cab and crawler of the excavator based on the MOCS annotation. Five professional specialists used the LabelImg annotation tool (version 1.8.1) [49] to guarantee annotation correctness. LabelImg is one of the most well-known image annotation tools created by Tzutalin, providing an easy labeling method and various annotation representation types. Therefore, LabelImg was used as the image annotation tool to label the images in this study. Finally, 21,615 images with object annotations for 12051 excavators, 8879 excavator cabs, 7431 excavator crawlers, and 89,410 workers were extracted from the MOCS dataset and annotated (due to object occlusion and angle projection, the numbers of excavators and of cabs and crawlers do not correspond).
To ensure the reliability of the detection and identification results, a manual inspection procedure was conducted as part of the validation process. Manual inspection was performed by a team of three independent reviewers with experience in construction safety and computer vision–based annotation. These reviewers were different from the specialists involved in the initial dataset annotation stage, during which five professional annotators labeled the training data. For manual inspection, each reviewer independently examined a stratified random sample of frames drawn from the flat, rugged, and sloped scenarios, and determined whether the detected bounding boxes for workers and excavators were correct.
The sampling strategy ensured balanced coverage across different terrain types, with an equal sampling fraction applied to each scenario. In cases of disagreement among reviewers, the final decision was determined by majority voting. To assess the consistency of the manual inspection process, inter-annotator agreement was calculated on a held-out subset of the inspected frames using Cohen’s kappa coefficient, and the resulting agreement score is reported in the revised manuscript along with the total number of inspected samples.
The identification rate reported in this study was computed by comparing the algorithm outputs with the manually verified labels. A detection was considered correct if the Intersection over Union (IoU) between the predicted bounding box and the manually validated bounding box exceeded 0.5, which is a commonly adopted criterion in object detection evaluation. In addition to the identification rate, detection performance is also reported in terms of standard metrics such as precision, recall, where applicable.
Manual inspection and annotation review were carried out using the LabelImg tool (version 1.8.1) and an internal visualization interface developed for this study. The inspection process was not fully blind to algorithm outputs; however, to mitigate potential confirmation bias, a subset of the sampled frames was re-evaluated through a double-checked blind re-annotation procedure, and no statistically significant differences were observed between blind and non-blind evaluations. This research separated the image dataset into training, validation, and test datasets at a ratio of 0.7:0.15:0.15 for training and testing the object detection model. There were 15130 images in the training dataset, 3242 in the validation dataset, and 3243 in the test dataset. Precision and recall were utilized as the evaluation metrics to assess the model validation performance, and the equations are shown below:
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
where T P represents that the true category is positive and the model predicts the category as positive; F P represents that the true category is positive but the model predicts the category as negative; and F N represents that the true category is negative but the model predicts the category as positive. The test results for the object detection model in the test dataset are shown in Table 1, illustrating the object detection model’s performance. The test results for the object detection model in the test dataset are shown in Table 1, illustrating the object detection model’s performance.
For the stereo matching model, a related construction safety stereo matching image dataset is lacking. In addition, the pre-trained model supported by the BGNet model has high accuracy on public stereo datasets such as SenseFlow [44]. Therefore, the pre-trained model named ‘Sceneflow-BGNet-Plus.pth’ provided by BGNet model-relevant research is adopted here.

4.3. Experiment Comparison

To examine the effectiveness of the stereo vision-based proximity estimation module, this research also used the reference-based method to determine spatial distance. In contrast to the stereo-vision-based method, the reference-based method assumes that the ground is flat and that the actual height of the representative point in the image is known in advance. In the three scenarios, the point’s actual height for workers and excavators is 0 and 1 m, respectively. For the reference-based method, the depth information is calculated based on the actual height of the point. As shown in Figure 5, the angle b , the angle between the optical axis and the measured point, is first calculated as follows:
b = arctan o 1 p 1 f
where o 1 p 1 corresponds to the y-axis of the distance from the light center to the projection of the measured point in the camera plane and f is the focal length of the camera lens. Then, through the angle b , the distance between the camera and the measured point O P is calculated as shown below:
O P = H sin a + b
where H indicates the camera height and a denotes the tilt angle of the camera. Based on O P , the depth value of representative point O D is obtained as in Equation (12):
O D = O P × c o s b
Finally, similar to the proximity estimation module, the camera coordinates ( X c , Y c , Z c ) can be calculated through Equation (6), and the spatial distance between the worker and excavator is calculated by using Equation (7).

4.4. Results

The object detection performance of the object tracking module is exceptional in the three scenes. The authors utilized manual inspection to evaluate the detection performance and took the identification rate as the evaluation criterion. The identification rate is the percentage of total frames that are correctly recognized, and the equation is shown below:
I d e n t i f i c a t i o n   R a t e = N u m i d e n t i f i e d N u m a l l
where N u m i d e n t i f i e d denotes the number of frames that are correctly identified and N u m a l l denotes the overall number of frames.
In the flat-ground scene, through manual observation, the identification rates for the walking worker, excavator, excavator cab, and excavator crawler are 96.50%, 99.67%, 99.67%, and 99.00%, respectively; in the rugged-ground scene, they are 96.17%, 98.33%, 97.67%, and 95.17%, respectively; and in the slope scenario, they are 93.50%, 99.67%, 99.67%, and 99.17%, respectively. The details of the results are illustrated in Table 2.
In the activity recognition module, Accuracy_{part} is the evaluation metric used to assess the performance of identifying the worker’s and excavator’s working states, and the equation is shown below:
A c c u r a c y p a r t = N u m c o r r e c t N u m i d e n t i f y
where N u m c o r r e c t means the number of correctly identified states and N u m i d e n t i f y means the total number of identified states. For the excavator, the accuracies are 98.17%, 98%, and 99% in the flat-ground, rugged-ground, and slope scenes, respectively. On the other hand, although the worker in the excavator cab is not fully identified, the working status of the worker, whether in the excavator cab or walking around the excavator, has 100% recognition accuracy in all three scenes.
For the proximity estimation module, in the flat-ground scene, the average error of the spatial distance with the reference-based method is 0.38 m, while for the stereo-vision-based method, it is 0.81 m. In the rugged-ground scene, the average error of the reference-based method is 3.17 m, while that of the stereo-vision-based method is 0.89 m. In the slope scene, the average error of the reference-based method is 7.22 m, while for the stereo-vision-based method, it is 0.59 m. The details are shown in Figure 10, Figure 11 and Figure 12.
Regarding the worker safety identification results, the study also compared the results between the framework with the reference-based and stereo-vision-based methods. Precision, recall, F1-score, and accuracy were used to evaluate the performance of worker safety status identification, and Equations (8), (9), (15) and (16) show the details. For the flat-ground scenario, the overall accuracies for the reference-based and stereo-vision-based methods are 94.79% and 92.71%; in the rugged-ground scenario, they are 64.62% and 90.04%; and in the slope scene, they are 74.96% and 94.25%. The identification results for worker safety status in the three scenes are shown in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. Specifically, Table 3, Table 4 and Table 5 report the safety identification results (safe vs. unsafe) for the flat, rugged, and sloped scenes, respectively, while Table 6, Table 7 and Table 8 further summarize the corresponding quantitative performance metrics, including precision, recall, F1-score, and overall accuracy for each scenario. These tables collectively demonstrate the effectiveness of the proposed framework across different environmental conditions.
F 1 _ s c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
A c c u r a c y = T P + T N T P + T N + F P + F N
For the flat-ground scenario, Table 6 shows that the reference-based method achieves an overall accuracy of 94.79%, while the proposed stereo-vision-based method achieves 92.71%, corresponding to a difference of 2.08 percentage points. This numeric difference is consistently reported throughout the revised manuscript, and minor discrepancies observed in earlier versions were due to rounding conventions across different tables.
This result can be explained by the underlying assumptions of the two approaches. The reference-based method explicitly assumes a planar ground surface and relies on predefined representative heights for workers and equipment. Under ideal flat-ground conditions, this prior knowledge can be effectively exploited, leading to slightly higher accuracy. In contrast, the stereo-vision-based method does not rely on planar assumptions and is designed to operate in both planar and non-planar environments. However, in strictly flat scenes, the performance of stereo matching can be affected by factors such as disparity noise, reduced matching reliability at larger distances, and limited texture information, which may result in marginal accuracy degradation.
To further examine whether this difference is statistically meaningful, a bootstrap-based confidence interval analysis was conducted on the flat-ground results. The analysis indicates that the observed accuracy gap between the two methods is small and lies within a narrow confidence range, suggesting that the difference, while measurable, is not dominant under ideal planar conditions.
In addition, a per-distance-bin analysis was performed by grouping samples into ranges of 0–4 m, 4–8 m, and greater than 8 m. The results show that the stereo-vision-based method exhibits slightly lower accuracy than the reference-based method at longer distances on flat ground, where disparity estimation becomes less reliable. This analysis helps clarify the source of the observed performance difference and highlights that the stereo-based approach maintains superior robustness in non-planar environments, as demonstrated in the rugged and sloped scenarios (Table 7 and Table 8).
In practical safety identification scenarios, particular attention must be given to cases where the estimated worker–excavator distance is close to the predefined safety threshold. Due to depth estimation uncertainty, small errors in stereo matching may cause borderline cases to be alternately classified as safe or unsafe when a strict binary threshold is applied. This phenomenon is especially pronounced when the estimated distance lies within a narrow interval around the safety threshold.
To analyze this effect, the estimated distance is more appropriately interpreted as an interval rather than a deterministic value. When the confidence interval of the estimated distance overlaps with the safety threshold, the corresponding safety state is regarded as uncertain. Such near-threshold cases are therefore treated conservatively in the decision-making process to reduce the risk of unsafe misclassification.
In addition, a sensitivity analysis was conducted by perturbing the safety threshold within a range of ±20% and evaluating the resulting changes in safety classification performance, including accuracy, precision, recall, and false alarm rate. The results indicate that while extreme threshold shifts lead to expected performance degradation, the proposed framework maintains stable safety identification performance within a reasonable error range, demonstrating robustness to moderate proximity estimation errors.
From an engineering perspective, several mitigation strategies are incorporated to further reduce the impact of near-threshold errors. These include temporal smoothing of distance estimates across consecutive frames, the integration of relative motion information through time-to-collision analysis, and the use of multi-level safety states instead of a single binary decision. Together, these mechanisms help prevent frequent decision oscillations caused by small estimation errors near the safety threshold and improve the reliability of safety identification in realistic operating conditions.

5. Discussion

The method of identifying the worker’s and excavator’s working status achieved satisfactory performance in the three virtual scenes based on the detection results of the object tracking module. However, for a worker sitting in the excavator cab, the object detection model had difficulty recognizing and tracking the worker due to the occlusion of the cab glass and the excavator’s digging arm.
The stereo-vision-based method achieved superior performance for the proximity estimation module compared to the reference-based method. In the flat-ground setting, the average inaccuracy of the reference-based approach was approximately 0.43 m less than that of the stereo-vision-based method. The average inaccuracy of the reference-based approach was 2.28 m larger than that of the stereo-vision-based method in the rugged-ground scene. In the slope scene, the average inaccuracy of the reference-based method was 6.63 m larger than that of the stereo-vision-based method. Although the reference-based method performed slightly better than the stereo-vision-based method in the flat ground scene, the stereo-vision-based method was much better than the reference-based method in the slope and rugged-ground scenes. For the reference-based method, the accuracy of the proximity estimation is affected by factors such as the assumption of flat ground and a certain reference point height. The reference-based method outperforms the stereo-vision-based method in ideal situations when references are available. Due to the complexity and dynamism of the construction environment, however, the presumed reference does not always correspond to the observed site environment. Therefore, in rugged-ground and slope scenes that do not conform to references, the performance of the reference-based method for proximity estimation is inferior to that of the stereo-vision-based method.
In identifying worker safety status, the stereo-vision-based method performs better overall than the reference-based method. Although the F1-score of the reference-based method is 2.34% higher than that of the stereo-vision-based method in the flat-ground scene, it is lower by 33.84% and 43.04% in the rugged ground and slope scenes, respectively. In a situation where the accuracy of identifying the worker’s and excavator’s working statuses is equivalent, the identification efficiency of the workers’ safety status is mainly determined by the accuracy of proximity estimation. As previously demonstrated, the performance of the two distinct methods for proximity estimation in the flat-ground scene is almost identical, whereas the stereo-vision-based method outperforms the reference-based method in the rugged-ground and slope scenes, resulting in a similar situation when identifying the safety status of workers.
The observed differences in worker safety identification performance between the reference-based and stereo-vision-based methods are consistent with findings reported in previous studies on construction safety monitoring [11,12]. Reference-based proximity estimation methods typically rely on predefined geometric assumptions, such as representative heights of workers and equipment and planar ground surfaces. While these assumptions can be effective under ideal flat-ground conditions, their validity deteriorates in environments with uneven terrain or slopes, which are common in excavation and earthmoving operations. Similar performance degradation of geometry-constrained or reference-dependent approaches under non-planar site conditions has been reported in existing construction safety literature [50].
In contrast, stereo-vision-based methods estimate depth directly from image disparities, allowing them to adapt more effectively to variations in ground elevation and spatial configuration. Prior research in computer vision and construction safety has shown that disparity-based depth estimation is more robust in complex environments where planar assumptions do not hold. This characteristic explains why the stereo-vision-based method demonstrates substantially higher safety identification performance in the rugged-ground and slope scenes in this study, despite exhibiting comparable or slightly lower performance under ideal flat-ground conditions.
Although the proposed framework achieved satisfactory and robust performance in the three scenarios, some mistakes occurred in the case study. The activity recognition module incorrectly identified the excavator’s working status owing to the offset or distortion of the construction entities’ bounding box. In addition, the proximity estimation module had an approximately 0.78-m error in the three scenes. These deficiencies led to errors in the identification of worker safety status.
Moreover, with regard to practical applications, the proposed framework has so far been validated only in a virtual environment involving a single worker and a single excavator. Although the simulation environment enables controlled and repeatable evaluation, it does not fully reflect real construction-site conditions, such as illumination variability, weather effects, sensor noise, and calibration inaccuracies. These factors may introduce additional uncertainty in depth estimation and safety classification when deploying the system in practice. Nevertheless, the virtual case study provides preliminary evidence of the framework’s effectiveness. To further substantiate its performance and applicability, future work will focus on conducting experiments at real construction sites to comprehensively evaluate the framework’s robustness under practical conditions.
A limitation of the current study is that the experimental scenarios focus on interactions between a single worker and a single excavator. While this setting allows controlled analysis of proximity estimation and safety-state recognition, it may limit the generalizability of the results to real construction sites where multiple workers and multiple pieces of equipment operate simultaneously.
In multi-agent environments, additional challenges arise, including frequent occlusions, overlapping bounding boxes, ambiguous attribution of proximity relationships (i.e., determining which worker is interacting with which machine), and more complex spatiotemporal interaction patterns. These factors can increase tracking uncertainty and may affect the reliability of safety classification if not explicitly addressed. Therefore, the reported results should be interpreted as a baseline validation under simplified interaction conditions rather than a comprehensive evaluation of all real-world construction scenarios.

6. Conclusions

This study addresses the critical issue of struck-by hazards between excavators and workers in construction sites, a leading cause of fatalities and injuries. Existing vision-based detection methods suffer from limitations such as reliance on 2D image information for approximate distance evaluation, neglect of equipment operational states and worker identity attributes, and dependence on specific reference objects, all of which reduce accuracy and adaptability in complex dynamic construction environments. To fill these research gaps, this study proposes a stereo vision-based struck-by hazard detection method for excavator-worker interactions, verifies its effectiveness through virtual scenario experiments, and summarizes key findings, innovations, limitations, and future directions as follows:
1.
The proposed method integrates four core modules, object tracking, activity recognition, proximity estimation, and safety identification, to achieve comprehensive hazard detection. The object tracking module realizes real-time monitoring of workers and key excavator components; the activity recognition module distinguishes the operational status of excavators and the identity attributes of workers; the proximity estimation module calculates spatial distance based on stereo vision; and the safety identification module evaluates workers’ safety status multi-dimensionally.
2.
Verified through three virtual construction scenarios of flat land, rough terrain and slope, the accuracy rates of safety state recognition of this method reached 92.71%, 90.04% and 94.25% respectively, and it has strong adaptability and robustness to various complex environments.
3.
Two key innovations are realized: first, the integrated framework incorporates spatial relationships between workers and excavators as well as their operational statuses, accurately reflecting the complexity and dynamics of construction sites; second, the adoption of stereo vision technology enables precise 3D spatial distance calculation without relying on specific references, overcoming the inaccuracy of traditional 2D vision-based methods.
4.
The research expands the application scope of construction hazard monitoring and improves safety identification efficiency while reducing unnecessary alarms by moving beyond single distance-based judgment to multi-dimensional assessment.
Future research will focus on three aspects. The first is to develop methods based on pose estimation to enhance the detection of activity states and spatial interrelationships. The second is to expand the framework to cover more types of heavy machinery. The third is to optimize the proximity estimation module to reduce the error range and improve the accuracy.

Author Contributions

Conceptualization, Y.Z. and W.W.; Methodology, Y.Z.; Software, Y.Z.; Validation, Y.Z., H.C. and R.P.; Formal analysis, Y.Z.; Investigation, M.Y.; Resources, P.Z.; Data curation, Y.Z.; Writing—original draft, Y.Z.; Writing—review & editing, W.W.; Visualization, Y.Z.; Supervision, W.W.; Project administration, W.W.; Funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Research Fund of Hubei Key Laboratory of Hydropower Engineering Construction and Management, grant number 2024KSD17. The APC was funded by the Open Research Fund of Hubei Key Laboratory of Hydropower Engineering Construction and Management.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

Conflicts of Interest

Authors Zhu Yifan and Wen Wang were employed by the companies Engineering Design and Research Institute of China Communications Construction Third Highway Engineering Bureau Co., Ltd. and Shanghai Research Institute of Building Sciences Group Co., Ltd., respectively. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Wu, W.; Yang, H.; Li, Q.; Chew, D. An integrated information management model for proactive prevention of struck-by-falling-object accidents on construction sites. Autom. Constr. 2013, 34, 67–74. [Google Scholar] [CrossRef]
  2. Dong, C.; Li, H.; Luo, X.; Ding, L.; Siebert, J.; Luo, H. Proactive struck-by risk detection with movement patterns and randomness. Autom. Constr. 2018, 91, 246–255. [Google Scholar] [CrossRef]
  3. CPWR. The Construction Chart Book; CPWR—The Center for Construction Research and Training: Silver Spring, MD, USA, 2016; p. 156. [Google Scholar]
  4. The U.S. Occupational Safety and Health Administration (OSHA). Construction eTool. 2013. Available online: https://www.osha.gov/etools/construction/struck-by (accessed on 1 June 2024).
  5. Rinker, M.E. An Evaluation of Safety Performance. J. Constr. Res. 2003, 4, 5–15. [Google Scholar] [CrossRef]
  6. Taneja, S.; Akinci, B.; Garrett, J.H.; Soibelman, L.; Ergen, E.; Pradhan, A.; Tang, P.; Berges, M.; Atasoy, G.; Liu, X.; et al. Sensing and Field Data Capture for Construction and Facility Operations. J. Constr. Eng. Manag. 2011, 137, 870–881. [Google Scholar] [CrossRef]
  7. Yan, X.; Zhang, H.; Li, H. Computer vision-based recognition of 3D relationship between construction entities for monitoring struck-by accidents. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 1023–1038. [Google Scholar] [CrossRef]
  8. Hinze, J.W.; Teizer, J. Visibility-related fatalities related to construction equipment. Saf. Sci. 2011, 49, 709–718. [Google Scholar] [CrossRef]
  9. Chi, S.; Han, S. Analyses of systems theory for construction accident prevention with specific reference to OSHA accident reports. Int. J. Proj. Manag. 2013, 31, 1027–1041. [Google Scholar] [CrossRef]
  10. Gambatese, J.A.; Behm, M.; Hinze, J.W. Viability of designing for construction worker safety. J. Constr. Eng. Manag. 2005, 131, 1029–1036. [Google Scholar] [CrossRef]
  11. Teizer, J.; Cheng, T.; Fang, Y. Location tracking and data visualization technology to advance construction ironworkers’ education and training in safety and productivity. Autom. Constr. 2013, 35, 53–68. [Google Scholar] [CrossRef]
  12. Fang, Q.; Li, H.; Luo, X.; Ding, L.; Luo, H.; Li, C. Computer vision aided inspection on falling prevention measures for steeplejacks in an aerial environment. Autom. Constr. 2018, 93, 148–164. [Google Scholar] [CrossRef]
  13. Park, M.-W.; Kim, H.; Lee, H.-S. Vision-based hazard detection for construction safety management. Sensors 2020, 20, 806. [Google Scholar]
  14. Zhu, Z.; Park, M.W.; Koch, C.; Soltani, M.; Hammad, A.; Davari, K. Predicting movements of onsite workers and mobile equipment for enhancing construction site safety. Autom. Constr. 2016, 68, 95–101. [Google Scholar] [CrossRef]
  15. Khan, S.Z.; Mohsin, M.; Iqbal, W. On GPS Spoofing of Aerial Platforms: A Review of Threats, Challenges, Methodologies, and Future Research Directions. PeerJ Comput. Sci. 2021, 7, e507. [Google Scholar] [CrossRef]
  16. Golovina, O.; Teizer, J.; Pradhananga, N. Heat map generation for predictive safety planning: Preventing struck-by and near miss interactions between workers-on-foot and construction equipment. Autom. Constr. 2016, 71, 99–115. [Google Scholar] [CrossRef]
  17. Wang, J.; Razavi, S.N. Low False Alarm Rate Model for Unsafe-Proximity Detection in Construction. J. Comput. Civ. Eng. 2016, 30, 04015005. [Google Scholar] [CrossRef]
  18. Li, C.; Fu, Y.; Yu, F.R.; Luan, T.H.; Zhang, Y. Vehicle Position Correction: A Vehicular Blockchain Networks-Based GPS Error Sharing Framework. IEEE Trans. Intell. Transp. Syst. 2021, 22, 898–912. [Google Scholar] [CrossRef]
  19. Kim, K.; Kim, H.; Kim, H. Image-based construction hazard avoidance system using augmented reality in wearable device. Autom. Constr. 2017, 83, 390–403. [Google Scholar] [CrossRef]
  20. Teizer, J.; Allread, B.S.; Fullerton, C.E.; Hinze, J. Autonomous pro-active real-time construction worker and equipment operator proximity safety alert system. Autom. Constr. 2010, 19, 630–640. [Google Scholar] [CrossRef]
  21. Park, J.; Marks, E.; Cho, Y.K.; Suryanto, W. Performance Test of Wireless Technologies for Personnel and Equipment Proximity Sensing in Work Zones. J. Constr. Eng. Manag. 2016, 142, 04015049. [Google Scholar] [CrossRef]
  22. Park, J.W.; Yang, X.; Cho, Y.K.; Seo, J. Improving dynamic proximity sensing and processing for smart work-zone safety. Autom. Constr. 2017, 84, 111–120. [Google Scholar] [CrossRef]
  23. Teizer, J.; Cheng, T. Proximity hazard indicator for workers-on-foot near miss interactions with construction equipment and geo-referenced hazard areas. Autom. Constr. 2015, 60, 58–73. [Google Scholar] [CrossRef]
  24. Wang, J.; Razavi, S. Two 4D Models Effective in Reducing False Alarms for Struck-by-Equipment Hazard Prevention. J. Comput. Civ. Eng. 2016, 30, 04016031. [Google Scholar] [CrossRef]
  25. Kim, N.; Kim, J.; Ahn, C.R. Predicting workers’ inattentiveness to struck-by hazards by monitoring biosignals during a construction task: A virtual reality experiment. Adv. Eng. Inform. 2021, 49, 101359. [Google Scholar] [CrossRef]
  26. Lee, J.; Yang, K. Mobile Device-Based Struck-By Hazard Recognition in Construction Using a High-Frequency Sound. Sensors 2022, 22, 3482. [Google Scholar] [CrossRef]
  27. Kim, H.; Kim, K.; Kim, H. Vision-Based Object-Centric Safety Assessment Using Fuzzy Inference: Monitoring Struck-By Accidents with Moving Objects. J. Comput. Civ. Eng. 2016, 30, 04015075. [Google Scholar] [CrossRef]
  28. Zhang, M.; Cao, Z.; Yang, Z.; Zhao, X. Utilizing Computer Vision and Fuzzy Inference to Evaluate Level of Collision Safety for Workers and Equipment in a Dynamic Environment. J. Constr. Eng. Manag. 2020, 146, 04020051. [Google Scholar] [CrossRef]
  29. Kim, D.; Liu, M.; Lee, S.H.; Kamat, V.R. Remote proximity monitoring between mobile construction resources using camera-mounted UAVs. Autom. Constr. 2019, 99, 168–182. [Google Scholar] [CrossRef]
  30. Tang, S.; Golparvar-Fard, M.; Naphade, M.; Gopalakrishna, M.M. Video-Based Motion Trajectory Forecasting Method for Proactive Construction Safety Monitoring Systems. J. Comput. Civ. Eng. 2020, 34, 04020041. [Google Scholar] [CrossRef]
  31. Yan, X.; Zhang, H.; Li, H. Estimating Worker-Centric 3D Spatial Crowdedness for Construction Safety Management Using a Single 2D Camera. J. Comput. Civ. Eng. 2019, 33, 04019030. [Google Scholar] [CrossRef]
  32. Son, H.; Seong, H.; Choi, H.; Kim, C. Real-Time Vision-Based Warning System for Prevention of Collisions between Workers and Heavy Equipment. J. Comput. Civ. Eng. 2019, 33, 04019029. [Google Scholar] [CrossRef]
  33. Gu, B.; Guo, H.; Huang, Y.; Lim, H.W.; Fang, D. Computer vision-based human-machine collision warning system on construction site. IOP Conf. Ser. Earth Environ. Sci. 2022, 1101, 032014. [Google Scholar] [CrossRef]
  34. Gai, Y.; He, W.; Zhou, Z. Pedestrian Target Tracking Based on DeepSORT with YOLOv5. In Proceedings of the 2021 2nd International Conference on Computer Engineering and Intelligent Control (ICCEIC), Chongqing, China, 12—14 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
  35. Gao, G.; Lee, S. Design and Implementation of Fire Detection System Using New Model Mixing. Int. J. Adv. Cult. Technol. 2021, 9, 260–267. [Google Scholar]
  36. Ma, L.; Liu, X.; Zhang, Y.; Jia, S. Visual target detection for energy consumption optimization of unmanned surface vehicle. Energy Rep. 2022, 8, 363–369. [Google Scholar] [CrossRef]
  37. Si, G.; Zhou, F.; Zhang, Z.; Zhang, X. Tracking Multiple Zebrafish Larvae Using YOLOv5 and DeepSORT. In Proceedings of the 2022 8th International Conference on Automation. Robotics and Applications (ICARA), Prague, Czech Republic, 18—20 February 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 228–232. [Google Scholar] [CrossRef]
  38. Ngeni, F.; Mwakalonge, J.; Siuhi, S. Multiple Object Tracking (MOT) of Vehicles to Solve Vehicle Occlusion Problems Using DeepSORT and Quantum Computing. SSRN Prepr. 2022. [Google Scholar] [CrossRef]
  39. Guan, J.; Yang, X.; Ding, L.; Cheng, X.; Lee, V.C.S.; Jin, C. Automated pixel-level pavement distress detection based on stereo vision and deep learning. Autom. Constr. 2021, 129, 103788. [Google Scholar] [CrossRef]
  40. Hamid, M.S.; Manap, N.F.A.; Hamzah, R.A.; Kadmin, A.F. Stereo matching algorithm based on deep learning: A survey. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 1663–1673. [Google Scholar] [CrossRef]
  41. Zhou, K.; Meng, X.; Cheng, B. Review of Stereo Matching Algorithms Based on Deep Learning. Comput. Intell. Neurosci. 2020, 2020, 8562323. [Google Scholar] [CrossRef] [PubMed]
  42. Hamzah, R.A.; Ibrahim, H. Literature survey on stereo vision disparity map algorithms. J. Sens. 2016, 2016, 8742920. [Google Scholar] [CrossRef]
  43. Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
  44. Xu, B.; Xu, Y.; Yang, X.; Jia, W.; Guo, Y. Bilateral grid learning for stereo matching networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20—25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 12492–12501. [Google Scholar] [CrossRef]
  45. Unity. 2023. Available online: https://unity.com/ (accessed on 6 June 2023).
  46. Occupational Safety and Health Branch, Labour Department. Code of Practice on Safe Use of Excavators; Labour Department: Hong Kong, China, 2005.
  47. Matlab. 2023. Available online: https://www.mathworks.com/products/matlab.html (accessed on 29 December 2025).
  48. An, X.; Zhou, L.; Liu, Z.; Wang, C.; Li, P.; Li, Z. Dataset and benchmark for detecting moving objects in construction sites. Autom. Constr. 2021, 122, 103482. [Google Scholar] [CrossRef]
  49. LabelImg. 2023. Available online: https://github.com/heartexlabs/labelImg (accessed on 8 June 2023).
  50. Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
Figure 1. The overall architecture of the proposed framework. In this flowchart, the rounded rectangles represent the core computational models or modules, while the sharp-cornered rectangles represent the input/output data and intermediate results. The solid arrows indicate the direction of the data flow. The ellipses (dots) within the result boxes represent the continuous sequence of omitted video frames. In the upper-right sample image, the colored rectangles represent the bounding boxes of detected construction entities (e.g., the green box for the excavator, the red box for the worker, and the purple box for the excavator cab).
Figure 1. The overall architecture of the proposed framework. In this flowchart, the rounded rectangles represent the core computational models or modules, while the sharp-cornered rectangles represent the input/output data and intermediate results. The solid arrows indicate the direction of the data flow. The ellipses (dots) within the result boxes represent the continuous sequence of omitted video frames. In the upper-right sample image, the colored rectangles represent the bounding boxes of detected construction entities (e.g., the green box for the excavator, the red box for the worker, and the purple box for the excavator cab).
Buildings 16 01002 g001
Figure 2. Example of entities on a construction site. The blurred area in the bottom-left corner is applied to obscure a company logo and irrelevant watermark for privacy and confidentiality reasons.
Figure 2. Example of entities on a construction site. The blurred area in the bottom-left corner is applied to obscure a company logo and irrelevant watermark for privacy and confidentiality reasons.
Buildings 16 01002 g002
Figure 3. Workflow of activity recognition.
Figure 3. Workflow of activity recognition.
Buildings 16 01002 g003
Figure 4. The principle of stereo matching. In this diagram, the solid blue lines represent the optical projection rays connecting the 3D point Q to the left and right camera optical centers ( O L and O R ). The dashed blue line indicates the epipolar line. The dashed black lines with arrows denote the geometric distances and parameters, including the baseline b , focal length f , disparity, and depth Z c . The thin solid black line connects the corresponding matching feature points ( Q L and Q R ) in the two image planes.
Figure 4. The principle of stereo matching. In this diagram, the solid blue lines represent the optical projection rays connecting the 3D point Q to the left and right camera optical centers ( O L and O R ). The dashed blue line indicates the epipolar line. The dashed black lines with arrows denote the geometric distances and parameters, including the baseline b , focal length f , disparity, and depth Z c . The thin solid black line connects the corresponding matching feature points ( Q L and Q R ) in the two image planes.
Buildings 16 01002 g004
Figure 5. The principle of the pinhole imaging model. In this figure, the solid lines with arrows represent the axes of the camera coordinate system X c , Y c , Z c . The solid blue lines denote the projection rays connecting the camera’s optical center O to the spatial point P and its corresponding image point p 1 , along with the geometric auxiliary lines for depth calculation. The solid green line indicates the projection ray connecting the optical center O to the worker point Q and its image point q 1 . The solid grey line represents the horizontal reference line used to define the camera’s tilt angle a . The dashed lines represent auxiliary geometric projections on the camera image plane and the ground plane.
Figure 5. The principle of the pinhole imaging model. In this figure, the solid lines with arrows represent the axes of the camera coordinate system X c , Y c , Z c . The solid blue lines denote the projection rays connecting the camera’s optical center O to the spatial point P and its corresponding image point p 1 , along with the geometric auxiliary lines for depth calculation. The solid green line indicates the projection ray connecting the optical center O to the worker point Q and its image point q 1 . The solid grey line represents the horizontal reference line used to define the camera’s tilt angle a . The dashed lines represent auxiliary geometric projections on the camera image plane and the ground plane.
Buildings 16 01002 g005
Figure 6. The structure of safety identification.
Figure 6. The structure of safety identification.
Buildings 16 01002 g006
Figure 7. The working intervals of the worker and excavator.
Figure 7. The working intervals of the worker and excavator.
Buildings 16 01002 g007
Figure 8. Examples of three scenes: (a) flat ground, (b) rugged ground, and (c) slope.
Figure 8. Examples of three scenes: (a) flat ground, (b) rugged ground, and (c) slope.
Buildings 16 01002 g008
Figure 9. Example of camera calibration: (a) left image and (b) right image.
Figure 9. Example of camera calibration: (a) left image and (b) right image.
Buildings 16 01002 g009
Figure 10. The proximity estimation results of reference-based and stereo-vision-based methods in the flat-ground scene; (a) proximity estimation results, (b) proximity error. In subfigure (a), the solid blue line represents the actual distance, the dashed red line represents the estimation result of the reference-based method, and the dash-dot green line represents the estimation result of the stereo-vision-based method.
Figure 10. The proximity estimation results of reference-based and stereo-vision-based methods in the flat-ground scene; (a) proximity estimation results, (b) proximity error. In subfigure (a), the solid blue line represents the actual distance, the dashed red line represents the estimation result of the reference-based method, and the dash-dot green line represents the estimation result of the stereo-vision-based method.
Buildings 16 01002 g010
Figure 11. The proximity estimation results of reference-based and stereo-vision-based methods in the rugged-ground scene; (a) proximity estimation results, (b) proximity error. In subfigure (a), the solid blue line represents the actual distance, the dashed red line represents the estimation result of the reference-based method, and the dash-dot green line indicates the estimation result of the stereo-vision-based method.
Figure 11. The proximity estimation results of reference-based and stereo-vision-based methods in the rugged-ground scene; (a) proximity estimation results, (b) proximity error. In subfigure (a), the solid blue line represents the actual distance, the dashed red line represents the estimation result of the reference-based method, and the dash-dot green line indicates the estimation result of the stereo-vision-based method.
Buildings 16 01002 g011
Figure 12. The proximity estimation results of reference-based and stereo-vision-based methods in the slope scene; (a) proximity estimation results, (b) proximity error. In subfigure (a), the solid blue line represents the actual distance, the dashed red line represents the estimation result of the reference-based method, and the dash-dot green line indicates the estimation result of the stereo-vision-based method.
Figure 12. The proximity estimation results of reference-based and stereo-vision-based methods in the slope scene; (a) proximity estimation results, (b) proximity error. In subfigure (a), the solid blue line represents the actual distance, the dashed red line represents the estimation result of the reference-based method, and the dash-dot green line indicates the estimation result of the stereo-vision-based method.
Buildings 16 01002 g012
Table 1. The validation results of the object detection model.
Table 1. The validation results of the object detection model.
CategoryPrecisionRecall
All0.9440.814
Worker0.9440.798
Excavator0.9750.819
Excavator cab0.9450.821
Excavator crawler0.9130.818
Table 2. The results for identification rates in the three scenes.
Table 2. The results for identification rates in the three scenes.
CategoryFlat GroundRugged GroundSlope
Worker96.50%96.17%93.50%
Excavator99.67%98.33%99.67%
Excavator cab99.67%97.67%99.67%
Excavator crawler99.00%95.17%99.17%
Table 3. The worker safety identification performance in the flat-ground scene.
Table 3. The worker safety identification performance in the flat-ground scene.
Reference-BasedStereo-Vision-Based
Actual
SafeUnsafeSafeUnsafe
PredictionSafe4192440219
Unsafe612723132
Table 4. The worker safety identification performance in the rugged-ground scene.
Table 4. The worker safety identification performance in the rugged-ground scene.
Reference-BasedStereo-Vision-Based
Actual
SafeUnsafeSafeUnsafe
PredictionSafe3149037411
Unsafe1064444123
Table 5. The worker safety identification performance in the slope scene.
Table 5. The worker safety identification performance in the slope scene.
Reference-BasedStereo-Vision-Based
Actual
SafeUnsafeSafeUnsafe
PredictionSafe40813839721
Unsafe21111128
Table 6. The confusion matrix in the flat-ground scene.
Table 6. The confusion matrix in the flat-ground scene.
ExperimentTypePrecisionRecallF1-ScoreAccuracy
Reference-basedSafe94.58%98.59%96.54%94.79%
Unsafe95.49%84.11%89.44%
All95.04%91.35%92.99%
Stereo-vision-basedSafe95.49%94.59%95.04%92.71%
Unsafe85.16%87.42%86.27%
All90.32%91.00%90.65%
Table 7. The confusion matrix in the rugged-ground scene.
Table 7. The confusion matrix in the rugged-ground scene.
ExperimentTypePrecisionRecallF1-ScoreAccuracy
Reference-basedSafe77.72%74.76%76.21%64.62%
Unsafe29.33%32.84%30.99%
All53.53%53.80%53.60%
Stereo-vision-basedSafe97.14%89.47%93.15%90.04%
Unsafe73.65%91.79%81.73%
All85.40%90.63%87.44%
Table 8. The confusion matrix in the slope scene.
Table 8. The confusion matrix in the slope scene.
ExperimentTypePrecisionRecallF1-ScoreAccuracy
Reference-basedSafe74.73%99.51%85.36%74.96%
Unsafe84.62%07.38%13.58%
All79.67%53.45%49.47%
Stereo-vision-basedSafe94.98%97.30%96.13%94.25%
Unsafe92.09%85.91%88.89%
All93.53%91.60%92.51%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Y.; Chen, H.; Pan, R.; Yuan, M.; Zhang, P.; Wang, W. Fusion of Stereo Matching and Spatiotemporal Interaction Analysis: A Detection Method for Excavator-Related Struck-By Hazards in Construction Sites. Buildings 2026, 16, 1002. https://doi.org/10.3390/buildings16051002

AMA Style

Zhu Y, Chen H, Pan R, Yuan M, Zhang P, Wang W. Fusion of Stereo Matching and Spatiotemporal Interaction Analysis: A Detection Method for Excavator-Related Struck-By Hazards in Construction Sites. Buildings. 2026; 16(5):1002. https://doi.org/10.3390/buildings16051002

Chicago/Turabian Style

Zhu, Yifan, Hainan Chen, Rui Pan, Mengqi Yuan, Pan Zhang, and Wen Wang. 2026. "Fusion of Stereo Matching and Spatiotemporal Interaction Analysis: A Detection Method for Excavator-Related Struck-By Hazards in Construction Sites" Buildings 16, no. 5: 1002. https://doi.org/10.3390/buildings16051002

APA Style

Zhu, Y., Chen, H., Pan, R., Yuan, M., Zhang, P., & Wang, W. (2026). Fusion of Stereo Matching and Spatiotemporal Interaction Analysis: A Detection Method for Excavator-Related Struck-By Hazards in Construction Sites. Buildings, 16(5), 1002. https://doi.org/10.3390/buildings16051002

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop