Leveraging Deep Learning and Internet of Things for Dynamic Construction Site Risk Management

Lung, Li-Wei; Wang, Yu-Ren; Chen, Yung-Sung

doi:10.3390/buildings15081325

Open AccessArticle

Leveraging Deep Learning and Internet of Things for Dynamic Construction Site Risk Management

by

Li-Wei Lung

^1,*

,

Yu-Ren Wang

¹

and

Yung-Sung Chen

²

¹

Department of Civil Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 807618, Taiwan

²

ASC Digital Technology Co., Ltd., Kaohsiung 814017, Taiwan

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(8), 1325; https://doi.org/10.3390/buildings15081325

Submission received: 5 March 2025 / Revised: 30 March 2025 / Accepted: 11 April 2025 / Published: 17 April 2025

(This article belongs to the Special Issue Data Analytics Applications for Architecture and Construction)

Download

Browse Figures

Versions Notes

Abstract

The construction industry faces persistent occupational health and safety challenges, with numerous risks arising from construction sites’ complex and dynamic nature. Accidents frequently result from inadequate safety distances and poorly managed work-er–machine interactions, highlighting the need for advanced safety management solutions. This study develops and validates an innovative hazard warning system that leverages deep learning-based image recognition (YOLOv7) and Internet of Things (IoT) modules to enhance construction site safety. The system achieves a mean average precision (mAP) of 0.922 and an F1 score of 0.88 at a 0.595 confidence threshold, detecting hazards in under 1 s. Integrating IoT-enabled smart wearable devices provides real-time monitoring, delivering instant hazard alerts and personalized safety warnings, even in areas with limited network connectivity. The system employs the DIKW knowledge management framework to extract, transform, and load (ETL) high-quality labeled data and optimize worker and machinery recognition. Robust feature extraction is performed using convolutional neural networks (CNNs) and a fully connected approach for neural network training. Key innovations, such as perspective projection coordinate transformation (PPCT) and the security assessment block module (SABM), further enhance hazard detection and warning generation accuracy and reliability. Validated through extensive on-site experiments, the system demonstrates significant advancements in real-time hazard detection, improving site safety, reducing accident rates, and increasing productivity. The integration of IoT enhances scalability and adaptability, laying the groundwork for future advancements in construction automation and safety management.

Keywords:

construction safety; hazard warning system; deep learning; IoT integration; safety management

1. Introduction

The construction industry faces significant safety and health challenges, with higher accident and fatality rates than other sectors [1,2]. In Taiwan, for example, the construction sector experiences a disproportionately high number of workplace incidents [3], with a fatality rate three times the national average, underscoring the urgent need for improved safety measures [4]. This reflects a global trend where construction safety remains a critical concern despite widespread safety standards and regulations.

Construction sites are dynamic environments where workers, machinery, and materials continuously interact, creating complex safety challenges [5]. In Taiwan, these challenges are intensified by diverse, concurrent activities requiring meticulous coordination and real-time hazard management [6]. Data highlight that most accidents on Taiwanese construction sites result from falls from height and contact with moving machinery—issues prevalent globally [7,8].

Traditional safety management practices, often reactive and reliant on manual surveillance or periodic inspections, are insufficient to address these real-time risks [9]. In Taiwan, this approach fails to leverage the vast image data captured by cameras and CCTV, which are primarily used for quality control or project updates [10,11]. To address this gap, our study implements AI-driven image recognition technology to enhance safety management on construction sites. YOLOv7 enables real-time multi-object detection, identifying workers and machinery to prevent collisions. The system improves safety by detecting vehicle wheels and measuring the shortest distance from their bounding box centers to workers, enabling precise hazard detection. Conducted as a case study in Taiwan, this research demonstrates how AI can reduce workplace accidents and enhance safety measures. A detailed literature review is provided in Section 2, synthesizing prior studies to expand on these challenges.

This article outlines the motivations for using AI in construction safety and reviews existing safety measures and AI applications. It details our deep learning methodology for real-time hazard detection, illustrated with results from a Taiwan case study. The discussion evaluates AI’s impact on safety improvements, and the conclusion explores future advancements in construction safety technology.

2. Literature Review

2.1. Construction Industry Safety

The construction industry faces multifaceted safety management challenges, such as limited resources and insufficient safety awareness, prompting the need for advanced solutions (Figure 1 [12]). To address these issues, inspection units and occupational safety centers conduct rigorous site inspections and enforce safety regulations while emphasizing safety education and training. Harnessing information and communication technology (ICT), particularly artificial intelligence (AI) integration, presents a transformative approach to occupational safety and health (OSH) management in construction. Initiatives like Japan’s Disaster Prevention ICT Application Database Project encourage sharing ICT application case studies to enhance safety.

ICT innovations, including IoT and AI, offer real-time monitoring, hazard detection, and improvement recommendations, augmenting safety and health efficacy, reducing accident risks, and improving construction efficiency and quality [13,14,15].

2.2. Integration of Automated Construction Management

In recent years, deep learning has significantly advanced the field of image recognition. Researchers have applied deep learning in artificial intelligence to solve complex image recognition challenges. Deep convolutional neural networks have surpassed edge detection algorithms in detecting concrete cracks [16]. Computer vision has also progressed in civil infrastructure inspection and monitoring, covering object detection, semantic segmentation, and deep learning approaches [17]. Fang et al. (2020) emphasized the potential of computer vision to enhance construction worker safety behavior [18]. Chou and Liu (2021) created an automated truck recognition system for river dredging using computer vision and deep learning [19]. In 2022, Sha and Boukerche designed a computerized system for detecting pedestrian walking posture using deep learning convolutional neural networks [20]. The same year, Greeshma and Edayadiyil introduced an automated system that tracks construction project progress using machine learning and image recognition [21]. Meanwhile, Yeşilmen and Tatar applied CNN to classify images of concrete aggregates, aiding in monitoring construction production efficiency [22].

Recent deep learning and IoT technology advancements have significantly enhanced construction safety monitoring. Wang et al. (2023) proposed YOLOv7, a real-time object detection method that combines a trainable bag-of-freebies solution and compound scaling to achieve improved speed and accuracy [23]. Similarly, a 2023 study applied YOLOv8 to identify unsafe behaviors among construction workers, such as not wearing safety helmets and reflective vests, demonstrating the model’s effectiveness in improving safety supervision [24]. Furthermore, a 2022 journal proposed a BIM-IoT-IC framework for real-time road compaction quality monitoring and optimized construction management. These studies underscore the potential of combining advanced deep learning models with IoT technologies to enhance real-time safety monitoring in construction environments [25]. Despite these advances, the literature reveals a critical gap: existing methods, such as static CCTV-based monitoring [10,11] or post-event analysis via deep learning [16,19], lack real-time distance estimation between workers and moving machinery in dynamic construction environments (see Table 1). This limitation hinders proactive hazard prevention, a challenge our study addresses through an integrated approach.

This study employs commonly used cameras at construction sites and deep learning image recognition technology to identify objects automatically through convolutional neural networks (CNNs). It uses safety assessment block modules to detect potential hazards swiftly and issue warning signals, enabling real-time monitoring of construction site image data. This approach improves occupational safety at construction sites while reducing equipment costs.

3. Real-Time Object Detection Model

This study employs the YOLOv7 model to simultaneously predict multiple bounding boxes and classes, enabling end-to-end target detection and identification (see Figure 2) [34,35]. YOLOv7 was selected for its superior balance of speed and accuracy, which is crucial for real-time construction safety monitoring. At its release in July 2022, YOLOv7 set a new benchmark, achieving a mean average precision (mAP) of 0.923 on multi-object detection tasks (e.g., workers, machinery) while maintaining over 30 FPS on mid-range hardware, outperforming YOLOv5’s 0.895 [25].

YOLOv7’s architecture leverages several key components to optimize detection performance. CBS (convolution, batch normalization, SiLU) is utilized for efficient feature extraction, while efficient layer aggregation networks (ELANs) maintain consistent feature sizes throughout the model. Additionally, extended ELANs (E-ELANs) enhance gradient learning across layers, improving training stability and detection accuracy (see Figure 3a–d). The model’s scalability allows input size and depth adjustments, balancing computational load and processing speed (see Figure 4) [36].

Compared to YOLOv8 (released in January 2023), YOLOv7 requires 20–25% less GPU memory, making it more suitable for integration into resource-constrained IoT devices, such as edge cameras deployed on construction sites for real-time hazard detection. While alternative architectures like Faster R-CNN and SSD offer strong detection capabilities, their higher computational demands make them impractical for real-time processing in IoT-based safety systems. Faster R-CNN relies on region proposal networks, leading to slower inference times, while SSD, though faster, struggles with detecting small objects—a critical limitation in dynamic construction environments. By maintaining high-speed inference while minimizing computational overhead, YOLOv7 ensures efficient and accurate real-time hazard detection, making it the optimal choice for enhancing safety on dynamic construction sites.

3.1. Data Preprocessing

Data preprocessing is a crucial step in data analysis and machine learning, particularly in real-world environments like construction sites, where image data can be incomplete, noisy, or affected by varying weather and lighting conditions. This process focuses on cleaning and transforming raw data to improve model performance and ensure effective integration into object detection and classification tasks. Our data pipeline follows a structured ETL (Extract–Transform–Load) process, which involves collecting images of various object categories from construction sites, extracting feature labels, converting formats, and loading data into models (see Figure 5). This approach aligns with the DIKW (data, information, knowledge, and wisdom) knowledge management hierarchy, which structures raw data into a progression of meaningful insights. By leveraging DIKW, the transformed and organized data facilitate the development of intelligent models for construction site safety and real-time hazard detection, enhancing decision making and proactive risk management (see Figure 6).

(1): Image Collection

This study employs image classification for multi-class processing in model training and testing. The dataset is categorized into distinct classes, including workers, excavators, cranes, concrete mixer trucks, and dump trucks, forming a comprehensive collection of construction site images. To create a robust and representative dataset, we sourced images from three primary channels:

Publicly Available Datasets: The COCO dataset provides diverse human postures, enhancing worker recognition in construction environments. The ACID dataset (Architectural Construction Image Dataset) contains labeled construction machinery images, including excavators, cranes, dump trucks, and concrete mixer trucks.
Real-World Construction Site Images: Our research team captured images from active construction sites to ensure the dataset accurately reflects real-world conditions. These images account for weather, lighting, and background complexity variations, enhancing model adaptability.
Creative Commons (CC) Licensed Images: Additional images were obtained through keyword-based searches (e.g., “Construction site images”, “Construction machinery images”, and “Building equipment images”). Only images with CC licenses were selected to ensure compliance with copyright regulations.

Each image category was assigned to specific object classes to improve object detection accuracy. Workers were primarily sourced from the COCO dataset, which provides a wide range of human postures applicable to construction-like scenarios. Excavators, dump trucks, cranes, and concrete mixer trucks were mainly derived from the ACID dataset and supplemented with selected real-world construction site images to improve robustness in diverse construction environments. Furthermore, construction site photos from ongoing projects further supplement worker and machinery categories. For example, Figure 7 illustrates an image captured by our research team at an active construction site, reinforcing the dataset’s authenticity and applicability to real-world scenarios.

(2): Extract–Transform–Load (ETL)

Convert images to TXT format for YOLOv7 compatibility and resize them to 320 × 320 pixels, a multiple of 32 as required by YOLO’s grid-based architecture. This resolution balances detection accuracy and real-time processing speed, which is critical for on-site hazard detection, enabling YOLOv7 to achieve over 30 FPS on mid-range hardware (as noted earlier in this section) for timely safety alerts. Higher resolutions like 640 × 640 could enhance accuracy for small objects (e.g., increasing mAP by 1–2%) and improve distance estimation in homography transformation with better feature points. However, they would increase inference time by 3–4 times, risking alert delays. A structured Extract–Transform–Load (ETL) process is essential for data integration and model application, ensuring high-quality input for object detection. The ETL workflow consists of three primary stages, as shown in Figure 5. The steps include the following:

① Dataset Compilation: images of construction workers, excavators, dump trucks, cranes, and concrete mixer trucks were collected from multiple sources and captured from various angles and lighting conditions to enhance adaptability to real-world construction environments.
② Data Preprocessing (ETL):
- Validate: confirm that dataset images accurately represent construction workers and machinery in real construction settings.
- Clean: remove blurry, obstructed, or cluttered background images to maintain high-quality training data
- Transform: convert images to TXT format for YOLOv7 compatibility.
- Aggregate: use LabelImg to extract and label object features (e.g., an excavator’s bucket, a worker’s helmet), with YOLOv7 effectively detecting objects like workers at 15 pixels, a threshold our targets typically exceed at 320 × 320 resolution.
- Load: Structure the dataset with images, bounding boxes, and spatial coordinates. YOLOv7 was trained with a 0.01 learning rate, SGD (0.937 momentum), batch size of 12, and 300 epochs (see Section 4 for details).
③ Data Warehouse: The processed dataset is stored in a digital warehouse, enabling real-time integration with the IoT-based hazard warning system. This structured dataset facilitates image recognition, construction safety alerts, and statistical analysis for predictive risk assessment.

Image labeling involves marking the location and size of objects in each image, as depicted in Figure 8. The bounding box annotation process defines object detection areas, where bnw and bnh represent the width and height of the bounding box, respectively. This structured approach ensures that YOLOv7 can efficiently identify workers and machinery, improving the accuracy of real-time safety monitoring in dynamic construction environments.

3.2. Heterogeneity Analysis in Construction Safety Systems

Construction sites exhibit significant heterogeneity, with variations in layout, worker density, environmental conditions, and machinery presence. This variability affects hazard detection accuracy and requires systematic adaptation strategies. A combination of dataset expansion, data augmentation, and site-specific adaptation techniques is implemented to ensure the generalizability of this study’s deep learning-based hazard warning system.

The research incorporates data from multiple construction site types to enhance model robustness, ensuring a diverse representation of environments. The dataset includes small-scale residential sites, large-scale commercial projects, and infrastructure works such as bridges, tunnels, and highways. Additionally, geographical diversity is considered, with data collected from urban, suburban, and rural areas and sites operating under different climatic conditions, including fog, rain, snow, and extreme heat. These variations are essential for ensuring that the deep learning model can generalize across different environments without bias toward a specific construction site type [36].

To further enhance the adaptability of the proposed system, data augmentation techniques are applied to simulate real-world variability. This includes random rotations, brightness adjustments, occlusion handling, and shadow simulations to mimic changing light conditions throughout the day. Such techniques are effective even in structural safety applications where datasets are limited [37] Additionally, contrast enhancement improves object detection performance in low-visibility conditions, such as nighttime work environments. These augmentation methods help to create a more diverse dataset, reducing overfitting and improving the model’s ability to detect hazards in complex construction scenes [38].

In addition to dataset expansion and augmentation, this approach considers site-specific factors that impact model performance, such as scaffolding density, terrain slope, and underground workspaces, which can affect worker visibility and machinery detection. To address these challenges, domain adaptation and transfer learning techniques are integrated, allowing the model to fine-tune its detection capabilities based on specific construction environments [39]. Moreover, feature recalibration techniques are implemented to consistently detect key safety elements, such as helmets, safety vests, and heavy machinery, across different background settings. These heterogeneity-aware adaptations enhance hazard recognition accuracy and enable real-time safety monitoring across various construction scenarios [40].

3.3. YOLO Object Detection Method

The YOLO detection method divides images into grid cells and predicts two bounding boxes per cell. IoU (intersection over union) assesses the overlap between detected and actual object boxes, with values close to one indicating high accuracy. Each bounding box contains five values (x, y, w, h, and confidence): x and y denote the object’s center in the grid cell, w and h represent its width and height, and confidence measures the likelihood of its presence. A higher confidence score suggests a higher probability of an object within the grid cell. The final results are refined using IoU thresholds and non-maximum suppression (NMS) (see Figure 9).

4. Experiment and Results

This study utilized a dataset of 8663 labeled construction site images to develop the YOLO-based image recognition system for construction safety. The dataset, with instance, counts, and performance metrics detailed in Table 2, was randomly split into training (6236 images, 70%), validation (1560 images, 20%), and test sets (867 images, 10%). A pie chart in Figure 10 illustrates the proportional distribution of detected instances across the five target object classes (workers, excavators, cranes, concrete mixers, dump trucks). The training process is optimized for object detection in built environments.

During the detection process, images showing the number of construction workers on site, data sources, object categories, coordinate positions, and execution time were integrated with the construction project log dataset (Table 3). Figure 11 shows the test results of the YOLO image detection model, with coordinates automatically converted to the center point of objects for a more precise representation of relative position.

YOLO evaluation metrics are calculated through confusion matrix parameters, resulting in mean average precision (mAP) and average precision (AP). mAP represents the average AP across all object classes, while AP is the area under the precision–recall (PR) curve, formed by plotting precision on the y-axis and recall on the x-axis.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

A P = \frac{\sum_{k = 1}^{n} [P (k) \times r e l (k)]}{n u m e r o f r e l a v a n t d o c u m e n t s}

(3)

m A P = \sum_{i = 1}^{n} A P i

(4)

Precision is the proportion of correctly classified targets among all detected targets, while recall is the proportion of correctly identified targets among all actual targets. TP, FP, FN, and TN are confusion matrix parameters. AP is the area under the PR curve, with precision and recall as axes ranging from 0 to 1. P(k) represents precision, and rel(k) represents recall. mAP is the average AP across all object classes, indicating object detection accuracy, ranging from 0 to 1. Higher precision suggests greater model accuracy and reliability, while higher recall indicates better accuracy and comprehensiveness in capturing target objects. The changes in these values during training are shown in Figure 12.

The YOLOv7 model’s training results were evaluated by testing multiple IoU thresholds to assess their impact on precision, recall, and F1 score. Lower IoU thresholds increase recall, capturing potential hazards and introducing false positives. Conversely, higher IoU thresholds enhance precision but may miss particular objects, leading to false negatives. This trade-off is particularly critical in construction safety applications, where accurately detecting the proximity of workers to machinery is essential for minimizing missed hazards and false alarms. By systematically analyzing the performance across different IoU thresholds, the IoU threshold that produced the highest F1 score was selected to ensure a balanced and robust detection system. This process helped mitigate potential biases by accounting for varying degrees of object overlap, critical for effective hazard identification in real-world construction environments.

The YOLOv7 model was trained with a learning rate of 0.01, which ensured stable loss convergence in fewer epochs on our construction dataset, aligning with YOLOv7’s default setting. SGD with momentum (0.937, also the default) was chosen over alternatives like Adam because it is better generalized to diverse construction site images, reducing overfitting and improving mAP by approximately 2–3% compared to Adam, which tended to overfit due to its adaptive learning rate. A batch size of 12 was used, deviating from the default of 16, to balance memory usage and gradient stability on our hardware while training for 300 epochs aligned with YOLOv7’s typical setting. Additional parameters included a weight decay of 0.0005 (default) and no learning rate schedule, differing from the default cosine annealing to maintain training simplicity, as our focus was on accurate distance estimation for real-time hazard detection rather than extensive network optimization.

To further evaluate the model’s performance, the results were measured using two key algorithms: mAP@0.5 and mAP@0.5:0.95. The mAP@0.5 calculates the mean average precision (mAP) for predictions with an IoU threshold more significant than 0.5, offering an interpretable measure of model accuracy at a relatively lenient threshold. This metric is handy for monitoring safety on construction sites, such as ensuring safe distances between workers and machinery to reduce the risk of injuries. Meanwhile, mAP@0.5:0.95 provides a more rigorous evaluation by averaging the mAP over IoU thresholds ranging from 0.5 to 0.95 in 0.05 increments, ensuring robust model performance across different conditions, as depicted in Figure 13. This dual-metric approach aligns with industry standards and addresses the critical need for precise hazard detection in real-time applications, where minimizing false positives and false negatives is crucial for ensuring worker safety.

This study aims to ensure a sufficient sample size for machinery and worker categories, maintaining a realistic worker-to-machinery ratio of two–three workers per machine to mirror actual construction site conditions. With 13,900 worker instances across 8663 images, this dataset supports reliable safety risk assessments, focusing on precise worker detection to improve construction site safety protocols.

To evaluate sample size adequacy and its effect on model reliability, three dataset subsets (33%, 67%, and 100%) were trained under identical conditions: using the same pre-trained weights, 300 epochs, input size of 320 × 320 pixels, and a learning rate of 0.01 with the SGD optimizer. This uniform setup allows a precise analysis of the impact of sample size on model performance. The results are summarized in Table 2.

The table illustrates how varying sample sizes affect performance metrics such as average precision (AP) and F1 scores across different object categories. To complement this, Figure 10 presents a pie chart illustrating the proportional distribution of detected instances across these classes, based on the 8663 cases total in our dataset, offering a clear visual representation of their relative frequencies. Key observations include the following:

Worker: the AP for workers slightly increases from 0.856 at 33% sample size to 0.883 at 100%, indicating that the model maintains reliable detection even with fewer samples due to the high variability captured in this category.
Crane: the AP for cranes improves from 0.835 to 0.915 as the sample size increases, showing that additional data enhance the model’s ability to detect less frequent classes.
Excavators and Concrete Mixer Truck: these categories maintain high AP scores even at lower sample sizes, suggesting that distinctive features aid their detection.
Dump Truck: strong performance is demonstrated relative to sample size, maintaining stability with mid-sized datasets and exhibiting improved detection capabilities as the training sample size increases.

This larger dataset enhances the model’s adaptability across diverse construction environments, improving generalization and reducing the risk of overfitting. Furthermore, comparisons with datasets from similar studies confirm its adequacy. For instance, Dwivedi et al. (2022) utilized 339,000 worker images to achieve high accuracy in detection [41]. Similarly, Shetye et al. (2023) trained their model on a dataset of 3500 images, highlighting the variability in sample sizes across related research [42]. The dataset, with 13,900 instances of workers, falls within this range and is sufficient to train deep learning models effectively.

Moreover, the model’s practical application covers an area of 52 square meters within the camera’s field of view, demonstrating strong performance in detecting workers and machinery on construction sites. As shown in Figure 14, the model achieves an overall F1 score of 0.88 at a confidence level of 0.595. For machinery, the F1 score ranges from 0.87 to 0.92, while it consistently remains above 0.83 for workers.

Figure 15 illustrates the precision curves for all classes, which approach 1.00 at high confidence levels, reaching 1.00 at a confidence level of 0.979. Figure 16 displays precision–recall curves, further validating the model’s effectiveness, with a mean average precision (mAP) of 0.922 at 0.5 IoU across all classes. The AP values remain consistently high, ranging from 0.883 for workers to 0.964 for concrete mixer trucks, underscoring the model’s ability and effectiveness in construction site object detection. These results are comparable to or exceed those reported in similar studies.

To address potential sampling bias and ensure that the dataset accurately reflects the variability of actual construction site conditions, data were collected from multiple construction sites under varying environmental conditions, such as lighting (daylight, nighttime) and weather (sunny, rainy), as demonstrated by Shanti et al. (2021) [43]. Additionally, data augmentation techniques were applied, including random rotations, scaling, flipping, brightness and contrast adjustments, and color jittering, to artificially increase the diversity of the dataset and enhance model generalization. Furthermore, the model was continuously validated across different environments to ensure consistent performance, reducing the likelihood of bias toward specific conditions. These measures ensure that the dataset is representative of real-world construction site conditions, ultimately enhancing the reliability of the hazard warning system.

5. System Application and Testing

5.1. System Architecture

This study employs the deep learning algorithm “You Only Look Once” (YOLO) for object detection, evaluating identified objects against established safety standards. The system integrates Internet of Things (IoT) modules to enhance functionality, facilitating seamless communication between the hazard detection system and workers’ smart wearable devices. These IoT modules act as intermediaries, connecting the YOLOv7 detection framework with devices such as smart helmets, which provide personalized safety warnings. Moreover, the IoT components ensure reliable alert delivery even in areas with limited network connectivity, establishing a robust real-time safety management ecosystem. The integration of IoT devices within the system workflow is illustrated in Figure 17. During the detection process, the system converts camera pixel dimensions to actual size, and when workers or equipment exceed the set safety distance, it converts back to pixel dimensions to highlight and mark the workers in the warning area. The system sends alerts to the project manager’s phone for real-time on-site operations management to prevent hazardous incidents, as illustrated in Figure 18.

5.2. Perspective Projection Coordinate Transformation

Computer vision research has often used affine transformations, which combine linear transformations and translations to convert one 2D coordinate system into another using matrix operations, represented as follows:

[\begin{matrix} x_{2} \\ y_{2} \\ 1 \end{matrix}] = [\begin{matrix} m_{11} \\ m_{21} \\ 0 \end{matrix} \begin{matrix} m_{12} \\ m_{22} \\ 0 \end{matrix} \begin{matrix} m_{13} \\ m_{23} \\ 1 \end{matrix}] [\begin{matrix} x_{1} \\ y_{1} \\ 1 \end{matrix}]

(5)

[\begin{matrix} \begin{matrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{matrix} \end{matrix}]

represents the linear transformation, while

[\begin{matrix} m_{13} \\ m_{23} \end{matrix}]

is the translation vector. Calculating matrix

[\begin{matrix} m_{11} \\ m_{21} \end{matrix} \begin{matrix} m_{12} \\ m_{22} \end{matrix} \begin{matrix} m_{13} \\ m_{23} \end{matrix}]

requires six independent linear equations using three non-collinear points. Affine transformations maintain the relative positions of 2D shapes, preserving parallel lines and the sequence of points on straight lines, though angles may shift. Figure 19 illustrates the concept of affine transformation.

This study uses projection transformation of construction site image data to convert a 3D coordinate system into a 2D aerial view. Unlike affine transformations, projection transformation aligns with homography theory. Homography is a reversible transformation mapping one plane coordinate system onto another. Using a 3 × 3 matrix H, a point x on plane A can be mapped to a corresponding point x’ on plane B. The mathematical representation of homography is as follows:

[\begin{matrix} x_{2} \\ y_{2} \\ 1 \end{matrix}] = [\begin{matrix} h_{11} \\ h_{21} \\ h_{31} \end{matrix} \begin{matrix} h_{12} \\ h_{22} \\ h_{32} \end{matrix} \begin{matrix} h_{13} \\ h_{23} \\ h_{33} \end{matrix}] [\begin{matrix} x_{1} \\ y_{1} \\ 1 \end{matrix}]

(6)

[\begin{matrix} \begin{matrix} h_{11} & h_{12} \\ h_{21} & h_{22} \end{matrix} \end{matrix}]

is regarded as a linear transformation, such as scaling or rotation, while

[\begin{matrix} h_{13} \\ h_{23} \end{matrix}]

represents the translation vector.

[\begin{matrix} h_{31} & h_{32} \end{matrix}]

is used for creating a perspective transformation of the image, and

[h_{33}]

is typically set to 1.

The homography transformation matrix has eight degrees of freedom despite containing nine elements, as the matrix can be multiplied by any non-zero constant without affecting the transformation. The homography transformation matrix can be calculated by providing four sets of corresponding pixel coordinates on two planes, establishing the projection transformation between two coordinate systems. Figure 20 shows the concept of homography transformation.

OpenCV is a widely used image processing toolkit in image processing and computer vision research. The library includes two functions for calculating homography matrices. The “getPerspectiveTransform()” function computes the homography matrix H (3 × 3) for four 2D points. At the same time, “findHomography” calculates the optimal matrix for multiple pairs of 2D points. “findHomography” handles more than four point pairs, using optimization algorithms to find the best solution.

Homography transformations are applied to calculate the relative position between the camera and the target [44]. SURF is used for feature matching to detect ground targets and compute the homography matrix between the target and camera coordinate systems. Feature point matching may be affected by image resolution and other factors.

In the case study, a 2-megapixel camera is installed 4.5 m above the construction site entrance, angled downward at 30 degrees, to provide a comprehensive aerial view of construction activities. This setup uses a grid of 50 × 50 cm squares as a real-world coordinate system, allowing the conversion of pixel coordinates to real-world coordinates through a homography transformation matrix. This enables the system to estimate the safety distance between workers and machinery accurately. Figure 21 illustrates the data transformation matrix used for these calculations, enhancing object detection accuracy.

5.3. Security Assessment Block Module

The camera is integrated with the YOLOv7 model to capture and process real-time video footage, enabling continuous monitoring and analysis of the construction environment for improved safety management.

Ensuring a safe distance between machinery and workers on construction sites is crucial. This study aims to develop a safety distance monitoring module using image recognition data, integrating multi-object detection with the YOLOv7 model to enable simultaneous tracking of workers and machinery. Cameras input images into a neural network for object detection, identifying object types and locations. The system calculates the distance between workers and construction vehicles based on their bounding box coordinates and compares it to a predefined safety threshold. If the distance falls below the threshold, a LINE message with a screenshot and details is sent to the project manager’s phone.

To enhance real-time safety monitoring, the system independently evaluates multiple hazardous events. For example, when construction vehicles enter the site, the system continuously tracks the safety distance between workers and machinery, with a threshold set at 6 m. Video frames are captured and processed by YOLOv7 to detect objects such as workers and vehicle components like wheels (the wheel will be hidden during detection to keep the screen clean). The bounding box coordinates are then mapped to real-world distances using homography and calibrated camera parameters. To improve accuracy, the system calculates the shortest distance between the centers of the tire and worker’s bounding boxes. If this distance falls below the safety threshold, alerts are triggered via the IoT module. Figure 22 illustrates this process, where a worker’s proximity to a vehicle tire triggers a “WARNING!!!” due to the close distance, indicating that a mechanical engineering vehicle is detected near the worker.

Given the complexity of construction sites, it is essential to recognize multi-risk scenarios where multiple hazards coexist. Workers may simultaneously face risks from overlapping machinery operations, multiple hazard sources, or blind spots. While the current study primarily addresses distance-based hazard detection, future research will focus on developing a risk accumulation index to enhance risk prioritization and alarm escalation strategies, as discussed in Section 6: Conclusions.

Table 4 illustrates an example where an engineering vehicle enters the construction site, and the algorithm calculates the shortest distance between a crane and a worker as being 575.69 cm. Since the predefined safety threshold in this study is 600 cm, the system immediately issues a red warning, notifying site managers to take prompt action and mitigate potential safety risks.

5.4. Distance Estimation Accuracy

To validate the accuracy of the system’s distance estimation in real-world settings, we conducted a controlled on-site experiment using a subset of 50 images extracted from the construction site video shown in Figure 22 (“Construction Site Video”). We captured every 10th frame to ensure a representative sample of scenarios with clear worker–machine pairs (e.g., a worker and a truck, as shown in Figure 23 for a worker and a truck’s wheel). In a controlled construction site setting, we manually measured the actual distances between the worker and the machine at 1 m, 2 m, and 3 m intervals using a laser rangefinder. The system’s estimated distances were calculated using the bounding box coordinate center points and homography transformation described in Section 3.2. We then computed the absolute error (actual distance–estimated distance) for each measurement, calculating the mean absolute error (MAE) and standard deviation (SD) for each distance interval. The results are summarized in Table 5.

The results show that the system achieves an MAE of 0.08 m at 1 m, 0.12 m at 2 m, and 0.25 m at 3 m. The error increases at longer distances due to the system’s reliance on a 2-megapixel camera with an input resolution of 320 × 320 pixels, which limits the precision of feature detection (e.g., the bottom of a tire) and homography transformation, mainly when perspective distortion or occlusion occurs. Influencing factors such as camera angle, lighting conditions, and partial occlusion of objects also contribute to variability, as evidenced by the higher SD at 3 m. These findings demonstrate the system’s world reliability while highlighting areas for future improvements and the need for future improvements, such as integrating sensor fusion techniques (e.g., LiDAR or depth cameras) to enhance accuracy at longer distances and under challenging conditions.

5.5. Real-Time IoT Alert System

In the era of smartphone ubiquity, Line is widely used for communication in Taiwan, making sending warning messages via Line more efficient than through websites or cloud-based alerts. Line Notify is an official Line account that connects users with specific channels, instantly sending notifications to users when there are messages or actions. Line Notify is a simple Line Bot that uses APIs and requested tokens to send notifications to groups or individuals (Figure 24).

Building upon the use of Line Notify for energy control, as demonstrated by Arunyagool et al. (2021), this study leverages Line Notify as a hazard alert tool for construction sites [45]. Configured for a network recognition speed of 7 fps or higher, with the Line API limiting alerts to once every 5 s for optimal balance between speed and reliability, this system achieves an approximate 1 s latency between threat detection and text message delivery. IoT modules are integrated to deliver warnings to workers’ smart wearable devices, such as smart helmets and smartwatches, to enhance the warning mechanism further and broaden alert dissemination. These devices provide immediate haptic or visual notifications upon hazard detection. For instance, if a worker enters an unsafe proximity to heavy machinery, the IoT system can trigger a haptic or auditory warning on the worker’s wearable and notify the site manager via a mobile application, ensuring rapid hazard awareness and timely interventions to improve overall construction site safety.

Integrating ThingSpeak with Line Notify connects construction site managers with a designated safety-focused Line group. When there are updates or changes in the safety zone assessment, Line Notify sends instant notifications to the group via Line. This ensures project safety managers in the group receive timely information and hazard alerts, allowing them to take immediate action to secure the construction site, as shown in Figure 25.

6. Conclusions

This study highlights the urgent need for improved safety management in the dynamic and complex environments of construction sites, emphasizing the limitations of traditional monitoring methods, which often lack real-time responsiveness and predictive capabilities. This recognition led to exploring artificial intelligence and deep learning as potential solutions.

The research introduced a real-time hazard warning system using the YOLO deep learning model, specifically designed for construction sites. This system efficiently identifies five key target objects, workers, excavators, cranes, concrete mixers, and dump trucks, while simultaneously collecting site data and monitoring safety zones. The system incorporates essential features such as object detection, perspective projection coordinate transformation, safety assessments, and hazard alerts. Case testing and analysis have confirmed the system’s feasibility, and it can be customized to set safe distances between machinery and workers based on specific construction operations.

The case study conducted in Taiwan demonstrated that the AI-driven system significantly enhances hazard detection and response times. It effectively identified potential risks, such as unsafe distances between workers and machinery, and issued timely warnings to prevent accidents. This confirms the system’s practical utility and effectiveness in real-world construction environments. To further enhance its adaptability, we will incorporate localized data from Taiwanese construction sites, using automated labeling tools to improve data annotation and refine the model’s accuracy. This will ensure the system aligns with specific safety standards and can be tailored to local conditions.

Enhancing construction site safety through real-time hazard detection demonstrates substantial potential, though certain limitations remain. The system’s effectiveness heavily relies on the quality and quantity of input image data, which may lead to occasional false positives or overlooked hazards. Additionally, while effective for real-time notifications, the IoT alert system currently lacks a mechanism to regulate alerts, resulting in multiple notifications for the same event due to the Line API’s 5 s interval restriction. To address this, future enhancements will include a notification cooldown mechanism to batch multiple triggers for the same hazard event within a 30 s interval into a single consolidated alert (e.g., “Worker in the hazardous zone for 30 s”), ensuring timely warnings while minimizing information overload for site managers. Integrating IoT elements presents a promising avenue to address these challenges by enhancing scalability, real-time responsiveness, and overall effectiveness. Future research should focus on leveraging IoT devices for predictive analytics based on real-time sensor data, enabling proactive accident prevention. Furthermore, developing customizable IoT platforms for centralized data management can provide site managers with comprehensive insights into safety conditions and trends. Enhancing data quality, ensuring secure data exchange, and standardizing warning messages will streamline information transmission and expand the system’s applicability, enabling seamless adaptation to varying site conditions and supporting diverse construction scenarios.

Integrating deep learning and IoT technologies provides a robust framework for enhancing construction site safety, reducing accident rates, and improving operational efficiency. Our system achieves an mAP of 0.922 and an F1 score of 0.88, demonstrating strong performance on a dataset of 8663 construction site images. In comparison, Redmon and Farhadi (2018) demonstrated the effectiveness of YOLOv3 on general object detection tasks [46], while Feng et al. (2024) achieved an mAP of 0.84 in construction safety monitoring [47]. Our approach outperforms these benchmarks, particularly in real-time applications, due to its integration of IoT for instant alerts and its focus on dynamic distance estimation between workers and machinery, addressing the unique challenges of construction environments.

Additionally, construction site safety management must account for situations where multiple risks coexist. In future work, we propose the development of a risk accumulation index to enhance hazard assessment. This index will (1) assign severity scores based on violation type and frequency, (2) escalate alarms when minor infractions accumulate into critical safety concerns, and (3) prioritize alerts to help site managers focus on the most urgent risks. Furthermore, to further improve system accuracy and reliability, we plan to implement sensor fusion techniques, integrating LiDAR or depth cameras with vision-based object detection to compensate for depth perception errors. Adaptive image enhancement methods, such as contrast adjustments for low-light conditions, will also be incorporated to enhance detection stability. Integrating these advancements will provide more comprehensive safety intervention measures and proactive hazard prevention strategies.

This research provides a scalable and effective solution for real-time monitoring and risk management in construction safety by integrating AI technologies. This innovative approach proactively addresses safety concerns, reducing the incidence of accidents and improving overall site safety.

Author Contributions

Conceptualization, L.-W.L. and Y.-R.W.; methodology, L.-W.L. and Y.-R.W.; software, L.-W.L. and Y.-S.C.; validation, L.-W.L. and Y.-S.C.; formal analysis, L.-W.L.; investigation, L.-W.L.; resources, L.-W.L. and Y.-R.W.; data curation, L.-W.L. and Y.-R.W.; writing—original draft preparation, L.-W.L.; writing—review and editing, L.-W.L. and Y.-R.W.; visualization, L.-W.L. and Y.-S.C.; supervision, L.-W.L.; project administration, L.-W.L. and Y.-R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

Author Yung-Sung Chen is employed by the ASC Digital Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Schwatka, N.V.; Hecker, S.; Goldenhar, L.M. Defining and measuring safety climate: A review of the construction industry literature. Ann. Occup. Hyg. 2016, 60, 537–550. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Zhang, Z.; Li, Y. A tripartite evolutionary game involving quality regulation of prefabricated building projects considering government rewards and penalties. Int. J. Ind. Eng. Theory Appl. Pract. 2023, 30, 8779. [Google Scholar] [CrossRef]
Chang, H.; Trieste, J.V. Accidents Highlight Workplace Safety at Taiwan’s Construction Sites. Taiwan News, 29 November 2023.
Chen, W.T.; Tsai, I.C.; Merrett, H.C.; Lu, S.T.; Lee, Y.-I.; You, J.-K.; Mortis, L. Construction Safety Success Factors: A Taiwanese Case Study. Sustainability 2020, 12, 6326. [Google Scholar] [CrossRef]
Kineber, A.F.; Antwi-Afari, M.F.; Elghaish, F.; Zamil, A.M.A.; Alhusban, M.; Qaralleh, T.J.O. Benefits of implementing occupational health and safety management systems for the sustainable construction industry: A systematic literature review. Sustainability 2023, 15, 12697. [Google Scholar] [CrossRef]
Chen, H.; Mao, Y.; Xu, Y.; Wang, R. The impact of wearable devices on the construction safety of building workers: A systematic review. Sustainability 2023, 15, 11165. [Google Scholar] [CrossRef]
Yu, W.-D.; Hsiao, W.-T.; Cheng, T.-M.; Chiang, H.-S.; Chang, C.-Y. Describing Construction Hazard Images Identified from Site Safety Surveillance Video. In Proceedings of the 3rd International Civil Engineering and Architecture Conference, Singapore, 11–14 March 2024; pp. 937–948. [Google Scholar]
Kang, L. Statistical analysis and case investigation of fatal fall-from-height accidents in the Chinese construction industry. Int. J. Ind. Eng. Theory Appl. Pract. 2022, 29, 7971. [Google Scholar] [CrossRef]
Zhou, Z.; Goh, Y.M.; Li, Q. Overview and analysis of safety management studies in the construction industry. Saf. Sci. 2015, 72, 337–350. [Google Scholar] [CrossRef]
Bohn, J.S.; Teizer, J. Benefits and barriers of construction project monitoring using high-resolution automated cameras. J. Constr. Eng. Manag. 2010, 136, 632–640. [Google Scholar] [CrossRef]
Chen, X.; Zhu, Y.; Chen, H.; Ouyang, Y.; Luo, X.; Wu, X. BIM-based optimization of camera placement for indoor construction monitoring considering the construction schedule. Autom. Constr. 2021, 130, 103825. [Google Scholar] [CrossRef]
Parsamehr, M.; Perera, U.S.; Dodanwala, T.C.; Perera, P.; Ruparathna, R. A review of construction management challenges and BIM-based solutions: Perspectives from the schedule, cost, quality, and safety management. Asian J. Civ. Eng. 2023, 24, 353–389. [Google Scholar] [CrossRef]
Ozumba, A.O.U.; Shakantu, W. Exploring challenges to ICT utilisation in construction site management. Constr. Innov. 2018, 18, 321–349. [Google Scholar] [CrossRef]
Tabatabaee, S.; Mohandes, S.R.; Ahmed, R.R.; Mahdiyar, A.; Arashpour, M.; Zayed, T.; Ismail, S. Investigating the barriers to applying the internet-of-things-based technologies to construction site safety management. Int. J. Environ. Res. Public Health 2022, 19, 868. [Google Scholar] [CrossRef] [PubMed]
Afzal, M.; Shafiq, M.T.; Al Jassmi, H. Improving construction safety with virtual-design construction technologies—A review. J. Inf. Technol. Constr. 2021, 26, 319–340. [Google Scholar] [CrossRef]
Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of deep convolutional neural networks and edge detectors for image-based crack detection in concrete. Constr. Build. Mater. 2018, 186, 1031–1045. [Google Scholar] [CrossRef]
Deng, J.; Singh, A.; Zhou, Y.; Lu, Y.; Lee, V.C.-S. Review on computer vision-based crack detection and quantification methodologies for civil structures. Constr. Build. Mater. 2022, 356, 129238. [Google Scholar] [CrossRef]
Fang, W.; Ding, L.; Love, P.E.D.; Luo, H.; Li, H.; Peña-Mora, F.; Zhong, B.; Zhou, C. Computer vision applications in construction safety assurance. Autom. Constr. 2020, 110, 103013. [Google Scholar] [CrossRef]
Chou, J.-S.; Liu, C.-H. Automated Sensing System for Real-Time Recognition of Trucks in River Dredging Areas Using Computer Vision and Convolutional Deep Learning. Sensors 2021, 21, 555. [Google Scholar] [CrossRef]
Sha, M.; Boukerche, A. Performance evaluation of CNN-based pedestrian detectors for autonomous vehicles. Ad Hoc Netw. 2022, 128, 102784. [Google Scholar] [CrossRef]
Greeshma, A.S.; Edayadiyil, J.B. Automated progress monitoring of construction projects using Machine learning and image processing approach. Mater. Today Proc. 2022, 65, 554–563. [Google Scholar]
Yeşilmen, S.; Tatar, B. Efficiency of convolutional neural networks (CNN) based image classification for monitoring construction related activities: A case study on aggregate mining for concrete production. Case Stud. Constr. Mater. 2022, 17, e01372. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Kim, K.; Kim, K.; Jeong, S. Application of YOLO v5 and v8 for Recognition of Safety Risk Factors at Construction Sites. Sustainability 2023, 15, 15179. [Google Scholar] [CrossRef]
Han, T.; Ma, T.; Fang, Z.; Zhang, Y.; Han, C. A BIM-IoT and intelligent compaction integrated framework for advanced road compaction quality monitoring and management. Comput. Electr. Eng. 2022, 100, 107981. [Google Scholar] [CrossRef]
Musarat, M.A.; Khan, A.M.; Alaloul, W.S.; Blas, N.; Ayub, S. Automated monitoring innovations for efficient and safe construction practices. Results Eng. 2024, 22, 102057. [Google Scholar] [CrossRef]
Halder, S.; Afsari, K.; Shojaei, A. Natural Interaction Modalities for Human-CPS Interaction in Construction Progress Monitoring. arXiv 2023, arXiv:2312.05988. [Google Scholar]
Halder, S.; Afsari, K.; Chiou, E.; Patrick, R.; Hamed, K.A. Construction inspection & monitoring with quadruped robots in future human-robot teaming: A preliminary study. J. Build. Eng. 2023, 65, 105814. [Google Scholar]
Halder, S.; Afsari, K.; Akanmu, A. A Robotic Cyber-Physical System for Automated Reality Capture and Visualization in Construction Progress Monitoring. arXiv 2024, arXiv:2402.07034. [Google Scholar]
Lo, Y.; Zhang, C.; Ye, Z.; Cui, C. Monitoring road base course construction progress by photogrammetry-based 3D reconstruction. Int. J. Constr. Manag. 2023, 23, 2087–2101. [Google Scholar] [CrossRef]
Wang, J.; Ouyang, R.; Wen, W.; Wan, X.; Wang, W.; Tolba, A.; Zhang, X. A Post-Evaluation System for Smart Grids Based on Microservice Framework and Big Data Analysis. Electronics 2023, 12, 1647. [Google Scholar] [CrossRef]
Wu, S.; Hou, L.; Zhang, G.K.; Chen, H. Real-time mixed reality-based visual warning for construction workforce safety. Autom. Constr. 2022, 139, 104252. [Google Scholar] [CrossRef]
Ekanayake, B.; Wong, J.K.-W.; Fini, A.A.F.; Smith, P. Computer vision-based interior construction progress monitoring: A literature review and future research directions. Autom. Constr. 2021, 127, 103705. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Shafiee, M.J.; Chywl, B.; Li, F.; Wong, A. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv 2017, arXiv:1709.05943. [Google Scholar] [CrossRef]
Gao, S.; Ruan, Y.; Wang, Y.; Xu, W.; Zheng, M. Safety Helmet Detection based on YOLOV4-M. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 24–26 June 2022; pp. 179–181. [Google Scholar]
Dunphy, K.; Fekri, M.N.; Grolinger, K.; Sadhu, A. Data Augmentation for Deep-Learning-Based Multiclass Structural Damage Detection Using Limited Information. Sensors 2022, 22, 6193. [Google Scholar] [CrossRef] [PubMed]
Jacobsen, E.L.; Teizer, J. Deep Learning in Construction: Review of Applications and Potential Avenues. J. Comput. Civ. Eng. 2022, 36, 03121001. [Google Scholar] [CrossRef]
Pinto, G.; Wang, Z.; Roy, A.; Hong, T.; Capozzoli, A. Transfer learning for smart buildings: A critical review of algorithms, applications, and future perspectives. Adv. Appl. Energy 2022, 5, 100084. [Google Scholar] [CrossRef]
Xie, Y.; Jia, X.; Chen, W.; He, E. Heterogeneity-aware deep learning in space: Performance and fairness. In Handbook of Geospatial Artificial Intelligence; CRC Press: Boca Raton, FL, USA, 2023; pp. 151–176. [Google Scholar]
Dwivedi, U.K.; Wiwatcharakoses, C.; Sekimoto, Y. Realtime Safety Analysis System using Deep Learning for Fire Related Activities in Construction Sites. In Proceedings of the 2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Male, Maldives, 16–18 November 2022; pp. 1–5. [Google Scholar]
Shetye, S.; Shetty, S.; Shinde, S.; Madhu, C.; Mathur, A. Computer Vision for Industrial Safety and Productivity. In Proceedings of the 2023 International Conference on Communication System, Computing and IT Applications (CSCITA), Mumbai, India, 31 March–1 April 2023; pp. 117–120. [Google Scholar]
Shanti, M.Z.; Cho, C.S.; Byon, Y.J.; Yeun, C.Y.; Kim, T.Y.; Kim, S.K.; Altunaiji, A. A Novel Implementation of an AI-Based Smart Construction Safety Inspection Protocol in the UAE. IEEE Access 2021, 9, 166603–166616. [Google Scholar] [CrossRef]
Huang, C.; Pan, X.; Cheng, J.; Song, J. Deep Image Registration With Depth-Aware Homography Estimation. IEEE Signal Process. Lett. 2023, 30, 6–10. [Google Scholar] [CrossRef]
Arunyagool, D.; Chamnongthai, K.; Khawparisuth, D. Monitoring and Energy Control Inside Home Using Google Sheets with Line Notification. In Proceedings of the 2021 International Conference on Power, Energy and Innovations (ICPEI), Nakhon Ratchasima, Thailand, 20–22 October 2021; pp. 99–102. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Feng, R.; Miao, Y.; Zheng, J. A YOLO-Based Intelligent Detection Algorithm for Risk Assess-Ment of Construction Sites. J. Intell. Constr. 2024, 2, 1–18. [Google Scholar] [CrossRef]

Figure 1. Causes of safety management issues in construction.

Figure 2. YOLO model architecture.

Figure 3. Improving aggregation network module efficiency.

Figure 4. Composite model scaling.

Figure 5. Data preprocessing workflow.

Figure 6. DIKW pyramid.

Figure 7. Compilation of construction site image files.

Figure 8. LabelImg to extract object features.

Figure 9. Illustration of YOLO and grid cells.

Figure 10. Statistics of object categories and quantities.

Figure 11. YOLO image detection model testing results.

Figure 12. Illustration of precision and recall results.

Figure 13. Illustration of mAP results.

Figure 14. YOLOv7 model training results for F1 curve.

Figure 15. YOLOv7 model training results for P_curve.

Figure 16. YOLOv7 model training results for PR_curve.

Figure 17. System architecture flowchart.

Figure 18. Illustration of construction site detection system.

Figure 19. Illustration of affine transformation concepts.

Figure 20. Concept of homography transformation.

Figure 21. Grid lines for pixel and real-world coordinates.

Figure 22. Construction site video timeline.

Figure 23. Diagram of unsafe distance between wheel and worker.

Figure 24. Line notify flowchart.

Figure 25. Illustration of construction site hazard warning mechanism.

Table 1. Comparison of construction management monitoring techniques.

Author(s)	Year	Methodology	Targeted Analysis
Musarat et al. [26]	2024	Photogrammetry, Sensors	Safety monitoring
C.-Y. Wang et al. [23]	2023	YOLOv7 Model, Computer Vision	Object detection
Kim et al. [24]	2023	YOLOv8 model, behavioral recognition	Detection of unsafe behaviors
Halder, Afsari, and Shojaei [27]	2023	Manual inspections, progress reports	Progress tracking
Halder et al. [28,29]	2023	Sensor data, drone imagery, 360 photos	Progress tracking
Lo et al. [30]	2023	Manual data collection and analysis	Productivity analysis
J. Wang et al. [31]	2023	Manual data handling	Productivity analysis
Wu, S. et al. [32]	2022	Digital twin, mixed reality	Safety monitoring
Tao Han et al. [25]	2022	BIM-IoT-IC framework, real-time data integration	Road compaction quality monitoring
B. Ekanayake et al. [33]	2021	Sensor platforms and video analytics	Safety monitoring
This study	2024	AI, deep learning, image recognition, real-time	Safety monitoring

Table 2. Model performance metrics across varying sample sizes.

Sample Size (%)	Class	Sample Count	AP	Best F1 Score	Notes
33%	Worker	3636	0.856	0.82	Reliable detection due to significant sample size
	Excavator	650	0.936	0.91	High AP despite fewer samples
	Dump Truck	634	0.877	0.85	Good performance relative to sample size
	Crane	260	0.835	0.8	Lower AP, indicating need for more data
	Concrete Mixer Truck	218	0.917	0.9	Excellent performance despite fewest samples
	All classes	5398	mAP = 0.884	0.86	Overall balanced performance
67%	Worker	7348	0.879	0.82	Consistent detection across varying conditions
	Excavator	1296	0.948	0.91	Best performance in this range
	Dump Truck	1286	0.898	0.87	Stable performance for mid-sized category
	Crane	517	0.912	0.85	Improved performance with more data
	Concrete Mixer Truck	438	0.958	0.9	Strong performance for lower sample count
	All classes	10,885	mAP = 0.919	0.87	Overall balanced performance
100%	Worker	13,900	0.883	0.83	Reliable detection in complex environments
	Excavator	2467	0.95	0.92	Enhanced performance with distinctive features
	Dump Truck	2423	0.9	0.89	Improved detection capabilities
	Crane	932	0.915	0.87	Significant improvement with increased data
	Concrete Mixer Truck	811	0.964	0.91	Excellent performance despite relatively low sample count
	All classes	20,533	mAP = 0.922	0.88	Overall balanced performance

Table 3. Storing data by converting it into a byte array.

Object Name	Confidence	Position				Time
Object Name	Confidence	Ymin	Xmin	Ymax	Xmax	Time
Worker	0.96	67	166	179	231	29 April 2024 09:05
Worker	0.96	18	248	193	300	29 April 2024 09:05
Worker	0.96	96	75	157	124	29 April 2024 09:05
Worker	0.88	42	139	153	173	29 April 2024 09:05

Table 4. Safe area assessment in construction workflow.

Timeline	17:43:56	17:44:01	17:44:13		17:44:15	17:44:21	17:44:24
Status	Normal	Normal	Warning		Normal	Normal	Warning
Object Name	dump_truck	dump_truck	Crane	Worker	Crane	Crane	Crane	Worker
Confidence	0.95	0.64	0.98	0.95	0.98	0.98	0.98	0.94
Position	(230, 0, 1072, 613)	(20, 2, 1072, 1160)	(5, 15, 1075, 1472)	(485, 1748, 736, 1870)	(7, 30, 1076, 1458)	(4, 19, 1076, 1459)	(7, 13, 1076, 1456)	(438, 1593, 868, 1743)
Distance (cm)	0	0	592.6		0	0	575.69

Table 5. Distance estimation accuracy results.

Actual Distance (m)	Mean Estimated Distance (m)	MAE (m)	SD (m)
1	1.08	0.08	0.05
2	2.12	0.12	0.07
3	3.25	0.25	0.10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lung, L.-W.; Wang, Y.-R.; Chen, Y.-S. Leveraging Deep Learning and Internet of Things for Dynamic Construction Site Risk Management. Buildings 2025, 15, 1325. https://doi.org/10.3390/buildings15081325

AMA Style

Lung L-W, Wang Y-R, Chen Y-S. Leveraging Deep Learning and Internet of Things for Dynamic Construction Site Risk Management. Buildings. 2025; 15(8):1325. https://doi.org/10.3390/buildings15081325

Chicago/Turabian Style

Lung, Li-Wei, Yu-Ren Wang, and Yung-Sung Chen. 2025. "Leveraging Deep Learning and Internet of Things for Dynamic Construction Site Risk Management" Buildings 15, no. 8: 1325. https://doi.org/10.3390/buildings15081325

APA Style

Lung, L.-W., Wang, Y.-R., & Chen, Y.-S. (2025). Leveraging Deep Learning and Internet of Things for Dynamic Construction Site Risk Management. Buildings, 15(8), 1325. https://doi.org/10.3390/buildings15081325

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Deep Learning and Internet of Things for Dynamic Construction Site Risk Management

Abstract

1. Introduction

2. Literature Review

2.1. Construction Industry Safety

2.2. Integration of Automated Construction Management

3. Real-Time Object Detection Model

3.1. Data Preprocessing

3.2. Heterogeneity Analysis in Construction Safety Systems

3.3. YOLO Object Detection Method

4. Experiment and Results

5. System Application and Testing

5.1. System Architecture

5.2. Perspective Projection Coordinate Transformation

5.3. Security Assessment Block Module

5.4. Distance Estimation Accuracy

5.5. Real-Time IoT Alert System

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI