Enhancing Tower Crane Safety: A Computer Vision and Deep

: The utilization of tower cranes at construction sites entails inherent risks, notably the po-8 tential for loads to fall. This study proposes a novel method for identifying the tower crane load fall


Introduction
Construction is a high-risk activity that takes place in complex environments.Consequently, the construction industry experiences a high fatality rate.Many of these injuries and fatalities can be attributed to the use of cranes, which are extensively employed on construction sites [1].In the United States, between 2011 and 2017, 297 fatal crane-related accidents were reported [2].Among these incidents, 154 cases resulted from contact accidents, with 79 of them specifically involving the fall of crane loads [2].Hence, the occurrence of crane load fall accounts for more than 25% of all crane-related fatalities.Restricting access to hazardous crane areas is a fundamental measure that can be taken to prevent or mitigate the impact of loads falling [3].
The Occupational Safety and Health Administration (OSHA) defines the fall zone as the area including but not limited to the area directly beneath the load where there is a foreseeable possibility of suspended materials falling [4].The OSHA 1926.1425 standard explicitly prohibits individuals from being present beneath a suspended load [4].Similarly, the BS-EN-ISO-13857 standard recommends maintaining a safety distance of 1.5 m from high-risk areas, including the crane load fall zone [5].Automated safety systems, particularly those based on computer vision, have demonstrated promising potential for enhancing safety [6].This study explores a computer vision and deep learning approach to accurately determine the relative location of individuals in relation to the tower crane load fall zone, which is validated through a carefully designed and executed experiment.
In the following sections of this paper, a review of the literature on crane safety improvement is presented in Section 2. Section 3 provides an explanation of the methodology employed in this study.The experiment and its findings are presented in Section 4. Section 5 is dedicated to the discussion of the results, and Section 6 encompasses the conclusions drawn from the research.

Literature Review
Previous studies have focused on enhancing crane safety through the implementation of various technologies, including sensors, scanners, and computer vision [7].However, in recent years, there has been a growing trend toward the use of computer vision technology for crane safety monitoring [6].This literature review explores the research conducted in this area.
One of the objectives of previous studies has been to alleviate crane collisions with site entities.Zhang and Ge [8] proposed a deep learning algorithm, FairMOT, to predict the trajectories of individuals and crane loads in a 2D image space.Yang et al. [9] used the Mask R-CNN to detect individuals and the crane hook.They calculated the distances between these entities to prevent collision.Chen et al. [10] examined the use of terrestrial laser scanning to produce a 3D point cloud and prevent crane collisions with modeled objects.
In addition to addressing crane collision prevention, computer vision technology has also been used to assist operators.Li et al. [11] developed an automatic system that uses robots and computer vision to attach loads to the crane hook.Wang et al. [12] trained a deep learning algorithm to interpret hand signals for crane steering.Other research [13,14] has aimed to determine the precise location of the load during blind lifts, where the exact position of the load is unknown due to swaying.These studies used color-based identification techniques to detect the load.However, in real conditions, loads often have colors similar to the background, making color-based identification challenging.Yoshida et al. [15] demonstrated the feasibility of using a stereo vision system to detect the load location.
The identification of crane loads and their related hazardous areas has also been a topic of interest in previous research.Zhou et al. [16] used the Faster R-CNN algorithm to identify all objects that were in the shape of a cube as possible crane loads.Their research showed that the deep learning algorithms alone was not able to distinguish between cuboid loads and other similar objects in the construction site.Chian et al. [17] proposed a method for estimating the tower crane load fall zone.Their method uses the homography matrix to transfer a grid of points from the project plan to the camera images.However, this method is only accurate if the construction site is a flat plane.In reality, the site is not a plane, which can cause the parallax problem and errors in the identification of the fall zone.In order to solve the problems of previous research, the current research proposes a new method for monitoring the crane load fall zone and determining the presence of people under the load.

Methodology
The proposed algorithm consists of four main components, which are presented in the subsequent subsections.(1) Depth extraction: A stereo camera system is used to extract depth information from the scene.(2) Load detection: The crane load is recognized based on its movement patterns and elevation.(3) Worker detection: The YOLOv7 algorithm is used to detect people in the scene.( 4) Location comparison: The location of people is compared to the location of the load fall zone in the scene.If a person is in the load fall zone, an alarm can be triggered to warn the person of the danger.

Depth Extraction
Depth information is a necessary input for the load detection algorithm.It is also necessary to determine the location of people and the load fall zone in the 3D world coordinate system.Depth calculation using a stereo vision system is a well-established method for fast and accurate depth estimation [15].Stereo vision is a technique that uses the difference between two or more stereo images to recover the three-dimensional structure of a scene [15].The stereo vision system used in this study is a stereo normal case that consists of two cameras and provides disparity.Disparity is the difference in the image coordinates of the corresponding points in two stereo images.The relationship between disparity and depth is expressed in Equation ( 1) [18].The disparity map was calculated using the StereoBM class of the OpenCV library [19].
In Equation ( 1), x is the vertical coordinate of the point in the left image; x is the vertical coordinate of the point in the right image; f is the focal length of the camera; B is the baseline distance between the two cameras; and z is the depth of the point in three-dimensional space.

Load Detection
Crane load detection is a more challenging task than regular object detection.The load is defined by the act of carrying it by the crane, rather than its visual appearance.Relying solely on the visual features of the load leads to limited results and makes it difficult to distinguish the load from other similar entities at the construction site [16].In this study, a novel method for crane load detection was developed based on the movement pattern and height of the load; these two features define the load, as every load carried by a crane needs to be lifted from the surface below it and moved.
Detecting moving objects in a video is a common problem in computer vision [6].However, detecting objects that move differently from the background is a different problem.When it is necessary to detect an object static with respect to the camera while the camera is moving, another method is needed.Optical flow is a two-dimensional vector that shows the movement of a specific point between two consecutive frames [20] and is a suitable criterion for identifying objects with different movements.The prerequisite for an object to be identified as a possible load is that it moves differently from the background.The optical flow in this study was calculated using the Lucas-Kanade algorithm [20] and then was densely calculated using the interpolation technique for all pixels.By specifying pixels with different optical flows, objects with different motions can be identified.
The height difference between the load and the underlying surface is a crucial factor to consider.By comparing the depth of candidate objects and the surrounding surface within a radius of 1.5 times, it can be determined whether the object is suspended.A difference of 2 m is considered the necessary distance for an object to be classified as suspended.When an object is suspended and exhibits different movement characteristics, it is considered as a load.

People Detection
Identifying individuals from a high height with an almost vertical viewing angle is a challenging issue because classical algorithms depend on appearance features that are mostly absent [8].In addition, the size of the people in the photos is very small.To overcome these challenges, this study employed the YOLOv7 algorithm [21], which was pre-trained on the COCO image database [22], to detect people with high accuracy.
The transfer learning of the YOLOv7 algorithm was performed using 120 images.Due to the small size of the database, the backbone layers of the pre-trained model, which are the first 50 layers, were frozen to avoid overfitting on a small amount of data.

Comparison of Locations
The last step involves comparing the location of the individuals and the load fall zone.This study represents the first instance of such a comparison conducted in the 3D world coordinate system.Using Equation (2) and depth measurements, the 3D coordinate and true size of the objects can be determined, allowing for a comparison in the world coordinate system. x Equation ( 2) employs the focal length (f), depth (z), and a constant for unit matching (c) to convert point image coordinates (x, y) to 3D world coordinates (X, Y), and the image size of objects (r) to its actual size (R).
Upon conducting the comparison of worker locations with the load fall zone, the worker locations are classified into three distinct zones: the red zone, which is directly beneath the load and is deemed off-limits to all personnel, the yellow zone, which is situated 1.5 m away from the red zone and is only accessible to individuals involved in the operation, and the green/safe zone, which is situated outside these two zones.

Experiment
A field experiment was conducted to validate the proposed method, encompassing all loads handling scenarios at a height of 13 m.To ensure that the test conditions closely resembled the actual working environment of the crane, a crane model was designed and fabricated.The length of the model boom was set at 3 m to provide an adequate distance for the cameras from the load.Also, a stereo system comprising two calibrated smart phone cameras spaced at 18 cm was developed to capture videos.The person's zone is the key variable and the ultimate outcome of this study.Precision and recall rate were used to assess the model's performance in determining the person's zone.The ground truth zone is documented by communicating with the individual under load and during the test execution.The confusion matrix related to determining the person's zone is presented in Table 1, which also reports the precision and recall rates for each zone.Figure 1 provides an example of the algorithm analysis results, revealing the precise detection of the load and the person.The person's zone is classified into three categories: red, yellow, and green, based on a comparison of their location with the fall zone's center point and the load's dimensions.
and true size of the objects can be determined, allowing for a comparison in the world coordinate system.= = = c., (2) Equation ( 2) employs the focal length (f), depth (z), and a constant for unit matching (c) to convert point image coordinates (x, y) to 3D world coordinates (X, Y), and the image size of objects (r) to its actual size (R).
Upon conducting the comparison of worker locations with the load fall zone, the worker locations are classified into three distinct zones: the red zone, which is directly beneath the load and is deemed off-limits to all personnel, the yellow zone, which is situated 1.5 m away from the red zone and is only accessible to individuals involved in the operation, and the green/safe zone, which is situated outside these two zones.

Experiment
A field experiment was conducted to validate the proposed method, encompassing all loads handling scenarios at a height of 13 m.To ensure that the test conditions closely resembled the actual working environment of the crane, a crane model was designed and fabricated.The length of the model boom was set at 3 m to provide an adequate distance for the cameras from the load.Also, a stereo system comprising two calibrated smart phone cameras spaced at 18 cm was developed to capture videos.The person's zone is the key variable and the ultimate outcome of this study.Precision and recall rate were used to assess the model's performance in determining the person's zone.The ground truth zone is documented by communicating with the individual under load and during the test execution.The confusion matrix related to determining the person's zone is presented in Table 1, which also reports the precision and recall rates for each zone.Figure 1 provides an example of the algorithm analysis results, revealing the precise detection of the load and the person.The person's zone is classified into three categories: red, yellow, and green, based on a comparison of their location with the fall zone's center point and the load's dimensions.

Discussion
The proposed model demonstrated high accuracy, as indicated by its 94% precision and 96.5% recall.The experiments were analyzed using a computer equipped with an Intel Core i7-9750H CPU, 8 GB RAM, and NVIDIA GeForce GTX 1650 GPU.The algorithm runs at a speed of 8 frames per second.This execution speed determines the worker's zone almost in real time, making it suitable for monitoring the load fall zone and issuing a warning of worker presence in the red zone if needed.
The proposed method surpasses previous approaches in multiple key aspects.First, it is highly cost-effective because of its minimum physical equipment requirement.Second, it eliminates the reliance on unrealistic assumptions, such as assuming that the entire site surface is flat or that the camera is permanently fixed.This greatly enhances its practical use in construction sites.Third, the algorithm operates at a high speed of 8 frames per second, with an impressive 94% precision and 96.5% recall in detecting the worker's zone.This highlights the method's exceptional performance and reliability, surpassing previous models that identified only a limited set of load types [17] and a maximum speed of 1 frame per second [8].Finally, the method's ability to provide continuous service without human intervention sets it apart from traditional methods that rely on human power for safety monitoring, reducing the likelihood of errors and improving overall efficiency.
Computer vision-based solutions have one limitation: the inability to function effectively in situations where the subject is occluded.This issue can be avoided to a large extent by determining the appropriate location for the cameras on the crane boom.However, in blind operations, occlusion may occur when the obstacle is very close to the load and its dimensions are significant, compared with the height of the crane.

Conclusions
The present study introduces a novel approach to monitor the presence of individuals within the crane load fall zone, which is a critical issue in the construction industry due to legal emphasis and hazardous conditions.Crane safety has been the subject of extensive research.However, the practical application of past methods is restricted due to the cost and assumptions associated with them.This study proposes a method that uses stereo vision to extract image depth information, computer vision algorithms to detect the load, and the YOLOv7 deep learning algorithm to identify individuals.By comparing the location of individuals and the load fall zone in the world coordinate system, the zone of individuals is classified into three categories (red, yellow, and green) based on predefined rules.The key accomplishments of this research are as follows: (1) a novel approach is proposed for the detection of the crane load fall zone and the determination of workers' positions relative to it; (2) a new algorithm is developed that enables load detection irrespective of its type, shape, color, and size; (3) the YOLOv7 deep learning algorithm is fine-tuned to accurately identify workers from the altitude of the crane; and (4) a laboratory model of a crane is constructed to facilitate the testing of the proposed method.
In conclusion, the method presented in this study achieves 94% precision, 96.5% recall, and an analysis speed of 8 frames per second, making it an accurate and fast solution that can be used in real-world conditions.This system can provide valuable information about the safe behavior of workers to safety managers.

Figure 1 .Figure 1 .
Figure 1.Examples of the results of the experiment.(a) Person within fall zone; (b) Person is situated in the yellow zone and at a distance of one meter from the red zone; (c) Person located in a safe zone.Figure 1. Examples of the results of the experiment.(a) Person within fall zone; (b) Person is situated in the yellow zone and at a distance of one meter from the red zone; (c) Person located in a safe zone.

Table 1 .
Experiment confusion matrix and precision and recall rates for each zone.