Deep Learning Based Fire Risk Detection on Construction Sites

The recent large-scale fire incidents on construction sites in South Korea have highlighted the need for computer vision technology to detect fire risks before an actual occurrence of fire. This study developed a proactive fire risk detection system by detecting the coexistence of an ignition source (sparks) and a combustible material (urethane foam or Styrofoam) using object detection on images from a surveillance camera. Statistical analysis was carried out on fire incidences on construction sites in South Korea to provide insight into the cause of the large-scale fire incidents. Labeling approaches were discussed to improve the performance of the object detectors for sparks and urethane foams. Detecting ignition sources and combustible materials at a distance was discussed in order to improve the performance for long-distance objects. Two candidate deep learning models, Yolov5 and EfficientDet, were compared in their performance. It was found that Yolov5 showed slightly higher mAP performances: Yolov5 models showed mAPs from 87% to 90% and EfficientDet models showed mAPs from 82% to 87%, depending on the complexity of the model. However, Yolov5 showed distinctive advantages over EfficientDet in terms of easiness and speed of learning.


Introduction
Fires on construction sites, whether they are new or undergoing refurbishment, are infrequent but can have severe and devastating consequences.South Korea has witnessed several large-scale fire incidents on construction sites, as illustrated in Figure 1.For instance, at the Icheon Refrigerated Warehouse construction site, the ignition of oil vapour during a urethane foaming operation, caused by an unidentified source, led to a fire.Similarly, at the Goyang Bus Terminal construction site, the ignition of urethane foam by sparks from welding work resulted in 7 deaths and 41 injuries.These incidents exemplify the common characteristic of catastrophic fires on South Korean construction sites, where a heat source (typically welding) and highly combustible materials (such as urethane foam or Styrofoam used for insulation) are in close proximity during various stages of construction.
The condition is prevalent in South Korean construction sites, particularly during the final stages, as multiple construction activities take place simultaneously within confined building floors with the aim of reducing construction times and costs.However, this poses significant fire hazards and requires careful management to prevent such devastating incidents.
Given the dangerous nature of the aforementioned condition, it is crucial to avoid it as much as possible.The National Fire Protection Association (NFPA) in the US has introduced the NFPA 51b regulation to prevent fire or explosions resulting from hot work projects, including welding, heat treating, grinding, and similar applications producing sparks, flames, or heat.This regulation ensures fire prevention during welding and hot work processes and is recognised in the US and South Korea.NFPA 51b stipulates that there should be no combustible materials within an 11 m (or 35 ft) radius of any hot work, as shown in Figure 2. In South Korea, the Korea Occupational Safety and Health Standards Rules (Article 241) adopts the 11 m rule for welding, cutting, and brazing operations, in accordance with the safety requirements established by NFPA 51b.By adhering to Article 241, most fire incidents on construction sites are likely to be prevented.However, this regulation is often violated by many medium-or small-sized construction sites, leading to repeated catastrophic incidents in South Korea.This situation gave rise to an idea that the recent advances in computer vision technology might be used to reduce such catastrophic incidents drastically.Object detection is a computer vision technology used to identify target objects in an image.It has the potential to enhance safety on construction sites through remote surveillance, enabling the detection of non-compliance with fire safety regulations.
The field of object detection has witnessed significant development over the past 20 years, typically divided into two distinct periods: the traditional object detection period (prior to 2014) and the deep learning-based detection period (since 2014) [2].
During the traditional object detection period, computer vision engineers relied on handcrafted features such as edges, colours, and simple textures that were distinctive in each given image [3].The selection of these features was based on the engineers' judgment and involved a lengthy trial and error process to determine the most effective features for different object classes [3].Examples are the Viola-Jones detector [4], Histogram of Oriented Gradients (HOG) [5], and Deformable Part-based Model (DPM) [6].
In 2012, AlexNet [7] introduced a multi-GPU training approach, enabling faster training of larger models.Since 2014, object detectors have undergone a rapid evolution by allocating substantial computational resources to the graphics processing unit (GPU) rather than the central processing unit (CPU).In the deep learning-based detection period, object detectors can be categorised as two-stage or one-stage detectors.
Recent studies have applied object detection to early-stage forest fire detection with high accuracy, distinguishing fire from fire-like objects (e.g., the sun) and detecting even small fires.Additionally, lightweight forest fire detection models have been developed for deployment on hardware devices such as CCTV.These applications typically employ onestage detectors such as Yolov3, SSD [19,20], Yolov5, EfficientDet [21], Yolov5 [22][23][24], and Deformable DETR [25].Similarly, object detectors have been employed for fire detection in urban indoor and outdoor environments, including chemical facility fire detection using Yolov2 [26], fire and smoke detection using Yolov3 and Yolov2 [27,28], and indoor fire and smoke detection using Faster R-CNN and Yolov5 [29][30][31].
In the context of safety on construction sites, object detection has been utilised to detect fire ignition sources such as welding sparks and fire safety equipment such as fire extinguishers and fire buckets using models such as Yolov5 [32] and Yolov4 [33].Although previous research [32,33] has focused on detecting ignition sources like welding sparks on construction sites, it has overlooked a crucial aspect in analyzing combustible materials such as urethane foam and Styrofoam, which possess the potential to escalate fires on a large scale.The research [33] introduced real-time object detection technology for identifying fires on construction sites, but primarily focused on post fire-occurrence detection, without a prevention strategy before an occurrence of fire.
This study aims to detect fire risks by identifying the presence of combustible materials (urethane foam/Styrofoam) and ignition sources (welding sparks) on construction sites.For a rigorous detection of fire risk on construction sites, the distance between an ignition source and a combustible material needs to be identified.However, due to the technical challenge involved in the process, this study focuses only on detecting the coexistence of an ignition source and a combustible material in a single camera view from a construction site using deep learning as the first stage of study.
Two deep learning models, Yolov5 and EfficientDet, were chosen as candidate deep learning models, and their performances were compared for detecting sparks as ignition sources and urethane foam and Styrofoam as combustible materials.
This paper is structured as follows.Section 2 provides an overview of fire incidents on construction sites in South Korea.Section 3 discusses fire detection methods, highlighting their functionalities and characteristics.Section 4 presents a comparison of the performance of these methods.Section 5 shows the experimental results, followed by a conclusion summarising the key findings.

Fire Incidents on Construction Sites in South Korea
Statistical analysis was carried out to identify the ignition sources and combustible materials commonly found in fire incidents on construction sites in South Korea.A dataset comprising 93 large-scale fire incidents that occurred between 2000 and 2019 was collected from the Korea Occupational Safety and Health Agency (KOSHA).Figure 3 presents an overview of the ignition sources found in the fire incidents, showing the sparks during hot work as the primary cause of fires.
Combustible materials in fire incidents on construction sites

Object Detection
Object detection has gained widespread adoption in various domains, including autonomous driving and video surveillance.Figure 5 shows the performances of the two state-of-the-art object detectors in terms of average precision (AP) on the Microsoft COCO dataset.Yolov5 and EfficientDet have demonstrated exceptional performance on the Microsoft COCO image dataset and have been extensively utilised in real-world applications [34].Table 1 provides a summary of performance of the two object detectors on custom datasets.Yolov5 tends to have a slightly better performance than EfficientDet.
Yolov5 offers five types of neural networks depending on the complexity of the network (see Table 2).Yolov5n is the smallest and fastest neural network, suitable for various applications.Yolov5n/s/m are designed for mobile deployments, while Yolov5l/x are intended for cloud deployments.Larger models like Yolov5l and Yolov5x generally deliver better results across different scenarios but have more parameters, require more CUDA memory for training, and exhibit slower inference speeds.

EfficientDet
EfficientDet is an advanced object detector developed by the Google Brain Team, consistently outperforming previous approaches in terms of efficiency under various resource constraints.The architecture of EfficientDet comprises three main components: (1) Backbone: EfficientNet, (2) Neck: BiFPN, and (3) Head.One of the key features of EfficientDet is the utilization of feature fusion techniques through a bidirectional feature pyramid network (BiFPN), which combines representations of input images at different resolutions [37].This approach enables EfficientDet to achieve high accuracy with fewer parameters and high floating-point operations per second (FLOPS) [21].EfficientDet offers pre-trained weights categorised from D0 to D7, with D0 having the fewest parameters and D7 having the highest number of parameters [37].

Dataset Preparation
The image dataset used in this study comprised images and videos of welding sparks, urethane foam, and Styrofoam sourced from Google and Naver search engines, as well as images obtained from the Korean AI integration platform (https://aihub.or.kr, accessed on 21 January 2023).Low-resolution or irrelevant images were removed manually from the search results.The numbers of images used in four trials in this study is shown in Table 3.In order to achieve the maximum performance, four different model training trials were carried out as discussed below.

Image Labeling Approach
Each image in the image dataset had to be labeled with bounding boxes to be used as training, validation, or test datasets.Typically, object detection was used to detect objects with a distinct shape, such as people, cups, or trees.However, object detection on sparks and urethane foam generally poses a challenge, as their shapes are not well-defined.For example, the shape of a spark depends on how it is generated, i.e., welded, flame-cut, or ground, and the shape of urethane foam depends on the specific spot where it is sprayed.This creates an uncertainty around how to label images for sparks and urethane foam.In addition, Styrofoam is prone to partial occlusion when stacked on construction sites.
Different image labeling approaches were explored and their Average Precision (AP) values were compared to determine the best approach.AP values were calculated using Yolov5s.

Sparks
For labeling images of sparks, two different labeling approaches were used: individual labeling and whole labeling, as shown in Figure 6.The individual labeling approach assigns multiple bounding boxes to each image, as shown in Figure 6a, where the image was labeled with three bounding boxes.The whole labeling approach assigns a single bounding box to cover all the sparks, as shown in Figure 6b.With 1900 images, training adopted a 6:2:2 ratio for training, validation, and test datasets in Table 3.This yielded an average precision (AP) of 60.3% for individual and 81.8% for whole labeling (Figure 6c).Notably, whole labeling outperformed individual labeling for sparks.

Urethane Foam
The same individual and whole labeling approaches were used for urethane foam.The individual labeling approach involved using more than 10 small bounding boxes per image, as shown in Figure 7a.The whole labeling approach employed 2-3 large bounding boxes per image, as shown in Figure 7b.Using 114 images, a 6:2:2 split ratio was used for training, validation, and test datasets (Table 3).Figure 7c shows average precision (AP) for urethane foam, comparing individual and whole labeling results.
The AP achieved through individual labeling for urethane foam was 88.3%, while the AP for the whole labeling approach was 93.3%.The improvement in AP for the whole labeling approach can be attributed to the larger bounding box size.Therefore, to achieve a higher AP, it is important to include as much of the urethane foam area as possible within a bounding box.

Styrofoam
Styrofoam is frequently stacked in bulk quantities on construction sites, often leading to partial occlusion of the material.When labeling Styrofoam, it is generally considered the best practice to label the occluded object as if it were fully visible, rather than drawing a bounding box solely around the partially visible portion as shown in Figure 8a.Training with 1381 Styrofoam images used a 6:2:2 ratio for training, validation, and test datasets (Table 3), achieving an AP of 85.9% in Figure 8b.

Long-Distance Object Detection
The image dataset used so far only consists of images of near objects.However, in real applications, it is ideal to be able to detect objects at further distances.The performance of the object detector for a long-distance object will be discussed in this section.
For the Yolov5s model trained, its performance for long-distance objects was calculated using a new test dataset containing only long-distance images.To enhance the detection performance for long-distanced objects, additional long-distance images were added to the training, validation, and test datasets.The model was then retrained using the updated image dataset, and its performance was evaluated on the updated test dataset.

Sparks
The training dataset comprised 1520 images, with a split of 6:2:2 (training:validation:test) as shown in Table 3, focusing on short-distance sparks (Figure 9a).A model trained solely on these images achieved an AP of 84.2% on the short-distance test dataset.
For performance evaluation on long-distance images, the test dataset was replaced by new 304 long-distance images (Figure 9b) and the AP value was evaluated, resulting in an AP of 2.9% significantly lower the original AP of 84.2%, as shown in Figure 9c.
To enhance long-distance spark detection, 330 long-distance images were added (6:2 ratio) to training: validation datasets (Table 3).This improved test dataset performance from 2.9% to 21% as shown in Figure 9c.

Urethane Foam
The dataset contains 1518 short-distance images (Figure 10a).These were split with a ratio of 6:2:2 (training:validation:test datasets), as shown in Table 3.After training on short-distance urethane foam images, an AP of 89.2% was achieved.When substituting short-distance test images with long-distance 304 urethane foam images (Figure 10b) in the test dataset, the model achieved a lower AP of 40.7%, as shown in Figure 10c.

Styrofoam
The dataset of 824 images was divided with a ratio of 6:2:2 into training:validation:test datasets, as shown in Table 3.The model, trained on short-distance Styrofoam images (Figure 11a), attained a 95.6% AP on the 163 images of short-distance test dataset (Figure 11c).However, its performance dropped to 40.8% AP when tested on 163 longdistance Styrofoam images (Figure 11b).
To ensure better performance of long-distance object detection, it is of paramount importance that enough long-distance images are included in the dataset.

Performance of Yolov5 and EfficientDet
The performance of Yolov5 and EfficientDet was compared in the final dataset, as shown in Table 3.The dataset was constructed using the whole labeling approach and also includes short-, medium-, and long-distanced images.Different sized models of Yolov5 and EfficientDet were all trained and their performance was evaluated, as shown in Figures 12 and 13 and Table 4. Yolov5 models were found to have slightly better APs from 87% to 90% than EfficientDet models from 82 % to 87%.However, it was found that Yolov5 was easier to train than EfficientDet, reaching convergence without the need for tuning parameters such as learning rate, batch size, and a choice of optimization algorithm.In addition, it should be noted that EfficientDet tends to scale up image size, resulting in higher memory consumption and slower training [46].On the other hand, Yolov5's architecture is lightweight, allowing training with smaller computational resources and cost-effectiveness.Figure 14 shows an example of fire risk detection on construction sites where Styrofoam is in close proximity to welding sparks.The trained Yolov5s model was found to successfully identify sparks and Styrofoams at the same time in a single camera view.The developed fire risk detection model may be used as a proactive fire risk management tool on construction sites.

Conclusions
To reduce catastrophic fire incidents on construction sites in South Korea, object detection technology was employed for detecting the fire risk that an ignition source and a combustible material coexist in a single-camera view of a surveillance camera on a construction site.Two candidate deep learning models, Yolov5 and EfficientDet, were compared on their performance in detecting welding sparks (as an ignition source) and urethane foam and Styrofoam (as combustible materials).

•
Improved Labeling for Enhanced Performance: To maximise the performance of the deep learning models in terms of the mean average precision (mAP), for detecting fire risks such as sparks and urethane foam, it was observed that higher mAPs were achieved by the labeling approach that encompassed the entire object(s) with relatively large bounding box(es).This improved labeling approach significantly improved the detection performance mAPs by around 15% for the given dataset.• Improved Long-Distance Object Detection: To enhance long-distance object detection, the study highlighted the importance of inclusion of images from diverse scenarios with varying distances into the dataset.By incorporating long-distance images, the model's ability to detect fire risks was notably improved, increasing the detection performance mAP by around 28% for the given dataset.

•
Best Model for Fire Risk Detection: In terms of the fire risk detection performance, Yolov5 showed a slightly better performance than EfficientDet for the given set of objects-sparks, urethane foam, and Styrofoam.It was found that YOLOv5 was easier to train without the need to fine-tune hyperparameters such as learning rate, batch size, and a choice of optimization algorithm.
Future work will concentrate on enhancing fire risk detection by incorporating the distance between combustible materials and ignition sources.Utilising depth estimation to measure these distances will yield valuable insights into the level of fire risks.By classifying the level of fire risk based on distance, a more quantitative assessment of fire risks can be achieved on construction sites.After the successful detection of fire risk using the proposed approach, an alarm can be notified to safety managers on the construction site or fire safety authorities, which can initiate appropriate action to manage the risk identified.

Figure 3 .
Figure 3. Ignition sources in fire incidents on construction sites in South Korea.

Figure 4
Figure4illustrates the combustible materials typically present on construction sites.Notably, urethane and Styrofoam constituted the majority of combustible materials present in the incidents.It can be seen that the coexistence of ignition sources such as welding sparks and combustible materials such as Styrofoam and urethane foam poses a significant risk of fires on construction sites.

Figure 6 .
Figure 6.Two labeling approaches and their performance on sparks.(a) Individual labeling.(b) Whole labeling.(c) AP.

Figure 7 .
Figure 7. Two labeling approaches and their performance on urethane foam.(a) Individual labeling.(b) Whole labeling.(c) AP.

Figure 14 .
Figure 14.Example of fire risk detection on the construction site.

Table 1 .
State-of-the-art object detection performance based on Yolov5, EfficientDet.

Table 3 .
Image datasets used in the study.

Table 4 .
Performance of Yolov5 and EfficientDet models.