Determining the Origin of Multi Socket Fires Using YOLO Image Detection

Lee, Hoon-Gi; Pham, Thi-Ngot; Nguyen, Viet-Hoan; Kwon, Ki-Ryong; Huh, Jun-Ho; Lee, Jae-Hun; Liu, YuanYuan

doi:10.3390/electronics15010022

Open AccessFeature PaperArticle

Determining the Origin of Multi Socket Fires Using YOLO Image Detection

by

Hoon-Gi Lee

^1,†

,

Thi-Ngot Pham

^2,3,†,

Viet-Hoan Nguyen

^4,5,

Ki-Ryong Kwon

⁴,

Jun-Ho Huh

^3,6,7,*

,

Jae-Hun Lee

^1,* and

YuanYuan Liu

^8,*

¹

Fire Safety Research Division, National Fire Research Institute of Korea, Asan-si 31555, Republic of Korea

²

Department of Data Informatics, National Korea Maritime and Ocean University, Busan 49112, Republic of Korea

³

Ocean Renewable Energy Engineering, National Korea Maritime and Ocean University, Busan 49112, Republic of Korea

⁴

Department of Artificial Intelligence Convergence, Pukyong National University, Busan 48513, Republic of Korea

⁵

R&D Center, Intown Co., Busan 08592, Republic of Korea

⁶

Department of Data Science, National Korea Maritime and Ocean University, Busan 49112, Republic of Korea

⁷

School of Computer Science and Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea

⁸

Division of Business Administration, Pukyong National University, Busan 48547, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2026, 15(1), 22; https://doi.org/10.3390/electronics15010022

Submission received: 2 November 2025 / Revised: 10 December 2025 / Accepted: 11 December 2025 / Published: 22 December 2025

(This article belongs to the Special Issue Innovations in Deep Learning and Computer Vision for Early Fire and Smoke Detection)

Download

Browse Figures

Versions Notes

Abstract

In the Republic of Korea, fire outbreaks caused by electrical devices are one of the most frequent accidents, causing severe damage to human lives and infrastructure. The metropolitan police, The National Institute of Scientific Investigation, and the National Fire Research Institute conduct fire root-cause inspections to determine whether these fires are external or internal infrastructure fires. However, obtaining results is a complex process. In addition, the situation has been hampered by the lack of sufficient digital forensics and relevant programs. Apart from electrical devices, multi-sockets are among the main fire instigators. In this study, we aim to verify the feasibility of utilizing YOLO-based deep-learning object detection models for fire-cause inspection systems for multi-sockets. Particularly, we have created a novel image dataset of multi-socket fire causes with 3300 images categorized into the three classes of socket, both burnt-in and burnt-out. This data was used to train various models, including YOLOv4-csp, YOLOv5n, YOLOR-csp, YOLOv6, and YOLOv7-Tiny. In addition, we have proposed an improved YOLOv5n-SE by adding a squeeze-and-excitation network (SE) into the backbone of the conventional YOLOv5 network and deploying it into a two-stage detector framework with a first stage of socket detection and a second stage of burnt-in/burnt-out classification. From the experiment, the performance of these models was evaluated, revealing that our work outperforms other models, with an accuracy of 91.3% mAP@0.5. Also, the improved YOLOv5-SE model was deployed in a web browser application.

Keywords:

fire prevention; deep learning; assisted inspection system; multi-socket fire cause; YOLOv5; digital forensic

1. Introduction

Fire incidents are one of the most common accidents, inflicting substantial harm to human lives and infrastructure. Normally, fire outbreaks occur due to electrical devices, wiring cables, plugs, multi-socket outlets, and gas devices. Additionally, fires caused by electricity are significant, accounting for 27.4% of the total number of fire accidents, on average, from 1994 to 2021 in the Republic of Korea [1]. As a result, investigative authorities, such as the metropolitan police, the National Institute of Scientific Investigation, and the National Fire Research Institute, have carried out inspections to determine the sources of fire outbreaks. Identifying the source of a fire is of crucial importance, since it determines the root cause/culprit as stipulated by the laws of the Republic of Korea. Typically, finding the exact cause of a fire is a very complicated task, and the investigation process can last from 7 to 10 days. Additionally, fire-related investigations are primarily based on the post-fire remnant objects. The proposed method can help reduce the time it takes to survey fire causes. In the case of electrical fire incidents, fires originating from electrical multi-socket outlets have become frequent occurrences nationwide. However, there is a current lack of artificial intelligence programs or deep learning-based applications to assist with inspections of multi-socket fire origins. Previous works have mainly been aimed at classifying fires caused by electric wires using CNN-based deep learning models [2,3]. In this work, we have focused on post-fire multi-socket outlets as our research objective.

In addition, existing research presented an analysis of the electrical socket outlet carbonization related to an overcurrent (or internal fire cause) [4].

In our work, we have investigated post-fire carbonized electrical socket outlets from both external and internal fire sources. According to our observation, the condition of post-fire multi-socket outlets varies in terms of carbon-melting shape depending on the oxygen concentration. If the socket is damaged by an external fire, the oxygen concentration remains normal. However, if the socket is damaged by an internal fire, the oxygen concentration is lower than that in the case of an external fire, resulting in more smoke and a different burnt shape. This phenomenon can be observed and captured during experiments on multi-socket fires. In the case of socket damage due to internal fires, a lot of black fire smoke is initially produced before the outer shell of the socket melts, leaving a burnt section due to the melted plastic socket, with soot sticking to the side of the burnt area. In contrast, in the case of damage due to external fire, the fire smoke is initially white before the socket melts, leaving a burnt section that burns inwards, and the shape and burnt coal are different. Based on these differences (melted area and carbonization), it is feasible to classify fires in sockets as being caused by inside or outside sources.

In detail, 220 V multi-socket combustion experiments were conducted by the Korea Testing and Research Institute (KTR). Based on the video recordings of these experiments and images captured of post-fire outlets, we created our dataset to utilize deep learning-based object detection methods to precisely identify the location of a fire outbreak, whether it be internal or external. To assist in boosting the speed of inspection, we have verified the feasibility of using deep learning models to classify the causes of fire incidents in electrical multi-socket outlets based on the analysis of post-fire images and videos. The two main causes of fires considered are “burnt-in” (caused by internal fires) and “burnt-out” (caused by external fires). These two classes can be considered to be specific objects that can be detected and classified by deep learning-based object detection models from images or video. In our work, we have also considered the burned socket as an additional object to locate the region-of-interest areas (ROIs), according to which there are three categories, including “socket”, “burnt-in”, and “burnt-out” classes.

From the literature review [5], there are two main branches of state-of-the-art (SOTA) deep learning-based object detection methods, namely one-stage and two-stage detectors. The representative one-stage detectors include You-Only-Look-One (YOLO) [6], famous for its fast detection time. The YOLO family has developed from its beginning, with YOLOv1, to the middle of 2023, with many popular variations such as YOLOv4 [7], YOLOv4-csp [8], YOLOR-csp [9]. Among them, YOLOv5 [10] is the most popular YOLO version, using the Pytorch development platform of YOLOv4, which is a baseline that was used to develop later versions such as YOLOv6 [11], and YOLOv7 [12]. In contrast, two-stage detectors prioritize accuracy over detection time, with the two most popular and well-known methods being the Recurrent Neural Network (RCNN) [13] and Faster RCNN [14]. These R-CNN models take a long time and have a slow detection speed.

In this work, we focus on YOLO-based deep learning models to implement our assistance system for the task of inspecting fires caused by external or internal factors of the multi-socket. It is noted that there is no available dataset of post-fire multi-socket images with corresponding labels that annotate the bounding boxes of post-fire multi-sockets as being caused by internal or external fires. The main contributions of this work are as follows:

We create a novel dataset with annotations in PASCAL VOC of post-fire multi-socket outlets with 3300 images, including three categories: “socket”, “burnt-in”, and “burnt-out”.
We verify the feasibility of YOLO-based deep learning models, i.e., YOLOv4-csp, YOLOv5n, YOLOR-csp, YOLOv6n, and YOLOv7-Tiny for the classification task of identifying fire-causing reasons (internal as “burnt-in” versus external as “burnt-out” sources) in multi-socket outlets.
We propose an improved version of the conventional YOLOv5n by adding squeeze-and-excitation networks (SENet) into the existing YOLOv5 backbone, following a two-stage detector architecture instead of a one-stage detector, including a first stage of socket detection and a second stage of fire-causing classification into either the burnt-in or burnt-out categories.
We deploy trained YOLO weights on stand-alone web browser applications.

2. Related Research

Researchers have mostly studied fire and smoke detection using CNN-based deep learning models [15,16]. However, researchers have recently paid attention to studying the classification of fire-originating factors based on the remaining post-fire objects, such as electrical wires. For example, an Android application by KESCO, named CUCU [17], as seen in Figure 1a, classifies various types of electrical wire short-circuit marks, as shown in Figure 1b. In Doo-Hyun Kim et al. [4] analyzed the carbonization of electrical socket outlets caused by an overcurrent. As a result of the overcurrent in Figure 2a, there was thermal damage to the carbonized shape of the electrical socket outlets, as shown in Figure 2b.

In 2021, Jang-Hoon Jo et al. [2] presented a classification of the shape of molten marks for the analysis of the causes of electrical fires using a CNN-based deep learning model. The authors created post-fire electrical fires, including two categories: “arc beads” and “molten mark”, which were classified with an accuracy of 93.54% and 96.81%, respectively, by the ResNet model. In 2023, Hyeong-Gyoon Park et al. [3] introduced a comparison of the performances of some CNN-based deep learning models for electrical wire fire cause analysis. The classified classes consisted of three categories, namely “primary arc bead”, “second arc bead”, and “molten mark”. The authors compared four CNN-based models, including GoogleNet, Inception V3, VGG16, and ResNet50. From the experiment results, Inceptionv3 outperformed other models with an accuracy of 97.8% for three fire-cause classifications of post-fire electrical wires.

This related work provides some insights into our work; however, there is still much room for improvement. In our work, we analyzed post-fire sockets’ carbonized shapes instead of the short-circuit marks on electrical wires to find out if a fire was caused by an internal or external cause. In addition, the CNN-based deep learning models in previous studies are quite out of date, i.e., GoogleNet (2014) [18], ResNet50 (2015) [19], and InceptionV3 (2015) [20]; thus, we will focus on comparing SOTA models, including YOLOv4 (2021), YOLOv5 version 6 (2021), YOLOv6 (2022), and YOLOv7 (2022).

3. Research Methodology

Our overall research methodology is described in Figure 3, which includes our three main phases, those being image data processing, the evaluation of a dataset with a deep learning object detection model, and the deployment of a web browser application.

Firstly, a new image dataset will be generated based on our experiments due to the lack of open-source data. Therefore, burning experiments will be conducted on multi-sockets to capture images of post-fire multi-sockets for raw data collection. These multi-sockets will be burned from both the external and internal positions (burnt-out and burnt-in). Before training a model on deep learning object detection, it often requires image annotation, which requires the manual labeling of raw data (unlabeled images) by drawing bounding boxes and selecting class labels. In the second phase, a new image dataset, called the External/Internal Fire Cause Dataset (EIFCD), will be evaluated using pre-trained YOLO-based deep learning object detection models, including YOLOv5, YOLOv6, YOLOv7, and the proposed YOLOv5-SE, and transferring this learning to archive the best weights of models for EIFCD.

Later, in the deployment phase, the trained YOLOv5-SE model will be deployed in a web application (stand-alone architecture). Then, the deployment phase will be conducted using the three main parts of the YOLOv5 model: the Object Detector, source of data input (i.e., image, video, live stream), and web browser graphic user interface (i.e., display window). This work also verifies the feasibility of deploying these models on web browser applications, such as a stand-alone schema. The weight of the model should be lightweight, and the detection time should be fast enough to work on devices with low processing power in real-time.

3.1. Data Collection

We conducted experiments to simulate real fire scenarios, as shown in Figure 4. The experimental setup included digital cameras, a thermal camera, a baseline multi-socket, a needle-flame tester (lighter), and a holder. In detail, the digital camera was positioned directly to the side and below the position of the multi-socket to record the burning process of the multi-socket. The thermal camera was also recording together with the original camera during the burning process. The fire needle was lit up to create a fire from the outside and inside of the multi-socket. The fire needle and socket were fixed in position by using a holder. It is noted that our experiments were performed with a burn chamber and following laboratory safety regulations.

Figure 5 illustrates the theoretical position of the needle-flame tester (lighter) in two simulated fire conditions: external cause of fire (burnt-out) and internal cause of fire (burnt-in). In Figure 5a, in the case of an external cause of fire, the fire lighter is positioned close to the outer surface of the multi-socket, such as the bottom and side surfaces. In Figure 5b, in the case of an internal cause of fire, the lighter is inserted into the multi-socket through a small hole that is created on the bottom surface.

We carried out experiments on 300 multi-socket samples, with 150 samples being burnt by external causes and 150 samples being burnt by internal causes. After performing the fire experiments, we collected information about the fire through video recording and collected post-fire information through images with two captured views (i.e., top view and front view). All multi-socket samples were of the same type, and we only utilized the images (videos) captured by the original camera, a NIKON Z7, to further investigate without using the thermal videos.

Figure 6 presents the status of the multi-socket in terms of external and internal fire causes during the performance of the fire experiment. The name tag (i.e., A-120) is situated on the surface of the multi-socket and is configured as follows. “A” stands for external fire cause, while “B” stands for internal fire cause, and the number refers to the serial number of the sample. Each case is shown in three statuses: initial fire, during fire, and post-fire. After the fire, the socket changes its form and burns, carbonizing where it comes in contact with the fire. In the case of multi-socket burning caused by an external fire, as in Figure 6a, the fire smoke is mainly white at first, and then the socket creates an external burnt area. On the other hand, an internal fire, as shown in Figure 6b, initially produces a lot of black smoke, and afterward, the outer shell of the socket melts, creating a burnt area due to the melted plastic socket with a hole left in the side of the burnt area. However, the shape and the burnt charcoal from the two causes are different.

After the fire experiments, each burned socket was captured with two views (the front view and top view), as shown in Figure 7. Regardless of the cause of the fire, the region was burnt and changed, manifesting in carbonization. However, the shape of the burnt socket and the amount of carbonization were different for the two causes of fires. With the external cause of fire, the carbonization tended to be less severe than with the internal cause of fire. According to these different signs (i.e., melted area and charred sockets), it is possible to detect the cause of the fire in the socket as deriving from an internal or external source through the post-fire evidence.

It is noted that after performing the fire-cause simulation experiments, we collected post-fire socket images. Each sample corresponds to two images (front and side views), meaning we obtained 600 raw images for the dataset. However, after manual evaluation, there were some faulty multi-socket samples, so only the 550 raw images that met the requirements were used to label the dataset, as presented in the Experiment Setup Section.

After receiving the raw images of the burned sockets, we annotated the post-fire socket images using Label open-source software, namely LabelImg (Available online: https://github.com/tzutalin/labelImg/)(accessed on 1 November 2025). The portable download link is publicly available at [21]. In Figure 8, the interface of the LabelImg tools is displayed, including the step-by-step labeling process, as follows.

Firstly, we selected the “openDir” function to open the direction of the image folder (i.e., dataset_burnt_in_socket). Then, the images were loaded into the LabelImg tool, as shown in the file list on the lower left side of the GUI window, while the current image was displayed on the main screen. The second important step was to select the desired format for the annotation files to be exported in. The LabelImg tool supports three exporting formats: PASCAL VOC, YOLO, and CreateML. In this work, we chose to export the annotated files in the PASCAL VOC format or “.xml” annotation files.

Thirdly, we used the “Create RectBox” function to draw the bounding box, shown as a blue rectangle in Figure 8. It is noted that the bounding box has to fit, as much as possible, around the object at hand. In the next step, we selected the “Edit Label” function to input the label name using a keyboard (i.e., “burnt-in” class as shown in the middle-left side of Figure 8). It should be noted that we could have quickened the edit label step by selecting the “Use default label” option. For instance, our image dataset is labeled into three categories: the socket, burnt-in, and burnt-out classes. Thus, we can set up three default labels, denoted as “burnt-in”, “burnt-out”, and “socket”. Hence, we can select the label name by using the mouse instead of typing on the keyboard. In the final step, the labeled image was saved in the desired folder.

3.2. YOLO-Based Deep Learning Models

Figure 9 presents an overview of the architecture of the YOLO-based models, including (a) conventional YOLOv5, (b) our improved YOLOv5-SE, and (c) the model presented in this work, combining our proposed two-stage architecture with the improved YOLOv5-SE. In detail, to increase the accuracy of detection and classification of the conventional YOLOv5, we propose two improved methods: the Squeeze-and-Excitation Networks (SENet)attention module [22] is added into the backbone of the conventional YOLOv5 after the last C3 and before the SPPF layer of the backbone, as can be seen in Figure 9b, and the detector is deployed following a two-stage architecture with the first stage being socket detection and second stage being burnt-in and burnt-out detection, as shown in Figure 9c.

3.2.1. Squeeze-and-Excitation Networks (SENet)

In this work, we add a SENet module into the existing YOLOv5 architecture, intending to enhance the classification performance of the model by focusing on channel-wise attention. The main core of SENet is the Squeeze-and-Excitation (SE) block, which recalibrates feature responses channel-wise to prioritize the most informative features and suppress less useful ones, because SENet builds on the idea that not all features extracted by the backbone network are equally important for the task at hand. Input X is mapped to the feature map

U = [u_{1}, u_{2}, \dots, u_{c}]

through a transformation

F_{t r}

as a convolutional operator, and in the learned set of filter kernels

V = [v_{1}, v_{2}, \dots, v_{c}]

with

v_{c}

as the parameters of the cth filter. The output of the feature map:

u_{c} = υ_{c} * X = \sum_{s = 1}^{C^{'}} v_{c}^{s} * x^{s}

(1)

where

v_{c}^{s}

is a 2D spatial kernel representing a single channel of

v_{c}

that acts on the corresponding channel of X.

The SE block involves two main operations: “squeeze,”

F_{s q} (\cdot)

which condenses global spatial information into a compact channel descriptor using global average pooling, and “excitation”,

F_{e x} (\cdot, W)

which models channel dependencies and generates scaling factors through a small fully connected neural network, as can be seen in Figure 10

F_{s q} (u_{c}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(2)

F_{e x} (z, W) = σ (g (z, W)) = σ (W_{2} δ (W_{1} z))

(3)

where

δ

is the ReLU function,

W_{1} \in R^{\frac{C}{r} \times C}

,

W_{2} \in R^{C \times \frac{C}{r}}

.

Then, each channel of the input feature is scaled by multiplying the corresponding element

F_{s c a l e} (\cdot, \cdot)

in the attention vector.

{\tilde{x}}_{c} = F_{s c a l e} (u_{c}, s_{c})

(4)

where

\tilde{X} = [{\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{c}]

and

F_{s c a l e} (u_{c}, s_{c})

is channel-wise multiplication between the scalar s_c and the feature map

u_{c} \in R^{H \times W}

.

Despite its additional parameters, the SE block is computationally efficient and can be seamlessly integrated into existing YOLO architectures.

3.2.2. Two-Stage Detector

As mentioned in related works, conventional YOLO-based models follow a one-stage detector. In this work, we propose a two-stage detector architecture with the first stage being socket detection and the second stage being fire-cause (burnt-in/burnt-out) detection, as explained in Algorithm 1.

Algorithm 1: Two-stage detection

In:

I_{S}

input source (images, videos),

W_{S I O}

: YOLOv5-SE model weights
Out:

P_{S I O}

: prediction of socket (S), burnt-in (I) and burnt-out (O)bounding boxes

$M \leftarrow$ DetectMultiBackend $(W_{S I O})$
If $I_{S} =$ “images” then
$D \leftarrow$ LoadImage $(S, i m g_s i z e)$
If $I_{S} =$ “videos” then
$D \leftarrow$ LoadStream $(S, i m g_s i z e)$
For each pair $(P, P_{0}) \in D$ :
$\hat{B} \leftarrow M (P)$ # predicted boxes in resized coordinates
For each index $j$ and box $b_{j} \in \hat{B}$ :
$b_{j}^{'} \leftarrow S c a l e C o o r d s (s h a p e (P), b_{j}, P_{0} [j])$
$S_{j} {\leftarrow S}_{0} [j] [b_{j}^{'}]$ # cropped socket region
${\hat{I}}_{j}$ DetectBurnt-in $(S_{j})$
${\hat{O}}_{j}$ DetectBurnt-out $(S_{j})$
$I_{j}^{'} \leftarrow b_{j}^{'} \oplus {\hat{I}}_{j}$ # global burnt-in coordinates
$O_{j}^{'} \leftarrow b_{j}^{'} \oplus {\hat{O}}_{j}$ # global burnt-out coordinates
$P_{S I O} \leftarrow (S_{j}, I_{j}^{'}, O_{j}^{'})$
Return $P_{S I O}$

In the first stage, after the socket is detected, the socket bounding boxes will be extracted in the format of [xyxy] as [xmin, ymin, xmax, ymax] and scaled from the network size (img) to the input image’s (im0) resolution. Then, the detected socket will be cut out of the input image (im0) and resized to 640 × 640 as an input for the second stage of classification. After being classified as burnt-in or burnt-out, the bounding boxes of the classified fire will be updated to the input image’s (im0) resolution by adding the bounding boxes of the detected socket. Finally, the detector will return the bounding boxes and label information of the socket and fire-cause detection (“burnt-in” for internal cause of fire and “burnt-out” for external cause of fire). Utilizing a two-stage detector leads to more accurate detection because the YOLOv5-SE network can focus on smaller, specific areas of the image with higher resolution and context, thereby improving the quality of the final predictions. Additionally, with the more detailed classification step applied to the detected socket, a two-stage detector significantly reduces the number of false positives. The initial input image might include many regions that do not contain burnt-in/burnt-out areas (such as backgrounds or ambient noise). The second stage filters out these false positives through a more stringent and discriminative classification process, leading to more reliable detection results.

3.3. Web-Browser Application Deployment

As mentioned previously, to detect multi-socket fire causes more conveniently, we deployed the trained YOLOv5n-SE model onto a web browser application that users can easily access. Our proposed web application for socket fire-root-cause detection on a single device (serverless) was developed with lightweight devices with cameras, such as mobile devices or mini-PCs, considered as the targeted deployment. The proposed stand-alone system consists of two main parts, denoted as a front-end Graphic User Interface (GUI) web browser and a back-end Socket Fire Causing Burnt-in/Burnt-out Detector (SFCBD), as denoted in Figure 11. The front-end GUI web browser illustrates a display window and types of input data selection (image, videos, and live stream). The back-end Socket Fire Causing Burnt-in/Burnt-out Detector (SFCBD) includes the trained model to detect multi-socket fire causes and save its results.

In detail, Figure 12 shows the interface of the stand-alone web application system with the Graphic User Interface of the Socket Fire Causing Burnt-in/Burnt-out Detector application, which consists of two main components: a menu on the left side and a monitor screen on the right side. Within the front-end web GUI, several functions have been implemented to assist users in interacting with the detector application.

Firstly, the mode selection function was created to support users in selecting the input data types that they wish to use. In this application, three input data types are supported, including “Upload Image”, “Upload Video”, and “Live Stream”. If a user selects “Upload Image” and “Upload Video”, there are two buttons: one for selecting a file, denoted as “Choose File”, and another for uploading data, denoted as “Upload Image/Video”. As for “Live Stream”, it entails clicking on the “Live Stream” button, which requires the device to connect to a webcam or camera for streaming.

Secondly, a monitor screen displays the output results by using three forms in an HTML template. To retrieve input data from our specified URL locally, two common HTTP methods (“GET” and “POST”) are utilized, supported by Flask. The monitor screen updates the output results of the Socket Fire Causing Burnt-in/Burnt-out Detector (SFCBD) application, and the detected image or detected image frame containing burnt-in/burnt-out socket will be saved into a default folder in the server for further investigation.

Users can easily choose types of input, such as image or video, in which to upload files, or livestream to capture the image with a camera in real-time. After processing, the detected results are displayed on the monitor screen. For instance, the right side of Figure 12a exhibits the GUI of the stand-alone system for detecting the cause of a fire from an image of the post-fire multi-socket, while the right side of Figure 12b showcases the detection from a video input. Regardless of the input format, the system identifies the cause of the multi-socket fire based on the carbonized shape of the socket shell after the fire. The monitor screen presents information about the fire’s cause, accuracy, and a red bounding box at the location of the burn from the fire on the multi-socket outlet.

In Figure 12a, the ‘Upload Image’ button is used to import the image, and the system marks the burned area in the red bounding box, categorizing it as ‘burnt-out’ with a detection confidence of 0.71 (71% accuracy). In Figure 12b, input data is uploaded as a video through the ‘Upload Video’ button. The system detects the cause of fire in each socket in real-time at a rate of 124 frames per second (fps), and the image frame is shown as ‘burnt-out’ with 70% accuracy.

4. Experiment and Analysis

4.1. Experiment Setup

The labeled images consist of 550 raw images, categorized into the three classes of “socket”, “burnt-in”, and “burnt-out”. To improve the model’s generalized performance by reducing overfitting and enhancing robustness, we applied augmentation methods using the “imaug” library [23], including rotations and flips, as shown in Figure 13.

For the detection of socket fire cause based on post-fire images in this work, 80% of the images in the External/Internal Fire Cause Dataset (EIFCD) data were randomly selected for training (2640 images), while the remainder was used for the validation of the model (660 images), as shown in Table 1. For all training experiments, the input image size was 640 × 640, the batch size was 16, and the number of epochs was 30 epochs with default hyperparameters for training from scratch.

For this work, we conducted experiments on the NVIDIA GeForce RTX3050 hardware platform, using the PyTorch framework and Python 3.8 language. Other configurations were CUDA 11.2, CuDNN 8.1, and PyTorch 1.12.1. To evaluate the accuracy of the performance of our object detection tasks, the following indices were used: Precision, Recall, Precision and Recall curve, F1-confidence score, and mAP (mean Average Precision) [24].

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

R e c a l l = \frac{T P}{T P + F N}

(6)

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

A P = \int_{0}^{1} P (R) d R

(8)

m A P = \frac{1}{N} \int_{0}^{1} P (R) d R

(9)

where TP is True Positive (correctly predicted objects), FP is False Positive (incorrectly predicted objects), FN is False Negative (undetected objects), AP is average precision (the area under the precision–recall curve for each class), and mAP is the mean of AP values across all classes N.

In this work, there are three classes, N = 3, which are socket, burnt-in, and burnt-out. A higher mAP indicates better overall performance in detecting and correctly classifying objects across all classes in the dataset. The model size (number of parameters) and computational complexity FLOPs were chosen to compare the computational costs of different models.

4.2. Experiment Results

In this paper, we conducted a detailed comparative evaluation of several advanced YOLO-based models, using their open-source implementation source code to train them from scratch, namely YOLOv4-csp [25], YOLOv5n-baseline [10], YOLOR-cps [26], YOLOv6n [27], YOLOv7-Tiny [28], and our proposed model for a two-stage detector with improved YOLOv5n-SE. Detailed results are shown in Table 2, which illustrates each model’s computational effectiveness and accuracy.

YOLOv4-csp comprises 2.29 million parameters and operates at 5.4 billion FLOPs. It achieves a mAP@0.5 of 65.4% and a mAP@0.5:0.95 of 33.6%. With a precision of 64.3% and a recall of 67.4%, it records an F1 score of 65%. In contrast with the YOLOv5n, it is optimized for resource efficiency with 1.76 million parameters and 4.1 billion FLOPs, demonstrating a significant performance improvement. It achieves a mAP@0.5 of 92.9% and a mAP@0.5:0.95 of 46%, and it maintains a balanced performance, with a precision of 82.8% and a recall of 79%, resulting in an F1 score of 81%.

YOLOR-cps includes a parameter count similar to that of YOLOv4-csp, with 2.29 million, but operates with higher computational requirements, at 11.34 billion FLOPs. This model achieves a mAP@0.5 of 81.3% and a mAP@0.5:0.95 of 46.6%, along with a precision of 79.9% and a recall of 81%, leading to an F1 score of 80%. The YOLOv6n model, with its more substantial 4.63 million parameters and 11.34 billion FLOPs, exhibits superior performance metrics, including a mAP@0.5 of 87.8% and a mAP@0.5:0.95 of 55.4%. It also achieves a commendable F1 score of 84%, thanks to a precision of 86.3% and a recall of 82.3%. YOLOv7-Tiny, the most resource-intensive model in our comparison, with 6.02 million parameters and 13.2 billion FLOPs, excels in both precision (88.3%) and recall (87.6%). It achieves a mAP@0.5 of 74.3% and a mAP@0.5:0.95 of 52.6%, culminating in a high F1 score of 88%.

Our proposed model surpasses these benchmark models with only 1.80 million parameters and 4.2 billion FLOPs. Despite its relatively modest computational and parameter requirements, it significantly outperforms the other models, achieving a remarkable mAP@0.5 of 91.3% and an outstanding mAP@0.5:0.95 of 55.5%. Furthermore, it boasts a precision of 91.2% and a recall of 87%, resulting in the highest F1 score of 89% among all the models evaluated.

This indication is also verified in our qualitative visualization results, as can be seen in Figure 14, with Figure 14a, burnt-out, and Figure 14b, burnt-in, depicted from the top and side views of the socket. In the case of burnt-out classification, as shown in Figure 14a from the top view, YOLOv4-csp and YOLOv6n cannot classify the cause of the fire, and YOLOR-csp confuses between the two classes; however, other models such as YOLOv5n-baseline, YOLOv7-Tiny, and the model presented in this work can correctly classify these images. From the side view of the burnt-out case, all models pass the classification with quite good confidence. However, YOLOv5-baseline suffers from multi-classification of burnt-out localizations, which is corrected in the model presented in this work. There is a reversed trend in the case of burnt-in examples, as the side view seems to be easier to classify than the front-view of the socket due to the differences in fire-cause location between the two burnt-in and burnt-out cases. Figure 14b depicts the top view of the burnt-in case, and the YOLOv5n-baseline again suffers from multi-classification with two areas of localization, while other models can correctly localize the fire-causing areas. In contrast, only the YOLOv5n-baseline and this work’s model can correctly classify one of the testing samples of the burnt-in cases from the side-view. From this evaluation, we can verify the effectiveness of this work’s model compared to the conventional YOLOv5n, as it solves the drawbacks of the YOLOv5n-based line and surpasses other models in terms of classification.

4.3. Ablation Study

To evaluate the contributions of various components and their enhancements of the conventional YOLOv5n architecture, we conducted an ablation study focusing on different model variations and their impacts on performance. We specifically evaluated the effects of integrating a two-stage detector and various attention mechanisms, including Coordinate Attention (CA) [29], the Convolutional Block Attention Module (CBAM) [30], Efficient Channel Attention (ECA) [31], and Squeeze-and-Excitation (SE) networks, into the YOLOv5n backbone as shown in Table 3. Starting with the baseline YOLOv5n, which has 1.76 million parameters and 4.1 billion FLOPs, achieving a mAP @ 0.5 of 82.9%, we introduced a two-stage detector. This enhancement alone significantly boosted the mAP @ 0.5 to 90.4%, demonstrating improved precision (90.1%) and recall (86.9%), leading to an F1 score of 88%.

We further explored the impact of adding attention mechanisms like Coordinate Attention (CA), Convolutional Block Attention Module (CBAM), Efficient Channel Attention (ECA), and Squeeze-and-Excitation (SE) networks. Each attention module was integrated into the two-stage detection framework to assess its contribution. The CA+ two-stage detector achieved a notable mAP @ 0.5 of 91.3%, with precision and recall metrics rising to 91.8% and 87%, respectively, resulting in an F1 score of 90%. Similarly, CBAM and ECA integrations showed improvements, with mAP @ 0.5 reaching 91.0% and 89.9%, respectively.

Our proposed model, featuring the SE and two-stage detector, stood out by maintaining the same parameter count and computational load (1.80 million parameters and 4.2 billion FLOPs) while achieving the highest mAP @ 0.5 of 91.3% and an F1 score of 89%. This configuration effectively enhances object detection by recalibrating feature responses channel-wise, thus balancing precision (91.2%) and recall (87%) more efficiently. Overall, these results highlight the significance of integrating the second-stage detection and attention mechanisms in enhancing the precision and reliability of our improved model.

4.4. Analysis

4.4.1. Epochs Versus Overfitting

To evaluate the potential for overfitting in the trained models, we adjusted the training epochs from 0 to 200 epochs while maintaining the other parameters the same (i.e., YOLOv5 model, input image size 640 × 640).

Figure 15 illustrates the loss function of the YOLOv5 trained model during 200 epochs in both the training and validation phases (i.e., train_loss, val_loss) with three functions: bounding box loss (i.e., box_loss), object loss (i.e., obj_loss), and classification loss (i.e., cls_loss). First of all, in both the training and validation loss functions, the values fall rapidly in the first 20 epochs and then gradually decrease until reaching 30 epochs. After 30 epochs, the obj_loss function of the two phases is different.

In the training phase, the training loss values become approximately zero at 200 epochs for all three functions, whereas in the validation phase, the value of the loss object (i.e., val/obj_loss) initially decreases but starts to increase beyond 30 epochs. It should be noted that for the first 30 epochs, the values of all three validation loss functions (i.e., box, obj, and cls) are lower than the corresponding training loss. These results indicate that the effective prediction of the YOLOv5s model effectively detects without overfitting for our dataset. The optimal number of epochs for training this dataset was determined to be 30.

4.4.2. Transfer Learning with a Pre-Trained Model Versus Training from Scratch

One approach to improving model generalization performance involves employing transfer learning. In this work, we utilized transfer learning with the pre-trained YOLOv5-SE model instead of starting from scratch on our EIFCD. To evaluate the transfer learning model, we compared the effectiveness of the pre-trained YOLOv5-SE model with that of a YOLOv5-SE model trained from scratch using the mAP@0.5 accuracy metric.

Both the pre-trained and trained-from-scratch YOLOv5-SE models were trained on our EIFCD for 30 epochs, utilizing the same training configuration. The results are presented in Figure 16. The detection confidence of the YOLOv5-SE model trained from scratch fell behind the YOLOv5-SE pre-trained model by approximately 10% during training. At 30 epochs, the pre-trained model achieved a detection confidence @mAP of about 0.9 (90% accuracy), whereas the model trained from scratch obtained 0.8 @mAP (80% accuracy). The results indicate that using a pre-trained model enhances the detection accuracy of the model.

5. Conclusions

This study demonstrates the feasibility of applying an ANN model to support the examination of fire origins in multi-sockets by determining whether the source is internal (internal cause of fire) or triggered by external factors (external cause of fire). The proposed model can identify the cause of a fire with an accuracy of 91.3%. Specifically, the model achieved 94.4% accuracy in predicting fires caused by external sources (burnt-out) and 80.7% accuracy in predicting fires originating internally within the multi-socket (burnt-in). With these results, the proposed model can assist on-site personnel, such as firefighters, in performing an initial screening of the potential fire cause, thereby reducing the time required for preliminary estimation.

In order to build a trained model, we have introduced a novel labeled dataset comprising 3300 images along with annotation files, named the “External/Internal Fire Cause Dataset” (EIFCD). This dataset consists of the three labels, these being “Socket”, “Burnt-in”, and “Burnt-out”. Additionally, we verified the feasibility of applying state-of-the-art YOLO-based deep learning object detection models, including YOLOv4-csp, YOLOR-csp, YOLOv5n, YOLOv6n, and YOLOv7-Tiny, to perform the task of assisting fire-cause inspection systems. In addition, we proposed an enhanced version of the YOLOv5n architecture, named YOLOv5n-SE, by integrating a squeeze-and-excitation (SE) network into its backbone. This modification significantly boosts the model’s ability to focus on relevant features by adapting the feature recalibration dynamically. We deployed this improved model in a two-stage detection framework. The first stage focused on detecting sockets, while the second stage classified the condition of the sockets as either burnt-in or burnt-out. In the experimental results, our proposed YOLOv5n-SE demonstrated superior performance, achieving an accuracy of 91.3% as measured by mean Average Precision (mAP) at a 0.5 Intersection over Union (IoU) threshold. This performance metric highlights the robustness and accuracy of our model, surpassing other comparable models.

Future research may extend this study in several promising directions. First, benchmarking the proposed approach against two-stage detection frameworks, such as Faster R-CNN, would provide a more comprehensive comparison with YOLO-based single-stage models, enabling deeper insight into the trade-offs between accuracy, computational cost, and deployment feasibility. Secondly, integrating additional attention mechanisms and exploring more sophisticated feature fusion techniques could provide even greater performance improvements [32,33,34,35]. Furthermore, evaluating the system on typical failure cases and edge conditions represents a valuable strategy for identifying its current limitations and guiding targeted improvements. Additionally, incorporating specialized models such as CLIP for vision–language alignment or lightweight classification networks for binary image classification could further enhance the model’s performance in scenarios requiring fine-grained decision making [36]. Another important direction is dataset expansion. This includes collecting outlet samples from a wider variety of manufacturers and structural designs, as well as diversifying the fire-initiation mechanisms. For example, simulating an inner fire caused by an over-current or short-circuit overload would help produce a dataset that more closely reflects real-world fire incidents, where electrical overload is a primary cause of outlet ignition. Such extensions would improve both the robustness and generalizability of the proposed system.

Author Contributions

Conceptualization, H.-G.L. and T.-N.P.; Data curation, H.-G.L., T.-N.P., V.-H.N., K.-R.K., J.-H.H., J.-H.L. and Y.L.; Formal analysis, T.-N.P. and Y.L.; Funding acquisition, J.-H.H. and J.-H.L.; Investigation, H.-G.L. and V.-H.N.; Methodology, H.-G.L., V.-H.N., J.-H.H. and J.-H.L.; Project administration, J.-H.H. and J.-H.L.; Resources, T.-N.P.; Software, T.-N.P., J.-H.H. and Y.L.; Supervision, K.-R.K., J.-H.H. and Y.L.; Validation, H.-G.L., T.-N.P., V.-H.N. and K.-R.K.; Visualization, H.-G.L., V.-H.N., K.-R.K., J.-H.H., J.-H.L. and Y.L.; Writing—original draft, H.-G.L., T.-N.P., V.-H.N., K.-R.K., J.-H.H., J.-H.L. and Y.L.; Writing—review and editing, J.-H.H., J.-H.L. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Please contact the corresponding author for data requests.

Conflicts of Interest

Author Viet-Hoan Nguyen was employed by the company Intown. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lee, H.-G.; Lee, J.-H.; Pham, T.-N.; Huh, J.-H. A Study on the Deposition Removal Technique Based on the Carbonization Morphology of Injection Molding Residue (Fundamental); Research Report; National Fire Research Institute of Korea, Fire Safety Research Division, Fire Research Institute of Korea: Hwaseong-si, Republic of Korea, 2023; pp. 1–118. (In Korean)
Jo, J.H.; Bang, J.; Yoo, J.; Sun, R.; Hong, S.; Bang, S.B. A CNN Algorithm Suitable for the Classification of Primary and Secondary Arc-bead and Molten Mark Using Laboratory Data for Cause Analysis of Electric Fires. Trans. Korean Inst. Electr. Eng. 2021, 70, 1750–1758. [Google Scholar] [CrossRef]
Park, H.; Bang, J.; Kim, J.H.; So, B.M.; Song, J.H.; Park, K.M. A Study on the Comparative Analysis of the Performance of CNN-Based Algorithms for the Determination of Arc Beads and Molten Mark by Model. J. Next-Gener. Converg. Technol. Assoc. 2023, 7, 543–552. [Google Scholar] [CrossRef]
Kim, D.; Kim, S.; Kim, G. Analysis of Thermal Characteristics of Electrical Outlets Due to Overcurrent. J. Korean Soc. Saf. 2019, 34, 8–14. [Google Scholar]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Event, 13–19 June 2020; pp. 1–17. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13024–13033. [Google Scholar] [CrossRef]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. You Only Learn One Representation: Unified Network for Multiple Tasks. arXiv 2021, arXiv:2105.04206. [Google Scholar] [CrossRef]
Jocher, G. YOLOv5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 November 2025).
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Hopkins, B.; Wang, H.; O’nEill, L.; Afghah, F.; Razi, A.; Fulé, P.; Coen, J.; Rowell, E.; Watts, A. Wildland Fire Detection and Monitoring Using a Drone-Collected RGB/IR Image Dataset. IEEE Access 2022, 10, 121301–121317. [Google Scholar] [CrossRef]
Muhammad, K.; Ahmad, J.; Lv, Z.; Bellavista, P.; Yang, P.; Baik, S.W. Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Applications. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 1419–1434. [Google Scholar] [CrossRef]
Available online: https://www.youtube.com/watch?v=NtHppN7YmZM (accessed on 1 November 2025). (In Korean).
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2354–2362. [Google Scholar] [CrossRef]
Available online: https://github.com/tzutalin/labelImg/ (accessed on 1 November 2025).
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Available online: https://imgaug.readthedocs.io/en/latest/index.html (accessed on 1 November 2025).
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
YOLOv4-csp. Available online: https://github.com/WongKinYiu/ScaledYOLOv4 (accessed on 1 November 2025).
YOLOR-csp. Available online: https://github.com/WongKinYiu/yolor (accessed on 1 November 2025).
YOLOv6n. Available online: https://github.com/meituan/YOLOv6 (accessed on 1 November 2025).
YOLOv7-Tiny. Available online: https://github.com/WongKinYiu/yolov7 (accessed on 1 November 2025).
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13708–13717. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
Shao, D.; Liu, Y.; Liu, G.; Wang, N.; Chen, P.; Yu, J.; Liang, G. YOLOv7scb: A Small-Target Object Detection Method for Fire Smoke Inspection. Fire 2025, 8, 62. [Google Scholar] [CrossRef]
Fu, J.; Xu, Z.; Yue, Q.; Lin, J.; Zhang, N.; Zhao, Y.; Gu, D. A multi-object detection method for building fire warnings through artificial intelligence generated content. Sci. Rep. 2025, 15, 18434. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Fu, X.; Yu, Z.; Zeng, Z. DSS-YOLO: An improved lightweight real-time fire detection model based on YOLOv8. Sci. Rep. 2025, 15, 8963. [Google Scholar] [CrossRef] [PubMed]
Luo, Z.; Xu, H.; Xing, Y.; Zhu, C.; Jiao, Z.; Cui, C. YOLO-UFS: A Novel Detection Model for UAVs to Detect Early Forest Fires. Forests 2025, 16, 743. [Google Scholar] [CrossRef]
Hu, X.; Cao, Y.; Sun, Y.; Tang, T. Railway Automatic Switch Stationary Contacts Wear Detection Under Few-Shot Occasions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14893–14907. [Google Scholar] [CrossRef]

Figure 1. Electrical wire short-circuit mark classification.

Figure 2. Analysis of carbonization damage by overcurrent.

Figure 3. Research methodology.

Figure 4. Data collection setup, including camera, lighter, and multi-socket outlet position.

Figure 5. Theoretical lighter position in case of (a) external cause of fire (burnt-out) and (b) internal cause of fire (burnt-in).

Figure 6. The multi-socket status in terms of external and internal causes of fire during the performance of the experiment.

Figure 7. Example of images captured after fire experiments, being either burnt-out or burnt-in.

Figure 8. Interface of LabelImg tool and manual labeling step.

Figure 9. Overview of YOLO-based model: (a) conventional YOLOv5, (b) our improved YOLOv5-SE, and (c) this work combining of two-stage detector with improved YOLOv5n-SE.

Figure 10. Squeeze–excitation module architecture.

Figure 11. Unified modeling language of the deployed web-browser graphic user interface application with the front-end and back-end architecture. (* indicate input format).

Figure 12. Graphic user interface of the stand-alone application: (a) image input data, and (b) video input data.

Figure 13. Example of augmentation methods: rotation at 90, 180, and 270 degree angles, and horizontal and vertical flips.

Figure 14. Qualitative visualization results of different models in case of (a) burnt-out (external fire cause) and (b) burnt-in (internal fire cause) from top and side views.

Figure 15. Loss function of the YOLOv5 Model.

Figure 16. Comparison of accuracy in mAP@0.5 of transfer learning using pre-trained model or training from scratch.

Table 1. Dataset configurations.

Dataset	Images		Labels
Dataset	Images		Burnt-in	Burnt-Out	Socket
EIFCD	3300	Training: 2640	963	1017	2640
(our dataset)	3300	Validation: 660	357	303	660

Table 2. Model comparison.

Models	Paras(M)	FLOPs(G)	AP@0.5			mAP @0.5	mAP @0.5:0.95	P	R	F1
Models	Paras(M)	FLOPs(G)	Socket	Burnt-in	Burnt-Out	mAP @0.5	mAP @0.5:0.95	P	R	F1
YOLOv4-csp	2.29	5.4	88.1	44.4	62.9	65.4	33.6	64.3	67.4	65
YOLOv5n	1.76	4.1	99.0	64.1	85.6	82.9	46.0	82.8	79.0	81
YOLOR-cps	2.29	11.3	98.6	61.7	83.7	81.3	46.6	79.9	81.0	80
YOLOv6n	4.63	11.3	99.3	73.4	90.7	87.8	55.4	86.3	82.3	84
YOLOv7-Tiny	6.02	13.2	99.1	91.7	74.8	88.5	52.6	88.3	87.6	88
This work	1.80	4.2	98.7	80.7	94.4	91.3	55.5	91.2	87.0	89

Table 3. Different model variation comparison.

Models	Paras (M)	FLOPs (G)	mAP @0.5	P	R	F1
YOLOv5n	1.76	4.1	82.9	82.8	79.0	81
w/two-stage detector	1.76	4.1	90.4	90.1	86.9	88
w/CA+ two-stage detector	1.81	4.2	91.0	90.4	87.0	89
w/CBAM+ two-stage detector	1.81	4.2	91.0	91.8	87.1	89
w/ECA+ two-stage detector	1.80	4.2	89.9	90.3	87.0	89
w/SE+ two-stage detector	1.80	4.2	91.3	91.2	87.0	89
w/SE+ two-stage detector + transfer learning	1.80	4.2	95.6	95.5	91.9	94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, H.-G.; Pham, T.-N.; Nguyen, V.-H.; Kwon, K.-R.; Huh, J.-H.; Lee, J.-H.; Liu, Y. Determining the Origin of Multi Socket Fires Using YOLO Image Detection. Electronics 2026, 15, 22. https://doi.org/10.3390/electronics15010022

AMA Style

Lee H-G, Pham T-N, Nguyen V-H, Kwon K-R, Huh J-H, Lee J-H, Liu Y. Determining the Origin of Multi Socket Fires Using YOLO Image Detection. Electronics. 2026; 15(1):22. https://doi.org/10.3390/electronics15010022

Chicago/Turabian Style

Lee, Hoon-Gi, Thi-Ngot Pham, Viet-Hoan Nguyen, Ki-Ryong Kwon, Jun-Ho Huh, Jae-Hun Lee, and YuanYuan Liu. 2026. "Determining the Origin of Multi Socket Fires Using YOLO Image Detection" Electronics 15, no. 1: 22. https://doi.org/10.3390/electronics15010022

APA Style

Lee, H.-G., Pham, T.-N., Nguyen, V.-H., Kwon, K.-R., Huh, J.-H., Lee, J.-H., & Liu, Y. (2026). Determining the Origin of Multi Socket Fires Using YOLO Image Detection. Electronics, 15(1), 22. https://doi.org/10.3390/electronics15010022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Determining the Origin of Multi Socket Fires Using YOLO Image Detection

Abstract

1. Introduction

2. Related Research

3. Research Methodology

3.1. Data Collection

3.2. YOLO-Based Deep Learning Models

3.2.1. Squeeze-and-Excitation Networks (SENet)

3.2.2. Two-Stage Detector

3.3. Web-Browser Application Deployment

4. Experiment and Analysis

4.1. Experiment Setup

4.2. Experiment Results

4.3. Ablation Study

4.4. Analysis

4.4.1. Epochs Versus Overfitting

4.4.2. Transfer Learning with a Pre-Trained Model Versus Training from Scratch

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI