A Study on a Complex Flame and Smoke Detection Method Using Computer Vision Detection and Convolutional Neural Network

: This study sought an effective detection method not only for ﬂame but also for the smoke generated in the event of a ﬁre. To this end, the ﬂame region was pre-processed using the color conversion and corner detection method, and the smoke region could be detected using the dark channel prior and optical ﬂow. This eliminates unnecessary background regions and allows selection of ﬁre-related regions. Where there was a pre-processed region of interest, inference was conducted using a deep-learning-based convolutional neural network (CNN) to accurately determine whether it was a ﬂame or smoke. Through this approach, the detection accuracy is improved by 5.5% for ﬂame and 6% for smoke compared to when a ﬁre is detected through the object detection model without separate pre-processing.


Introduction
In a fire, engineering approaches to reducing the spread of smoke and flames involve compartmentalization, dilution, airflow, and pressurization [1]. These methods are very important for extinguishing a fire in its early stages, but there is a problem in that the fire has already begun to grow or become responsive after full development. Therefore, to solve this problem, this study attempts an image-based fire detection method. In particular, it aims to effectively respond to a fire by detecting all the flame and smoke that may occur in the early stages of the fire. It is very important to detect smoke, not just flames, in a fire, particularly considering that in general, smoke damage to human health occurs more often than direct damage caused by flames. Smoke generated from such a fire can affect the human body due to high temperatures, lack of oxygen, and carbon monoxide. In addition to these direct factors, reduced visibility and subsequent psychological anxiety may adversely affect evacuation behavior [2,3].
To this end, many studies on fire detection based on artificial intelligence have recently been conducted. Existing deep learning computer vision-based flame detection studies have included a method proposed by Shen et al. [4], in which flames were detected using a "you only look once" (YOLO) model based on Tensorflow, without separate filtering of input images. In this case, if additional image pre-processing was added, the accuracy would be improved, as the unnecessary background area would be removed in advance, reducing false negatives significantly. Another study was a fire detection method proposed by Muhammad et al. [5], which classified fire or non-fire from the image in order to efficiently detect fire in resource-constrained environments. However, in this case, the fire is judged not only for the flame but also for the entire unnecessary image area. Nguyen et al. [6] achieved 92.7% accuracy in detecting fire using a UAV-based object detection algorithm, while Jeon et al. [7] achieved a 97.9% F1 score by detecting fire using a CNN based on a multi-scale prediction framework, but both of these previous studies detected only flame. However, there is a problem of high error detection rate because there is no separate filtering method. Therefore, in this study, the disadvantages of the existing image-based fire detection methods were supplemented through color and dynamic characteristics. In particular, when detection is difficult because the shape is not constant, such as with smoke, detection is facilitated through an appropriate pre-processing method. In addition, both of these previous studies detected only flame. Therefore, to supplement these fire detection methods, this study proposes a method capable of detecting flame and smoke in combination. Therefore, and as shown in Figure 1, an effective image pre-processing method was designed for each flame, as well as for the smoke from the input image, so that objects that are not related to the fire can be filtered out in advance. In the case of a flame, when a combustible gas generated by pyrolysis of a solid combustible material such as wood is mixed with air and combusted, a flame may be generated. In addition, a flame may be generated by evaporative combustion, in which the combustible liquid evaporates and burns. When such flaming occurs, pre-processing was attempted in order to detect the flame through its appearance-related characteristics. For this purpose, hue, saturation, value (HSV) color conversion and corner detection were used during image pre-processing. First, in the case of HSV color conversion, a color region where a flame is likely to exist is detected. In addition, among the objects that remain after HSV color conversion, the flame is found to possess the characteristic of a sharp texture of the object, resulting in a large number of corners [8,9]. Based on this fact, if the Harris corner detector is performed on the HSV color converted image, corners are generated intensively only in the flame, so that filtering can be performed more precisely. Therefore, following the color conversion and filtering, the region where the corners are gathered is detected as a flame candidate region. a CNN based on a multi-scale prediction framework, but both of these previous studies detected only flame. However, there is a problem of high error detection rate because there is no separate filtering method. Therefore, in this study, the disadvantages of the existing image-based fire detection methods were supplemented through color and dynamic characteristics. In particular, when detection is difficult because the shape is not constant, such as with smoke, detection is facilitated through an appropriate pre-processing method. In addition, both of these previous studies detected only flame. Therefore, to supplement these fire detection methods, this study proposes a method capable of detecting flame and smoke in combination. Therefore, and as shown in Figure 1, an effective image pre-processing method was designed for each flame, as well as for the smoke from the input image, so that objects that are not related to the fire can be filtered out in advance. In the case of a flame, when a combustible gas generated by pyrolysis of a solid combustible material such as wood is mixed with air and combusted, a flame may be generated. In addition, a flame may be generated by evaporative combustion, in which the combustible liquid evaporates and burns. When such flaming occurs, pre-processing was attempted in order to detect the flame through its appearance-related characteristics. For this purpose, hue, saturation, value (HSV) color conversion and corner detection were used during image pre-processing. First, in the case of HSV color conversion, a color region where a flame is likely to exist is detected. In addition, among the objects that remain after HSV color conversion, the flame is found to possess the characteristic of a sharp texture of the object, resulting in a large number of corners [8,9]. Based on this fact, if the Harris corner detector is performed on the HSV color converted image, corners are generated intensively only in the flame, so that filtering can be performed more precisely. Therefore, following the color conversion and filtering, the region where the corners are gathered is detected as a flame candidate region. In this study, both the optical flow technique, which used the dark channel prior, and the Lucas-Kanade method were used to effectively pre-process smoke. The dark channel prior was originally proposed by He et al. [10] as an algorithm designed to remove haze from the image. However, in this study, the smoke region in the image was detected using these haze detection characteristics. The characteristic of the smoke region that was detected using the dark channel prior was identified in pixels where haze or smoke does not exist; here, at least one color channel among R, G, and B had a value close to 0, and this pixel was defined as a dark pixel. The smoke region detected through these features was additionally filtered using optical flow, based on the Lucas-Kanade method. This allows the smoke to be effectively detected by filtering the background through dynamic characteristics, in which the smoke moves in the upward direction. Optical flow is an important technique for analyzing the motion of an object in computer vision, and includes a differ- In this study, both the optical flow technique, which used the dark channel prior, and the Lucas-Kanade method were used to effectively pre-process smoke. The dark channel prior was originally proposed by He et al. [10] as an algorithm designed to remove haze from the image. However, in this study, the smoke region in the image was detected using these haze detection characteristics. The characteristic of the smoke region that was detected using the dark channel prior was identified in pixels where haze or smoke does not exist; here, at least one color channel among R, G, and B had a value close to 0, and this pixel was defined as a dark pixel. The smoke region detected through these features was additionally filtered using optical flow, based on the Lucas-Kanade method. This allows the smoke to be effectively detected by filtering the background through dynamic characteristics, in which the smoke moves in the upward direction. Optical flow is an important technique for analyzing the motion of an object in computer vision, and includes a differential method, a matching method, and a phase-based method. Although there are various optical flow techniques, in this study the smoke motion characteristics were detected using the Lucas-Kanade method [11]. Finally, a CNN was used to detect fire with higher accuracy and reliability for the preprocessed flame and candidate smoke regions. Among the CNN models, the Inception-V3 model was used for inference, and images related to flames and smoke were collected and configured as a training dataset.

Flame Detection
The first image pre-processing step employed for flame detection in this study was HSV color conversion. The HSV color model can be used to identify the color of objects in various applications other than image pre-processing. Hue and saturation components are very useful because they are similar to how humans perceive color, which can be an ideal method for developing image-processing algorithms. Hue represents the distribution of colors based on red, and saturation represents the degree to which white light is included in color. Additionally, value is used to control the intensity of light. The value can be independent of a single component to control its range, thus creating algorithms that are robust to lighting changes [12,13].
In Equation (1), it should be noted that when each pixel value is 1, it indicates a region corresponding to a color space in which a flame can exist at an image location, and pixels in the corresponding range are extracted as a candidate region. A pixel value of 0 means that a pixel is classified as a non-flame.
Fire 2022, 5, x FOR PEER REVIEW 3 of 13 ential method, a matching method, and a phase-based method. Although there are various optical flow techniques, in this study the smoke motion characteristics were detected using the Lucas-Kanade method [11]. Finally, a CNN was used to detect fire with higher accuracy and reliability for the pre-processed flame and candidate smoke regions. Among the CNN models, the Inception-V3 model was used for inference, and images related to flames and smoke were collected and configured as a training dataset.

Flame Detection
The first image pre-processing step employed for flame detection in this study was HSV color conversion. The HSV color model can be used to identify the color of objects in various applications other than image pre-processing. Hue and saturation components are very useful because they are similar to how humans perceive color, which can be an ideal method for developing image-processing algorithms. Hue represents the distribution of colors based on red, and saturation represents the degree to which white light is included in color. Additionally, value is used to control the intensity of light. The value can be independent of a single component to control its range, thus creating algorithms that are robust to lighting changes [12,13].
In Equation (1), it should be noted that when each pixel value is 1, it indicates a region corresponding to a color space in which a flame can exist at an image location, and pixels in the corresponding range are extracted as a candidate region. A pixel value of 0 means that a pixel is classified as a non-flame.  Even after HSV color conversion, the results of objects, including light yellow other than flame, remain. To further filter this, Harris corner detector was used as the second image pre-processing step. Among the remaining objects following HSV color conversion, the flame had a sharp texture, which resulted in a large number of corners. Therefore, in a region where corners are intensively generated, it is highly likely that it is a flame, and such a region is detected as a candidate region.
First, when there is a reference point (x, y) in the image, it can be expressed as Equation (2). When the amount of change shifts by (u, v) from the reference point. I represents brightness, and (x, y) represents points inside Gaussian window W. The region moved by (u, v) can be organized as shown in Equation (3) below, using the Taylor series.
The first-order derivative in the x and y directions, I x and I y , could be obtained via convolution arithmetic, using S x , the Sobel x kernel, and S y , the Sobel y kernel, as shown in Figure 3. If Equation (3) is substituted for Equation (2), it can be expressed as Equation (4).
Fire 2022, 5, x FOR PEER REVIEW 4 of 13 Even after HSV color conversion, the results of objects, including light yellow other than flame, remain. To further filter this, Harris corner detector was used as the second image pre-processing step. Among the remaining objects following HSV color conversion, the flame had a sharp texture, which resulted in a large number of corners. Therefore, in a region where corners are intensively generated, it is highly likely that it is a flame, and such a region is detected as a candidate region.
First, when there is a reference point ( , ) in the image, it can be expressed as Equation (2). When the amount of change shifts by ( , ) from the reference point. represents brightness, and ( , ) represents points inside Gaussian window . The region moved by ( , ) can be organized as shown in Equation (3) below, using the Taylor series.
The first-order derivative in the and directions, and , could be obtained via convolution arithmetic, using , the Sobel kernel, and , the Sobel kernel, as shown in Figure 3. If Equation (3) is substituted for Equation (2), it can be expressed as Equation (4). If is defined = , properties such as Equations (5) and (6) are satisfied. Finally, Equation (7) allows us to determine the edge, corner, and flat. is an empirical constant, and a value of 0.04 was used in this paper.
Each pixel's location will have a different value, and the final calculated ( , ) will be compared to the following conditions to distinguish between the edge, corner, and flat [14][15][16][17].

•
When |R| is small, which happens when and are small, these points belong to flat regions; • When R < 0, if only one eigenvalue of and is bigger than the other eigenvalue, the region belongs to edges; , properties such as Equations (5) and (6) are satisfied.
Finally, Equation (7) allows us to determine the edge, corner, and flat. k is an empirical constant, and a value of 0.04 was used in this paper.
Each pixel's location will have a different value, and the final calculated R(x, y) will be compared to the following conditions to distinguish between the edge, corner, and flat [14][15][16][17].

•
When |R| is small, which happens when λ 1 and λ 2 are small, these points belong to flat regions; • When R < 0, if only one eigenvalue of λ 1 and λ 2 is bigger than the other eigenvalue, the region belongs to edges; • If R has a large value, the region is a corner.
In these conditions, R has a large value and corresponds to a corner, and Figure 4 is the result of visualizing pixels that satisfy this condition. • If R has a large value, the region is a corner.
In these conditions, has a large value and corresponds to a corner, and Figure 4 is the result of visualizing pixels that satisfy this condition.  Figure 4a is the pre-processing result of the image without a flame, and Figure 4b is the pre-processing result of the image where the flame exists; the pixels that satisfy the corner condition in the HSV color conversion image are marked with green dots. In the case of non-flame images, there are many pixels that have not been filtered even via HSV color conversion, but when corner detection is performed, it can be confirmed that most corners do not exist. In the case of the flame image, one result of intensively detecting a corner in a region where the flame exists may be confirmed. Therefore, through this result, when various objects exist in the image, only the flame region may be effectively preprocessed. In addition, the region where these corners are clustered is used as a candidate region that can be inferred through a deep-learning-based CNN.

Smoke Detection
If smoke occurs in a fire, it can cause negative physiological effects, such as poisoning or asphyxiation, leading to problems in evacuation or extinguishing activities. In addition, when smoke is generated, visibility is poor, the range of action for evacuation is narrowed, and adverse effects such as the malfunction of fire alarm equipment can be caused. Therefore, detecting smoke early in the event of a fire is important. To this end, in this study the smoke region was detected using the dark channel prior and optical flow.
The dark channel refers to a case in which at least one channel with a low intensity value among R, G, and B color channels exists in the case where an image has no haze. Dark channels are algorithms that remove haze based on these characteristics, as proposed by He et al. When haze or smoke exists in the atmosphere, some of the light reflected from the object is lost in the process of being transmitted to the observer or camera, causing the object to appear blurred. This can be expressed as Equation (8), based on pixel [10].
( ) represents an undistorted pixel, ( ) represents a pixel that has reached the actual camera, and ( ) has a value of 1 when it reaches the camera completely without haze or smoke with medium transmission. is air light, and it can be assumed that all  Figure 4a is the pre-processing result of the image without a flame, and Figure 4b is the pre-processing result of the image where the flame exists; the pixels that satisfy the corner condition in the HSV color conversion image are marked with green dots. In the case of non-flame images, there are many pixels that have not been filtered even via HSV color conversion, but when corner detection is performed, it can be confirmed that most corners do not exist. In the case of the flame image, one result of intensively detecting a corner in a region where the flame exists may be confirmed. Therefore, through this result, when various objects exist in the image, only the flame region may be effectively pre-processed. In addition, the region where these corners are clustered is used as a candidate region that can be inferred through a deep-learning-based CNN.

Smoke Detection
If smoke occurs in a fire, it can cause negative physiological effects, such as poisoning or asphyxiation, leading to problems in evacuation or extinguishing activities. In addition, when smoke is generated, visibility is poor, the range of action for evacuation is narrowed, and adverse effects such as the malfunction of fire alarm equipment can be caused. Therefore, detecting smoke early in the event of a fire is important. To this end, in this study the smoke region was detected using the dark channel prior and optical flow.
The dark channel refers to a case in which at least one channel with a low intensity value among R, G, and B color channels exists in the case where an image has no haze. Dark channels are algorithms that remove haze based on these characteristics, as proposed by He et al. When haze or smoke exists in the atmosphere, some of the light reflected from the object is lost in the process of being transmitted to the observer or camera, causing the object to appear blurred. This can be expressed as Equation (8), based on pixel x [10].
J(x) represents an undistorted pixel, I(x) represents a pixel that has reached the actual camera, and t(x) has a value of 1 when it reaches the camera completely without haze or smoke with medium transmission. A is air light, and it can be assumed that all pixels in the image have the same value. The operation of the dark channel existing in the pixel x in the image can be expressed as Equation (9).
Fire 2022, 5, 108 6 of 12 Here, Ω(x) is the kernel centered on pixel x, and C ∈ r, g, b represents the value of each channel. Therefore, a case where the brightness value of at least one channel among the R, G, and B values of Ω(x) is very low is defined as J dark [10]. Using the characteristics of these dark channels, it is possible not only to effectively remove haze or fog from the image, but also to pre-process smoke regions that may occur during a fire. Figure 5 shows the results of a thresholding set through the dark channel characteristics that exist in the pixel.
Here, Ω( ) is the kernel centered on pixel , and ∈ , , represents the value of each channel. Therefore, a case where the brightness value of at least one channel among the R, G, and B values of Ω( ) is very low is defined as [10]. Using the characteristics of these dark channels, it is possible not only to effectively remove haze or fog from the image, but also to pre-process smoke regions that may occur during a fire. Figure 5 shows the results of a thresholding set through the dark channel characteristics that exist in the pixel. The smoke region detected through the dark channel prior is filtered once more through the dynamic characteristics of the smoke. Combustion of solid fuels in a fire usually entails heat in the adjacent material that burns or in the fuel itself. As a result, hot volatile or flammable vapors are emitted, and when a fire column and gas accompanying hot smoke are generated, they rise above the surrounding cold air due to the lowered gas density [18][19][20]. Therefore, in order to pre-process the image with the smoke flow characteristics, the motion of the smoke rising upward was detected using the optical flow algorithm. Estimating the motion of an object through optical flow uses the change in contrast from two adjacent images with a time difference [21,22].
The optical flow based on the Lucas-Kanade method makes appropriate assumptions within the range that does not significantly deviate from the actual reality. Among them, brightness constancy is the most important assumption with regard to the optical flow estimation algorithm. According to the brightness constancy, the same part of two scenes with a time difference from the video have the same or almost the same contrast values. This brightness constancy is not always correct in reality, but it is based on the principle whereby it can be assumed that the change in contrast of an object is not large in the short time difference between image frames [23,24].
If the time difference between two adjacent images is sufficiently small, the following Equation (10) is established according to the Taylor series. The smoke region detected through the dark channel prior is filtered once more through the dynamic characteristics of the smoke. Combustion of solid fuels in a fire usually entails heat in the adjacent material that burns or in the fuel itself. As a result, hot volatile or flammable vapors are emitted, and when a fire column and gas accompanying hot smoke are generated, they rise above the surrounding cold air due to the lowered gas density [18][19][20]. Therefore, in order to pre-process the image with the smoke flow characteristics, the motion of the smoke rising upward was detected using the optical flow algorithm. Estimating the motion of an object through optical flow uses the change in contrast from two adjacent images with a time difference [21,22].
The optical flow based on the Lucas-Kanade method makes appropriate assumptions within the range that does not significantly deviate from the actual reality. Among them, brightness constancy is the most important assumption with regard to the optical flow estimation algorithm. According to the brightness constancy, the same part of two scenes with a time difference from the video have the same or almost the same contrast values. This brightness constancy is not always correct in reality, but it is based on the principle whereby it can be assumed that the change in contrast of an object is not large in the short time difference between image frames [23,24].
If the time difference between two adjacent images is sufficiently small, the following Equation (10) is established according to the Taylor series.
Assuming that is small, dy and dx, which represent the movement of an object, are also small, so there is no significant error even if the quadratic term or higher are ignored.
As mentioned earlier, according to the assumption with regard to brightness constancy, the new point f (y + dy, x + dx, t + dt) is formed by moving (dx, dy) during the time dt, so that f (y + dy, x + dx, t + dt) of the new point is the same as f (y, x, t) of the original point, dy dt = v, and dx dt = u. Therefore, Equation (10) can be written as Equation (11).
This equation is a differential equation and is called an optical flow constraint equation or a gradient constraint equation. Although the motion of the object can be estimated through this equation, the resulting value cannot be determined because there are two unknowns, v and u. In order to solve two vectors that could not be obtained in the optical flow estimation algorithm, the Lucas-Kanade algorithm, a local computation method, is used. The Lucas-Kanade method solves the equation using the least squares method as shown in Equation (12) below [25].
i corresponds to the coordinate values of all pixels, and the optical flow is calculated based on the derivative value calculated in each pixel. The change in the direction of optical flow can distinguish the smoke area by manually setting the threshold T. Figure 6 depicts a scene where the smoke flow is detected using optical flow from the smoke generated images. Using the optical flow algorithm, the vector change of the object is calculated only for the area where the smoke extracted through the dark channel feature is expected to exist, not the entire input image. Assuming that is small, and , which represent the movement of an object, are also small, so there is no significant error even if the quadratic term or higher are ignored.
As mentioned earlier, according to the assumption with regard to brightness constancy, the new point ( + , + , + ) is formed by moving ( , ) during the time , so that ( + , + , + ) of the new point is the same as ( , , ) of the original point, = , and = . Therefore, Equation (10) can be written as Equation (11).
This equation is a differential equation and is called an optical flow constraint equation or a gradient constraint equation. Although the motion of the object can be estimated through this equation, the resulting value cannot be determined because there are two unknowns, and . In order to solve two vectors that could not be obtained in the optical flow estimation algorithm, the Lucas-Kanade algorithm, a local computation method, is used. The Lucas-Kanade method solves the equation using the least squares method as shown in Equation (12) corresponds to the coordinate values of all pixels, and the optical flow is calculated based on the derivative value calculated in each pixel. The change in the direction of optical flow can distinguish the smoke area by manually setting the threshold . Figure 6 depicts a scene where the smoke flow is detected using optical flow from the smoke generated images. Using the optical flow algorithm, the vector change of the object is calculated only for the area where the smoke extracted through the dark channel feature is expected to exist, not the entire input image. The information obtained through this includes , which is the angle, as shown in Equation (13).
Considering that ( , ) is the direction of the optical flow vector of the pixel at position ( , ) of the − th frame according to Equation (13), and are the motion flow vectors of the row and column, respectively. Among the values of the vectors obtained here, the area that moved from 45 degrees to 135 degrees, as shown in Equation The information obtained through this includes θ i , which is the angle, as shown in Equation (13).
Considering that θ i (X, Y) is the direction of the optical flow vector of the pixel at position (X, Y) of the i-th frame according to Equation (13), dx and dy are the motion flow vectors of the row and column, respectively. Among the values of the vectors obtained here, the area that moved from 45 degrees to 135 degrees, as shown in Equation (14), can be filtered. Through these pre-processing processes, a region with a high probability of acting can be used as a candidate region and predictions can be made through a deep-learningbased CNN.

Inference Using Inception-V3
In order to detect fire with higher accuracy in the region of interest obtained during image pre-processing, in this study, CNN was constructed as the last step toward finally detecting whether a fire occurred. CNN is used in similar computer vision studies, such as in image classification, object detection and recognition, and image matching. In addition, from the simple neural networks of the past, complex and deep network-type models are now being developed.
When training through deep learning, it is common to obtain high precision when using it with a deep layer and a wide node. However, in this case, the amount of parameters increases, and the computational amount increases considerably, and an over-fitting problem or a gradient vanishing problem occurs. Therefore, we made the connections between nodes sparse and the matrix operations dense. Reflecting this, the inception structure in Figure 7 makes the overall network deep but not difficult to operate.
Fire 2022, 5, x FOR PEER REVIEW 8 of 13 (14), can be filtered. Through these pre-processing processes, a region with a high probability of acting can be used as a candidate region and predictions can be made through a deep-learning-based CNN.

Inference Using Inception-V3
In order to detect fire with higher accuracy in the region of interest obtained during image pre-processing, in this study, CNN was constructed as the last step toward finally detecting whether a fire occurred. CNN is used in similar computer vision studies, such as in image classification, object detection and recognition, and image matching. In addition, from the simple neural networks of the past, complex and deep network-type models are now being developed.
When training through deep learning, it is common to obtain high precision when using it with a deep layer and a wide node. However, in this case, the amount of parameters increases, and the computational amount increases considerably, and an over-fitting problem or a gradient vanishing problem occurs. Therefore, we made the connections between nodes sparse and the matrix operations dense. Reflecting this, the inception structure in Figure 7 makes the overall network deep but not difficult to operate. Therefore, the Inception-v3 model has the advantage of having a deeper layer than other CNN models, but not having a relatively large parameter. Table 1 shows the configuration of the CNN layer configured using Inception modules. The size of the input image was set to 299 × 299, and a reduction layer was added between the inception modules [26][27][28]. With most CNN, the pooling layer is used, but it is constructed to solve the representational bottleneck problem. Finally, softmax was used as the activation function of the final layer is a classification problem for flame, smoke, and non-fire.  Therefore, the Inception-v3 model has the advantage of having a deeper layer than other CNN models, but not having a relatively large parameter. Table 1 shows the configuration of the CNN layer configured using Inception modules. The size of the input image was set to 299 × 299, and a reduction layer was added between the inception modules [26][27][28]. With most CNN, the pooling layer is used, but it is constructed to solve the representational bottleneck problem. Finally, softmax was used as the activation function of the final layer is a classification problem for flame, smoke, and non-fire. In addition, the dataset used for training this CNN model is shown in Table 2. Here, the train dataset used for training and the test dataset for evaluating and reflecting the Fire 2022, 5, 108 9 of 12 learning understanding of the training intermediate model were divided into about 8 to 2 ratios, and the training was conducted, and accuracy and loss did not change significantly. Learning was ended at the converging 5000 steps. The train and test image dataset used was obtained from Kaggle and CVonline as public materials for use in research.

Experimental Results
Images of flame and smoke that may occur in fires were pre-processed using appearance characteristics and classified through an Inception-V3 model based on a CNN. Figure 8 visualizes the final detection of the flame region from the test images.
Average pooling -8 × 8 × 2048 Fully connected -1 × 2048 Softmax -3 In addition, the dataset used for training this CNN model is shown in Table 2. Here, the train dataset used for training and the test dataset for evaluating and reflecting the learning understanding of the training intermediate model were divided into about 8 to 2 ratios, and the training was conducted, and accuracy and loss did not change significantly. Learning was ended at the converging 5000 steps. The train and test image dataset used was obtained from Kaggle and CVonline as public materials for use in research.

Experimental Results
Images of flame and smoke that may occur in fires were pre-processed using appearance characteristics and classified through an Inception-V3 model based on a CNN. Figure  8 visualizes the final detection of the flame region from the test images. If the candidate region detected through pre-processing is judged to be a flame, it is visualized as a red bounding box, and if it is an object not related to fire, it is visualized as a green bounding box. In addition, Figure 9 visualizes the detection of smoke from the input images, and similarly, in the case of the red bounding box, it is inferred to be the smoke region; in the case of the green bounding box, it is visualized as an object unrelated to fire. If the candidate region detected through pre-processing is judged to be a flame, it is visualized as a red bounding box, and if it is an object not related to fire, it is visualized as a green bounding box. In addition, Figure 9 visualizes the detection of smoke from the input images, and similarly, in the case of the red bounding box, it is inferred to be the smoke region; in the case of the green bounding box, it is visualized as an object unrelated to fire.
Accuracy, precision, recall, and F1 score were calculated to determine the objective performance of the experimental results obtained through this study, where TP is the number of true positives, FP the number of false positives, FN the number of false negatives and TN the number of true negatives. The relationships among them are listed as shown below. First, accuracy and precision were obtained via Equations (15) and (16), and recall was obtained using Equation (17), while F1 score, which is the harmonic mean of the precision and detection rate, was obtained using Equation (18).  Accuracy, precision, recall, and F1 score were calculated to determine the objective performance of the experimental results obtained through this study, where TP is the number of true positives, FP the number of false positives, FN the number of false negatives and TN the number of true negatives. The relationships among them are listed as shown below. First, accuracy and precision were obtained via Equations (15) and (16), and recall was obtained using Equation (17), while F1 score, which is the harmonic mean of the precision and detection rate, was obtained using Equation (18).
The performance evaluation was conducted using five videos featuring flames, five videos with smoke, and five videos not related to fire. When carrying out the detection method using optical flow, the performance should be evaluated through continuous images-that is, by using videos, rather than a single still image. Therefore, 50 frames in which inference was performed on the object were extracted from each test video, and the result was calculated. In addition, a performance evaluation was performed in the same way in the flame and non-fire videos. Moreover, to compare the results with the model presented in this study, the deep-learning-based object detection model Single Shot Multibox Detector (SSD) [29]-Faster R-CNN (Region proposal Convolutional Neural Network) [30] was used.
The flame detection results are shown in Table 3, and for the model presented in this study, the accuracy was 96.0%, precision was 94.2%, and recall and F1 score were 98.0% and 96.1, respectively. In the case of SSD, the accuracy was 89.0% and that of Faster R-CNN was 92.0%. The accuracy of the flame detection algorithm presented in this study was relatively high. The frequency of false positives, in which non-flame objects are misdetected as flames, was decreased by more than 10% compared to other studies, which greatly affected the overall increase in precision. The performance evaluation was conducted using five videos featuring flames, five videos with smoke, and five videos not related to fire. When carrying out the detection method using optical flow, the performance should be evaluated through continuous images-that is, by using videos, rather than a single still image. Therefore, 50 frames in which inference was performed on the object were extracted from each test video, and the result was calculated. In addition, a performance evaluation was performed in the same way in the flame and non-fire videos. Moreover, to compare the results with the model presented in this study, the deep-learning-based object detection model Single Shot Multibox Detector (SSD) [29]-Faster R-CNN (Region proposal Convolutional Neural Network) [30] was used.
The flame detection results are shown in Table 3, and for the model presented in this study, the accuracy was 96.0%, precision was 94.2%, and recall and F1 score were 98.0% and 96.1, respectively. In the case of SSD, the accuracy was 89.0% and that of Faster R-CNN was 92.0%. The accuracy of the flame detection algorithm presented in this study was relatively high. The frequency of false positives, in which non-flame objects are mis-detected as flames, was decreased by more than 10% compared to other studies, which greatly affected the overall increase in precision. The smoke detection results were similar, and in the model presented in this study, the accuracy was 93.0%, precision was 93.9%, and detection rate and F1 score were 92.0% and 92.9%, respectively. In the case of SSD, the accuracy was 85.0% and Faster R-CNN was 89.0%, as shown in Table 4.

Conclusions
In this study, an appropriate pre-processing method was presented to detect both flames and smoke that may occur during a fire in its early stages. To this end, color-based and optical flow methods were used, and in order to make a precise judgment on the detected candidate region, inferences were made using a deep-learning-based CNN. Through this approach, it was possible to reduce false detections due to unnecessary background regions while improving accuracy when detecting fire. Our tests of the proposed flame detection method found that the accuracy was improved by 5.5% compared to the object detection models without separate pre-processing. For the smoke detection method proposed in this study, dark channel feature and optical flow were utilized, and accuracy was improved by 6% compared to other object detection models. In future studies, a CNN that can accurately detect objects with irregular shapes, such as flames and smoke, will be developed or improved for future applications. In addition, research will pursue the development of an intelligent fire detector that can be applied to low-specification systems, and which can easily perform real-time detection by supplementing pre-processing methods. In addition, we will study a method that can accurately detect fire even from small characteristics in images and develop a fire detection model with higher reliability than can be achieved by human observation.