Full-Scale Fire Smoke Root Detection Based on Connected Particles

Smoke is an early visual phenomenon of forest fires, and the timely detection of smoke is of great significance for early warning systems. However, most existing smoke detection algorithms have varying levels of accuracy over different distances. This paper proposes a new smoke root detection algorithm that integrates the static and dynamic features of smoke and detects the final smoke root based on clustering and the circumcircle. Compared with the existing methods, the newly developed method has a higher accuracy and detection efficiency on the full scale, indicating that the method has a wider range of applications in the quicker detection of smoke in forests and the prevention of potential forest fire spread.


Introduction
For smoke detection in outdoor open spaces, smoke detection methods using chemical or optical sensors have limitations, as they are usually local sensors. On the other hand, in recent years, with the development of machine vision technology and the increasing investment in forest fire prevention in various countries, vision-based smoke detection methods have become more popular. With the assistance of vision-based forest fire monitoring, the number of fire occurrences, the affected areas, and property losses worldwide are decreasing every year [1]. However, the forest environment is complex, and there are many interferences, which are capable of inducing the false detection of fire. To reduce the false detection of fire and enhance the early warning systems of fire occurrence in forest, the detection of smoke, the most important visual phenomenon in the early stages of forest fires, with a high robustness and accuracy is urgently required [2].
Vision-based smoke detection methods can be divided into two categories, including the traditional and the deep learning methods. The traditional method involves classifying images using various image processing feature descriptors and calculations of these descriptors, mainly relying on the extraction of hand-crafted features, such as color [3], texture [4], and other information of the image. The common image processing methods include the optical flow method [5], wavelet energy [6], and background subtraction [7]. Meanwhile, the commonly used feature descriptors include the local binary pattern (LBP) [8], histogram of the gradient (HOG) [9], discrete wavelet transforms [10], and redesigned feature descriptors or improved feature descriptors. For instance, Liu et al. [11] proposed an LBP operator based on centrosymmetric gradient compensation, and Wang et al. [12] used the color and diffusion characteristics of smoke to define the time window and determined whether the smoke was generated by the slope of the fitting. For video smoke detection, the feature descriptors may also include the dynamic features between video frames [13].
The deep learning method automatically extracts features through a neural network after preprocessing the image. For image smoke detection, Zheng et al. [14] compared several target detection networks and found that EfficientNet has the highest average detection accuracy. For video smoke detection, Lin et al. [15] used 3D networks to detect smoke using video, and Ren et al. [16,17] made significant progress in their study on image dehazing with respect to fog. However, since fog is uniformly distributed within an area, while smoke, on the other hand, is randomly distributed, the detection methods should be different. As smoke detection using deep learning techniques requires a large amount of training data, this greatly limits its wide application. In addition, deep learning technology is still in the developmental stage; thus, it mainly relies on lower-level cues and rarely uses temporal cues or is compared with the manual methods.
As it is difficult to increase the accuracy of the existing traditional smoke detection methods and consider the universality of the hand-designed features, these methods often have a high false negative rate or high false positive rate. To address this challenge, Gao et al. [18] proposed using smoke roots as smoke features for smoke detection and developed a method based on fluid mechanics to detect smoke roots in videos. To adapt this method for long-distance scenes, Gao et al. [19] combined it with maximally stable extremal regions (MSER) to render the contour and shape of the smoke area more visible. Lou et al. [20] reduced the number of candidate smoke root points, therefore improving the computational efficiency of simulated smoke, but the detection speed still requires improvement.
The smoke root is an important feature for distinguishing smoke from other smokelike objects. However, the existing smoke root detection algorithm is still under exploration, and both the detection speed and accuracy require further improvement. Generally, the challenge of detecting the smoke root involves accurately obtaining and defining the smoke root points. To solve this challenge, in this paper, we obtain the complete contour of smoke to calculate the exact root point of the smoke using a pixel-level fusion algorithm. More importantly, we redefine the smoke root and divide the video according to the distance between the camera and the place where the smoke occurs to account for the universal adaptability of the smoke root. Specifically, to more effectively detect the smoke roots, we develop a new smoke root detection method based on connected particles, which is insensitive to the distance between the smoke and the lens, avoiding false detection and missed detection caused by distance and improving the robustness of the scene change. The comparison between the newly developed method with the Gao's method [19] indicated that the new method can significantly improve the speed of the detection of smoke roots.

Methodology
As shown in Figure 1, the smoke root detection algorithm proposed in this paper mainly includes five stages: (1) dynamic candidate region extraction, (2) static candidate region extraction, (3) region fusion, (4) the extraction of skeleton points, and (5) the calculation of smoke points.

Dynamic Candidate Region Extraction
Since the background of the video used for forest fire monitoring does not usually change, when a fire occurs, moving objects, such as smoke, enter and form the foreground; thus, the background modeling method can be used to extract the smoke generated in the early stages of the fire. Existing background modeling methods include the CodeBook [21], SACON [22], and Vibe [23] methods. In practical scenes, the extraction of the dynamic regions may be disturbed by light, the movements of leaves, and weather. Thus, choosing an appropriate background modeling method is crucial for achieving the required detection accuracy. In this paper, background subtraction based on the Gaussian mixture model (GMM) method [24] is adopted for modelling the video sequence, which can effectively overcome the interferences of weather, leaves, or light.

Static Candidate Region Extraction
Through the extensive observation of fog-free outdoor images, He et al. [25] found that, in most non-sky local patches in haze-free images, at least one color channel has very low grayscale values at certain pixels. Considering that smoke and fog have similar color features, and that trees occupy most of the forest fire surveillance video, this color feature can be used to remove some interfering scenes in the video and reduce the false alarm rate by using the formula below: where J c is the color for each channel, Ω(x) is the window x centered on the pixel, and J dark is the dark channel image. Thus, the binary image of a static candidate region (static_img) can be determined as:

Region Fusion
The GMM algorithm can effectively detect smoke in short-distance and fast-changing dynamic regions, but it is easy to induce a false detection in long-distance and slowly changing dynamic regions. On the other hand, the dark channel algorithm can extract all areas that are similar to the color of the smoke, and there are no holes. To this end, we propose an image fusion method that combines the dynamic and static regions, as shown in Figure 2. Since most of the same objects have the same color feature, a connected region may represent the same object, but there may also be objects with similar color features that are presented together in the scene, such as smoke and a house; thus, a connected region may also represent multiple objects. Therefore, smoke may exist in a single static connected domain or in a large connected domain. As shown in Figure 2, the fusion process firstly divides each frame of the smoke image into 10 × 10 grids and fuses the dynamic and static areas in each grid to avoid the phenomenon of "over-fusion". Then, according to the position of the pixels in the dynamic area, it verifies whether the position corresponding to the binary image of the static area belongs to a static pixel point. If so, the point is defined as a candidate smoke point, and the current point is used as the center to determine whether its eight neighborhoods are static area pixels. This process is repeated for the points that meet the conditions until the grid area is exceeded. If it is not, the point is not defined as a candidate smoke point. Observing Figure 3a,b, it can be seen that the colors of clouds and buildings, such as houses and roads, are gray-white. Thus, in the process of static feature extraction, objects with similar color characteristics to smoke are extracted as distractors. However, regardless of whether the smoke is near or far away, it is completely extracted, and it may be integrated with the building. By observing Figure 3b,c, it can be seen that the GMM algorithm can only extract a small part of the smoke with no obvious change in the long-distance region, while more dynamic areas can be extracted for the smoke with an obvious change in the short-distance region, but in this case, the phenomenon of hollowing occurs. At the same time, since the cars on the road are also moving, they are extracted as interferences. Figure 3c,d shows that the fused image using the fusion process method developed in this paper not only fills the holes of the smoke to render the smoke area more complete, but also excludes the influence of vehicles, buildings, and clouds on the smoke detection.

Extraction of Skeleton Points
In this study, the Zhang-Suen skeleton extraction algorithm [26] was used to extract the smoke skeleton. Each iteration was divided into two sub-iterations to remove the boundary and corner points of candidate smoke binary images. After many iterations, only the skeleton of the candidate smoke image remains. The second row in Figure 4 shows the fused images with smoke in four different distance environments, and the three row in Figure 4 shows the extracted skeleton images. From Figure 4, it can be seen that, in each scene, eight neighborhoods are detected. Accordingly, the endpoints of the smoke skeleton are identified, and the bottom endpoint is selected as the candidate smoke root node, which is marked with a red box in Figure 4. Among the objects, dynamic objects with a similar color to smoke are also marked.

Calculation of Smoke Root
As seen from Figure 4, there are still other objects causing interference when detecting the smoke. However, the important feature that can distinguish the smoke from these interfering substances is the "generalized root" of the smoke. Based on the definition of the smoke root by Gao [18], from a visual point of view, the "root" is not a certain pixel point on the image but a group of pixel points that is stable within a certain range. As the root is immobile, we can use the density of the candidate smoke root nodes to calculate the representative smoke root node and determine the smoke area, as shown in Figure 5. As shown in Figure 5, firstly, five consecutive frames of candidate smoke root point binary images are stored in a queue, and then all candidate root nodes are projected onto the black template image. Finally, according to their density, the endpoints on the black template are processed to obtain a clustering template. After the clustering is complete, the number of endpoints of each category is obtained. The number of endpoints is classified as one of three categories, each of which corresponds to a different calculation method, as follows: The total number is not more than three; thus, we exclude this area.

2.
If the total number is greater than three, and the number of overlapping points is less than three, according to all the endpoint information of this type, we find the center and radius of the circumcircle, as shown in Figure 6. If the radius of the circumcircle is greater than the threshold, the area is excluded. Otherwise, the area represented by this category is a smoke area, and the coordinates of the center of the circumscribed circle are the coordinates of the node representing the root of the smoke.

3.
If the total number is greater than three, and the number of overlapping points is greater than three, the most overlapping points represent the coordinates of the smoke root node.

Fire Smoke Video Dataset
To test the robustness and effectiveness of the developed algorithm, we selected 20 smoke sequences, some of which were gathered from public datasets on the internet, and the others were produced by the authors themselves, all of which were 480 × 320 in size. To validate the accuracy of the smoke root detection algorithm proposed in this paper [18], we established an artificial ROI area for each video. If the smoke candidate root was in the ROI area, the detection was determined to be successful; otherwise, the detection was deemed to have failed. Due to different scenes, the ROI sizes were also different. The specific ROI area size is shown in Table 1 and Figure 7.  At the same time, all videos were divided into long-distance and short-distance videos. ROIs greater than 10 × 10 were established as long-distance videos, and others were identified as short-distance videos. Among them, T1 to T12 are short-distance videos, where the smoke moving in the video is fast and occupies a large area, while T13 to T20 are long-distance videos, where the smoke is moving slowly and occupies a small area. As seen in Table 1, these 20 videos contain a variety of scenes and different background colors, including smoke surrounded by clouds, smoke obscured by pillars, and smoke that appears from houses in the evening.

Experimental Performance Analysis and Discussion
The smoke root detection method proposed in this paper was compared with two similar methods, including a forest fire smoke detection system based on the visual smoke root, which is a diffusion model proposed by Gao et al. (Method 1) [16], and a smoke segmentation algorithm based on improved intelligent seeded region growing (Method 2), proposed by Zhao et al. [27]. It is worth noting that, since the purpose of Zhao's method is to detect smoke, not smoke roots, in order to compare the proposed method with Zhao's method, we combined the developed smoke root node method in this paper with Zhao's smoke detection so as to obtain the final results of Zhao's method. Table 2 shows the attributes of each test video used in this experiment, the total number of frames, and the times when the smoke roots first appeared (in frames) using the three different methods. If the smoke root was not detected, it is marked as "NO". From Table 2, it can be seen that T1, T10, T14, and T17 were not detected by the other two methods. The common feature of these four videos is that there were disturbances similar to the color of the smoke in the scene, and the change in the smoke root is not obvious. In T17, especially, the cloud and smoke are almost integrated, which increases the difficulty of extraction. Method 1 places greater emphasis on the extraction of the foreground area, while the smoke changes slowly at the smoke source, meaning that this method is not ideal for such a scene. However, Method 2 segments the smoke using the method of region growth. For the parts with similar colors, over-segmentation easily occurs, resulting in the obtained smoke area being too large, and the position of the calculated smoke root node is offset. The method developed in this paper is applicable to such scenarios. In addition, although most of the smoke moves faster and is more easily detected in close-distance scenes, when viewing T4 and T9, we can see that none of the three methods detect the correct smoke root, the reason for which is that, in these two scenes, with the influence of the wind, the smoke in the video quickly spreads, and the real smoke root node is surrounded by the smoke, so that the real smoke root point cannot be detected. According to Table 2, we calculated the detection accuracy of the methods for detecting smoke roots in the short-distance and long-distance scenes and the total smoke root detection rate. As shown in Table 3, it can be seen that Method 2 has only a 25% accuracy for long-distance scenes, while the accuracy of the proposed algorithm is as high as 87.5%. In short-distance scenarios, the accuracy of Method 1 and Method 2 are similar, at around 50%, while the method developed in this paper improved this accuracy by 25% to obtain a value of 75% compared with the other two methods. In general, the accuracy of the proposed method is significantly better than that of the other two comparison methods. Table 3 also shows that, for long-distance scenes, the method developed in this paper has a greater advantage, but for short-distance scenes, the advantages are not as clear as those for long-distance scenes. This is because, although the smoke occupies a large area in short-distance scenes, the proportion of the pixel group points of the smoke root in the whole smoke area is relatively small, and a complete smoke outline is required for the accuracy to be high. From Table 3, we can see that the accuracy rate of Method 2 for long-distance scenes is only half that for short-distance scenes, because the region-growing algorithm proposed by Zhao can accurately segment the smoke when the smoke occupies a large area. Thus, the accuracy of this method only amounted to 25% for the long-distance scenes in this experiment.  Figures 8 and 9 illustrate the ratio of the number of frames to the total number of frames when the smoke root node is detected for the first time. If the ratio is 100%, it is considered that the smoke root point is not detected. As can be seen from Figures 7 and 8, the method developed in this paper is more efficient. In short-distance scenes, the correct smoke root can often be accurately detected in the first 20% of the frames, followed by Method 1 and Method 2. For the long-distance scenes, the overall efficiency is lower than that of the short-distance scenes using all three methods.  As shown in Figure 10, the algorithm proposed in this paper is in the middle level in terms of time cost, but it is more stable. From Figures 8 and 9, it can be seen that the method proposed in this paper has a high accuracy, and the accurate position of the smoke root points can often be detected at the early stage of the video, and even though the average processing time of each frame is longer, the overall efficiency is higher. The average efficiency of Method 1 is the lowest. The smoke area obtained by the Vibe algorithm used by Gao generally presents as multi-cluster scattered points, which need to be merged into a whole area through morphological operations, and the processed smoke outline expands outward, resulting in the offsetting of the boundary points of the smoke area. Moreover, the detection is performed every five frames within a group, delaying the detection rate. However, Method 2 places higher requirements on the test video. Under conditions where the smoke dynamic information is obvious and the colors of the scene in the video are quite different, the detected smoke area is more stable, while some test samples in this experiment show no obvious color differences. As the smoke moves slowly, a certain amount of time is required in order to detect the correct location of the smoke root. The fusion algorithm proposed in this paper can determine the root in five adjacent frames; that is, the position of the candidate smoke root can be calculated for each frame, which greatly saves time and space. The proposed fusion method can obtain a relatively complete smoke area, which is more conducive to the identification of the smoke root nodes.

Conclusions and Future Work
This paper proposes a new smoke root detection method that is not sensitive to the distance between the smoke and the lens by combining the GMM and dark channel prior algorithm to obtain the complete smoke area and improve the accuracy of the smoke root detection. In addition, we used the stability of the smoke roots to cluster the candidate points of the cigarette roots in five consecutive frames according to their density, obtained the circumscribed circle radius of the clustered points, and determined whether these were the final smoke roots, according to the circumscribed circle radius. The experiments showed that the newly developed smoke root detection method improves the accuracy by 37.5% to 62.5% in long-distance scenarios and, at the same time, the detection time is superior to that of the two existing algorithms.
The lack of datasets has always posed a serious problem for forest fire monitoring. Some researchers use synthetic or self-made methods in order to increase the number of datasets. However, the obtained data are too different from the real scene, resulting in a high false positive rate in practical applications. The question of how we can use a small amount of video material to accurately monitor forest fires is one of the problems that we need to solve. Moreover, future work may also consider how we can detect smoke in weather such as heavy fog and strong winds and further improve the performance.