1. Introduction
The research on flame image classification is of great significance in the field of forest fire prevention and fire safety [
1]. Traditional forest fire prevention methods, such as manual patrols and monitoring, suffer from low efficiency and susceptibility to factors such as weather and terrain. By utilizing flame image classification technology, early detection and rapid response to forest fires can be achieved, significantly improving fire prevention efficiency and accuracy. Flame classification and recognition based on video surveillance systems play an important role in preventing forest fires due to their advantages of real-time warning and facilitating data processing [
2]. Many flame recognition methods based on video surveillance have been proposed. In [
3], a dual-stream network was constructed using an attention-guided mechanism for classifying forest wildfires under video surveillance. In [
4], a Dilation Repconv Cross Stage Partial Network was proposed to enhance the multi-scale flame detection capability.
Traditional flame image classification methods mainly rely on researchers designing features to describe flames, such as color, shape, texture, and wavelet features. Reference [
5] summarizes the relevant techniques for smoke and fire detection using RGB and HSI color models. Zaidi et al. classified flame and non-flame images using RGB color space and YCbCr color space, respectively, to evaluate the performance of the two color spaces [
6]. Celik et al. use a YCbCr color space to describe flame characteristics for flame detection [
7]. Reference [
8] uses various image processing techniques such as filtering, color space conversion, image segmentation, and morphological operations to recognize forest flame images captured by unmanned aerial vehicles. In [
9], the author detected flames in video images by analyzing shape features such as area, centroid displacement, and perimeter of flame images. Real-time evaluation of forest fire status based on the combination of spatiotemporal feature extraction and dynamic texture analysis was reported in [
10]. Harkat et al. used wavelet decomposition to extract features such as wavelet length and achieved fire detection and classification in RGB and infrared image data [
11]. Although traditional feature extraction methods can achieve good results in specific environments, they need to be continuously improved and optimized to adapt to complex and changing environmental conditions.
In deep learning-based flame image classification methods, Convolutional Neural Networks (CNNs) and various variants and improved methods play important roles. The core idea of CNNs is to automatically learn effective feature representations from data through a combination of convolutional layers, pooling layers, and fully connected layers. In flame image classification, CNNs can automatically extract the features of flames, thereby achieving accurate classification of flame images. Lu et al. designed a lightweight TSCNN Flame network based on CNN architecture, which integrates temporal and spatial features to detect flame regions in videos [
12]. Li et al. combined the attention mechanism into the DenseNet network for detecting smoke and flames to improve recognition accuracy [
13]. ResNet34 has a relatively simple network structure and maintains good performance [
14]. By introducing the residual connection mechanism, ResNet34 can be trained deeper and more stably, thereby extracting richer feature information. Tsalera et al. combined different CNNs (including ResNet 50) with transfer learning for flame detection, achieving high detection accuracy [
15]. Muksimova et al. utilized ResNet50, VGG16, and EfficientNet-B3 networks as pre-trained models to extract features from multiple scales for flame image analysis [
16]. Some studies combine smoke detection with flame detection to improve the effectiveness of flame recognition [
17,
18]. Considering the complexity and computational efficiency of the model, this paper chooses ResNet 34 for flame image classification.
Although CNNs perform well, classification errors can still occur in flame classification problems. This is mainly because flame image classification faces various challenges such as illumination changes, occlusions, and diverse flame shapes, making it difficult for a single CNN to accurately capture all key features. Introducing a secondary classification strategy has become an effective method to improve classification accuracy. Secondary classification, which further subdivides or validates difficult-to-distinguish samples based on the initial classification, can fully utilize the advantages of different classifiers to explore flame features from different perspectives or deeper levels, thereby effectively reducing misclassification. In the task of flame image classification, we need to determine whether an image is a flame image. In this paper, we first use the ResNet34 network to determine whether or not an image is a flame image. After the ResNet network’s determination, we obtain a probability value, which is the indicator used to evaluate whether the image belongs to a flame image. If the probability value is higher than a certain threshold, the image will be classified as a flame category. If it is lower than a certain probability value, we will determine that it does not belong to a flame image, that is, it is a non-flame image. At the same time, there are still some images in the middle range, and there is still insufficient confidence in the accuracy of the determination results for these images. At this point, we hope to add new features to make further decisions. This idea perfectly conforms to the theoretical framework of the Three Decisions. The Three Decisions is a theoretical model proposed by scholars such as Yao on the basis of rough sets [
19]. Its core idea is to divide the whole into three parts, adopt different strategies for different domains, and establish corresponding decision rules [
20]. The three domains of the Three Decisions are positive domain, negative domain, and boundary domain. A positive domain decision means acceptance, a negative domain means rejection, and a boundary domain means not making a decision at that moment [
21]. Therefore, this article takes the results of the ResNet network for flame image classification as the initial classification. Images with high probability values in decision evaluation are classified as flames in the positive domain, while images with low probability values are classified as non-flames in the negative domain. For images in the middle, they are classified as boundary domains. Images entering the boundary domain need to be subjected to secondary classification based on new features. The overall architecture of the method proposed in this paper is shown in
Figure 1.
In the process of secondary classification, we attempted to adopt a relatively lightweight network structure that maintains good performance while having lower computational costs. Therefore, this paper improved DABNet and combined it with the features extracted by ResNet34 to improve the accuracy of delayed decision-making. DABNet was originally used in the field of semantic segmentation of images [
22]. By constructing bottleneck structures through deep asymmetric convolution and dilated convolution, DABNet can significantly reduce computational costs while maintaining high levels of performance. This efficient feature extraction capability is equally important in flame image classification tasks, allowing the network to extract more useful feature information with limited computing resources. Meanwhile, DABNet’s DAB module adopts a dual-branch structure, which facilitates the fusion of multi-scale information. In flame image classification tasks, features of different scales can provide more comprehensive image descriptions. For this purpose, we designed a DualArchClassNet for secondary classification tasks, which consists of two stages: feature extraction and classification. In the feature extraction stage, it inherits the characteristics of the encoder in DABNet and is used to extract key features from images, which are then concatenated and fused with the features extracted by ResNet34 to design a Feature Refinement Structure (FRS) during the classification stage to more accurately identify the presence of flames in the image.
The remainder of this paper is organized as follows:
Section 2 presents the method of initial classification using ResNet34.
Section 3 introduces three-way decision-making and the strategy for partitioning boundary domains.
Section 4 presents the design of the DualArchClassNet network to fuse new features for secondary classification.
Section 5 compares the experimental results of using only initial classification and introduces three-way decision-making for secondary classification.
Section 6 concludes the proposed work with a summary.
2. Initial Classification Bases on ResNet34
Convolutional Neural Networks (CNNs) have excellent feature extraction capabilities. In the initial classification task, considering both network performance and computational speed, ResNet34 was chosen as the preliminary classification model in this work. As shown in
Figure 2, ResNet34 utilizes the superposition of residual modules to form a 34-layer neural network. The front end of the network uses a standard convolution with a kernel size of 7 for feature extraction, and then max pooling is used to reduce the size of the feature map. By successively stacking multiple residual modules, the network is able to gradually extract multi-level features from the image like building blocks. Each residual module adopts a three-layer “bottleneck” design. Firstly, the number of channels is streamlined by 1 × 1 small convolution, which reduces the computational complexity. Subsequently, the core features are extracted by 3 × 3 standard convolution. Finally, the number of channels is adjusted by 1 × 1 convolution, which ensures a smooth connection with the subsequent modules. Due to the issue of latitude matching, there are differences in the connection methods in the network. In
Figure 2, module a uses residual connections to increase 1 × 1 convolution and reduce the size of the feature map, so that the dimensions of the residual image match those of the original image. The residual module can be obtained by Equation (1). Assuming that the input to the module is
, two consecutive 3 × 3 convolutions (
F3×3) are used, and the final output
is the post-convolution feature map summed with the original input to the module as follows:
where
F3×3 is the 3 × 3 standard convolution;
U and
are the original input and final output of the module. Finally, global pooling focuses on the global information of the feature map and connects it with the fully connected layer to predict the final flame recognition result. ResNet34 uses residual connections to add the residual image to the original image, resulting in a new image that integrates the original information and residual details. This new image serves as the input for the next layer of the neural network, enabling the network to learn more accurately by utilizing both the original information and residual details.
3. Three-Way Decision-Making Framework in This Work
After the initial classification, the ResNet34 network makes a determination on whether the given image belongs to a flame image, but there is a possibility of errors in this process. Based on the idea of three-way decision-making, we provide a decision evaluation index for the initial classification results. Here, we use the SoftMax function, that is, we use the probability value output by SoftMax to evaluate the results. If this probability value is high, it indicates that the cost of failing to classify image samples into flames is low.
The mathematical expression of the
SoftMax function is as follows:
where
Xi represents the linear output of the
ith element in the input vector;
N represents the total number of categories (dimension of input vector); and
SoftMax(
xi) represents the probability value corresponding to the
ith category after passing through the
SoftMax function. There are two types of samples in this study, where
i = 1 represents flame images and
i = 2 represents non-flame images.
Pr(
X|
Xi) is the probability of the output, which is used to represent the confidence level of the model that the sample belongs to each category. When the ResNet34 network decides to classify an image into the positive domain (flame), the expected cost of this misclassification, i.e., the positive domain cost, can be calculated using the probability output by
SoftMax. Similarly, when the model classifies the sample into the negative domain (non-flame), the probability output by
SoftMax can be used to calculate the negative domain cost.
Unlike traditional three-way decision-making methods that use parameter estimation to determine the boundary domain, this paper considers the classification task and uses the output probability of the SoftMax function to divide the positive domain, negative domain, and boundary domain. By using the SoftMax function, the prediction probabilities, Pr(X|X1) and Pr(X|X2), corresponding to the categories of fire and non-fire can be obtained. The maximum value Pyi = max(Pr(X|X1), Pr(X|X2)) in our two probabilities is used as the final classification result, where Pr(X|X1) represents the probability of flames in the image, and vice versa, Pr(X|X2) represents the probability of no flames in the image. It is worth noting that in this work if the probability of an image being classified as a flame image is low, the probability of it being classified as a non-flame image will be high. Therefore, we form the following rules for dividing positive, negative, and boundary domains:
- (1)
If and , then and the ith image is classified into the positive domain (fire image);
- (1)
If and , then , which means that the ith image is classified into the negative domain (non-fire image);
- (1)
If , then , which indicates that the image is classified into the boundary domain (delayed processing).
This indicates that the current information is insufficient for clear classification. Therefore, more information is required for subsequent secondary classification.
5. Experimental Results
To verify the effectiveness of the method, we conducted experiments on two public datasets: the FLAME forest fire detection dataset [
26] and the Kaggle fire dataset [
27]. The FLAME dataset consists of 47,992 images, including 30,155 fire images and 17,837 non-fire images, which contain images in various states.
Figure 5 shows some images in different states in the FLAME dataset. The fire recorded in this dataset occurred on 16 January 2020, in a pine forest in Arizona, USA. During the data processing, the dataset is divided into a training set and a testing set, with 39,375 images used for training and 8617 images used for testing. This partitioning ensures that the training set can effectively cover different types of fire and non-fire scenarios, thereby improving the learning performance of the model. The Kaggle dataset consists of 999 images, with 755 representing fire images and 244 representing non-fire images.
Figure 6 shows the images in some states of the Kaggle dataset, with 799 images used for training and 200 images used for testing. In order to improve the generalization ability of the model, this paper applies a series of data augmentation techniques during the training process, including left and right, up and down flipping, and color jitter techniques to increase the diversity of the training samples.
5.1. Experimental Parameter Settings
The experiment was conducted on a Linux operating system using an Nvidia GeForce GTX 2080Ti GPU. This article uses the PyTorch (1.11.0) deep learning framework. During the training process, for the ResNet34 network for initial classification, the stochastic gradient descent (SGD) optimizer is used, with the optimization objective being the cross entropy loss function. The initial learning rate is set to 0.01, momentum is set to 0.9, and the training rounds are 100. In the secondary classification DualArchClassNet network, an adaptive moment estimation optimizer is adopted, with the optimization objective of a binary cross entropy loss function with Logits. The initial learning rate is set to 0.00001, the momentum is 0.9, and the training rounds are 50.
In order to better adapt to the training needs of different stages, this paper adopts an adjustment strategy based on the LambdaLR learning rate scheduler [
28]. We used the cosine annealing learning rate in LambdaLR and employed a warm-up strategy during the initial training phase. Assuming the initial learning rate is
lr_init, and the number of warm-up iterations is
num_warm_up, the preheating strategy can be expressed as follows:
where t is the current iteration count, num_warm_up is the preheating stage iteration count, and lr_init is the initial learning rate. The cosine annealing learning rate adjustment strategy can be expressed as follows:
where
x is the current number of training rounds,
S is the total number of training rounds, where
y1 and
y2 are ranges for the learning rate. In this work, the down regulation learning rate method in the initial classification network are setted as (10,20,40,60). The down regulation learning rate methods in secondary classification networks are setted as (5,15,30,40).
5.2. Initial Classification Results and Boundary Domain Division
In this work, classification accuracy was selected as the evaluation metric, which mainly focuses on the overall classification performance. To demonstrate the effectiveness of the three decision-making methods in dividing positive, negative, and boundary domains, we calculated the positive domain accuracy (POS
p), the negative domain accuracy (NEG
p), and the boundary domain accuracy (BND
p), respectively, after the initial classification as follows:
where
POSS,
BNDS, and
NEGS represent the population sample size in the positive domain, boundary domain, and negative domain, respectively.
POSp1,
BNDp1, and
NEGp1 represent the number of correctly classified samples in the positive, boundary, and negative domains.
After the initial classification of images using the ResNet34 network, the classification accuracy of the test set was 86.93%. Although the ResNet34 model can maintain good performance in complex and noisy environments, it still has significant errors for images in complex environments such as high occlusion and open spaces with smoke. Therefore, this paper introduces three-way decision-making for the secondary classification of images in the boundary domain.
The selection of threshold α determines the images entering the boundary domain. We conducted experiments with different threshold α, and the results are shown in
Table 1. It can be seen that the accuracy of images divided into the boundary domain is much lower than that of the entire sample. We chose α = 0.90 as the threshold, achieving a good balance between positive domain accuracy, negative domain accuracy, and uncertain domain accuracy. If the value of α is too large, although the accuracy of the positive domain is higher, more correctly classified samples will enter the boundary domain. If the value of α is too small, it will result in lower accuracy of the positive domain.
5.3. Analysis of Secondary Classification Results
For images divided into boundary domains, this paper uses DualArchClassNet for secondary classification. To demonstrate the feature extraction performance of the DualArchClassNet network,
Figure 7 shows the extracted feature maps at different stages. It can be seen that the features extracted by the Initial Features layer are relatively scattered. After fusing multi-scale features and introducing a channel attention mechanism, the features extracted by the Enhanced Features layer enhance the expression ability of key regions.
In order to verify the classification performance based on the three-way decision-making and DualArchClassNet network, we compared the results of our method with those of using the initial classification network alone and directly using the secondary classification network without using the three-way decision-making. From
Table 2, it can be seen that for the FLAME dataset, after using three-way decision-making and secondary classification, the classification accuracy of our method improved by 2.44 compared to the initial classification. In order to demonstrate the effectiveness of the three decision methods, we conducted experiments using a single secondary classification network (DualArchClassNet) to directly classify all images in the test set. It can be seen that its classification accuracy is lower than the method proposed in this paper, and it also proves that the classification accuracy can be significantly improved after partitioning the uncertain domain. Meanwhile, on the FLAME dataset, we also compared our proposed method with other existing studies, and it can be seen that our method outperforms the other four methods in terms of classification accuracy. For the Kaggle dataset (
Table 3), due to its relative simplicity and small number of images, the initial classification network has already achieved good results, so the method proposed in this paper only slightly improves. The experiments on the Kaggle dataset also demonstrate that the method proposed in this paper has strong scene adaptability. It is worth noting that the classification accuracy of our method in the FLAME dataset is lower than that in the Kaggle dataset because the recognition of the FLAME dataset presents certain challenges. The images in the FLAME dataset were captured by drones, which present challenges such as small flame areas, complex backgrounds, and the presence of obstructions. We visually demonstrated the classification performance of the proposed method among different categories through the confusion matrix, as shown in
Figure 8. The classification error rate of flame images in the FLAME dataset was slightly higher than that of non-fire images. In the Kaggle dataset, the classification error rate of non-fire images is higher than that of flame images. Through the analysis of misclassified images, we found that misclassification is prone to occur when the flame area is small or the background area has similarities to the flame, which is also related to whether there are enough images in the dataset for training. Overall, the method proposed in this article achieves better classification results for images with a larger proportion of flame area and vivid colors.
5.4. Ablation Experiment
To better demonstrate the effectiveness of the method proposed in this paper,
Table 4 and
Table 5, respectively, show the impact of each module on classification accuracy under different datasets. During the training process, we removed the CBAM, ECA, and FRS modules from the secondary classification network. The results show that these modules have a significant contribution to the model performance in terms of accuracy, thus proving their improvement effect on DualArchClassNet. In the Kaggle dataset, removing the FRS module improved performance, mainly due to the presence of an image in the test set with a small flame area surrounded by thick smoke. In this case, the maxpool operation of the FRS module failed to effectively capture flame features, resulting in a decrease in detection performance. However, on the FLAME dataset, the performance of the FRS module is significantly better than that without the FRS module, indicating its stronger robustness in complex scenarios. Therefore, considering the performance of different datasets comprehensively, this method still retains the FRS module for secondary classification to ensure the stability and accuracy of the model in various environments.
6. Discussion
The flame classification based on video images plays an important role in the application of fire prevention and emergency response. With the development of deep learning, it has achieved good performance in the field of flame image recognition. However, in complex and ever-changing fire scenarios, the classification accuracy of a single network still faces significant challenges. The strategy proposed in this article, which combines three-way decision-making with deep learning, can help improve classification accuracy. Below, we will discuss the noteworthy aspects of using three-way decision-making for flame classification, as well as the correlation between flame image classification and fire protection systems.
- (1)
Lessons learned from using three-way decision-making strategies
When combining three-way decision-making with deep learning techniques for flame classification, both methods have demonstrated mutual advantages, but there are still some issues worth noting. One issue is the mechanism of boundary domain partitioning. It is the key factor for the three-way decision-making. The determination of the threshold is quite important. The choice of threshold has a significant impact on the partitioning results of a boundary domain. By adjusting the threshold, the classification accuracy of positive and negative domains can be balanced with the size of the boundary domain. Reasonable threshold selection can ensure that the boundary domain contains as many classification-uncertain images as possible while avoiding too many determined images from being mistakenly classified into the boundary domain. The second issue is that when performing secondary classification, the chosen method should be different from the network used for the initial classification, which can make them complementary when merging with each other. This is beneficial for correctly classifying images entering the boundary domain during secondary classification.
- (2)
The application potential of the proposed flame classification method
The results of flame image classification have a certain application potential. The classification results determine whether there is a flame in the monitored image, which can be integrated with other parts of the fire protection system, such as alarm and emergency response modules. When the flame classification method outputs a classification result (i.e., fire and no-fire), it can be output to the alarm module of the fire protection system. After receiving the information, the alarm module can send the alarm information to the monitoring center or designated personnel. After receiving the flame classification results, the fire department should quickly confirm their authenticity.
7. Conclusions
This article focuses on the classification and recognition of flame images using deep learning techniques and three-way decision-making strategies. In order to improve the accuracy of flame image classification using a single deep learning network, this article uses the probability value calculated by the SoftMax function to determine the credibility of the initial classification based on the ResNet34 network. Using the idea of three-way decision-making, samples with low classification probability values are divided into the boundary domain, and further, the images divided into the boundary domain are subjected to secondary classification. In the secondary classification, the DAB network is combined with ResNet34 to construct a DualArchClassNet network structure, which extracts new features to reclassify the images in the boundary domain. The results indicate that the overall classification accuracy has been improved after using three-way decision-making ideas and the DualArchClassNet network for secondary classification.
In the future, we will continue to explore the application of deep learning technology in the field of fire prevention and early warning, especially the application of lightweight networks with real-time capability and a network structure deployed in hardware and edge computing, to enhance the application value of the method.