Next Article in Journal
Cypress Wood and Bark Residues Chemical Characterization and Utilization as Fuel Pellets Feedstock
Previous Article in Journal
A Forest Fire Identification System Based on Weighted Fusion Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Wildfire Identification Based on an Improved Two-Channel Convolutional Neural Network

1
College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China
2
China-Pakistan Belt and Road Joint Laboratory on Smart Disaster Prevention of Major Infrastructures, Southeast University, Nanjing 211189, China
*
Author to whom correspondence should be addressed.
Forests 2022, 13(8), 1302; https://doi.org/10.3390/f13081302
Submission received: 6 July 2022 / Revised: 9 August 2022 / Accepted: 12 August 2022 / Published: 16 August 2022
(This article belongs to the Section Natural Hazards and Risk Management)

Abstract

:
The identification of wildfires is a very complex task due to their different shapes, textures, and colours. Traditional image processing methods need to manually design feature extraction algorithms based on prior knowledge, and because fires at different stages have different characteristics, manually designed feature extraction algorithms often have insufficient generalization capabilities. A convolutional neural network (CNN) can automatically extract the deeper features of an image, avoiding the complexity and blindness of the feature extraction phase. Therefore, a wildfire identification method based on an improved two-channel CNN is proposed in this paper. Firstly, in order to solve the problem of the insufficient dataset, the dataset is processed by using PCA_Jittering, transfer learning is used to train the model and then the accuracy of the model is improved by using segmented training. Secondly, in order to achieve the effective coverage of the model for fire scenes of different sizes, a two-channel CNN based on feature fusion is designed, in which the fully connected layers are replaced by a support vector machine (SVM). Finally, in order to reduce the delay time of the model, Lasso_SVM is designed to replace the SVM in the original model. The results show that the method has the advantages of high accuracy and low latency. The accuracy of wildfire identification is 98.47% and the average delay time of fire identification is 0.051 s/frame. The wildfire identification method designed in this paper improves the accuracy of identifying wildfires and reduces the delay time in identifying them.

1. Introduction

As the Earth’s climate changes, wildfires occur frequently around the world, causing serious economic losses and ecological damage [1]. In 2019–2020, more than 24 million hectares of land in south-eastern Australia were burned by wildfires, resulting in 33 deaths and more than 3000 houses being razed. A total of 80% of the Australian population was affected by the smoke from the wildfire, which killed 445 people, hospitalised more than 4000 and cost $2 billion in medical treatment. The wildfires caused billions of dollars of damage, emitted 400–700 million tonnes of carbon, and killed 3 billion wild animals [2]. Fire affects the biodiversity, species composition, and ecosystem structure of forest ecosystems, in addition to human livelihoods, regional economies, and environmental health [3]. Intelligent monitoring of wildfire is therefore of great importance.
Image-based fire identification has become a research focus in recent years. There are two main types of image-based wildfire identification methods: one is a traditional image processing method and the other is the identification method based on CNN [4].
Traditional image processing methods have been used to identify fires by extracting features such as colour, geometry and dynamic texture. Kong et al. [5] proposed a fire identification method based on temporal smoothing and logistic regression, which used background subtraction to obtain colour component ratios and running cues of fires to identify suspected fire regions and used logistic regression to identify fires based on the size, motion and colour characteristics. Mei et al. [6] proposed a new method for fire identification which first extracted suspected fire regions by random forest, support vector machine and improved ViBe algorithms, and then used fire overlap rate and motion intensity rate between each frame and Hu features to identify fires. Dimitropoulos et al. [7] fused the features of fire smoke according to a dynamic fractional union approach and then classified them by SVM to finally identify fire smoke. Gong et al. [8] proposed a fire identification method based on multiple features of fires, which extracted suspected fire regions by identifying the motion and colour of the images and calculated the fire centre of mass for each frame to identify fires by extracting the spatial, shape, and area variability of the images. Prema et al. [9] proposed a multi-feature-based fire smoke identification method, which extracted suspected fire smoke regions by filtering in the YUV colour space and extracted the spatio-temporal features of the regions to identify fire smoke by SVM. Han et al. [10] used a fire identification method based on a Gaussian mixture model and multicolor features to extract moving objects from the video, then combined RGB, HSI and YUV color spaces to extract suspected fire regions and combined the results obtained above to identify fires. Chu et al. [11] proposed a probabilistic inference framework to determine fire location and fire size, and to estimate multidimensional fire parameters using Bayesian inference theory.
However, traditional image processing methods rely on prior knowledge to determine the feature extraction algorithms for fires, and the morphology of fires can produce different features at different stages, making it difficult for the same fire feature extraction algorithm to cater for different fire scenarios, and therefore their generalization capability is weak. The deep fire features can be found automatically by CNN-based image identifying methods [12], which avoids the limitations caused by the selection of image features by subjective human factors, thus greatly improving the accuracy of identifying. Zhang et al. [13] proposed a two-layer CNN-based method for wildfire identification, which first used a CNN with a larger input resolution to detect the image and then used a fine-grained CNN to pinpoint the fire if a wildfire was detected. Kim et al. [14] used Faster-RCNN to detect suspected fire regions and non-fire regions based on spatial features and detect the presence of fire in the short term by accumulating summary features within bounding boxes in successive frames using Long Short-Term Memory (LSTM), and then successive short-term results were combined into a majority vote to make the final decision. Zhang et al. [15] proposed an effective SqueezeNet-based asymmetric encoder-decoder U-shape architecture ATT Squeeze U-Net to identify wildfire. Muhammad et al. [16] proposed a computationally efficient CNN model based on the Squeezenet architecture, which used smaller convolutional kernels and removed the fully connected layers to achieve high computational efficiency. Han et al. [17] proposed an effective method for wildfire identification, which combined CNN and RNN to enable the model to process sequential data, and the method achieved good identification results in various fire scenarios. Wu et al. [18] used a deep learning-based method for the identification of fire and smoke which used ViBe and frame difference methods to extract regions of motion and used features extracted using Alexnet and the degree of irregularity of fire and smoke as static features, combining dynamic and static features to eventually identify regions of fire and smoke.
The comparisons between extracting wildfire features with traditional image processing methods and deep learning methods are shown in Table 1.
Despite its accomplishments in wildfire identification, CNN suffers from significant latency. Reducing the delay time by lightening the network structure of the CNN leads to a reduction in model accuracy. In order to solve the problems above, a wildfire identification method based on an improved two-channel CNN is proposed, aiming to achieve high accuracy and low latency of the model. In this method, first, the dataset is expanded with an image enhancement strategy, training is performed using transfer learning, and model accuracy is improved using segmented training. Next, a two-channel CNN is designed, and feature fusion of the two-channel CNN is implemented. Finally, the structure of the two-channel CNN is modified to reduce the delay time of the model in identifying images. The results of the study may provide useful information for the improvement of wildfire identification algorithms.

2. Construction of the Experimental Dataset

The dataset used in this paper is obtained from the Google search engine and contains three main parts. Part one includes wildfire scenes, which contains images of large fires and smoke, as shown in Figure 1. There are no special restrictions on the place of wildfires, and they are available at various times of the day. Part two includes regular forest scenes, which contain images of the forests without fires and smoke, as shown in Figure 2. As the identifying of wildfire is susceptible to sunlight, part three includes sun scenes, which contain sunset images and sunset images when there are large areas of clouds, as shown in Figure 3. There are 14,000 images in the dataset. A total of 10,000 images are selected as the training set and 4000 images as the testing set, with 5000 fire and 5000 non-fire images in the training set and 2000 fire and 2000 non-fire images in the testing set.
The accuracy of the model is strongly influenced by the dataset. When the size of the dataset is minimal, the model is prone to overfitting, resulting in the model’s accuracy for the training set being substantially larger than the model’s accuracy for the testing set. In order to reduce the overfitting of the model and improve its generalization ability, the PCA_Jittering image enhancement [19] strategy is adopted in this paper, thus expanding the training set. Firstly, principal component analysis (PCA) is used on the three RGB channels of the training set to obtain the covariance matrix and calculate the eigenvalues and eigenvectors. Next, Gaussian jitterings are added to the eigenvalues and the eigenvalues are multiplied by the eigenvectors. Finally, the multiplication results are added to the original image, modifying the intensity of the image’s three RGB channels, and the image using PCA_Jittering image enhancement is added to the training set for training. The flowchart of PCA_Jittering is shown in Figure 4.
The specific steps of PCA_Jittering are as follows:
(1).
Obtain the original image I x y of the wildfire, split the three RGB channels of the original image I x y , and obtain the values I x y R ,   I x y G ,   I x y B of the three channels.
(2).
Using I x y R as an example, I x y R are normalised to eliminate the effect of odd data. The normalisation formula is shown in Equation (1).
I x y R = I x y R 255.0
(3).
To unify the scale of the images, I x y R are standardised. The standardisation formula is shown in Equation (2).
g i j R = x i j R μ R σ R
where x i j R denotes the grayscale value of row i and column j of the R-channel image of wildfire; g i j R denotes the standardised grayscale value of row i and column j of the R-channel image of wildfire; μ R denotes the arithmetic mean of the R-channel, μ R = 1 n × m i = 1 n j = 1 m x i j R , n denotes the number of rows of the image and m denotes the number of columns of the image; σ R denotes the standard deviation of the R-channel, σ R = 1 n 1 i = 1 n ( x i j R μ R ) 2 .
(4).
Expand the standardised g i j R into a column of vector g R = [ g 1 R   g 2 R     g m × n R ] T .
(5).
I x y G and I x y B are processed in the same way as I x y R . After processing, the standardised vectors g G and g B for the G and B channels are obtained, and g R ,   g G ,   g B vectors are formed into a matrix T.
T = [ g 1 R g 1 G g 1 B g 2 R g 2 G g 2 B g m × n R g m × n G g m × n B ] = [ g R   g G   g B ]
(6).
Create a covariance matrix S of T. The formula for the covariance matrix S is shown in Equation (4).
S = [ c o v ( g R , g R ) c o v ( g R , g G ) c o v ( g R , g B ) c o v ( g G , g R ) c o v ( g G , g G ) c o v ( g G , g B ) c o v ( g B , g R ) c o v ( g B , g G ) c o v ( g B , g B ) ]
where c o v ( g w , g v ) = 1 n 1 i = 1 m × n ( g i w g w ¯ ) ( g i v g v ¯ ) , w , v [ R , G , B ] , g w ¯ and g v ¯ denote the average of the w and v columns, respectively.
(7).
Solve for the eigenvectors p i and eigenvalues λ i of the covariance matrix S, where i [ 1 , 2 , 3 ] .
(8).
Multiply the eigenvalues λ i by Gaussian jitterings α i with 0 as the mean and 0.1 as the variance, then multiply the eigenvalues after adding jitterings with the eigenvectors and multiply the results by 255 and add them to the original pixel values I x y = [ I x y R , I x y G , I x y B ] . The pixel values of the image after using PCA_Jittering image enhancement are shown in Equation (5).
I x y = [ I x y R , I x y G , I x y B ] + 255 × [ p 1 , p 2 , p 3 ] [ α 1 λ 1 , α 2 λ 2 , α 3 λ 3 ]
The dominant colours in the RGB channels of the image can be found by PCA_Jittering image enhancement, so that the values of the dominant colours are drastically altered while the overall tone of the image remains unchanged. The images using PCA_Jittering image enhancement are added to the training set. Finally, the wildfire image dataset has a total of 24,000 images, with 20,000 images in the training set and 4000 images in the testing set. There are 10,000 images for each of the fire and non-fire samples in the training set and 2000 images for each of the fire and non-fire samples in the testing set. The comparisons of images before and after using PCA_Jittering image enhancement are shown in Figure 5.

3. Establishment of the Wildfire Identification Model

3.1. Modeling of the Alexnet Network

Deep features of fires need to be extracted by CNN. To identify wildfire, Alexnet [19] was used as a CNN architecture in this paper. Alexnet consists of five convolution layers, three pooling layers and three fully connected layers Fc6Fc8. The convolution and pooling layers form C1C5. The structure of Alexnet is shown in Figure 6.
Partial features of the image are learned through the convolution kernel of the convolution layer to generate the feature map. The convolution operation is used to extract the different features of the input. The formula for extracting image features from the convolution layer is shown in Equation (6).
X j l = f ( i M j X i l × k i j l + b j l )
where X j l represents the jth feature map extracted from the lth convolution layer, f ( ) represents the activation function, Mj represents the feature map, k i j l represents the convolution kernel, and b j l represents the offset.
Once the image features have been extracted by the convolution kernel, activation functions need to be introduced to allow the model to address non-linear issues. The ReLU activation function can address not only the non-linearity problem of the model, but also the gradient disappearance problem when the network layers are deep, and its expression is shown in Equation (7). Since Leaky ReLU does not always work better than ReLU [20], ReLU is chosen as the activation function in this paper, and the convolution layer is calculated as shown in Equation (8).
ReLU ( x ) = max ( 0 , x )
X j l = ReLU ( i M j X i l × k i j l + b j l )
The feature maps obtained from the convolution operation are relatively large, and if they are fed directly into the fully connected layers, the performance of the network will be reduced, while the pooling layer can compress the features extracted from the convolution layer. In this paper, the maximum pooling layer is used to extract features from the convolution layer. Only the largest feature value within the region is retained by the maximum pooling layer, and all other features are ignored by the maximum pooling layer. This not only allows the features to be compressed, but also alleviates the sensitivity of the convolution layer to position, and the effect of noise is reduced. The effect of the maximum pooling layer is shown in Figure 7.
After convolution pooling layers C1C5, the deep features of wildfire were extracted by CNN. In order to identify fires, deep features are fed into the fully connected layers. The fully connected layers are neural networks with multiple hidden layers that identify wildfire by updating the parameters with a back propagation algorithm.
Due to the long training time of Alexnet and the tendency of the model to overfit, transfer learning is used in this paper to train the model. Transfer learning is a machine learning method that reduces the training time and overfitting of a model by transferring parameters from the source domain to the target domain. With transfer learning, the model no longer needs to be trained from a blank CNN. In this paper, the network parameters C1C5 trained on the large dataset ImageNet are transferred to the target model, the model parameters are initialised, the fully connected layers Fc6_newFc8_new for the new task are designed to replace the fully connected layers Fc6Fc8 in the original model, and finally the network parameters are fine-tuned using the wildfire image dataset. The transfer process in this paper is shown in Figure 8.
In this paper, the number of input neurons of the new fully connected layers is designed to be the number of dimensions of the wildfire features extracted by the convolution pooling layers C1C5. Since the output of the fully connected layers has two states, fire and no fire, the number of output neurons of the new fully connected layers is designed to be two. The features extracted from the convolution pooling layers C1C5 are used as inputs to the new fully connected layers. The features extracted by C1C5 are first fed to the Dropout layer, which can randomly ignore some neurons in the model, and overfitting of the model can be well avoided. Next, the features are dimensionally reduced through Fc6_new. The features are then reduced to two dimensions through the ReLU activation function, Dropout layer, Fc7_new, ReLU activation function and Fc8_new, thus giving the model the ability to identify wildfire. Finally, the features are fed to the softmax layer to output the classification result, which is calculated as shown in Equation (9). The diagram of the new fully connected layers is shown in Figure 9.
S q = e q w = 1 2 e w
where q denotes the qth component in the output vector, Sq denotes the classification probability of the qth component, and w denotes the sequence number of the component.
The network settings for the new model are shown in Table 2.
The Adam optimiser is employed in this paper so as to update the model parameters of the CNN. Adam is an adaptive learning rate method with the following equations.
m t   =   β 1 m t 1   +   ( 1 β 1 ) g t
v t   =   β 2 v t 1   +   ( 1     β 2 ) g t 2
θ t + 1   =   θ t     α v ^ t + ε m t ^
where mt and vt are the vectors initialized to zero in the first and second moment estimates of the gradient, respectively, θ t is the learnable parameter, ε is the smoothing term, a is the adaptive learning rate, and g t is the gradient distribution.
Based on transfer learning, segmented training is used in this paper to further reduce the overfitting of the model. Segmented training means that the image dataset is first set to a low resolution by the model and the model starts training using low resolution images. As the training progresses, the resolution of the dataset is gradually increased and the model parameters from the low resolution are transferred to the new model for training until the resolution reaches the specified size. When training with low resolution images, the overall structure of images is learned by the model, and this information can be refined as the resolution of images is extended.

3.2. Development of a Two-Channel CNN

In order to achieve effective coverage of the model for fire scenes of different sizes and thus improve the accuracy of the model, a two-channel CNN based on different resolutions is proposed in this paper. The two-channel CNN has two convolution pooling layer channels, and the inputs of the two channels are wildfire datasets with different resolutions. The resolution of the dataset for the first channel is 227 × 227 and the resolution of the dataset for the second channel is 336 × 336. The convolution pooling layers C1C5 of both channels are trained simultaneously and wildfire features are extracted independently, and the features are flattened after extraction and fused in a 1:1 ratio. Considering the unique advantages of SVM in small samples and non-linear classification problems, the fully connected layers in the original model are replaced using SVM. The structure of the two-channel CNN is shown in Figure 10.
SVM [21] is a machine learning algorithm that requires less training data and has high model accuracy. The features extracted and fused by the two-channel CNN are trained by SVM to find the optimal segmentation hyperplane of the model and to achieve fire classification. When given an image (xi,yi), y i { 1 , 1 } of wildfire, SVM is optimised in the form shown in Equation (13).
min w i , j , b i , j , ξ t i , j 1 2 w i , j 2   +   C t = 1 ξ t i , j s . t   ( w i , j ) T ϕ ( x t ) + b i , j 1 ξ t i , j ,   i f   y t   =   i ; ( w i , j ) T ϕ ( x t ) + b i , j 1 ξ t i , j ,   i f   y t   =   j ; ξ t i , j 0 , i < j
where w is the normal vector perpendicular to the resulting hyperplane, ϕ is the non-linear mapping, ξ is the relaxation variable, and C is the penalty factor.
After introducing the Lagrange function L = (w,b,a) and implementing the dualisation, the decision function becomes:
f i , j ( t ) = s i g n ( l = 1 n s v i , j a l i , j y l i , j k ( t , x l i , j ) + b i , j )
where t is the sample to be classified, n is the number of support vectors, xi is the fused feature vector, yi is the corresponding category, s i g n ( ) is the symbolic function, α i is the Lagrangian factor, k ( x , x i ) is the kernel function, and b is the classification threshold determined from the training set.
The SVM kernel function used in this paper is a Gaussian radial basis function. The advantages of few parameters and fast convergence are possessed by the Gaussian radial basis function, whose formula is shown in Equation (15).
k ( t , x l i , j ) = exp ( t x i 2 2 σ 2 )
where σ is the argument to the kernel function.

3.3. Construction of an Improved Two-Channel CNN

Due to the large number of features extracted by the two-channel CNN, feeding them directly to the SVM will result in a large delay time. To address this problem, an improved two-channel CNN is proposed in this paper, which compresses the flattened features from the two channels separately. After feature compression, the improved two-channel CNN performs feature fusion and finally classifies the dataset by SVM.
In this paper, since the features after flattening are high-dimensional sparse data, and linear models have better performance in dealing with these data while being widely used in classification problems [22], Lasso is chosen to compress the features in this paper.
Lasso is a compressed estimate in which some of the eigencoefficients in the model are made to be exactly zero by adding the L1 parametrization to the model, that is, certain features are completely ignored by the model, thus presenting the most important features of the model and achieving the effect of subset shrinkage [23]. Calculation time can be reduced by reducing the number of dimensions of the original data [24].
Taking one of the two channels in the improved two-channel CNN as an example, the features extracted by the convolution pooling layers C1C5 are flattened and used as inputs to Lasso, and the output is predicted using Equation (16).
y ^ = w 1 x 1 + w 2 x 2 + + w i x i + b
where xi denotes the features after flattening, wi denotes the parameters of the features in the model, b is the offset, and y ^ is the predicted output.
Lasso calculates the loss function J ( w , b ) from the squared error of the true value y and the predicted value y ^ , and adds a regular term to the loss function. Lasso’s loss function is shown in Equation (17).
J ( w , b ) = 1 2 m i = 1 m ( y ^ y ) 2 + λ w   1
where m is the number of wildfire dataset, n is the number of features after flattening, λ is the regularisation parameter, and w   1 = p = 1 n | w p | is the L1 parametrization.
The objective of Lasso is to achieve variables selection and model parameter estimation, and Lasso’s solution formula is shown in Equation (18).
arg min w R P 1 2 m i = 1 m ( y ^ y ) 2 + λ w   1

4. Experimental Results

This experiment was performed under the Pytorch framework of Windows 10 system, Intel(R) Core (TM) i5-6300HQ 2.3 GHz; 16 GB DDR4 memory; the GPU uses the NVIDIA GTX 950 as the hardware accelerator for model training.

4.1. Simulation Analysis of PCA_Jittering Image Enhancement

PCA_Jittering is used in this paper to achieve image enhancement of the training set by drastically changing the values of the major colour families in the RGB image so that the overall hue of the image does not change. The enhanced images are added to the training set, thus increasing the number of images in the training set. Because of the inconsistent image sizes in the wildfire dataset, the images are first stretched and compressed to a uniform size of 227 × 227 and then the images are standardised. Next, the batch training method is used in this paper and the value of batch_size is set to 8.
The back propagation algorithm is employed by CNN so that the parameters can be updated. The learning rate has a large impact on the back propagation algorithm and the performance of the network is directly determined by the value of the learning rate. Therefore, different learning rates are set in this paper so as to test the effect of learning rate on network performance. The initial learning rate is first set to 0.1, and subsequently the learning rate for each trial is set to 0.01, 0.001, 0.0001, and 0.00001, respectively. The experimental results are shown in Table 3, where the model fails to converge when the initial learning rates are 0.1, 0.01 and 0.001; when the initial learning rate is 0.0001, the testing set accuracy of the model is 95.33%, which is 0.27% lower than the accuracy of the model when the initial learning rate is 0.00001. When the initial learning rate is 0.000001, the testing set accuracy of the model is 94.38%, which is 1.22% lower than the accuracy of the model when the initial learning rate is 0.00001. Therefore, the model has the highest testing set accuracy of 95.60% when the initial learning rate of the model is 0.00001. Therefore, the initial learning rate of the model is set to 0.00001.
To verify the effectiveness of PCA_Jittering, two sets of experiments without and with PCA_Jittering are set up in this paper for comparison. Before using PCA_Jittering, the model loss is 0.029. After using PCA_Jittering, the model loss is 0.0129, and the loss value is reduced by 0.0161. The comparison of loss values in the training set before and after using PCA_Jittering is shown in Figure 11. Table 4 shows the comparison of the models’ accuracy in wildfire identification. Before using PCA_Jittering, the accuracy of the model is 99.13% for the training set and 95.30% for the testing set. After using PCA_Jittering, the accuracy of the model is 99.64% for the training set and 95.60% for the testing set. PCA_Jittering improves the model’s training set accuracy and testing set accuracy by 0.51% and 0.30%, respectively, but the training set accuracy is substantially higher than the testing set accuracy and the model has significant overfitting.

4.2. Simulation Analysis of Transfer Learning and Segmentation Training

Overfitting makes the model more accurate on the training set, but less accurate on the testing set. To reduce the overfitting of the model, transfer learning is used in this paper to train the model. Transfer learning transfers the parameters of Alexnet trained on the large dataset ImageNet to the target model, and the target model is trained using the wildfire image dataset. The value of batch_size is set to eight for this experiment. The Adam optimizer is used to update the model parameters of the CNN and the initial learning rate of Adam is set to 0.00001.
After 60 epochs of training, the accuracy of the model is 99.65% in the training set and 97.35% in the testing set. The accuracy of the testing set is improved by 1.75% compared to that before transfer learning, but the accuracy of the training set is higher than the accuracy of the testing set, and the model still has significant overfitting. The experimental results of transfer learning on the training set and testing set are shown in Figure 12. The results of trials with and without transfer learning are shown in Table 5.
To further reduce the overfitting of the model, the model is trained using a segmented training approach based on transfer learning. The transfer learning model is first created using a dataset with a resolution of 64 × 64. Next, the resolution of the dataset is increased to 128 × 128 and the parameters from the model with a dataset resolution of 64 × 64 are transferred to the new model for training. Finally, the resolution of the dataset is increased to 227 × 227 and the parameters from the model with a dataset resolution of 128 × 128 are transferred to the new model for training.
After using segmented training, there are two extremely substantial slopes in the accuracy curve of the model’s training set, which is due to the fact that the model requires an adaptation process to the change in resolution. After the adaptation process, the accuracy of the model is higher than that of the model before using segmented training, reaching 99.965%. The experimental results for the training set are shown in Figure 13.
Compared to the accuracy of the testing set before using segmented training, the accuracy of the model using segmented training improved by 1.095% to 98.45%, which further reduces the overfitting of the model and improves the accuracy of the model. The experimental results for the testing set are shown in Figure 14.

4.3. Simulation Analysis of the Two-Channel CNN

A two-channel CNN model is designed to achieve effective model coverage of fire scenes of different sizes. The two-channel CNN has two convolution pooling layer channels. The input resolutions of the two channels are set to 227 × 227 and 336 × 336, respectively, and the model is trained using transfer learning and segmented training. The two-channel CNN extracts fire features through respective convolution pooling layers C1C5, and after the features are extracted, the features are flattened and then fused in a 1:1 ratio. Finally, the fully connected layers of the original model are replaced using SVM.
Since the classification performance of SVM is affected by the penalty factor C and the kernel function parameter σ , multiple sets of parameters are set in this paper to find the optimal parameters for the SVM. The penalty factor C of the SVM is set to 1, 2, 3, 4, 5, 6, 7, and 8, and the kernel function parameter σ of the SVM is set to 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, and 100 by this paper. The highest classification accuracy of 98.525% is obtained when the penalty factor C is 5 and the parameter of the Gaussian radial basis kernel function σ is 0.001. The results for multiple sets of parameters are shown in Table 6.
The testing set accuracy of the single-channel CNN trained using transfer learning and segmentation is 98.45%, while the testing set accuracy of the two-channel CNN reaches 98.52%. The two-channel CNN improves the accuracy of the model by 0.07%. The comparison of the testing set accuracy is shown in Table 7.

4.4. Simulation Analysis of the Improved Two-Channel CNN

Although the model accuracy of the two-channel CNN is high, the delay time of the model is large, and the model needs to be accelerated. Therefore, an improved two-channel CNN is designed, and Lasso_SVM is chosen to replace the SVM in the original model so as to achieve model compression.
When Lasso’s regularisation parameter λ is too large, almost all of the model’s features are compressed to zero and the model loses its ability to identify fires. When Lasso’s regularisation parameter λ is too small, the feature parameters of the model are barely compressed and although the model maintains the ability to identify fires, the delay time cannot be reduced. When Lasso’s regularisation parameter λ is appropriate, an appropriate number of features can be compressed to zero and the delay time of the model can be greatly reduced with little or no reduction in model accuracy.
Since Lasso’s regularisation parameter λ has a large impact on model performance, different regularisation parameters are set in this paper to test the impact of regularisation parameter λ on the model performance. Lasso’s regularisation parameter λ is first set to 0.1 and then to 0.01, 0.001, 0.0001 and 0.00001 for each trial. When the regularization parameter λ is 0.1, the model accuracy is 0% and the model loses its ability to identify fires; the highest accuracy of 98.475% is achieved when the regularisation parameter λ is 0.0001 and the model has a short average delay time of 0.051 s/frame. Therefore, the regularization parameter λ for Lasso is set to 0.0001. The effects of different regularization parameter sizes on model performance are shown in Table 8.
The comparison between the model before improvement and the model after improvement is shown in Table 9. The average delay time of the two-channel CNN is 0.11 s/frame, and the average delay time of the improved two-channel CNN is 0.051 s/frame. The delay time of the improved model is reduced by 53.64% compared to the model before the improvement. The model accuracy of the improved two-channel CNN is 98.47%, which is 0.05% lower than that before the improvement. The improved two-channel CNN reduces the model latency by 53.64% with 0.05% accuracy loss.
To better reflect the performance of the improved two-channel CNN, a confusion matrix is used to represent the model’s ability to identify fires. A confusion matrix is a specific matrix used in visualisation algorithms, where each column represents the predicted label and each row represents the true label. The confusion matrix of the improved two-channel CNN is shown in Figure 15, where only 61 out of 4000 wildfire image samples are incorrectly identified.
This paper shows some of the images where identification failed and identifies the key features that led to the failure. Figure 16 shows the identification of non-fire images as fire images. It can be seen that both images share the common feature of having sunlight interference in a conventional forest scene, so that sunlight has a greater impact on wildfire identification.
Figure 17 shows the identification of fire images as non-fire images. It can be seen that both images share the common feature that they are both fire tornadoes. Since fire tornadoes differ significantly from conventional wildfire, the wildfire identification model designed in this paper does not perform well in identifying them.
In this paper, the improved two-channel CNN is compared with two different identification algorithms to verify the performance of this model. The two different identification algorithms are both deep learning algorithms, including VGG16 [25] and Resnet50 [26]. The test results are shown in Table 10. Among these algorithms, the improved two-channel CNN achieves the highest accuracy of 98.47% and the lowest latency of 0.051 s/frame.

5. Discussion

Wildfires cause a great deal of damage each year. Compared to other objects with a fixed form, wildfires are more difficult to identify because of their irregular forms. The variety of shapes, sizes, textures and colours of wildfires makes fire evolution a complex process and makes fire identification more difficult. Therefore, the accurate identification of wildfires is of great importance.
In this paper, we propose a two-channel CNN for the problem of difficult wildfire identification, which achieves high accuracy, and improve the model to reduce the delay time of CNN for the problem of large delay. Therefore, the model proposed in this paper achieves high accuracy and low latency in wildfire identification, which will provide greater possibilities for wildfire identification applications.
The wildfire identification model proposed in this paper is compared with other models in the field. The model proposed in [27] achieved an accuracy of 97.6% in identifying wildfires due to the use of GAN to generate high-quality images and a two-stage model. The model led to a decrease in the false alarm rate, but the identification was more influenced by the first stage model and the delay time was high, reaching 0.7 s/frame, which could not achieve real-time identification of wildfires. The model proposed in [28] achieved an accuracy of 90.7% for wildfire identification. The model extracted suspected fire regions first and then implemented wildfire identification, avoiding ineffective feature learning and reducing the training time. Since there were fewer feature maps in the suspected fire regions, the latency was not large. However, this model was prone to misclassification as it only trained the suspected fire regions without focusing on the environmental features in the vicinity of fire. Unlike the models mentioned above, the wildfire identification model designed in this paper has both high accuracy and low latency, with an identification accuracy of 98.47% and a latency of 0.051 s/frame. However, the model still has some shortcomings, such as not having a large number of interfering images that would cause identification failure, and the fact that the backbone network is not dedicated to wildfire identification.
For further improvement, we plan to further improve the quality and quantity of wildfire images and add interfering images in the training set that tend to cause identification failure, as the training data directly determines identification performance. Another improvement will be to design the structure of the backbone networks and modify them to ensure that they are specifically designed for wildfire identification missions. In addition, we will look at wildfire location and spread prediction to minimise the damage caused by wildfires.

6. Conclusions

A new wildfire identification method is proposed in this paper to improve the accuracy and reduce the latency of wildfire identification. Based on this method, simulation analysis of the accuracy and delay time of the model is carried out. The following main conclusions can be drawn:
(1).
A dataset for the wildfire identification model is built and implementation details for achieving the extension of the training set are described. The results of the study provide an important reference value for the development and implementation of similar wildfire identification models.
(2).
The accuracy of the wildfire identification model is achieved by introducing transfer learning and segmentation training into the proposed wildfire identification model. The results of the study can provide an important reference value for similar identification models to improve accuracy.
(3).
The use of a two-channel CNN to identify wildfires and the introduction of Lasso are proposed to achieve high accuracy and low latency in the wildfire identification model. The larger the regularisation factor, the lower the latency of the model, but the lower the accuracy of the model at the same time.
(4).
The delay time and confusion matrix of the wildfire identification model verifies the accuracy and reliability of the model. The accuracy of the model is 98.47% and the delay time is 0.051 s/frame, which is better compared with other models. These results show that the model has relatively good realistic results for wildfire identification.
(5).
The influence of sunlight and fire tornadoes on the wildfire identification model is significant and may lead to misclassification of the model.

Author Contributions

Conceptualization, Y.-Q.G. and G.C.; methodology, G.C.; software, G.C.; validation, Y.-N.W., X.-M.Z. and Z.-D.X.; formal analysis, G.C.; investigation, Y.-N.W.; resources, Z.-D.X.; data curation, G.C.; writing—original draft preparation, G.C.; writing—review and editing, Y.-Q.G.; visualization, X.-M.Z.; supervision, Y.-Q.G.; project administration, Z.-D.X.; funding acquisition, Y.-Q.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Program on Key R&D Project of China under Grant 2020YFB2103500 and the Key Research and Development Program of Anhui Province under Grant 202104g01020002.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Xu, R.J.; Lin, H.F.; Lu, K.J.; Cao, L.; Liu, Y.F. A Forest Fire Detection System Based on Ensemble Learning. Forests 2021, 12, 217. [Google Scholar] [CrossRef]
  2. Celermajer, D.; Lyster, R.; Wardle, G.M.; Walmsley, R.; Couzens, E. The Australian bushfire disaster: How to avoid repeating this catastrophe for biodiversity. Wiley Interdiscip. Rev.-Clim. Chang. 2021, 12, e704. [Google Scholar] [CrossRef]
  3. Ma, W.Y.; Feng, Z.K.; Cheng, Z.X.; Chen, S.L.; Wang, F.G. Identifying Forest Fire Driving Factors and Related Impacts in China Using Random Forest Algorithm. Forests 2020, 11, 507. [Google Scholar] [CrossRef]
  4. Wang, S.Y.; Zhao, J.; Ta, N.; Zhao, X.Y.; Xiao, M.X.; Wei, H.C. A real-time deep learning forest fire monitoring algorithm based on an improved Pruned plus KD model. J. Real-Time Image Process 2021, 18, 2319–2329. [Google Scholar] [CrossRef]
  5. Kong, S.G.; Jin, D.; Li, S.; Kim, H. Fast fire flame detection in surveillance video using logistic regression and temporal smoothing. Fire Saf. J. 2016, 79, 37–43. [Google Scholar] [CrossRef]
  6. Mei, J.J.; Zhang, W. Early Fire Detection Algorithm Based on ViBe and Machine Learning. Acta Opt. Sin. 2018, 38, 0710001. [Google Scholar]
  7. Dimitropoulos, K.; Barmpoutis, P.; Grammalidis, N. Higher Order Linear Dynamical Systems for Smoke Detection in Video Surveillance Applications. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 1143–1154. [Google Scholar] [CrossRef]
  8. Gong, F.M.; Li, C.T.; Gong, W.J.; Li, X.; Yuan, X.B.; Ma, Y.H.; Song, T. A Real-Time Fire Detection Method from Video with Multifeature Fusion. Comput. Intell. Neurosci. 2019, 2019, 1939171. [Google Scholar] [CrossRef]
  9. Prema, C.E.; Vinsley, S.S.; Suresh, S. Multi Feature Analysis of Smoke in YUV Color Space for Early Forest Fire Detection. Fire Technol. 2016, 52, 1319–1342. [Google Scholar] [CrossRef]
  10. Han, X.F.; Jin, J.S.; Wang, M.J.; Jiang, W.; Gao, L.; Xiao, L.P. Video fire detection based on Gaussian Mixture Model and multi-color features. Signal Image Video Process 2017, 11, 1419–1425. [Google Scholar] [CrossRef]
  11. Chu, Y.Y.; Liang, D.; Chu, Y.Y.; Liang, D.; Kodur, V.K.R. A Probabilistic Inferential Algorithm to Determine Fire Source Location Based on Inversion of Multidimensional Fire Parameters. Fire Technol. 2017, 53, 1077–1100. [Google Scholar] [CrossRef]
  12. Li, P.; Zhao, W.D. Image fire detection algorithms based on convolutional neural networks. Case Stud. Therm. Eng. 2020, 19, 100625. [Google Scholar] [CrossRef]
  13. Zhang, Q.J.; Xu, J.L.; Xu, L.; Guo, H.F. Deep Convolutional Neural Networks for Forest Fire Detection. In Proceedings of the International Forum on Management, Education and Information Technology Application (IFMEITA), Guangzhou, China, 30–31 January 2016; pp. 568–575. [Google Scholar]
  14. Kim, B.; Lee, J. A Video-Based Fire Detection Using Deep Learning Models. Appl. Sci. 2019, 9, 2862. [Google Scholar] [CrossRef]
  15. Zhang, J.M.; Zhu, H.Q.; Wang, P.Y.; Ling, X.F. ATT Squeeze U-Net: A Lightweight Network for Forest Fire Detection and Recognition. IEEE Access 2021, 9, 10858–10870. [Google Scholar] [CrossRef]
  16. Muhammad, K.; Ahmad, J.; Lv, Z.H.; Bellavista, P.; Yang, P.; Baik, S.W. Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Applications. IEEE Trans. Syst. Man Cybern. -Syst. 2019, 49, 1419–1434. [Google Scholar] [CrossRef]
  17. Han, J.; Kim, G.; Lee, C.; Hwang, U.; Han, Y.; Kim, S. Predictive Models of Fire via Deep learning Exploiting Colorific Variation. In Proceedings of the 1st International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Okinawa, Japan, 11–13 February 2019; pp. 579–581. [Google Scholar]
  18. Wu, X.H.; Lu, X.B.; Leung, H. An Adaptive Threshold Deep Learning Method for Fire and Smoke Detection. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 1954–1959. [Google Scholar]
  19. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. Acm 2017, 60, 84–90. [Google Scholar] [CrossRef]
  20. Dubey, A.K.; Jain, V. Comparative Study of Convolution Neural Network’s Relu and Leaky-Relu Activation Functions. In Applications of Computing, Automation and Wireless Systems in Electrical Engineering; Mishra, S., Sood, Y., Tomar, A., Eds.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2019; Volume 553. [Google Scholar]
  21. Aljarah, I.; Al-Zoubi, A.M.; Faris, H.; Hassonah, M.A.; Mirjalili, S.; Saadeh, H. Simultaneous Feature Selection and Support Vector Machine Optimization Using the Grasshopper Optimization Algorithm. Cogn. Comput. 2018, 10, 478–495. [Google Scholar] [CrossRef]
  22. Müller, A.C.; Guido, S. Introduction to Machine Learning with Python; Posts and Telecommunications Press: Beijing, China, 2018; pp. 44–68. [Google Scholar]
  23. Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B-Stat. Methodol. 2011, 73, 273–282. [Google Scholar] [CrossRef]
  24. Lu, H.; Xu, Z.D.; Iseley, T.; Matthews, J.C. Novel data-driven framework for predicting residual strength of corroded pipelines. J. Pipeline Syst. Eng. Pract. 2021, 12, 04021045. [Google Scholar] [CrossRef]
  25. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  26. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  27. Liu, Z.C.; Zhang, K.; Wang, C.Y.; Huang, S.Y. Research on the identification method for the forest fire based on deep learning. Optik 2020, 223, 165491. [Google Scholar] [CrossRef]
  28. Wang, Y.B.; Dang, L.F.; Ren, J.Y. Forest fire image recognition based on convolutional neural network. J. Algorithms Comput. Technol. 2019, 13, 1748302619887689. [Google Scholar] [CrossRef]
Figure 1. Images of wildfire scenes.
Figure 1. Images of wildfire scenes.
Forests 13 01302 g001
Figure 2. Images of conventional forest scenes.
Figure 2. Images of conventional forest scenes.
Forests 13 01302 g002
Figure 3. Images of sun scenes.
Figure 3. Images of sun scenes.
Forests 13 01302 g003
Figure 4. The flowchart of PCA_Jittering.
Figure 4. The flowchart of PCA_Jittering.
Forests 13 01302 g004
Figure 5. Effects of using PCA_Jittering image enhancement. (a) Conventional forest scene before using PCA_Jittering; (b) conventional forest scene after using PCA_Jittering; (c) wildfire scene before using PCA_Jittering; (d) wildfire scene after using PCA_Jittering; (e) sun scene before using PCA_Jittering; (f) sun scene after using PCA_Jittering.
Figure 5. Effects of using PCA_Jittering image enhancement. (a) Conventional forest scene before using PCA_Jittering; (b) conventional forest scene after using PCA_Jittering; (c) wildfire scene before using PCA_Jittering; (d) wildfire scene after using PCA_Jittering; (e) sun scene before using PCA_Jittering; (f) sun scene after using PCA_Jittering.
Forests 13 01302 g005
Figure 6. The structure of Alexnet.
Figure 6. The structure of Alexnet.
Forests 13 01302 g006
Figure 7. Effect of using maximum pooling layer. (a) Characteristics before using maximum pooling; (b) characteristics after using maximum pooling.
Figure 7. Effect of using maximum pooling layer. (a) Characteristics before using maximum pooling; (b) characteristics after using maximum pooling.
Forests 13 01302 g007
Figure 8. Diagram of parameters transfer.
Figure 8. Diagram of parameters transfer.
Forests 13 01302 g008
Figure 9. The new fully connected layers.
Figure 9. The new fully connected layers.
Forests 13 01302 g009
Figure 10. The structure of the two-channel CNN.
Figure 10. The structure of the two-channel CNN.
Forests 13 01302 g010
Figure 11. The comparison of loss values in the training set before and after using PCA_Jittering.
Figure 11. The comparison of loss values in the training set before and after using PCA_Jittering.
Forests 13 01302 g011
Figure 12. The experimental results of transfer learning on the training set and testing set.
Figure 12. The experimental results of transfer learning on the training set and testing set.
Forests 13 01302 g012
Figure 13. Accuracy curve of the training set before and after using segmented training.
Figure 13. Accuracy curve of the training set before and after using segmented training.
Forests 13 01302 g013
Figure 14. Accuracy curves of the testing set before and after using segmented training.
Figure 14. Accuracy curves of the testing set before and after using segmented training.
Forests 13 01302 g014
Figure 15. Confusion matrix.
Figure 15. Confusion matrix.
Forests 13 01302 g015
Figure 16. Identification of non-fire images as fire images.
Figure 16. Identification of non-fire images as fire images.
Forests 13 01302 g016
Figure 17. Identification of fire images as non-fire images.
Figure 17. Identification of fire images as non-fire images.
Forests 13 01302 g017
Table 1. The comparisons between two wildfire feature extraction methods.
Table 1. The comparisons between two wildfire feature extraction methods.
MethodsComparisons
Traditional methodsRely on manual extraction of features, which is highly subjective;
Rely on priori knowledges to extract features;
The extracted features are relatively simple;
The method has a low generalization capability.
Deep learning methodsFeatures can be extracted automatically with low subjectivity;
Feature extraction without reliance on prior knowledge;
Deep features can be extracted;
The method has a strong generalization capability.
Table 2. Network settings for the new model.
Table 2. Network settings for the new model.
TypeInput SizeKernel SizeKernel NumOutput SizeStridePadding
Conv1227 × 227 × 311 × 116456 × 56 × 6442
Pool156 × 56 × 643 × 3none27 × 27 × 6420
Conv227 × 27 × 645 × 519227 × 27 × 19212
Pool227 × 27 × 1923 × 3none13 × 13 × 19220
Conv313 × 13 × 1923 × 338413 × 13 × 38411
Conv413 × 13 × 3843 × 325613 × 13 × 25611
Conv513 × 13 × 2563 × 325613 × 13 × 25611
Pool313 × 13 × 2563 × 3none6 × 6 × 25620
Fc6_new9216nonenone4096nonenone
Fc7_new4096nonenone4096nonenone
Fc8_new4096nonenone2nonenone
Table 3. Effects of initial learning rates on model accuracy.
Table 3. Effects of initial learning rates on model accuracy.
Initial Learning RateTraining Set Accuracy (%)Testing Set Accuracy (%)
0.149.8450.00
0.0149.4850.00
0.00149.5850.00
0.000199.0095.33
0.0000199.6495.60
0.00000196.7094.38
Table 4. The comparison of model accuracy before and after using PCA_Jittering.
Table 4. The comparison of model accuracy before and after using PCA_Jittering.
ModelsTraining Accuracy (%)Test Accuracy (%)
Alexnet99.1395.30
PCA_Jittering + Alexnet99.6495.60
Table 5. Comparison of results of transfer learning trials.
Table 5. Comparison of results of transfer learning trials.
ModelsTesting Set Accuracy (%)
Before using Transfer Learning95.60%
After using Transfer Learning97.35%
Table 6. Test results for multiple sets of parameters.
Table 6. Test results for multiple sets of parameters.
σ /C12345678
0.0000193.47%95.17%96.12%96.47%96.67%96.92%96.97%97.07%
0.000197.20%97.70%97.80%97.97%98.02%98.15%98.17%98.25%
0.00198.25%98.40%98.50%98.50%98.52%98.50%98.47%98.47%
0.0198.17%98.15%98.15%98.15%98.15%98.15%98.15%98.15%
0.153.82%54.15%54.15%54.15%54.15%54.15%54.15%54.15%
151.57%51.90%51.90%51.90%51.90%51.90%51.90%51.90%
1050.50%50.57%50.57%50.57%50.57%50.57%50.57%50.57%
10050.20%50.20%50.20%50.20%50.20%50.20%50.20%50.20%
Table 7. The comparison of the testing set accuracy.
Table 7. The comparison of the testing set accuracy.
ModelsTesting Set Accuracy (%)
The single-channel CNN98.45
The two-channel CNN98.52
Table 8. Effects of different regularisation parameter sizes on model performance.
Table 8. Effects of different regularisation parameter sizes on model performance.
Size of the Regularization Parameter λModel Accuracy (%)Average Delay Time (s/Frame)
0.10.000.000
0.0196.320.046
0.00198.200.048
0.000198.470.051
0.0000198.450.082
Table 9. Comparison of model performance before and after improvement.
Table 9. Comparison of model performance before and after improvement.
ModelsAverage Delay Time (s/Frame)Model Accuracy (%)
Two-channel CNN0.11098.52
Improved two-channel CNN0.05198.47
Table 10. Comparison between different models.
Table 10. Comparison between different models.
ModelsAverage Delay Time (s/Frame)Model Accuracy (%)
Improved two-channel CNN0.05198.47
VGG160.09395.87
Resnet500.06497.77
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Guo, Y.-Q.; Chen, G.; Wang, Y.-N.; Zha, X.-M.; Xu, Z.-D. Wildfire Identification Based on an Improved Two-Channel Convolutional Neural Network. Forests 2022, 13, 1302. https://doi.org/10.3390/f13081302

AMA Style

Guo Y-Q, Chen G, Wang Y-N, Zha X-M, Xu Z-D. Wildfire Identification Based on an Improved Two-Channel Convolutional Neural Network. Forests. 2022; 13(8):1302. https://doi.org/10.3390/f13081302

Chicago/Turabian Style

Guo, Ying-Qing, Gang Chen, Yi-Na Wang, Xiu-Mei Zha, and Zhao-Dong Xu. 2022. "Wildfire Identification Based on an Improved Two-Channel Convolutional Neural Network" Forests 13, no. 8: 1302. https://doi.org/10.3390/f13081302

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop