Wildfire is a destructive natural disaster that poses serious threats for human lives, property, and ecosystems [1
]. The smoke emitted by biomass burning perturbs the atmospheric radiation balance, air quality, and ecological energy budget [2
]. However, fire smoke can also be a significant signal of biomass burning, which plays an important role in wildfire detection. Therefore, improved smoke detection is important in the identification of new fires, as well as in the subsequent fire rescue and emergency management [4
]. With the rapid development in the past decades, satellite remote sensing brings a great opportunity for this task with the advantages of timeliness, wide observation, and low cost [5
]. Nevertheless, the identification of smoke using satellite data is challenging because the fire smoke has varying shapes, colors, scopes, and spectral overlaps [2
]. This makes it difficult to distinguish smoke from similar disasters and complex land cover types, such as clouds, dust, haze, and so on.
On the basis of the differences between smoke and some typical land cover types, a variety of smoke detection methods were developed. Visual interpretation is the most commonly used method to identify smoke. It utilizes three spectral bands of satellite sensor as the three channels of red, green, and blue (RGB) to generate the true-color or false-color composition images [7
], for example, using bands 1, 4, and 3 of Moderate Resolution Imaging Spectroradiometer (MODIS) to yield the true-color RGB image [10
], or bands 1, 2, and 4 of Advanced Very High Resolution Radiometer (AVHRR) to form the false-color RGB image [9
]. This method can be used for manual visual discrimination of smoke, whereas it cannot automatically process massive data [2
]. Another popular method is the multi-threshold method [2
], which extracts the regionally optimal thresholds for the reflectance or brightness temperature (BT) of the fixed spectral bands based on historical data, and then combines them to exclude cloud class as well as some land covers, and finally identifies smoke. Xie et al. [12
] developed a set of thresholds to discriminate smoke pixels and Wang et al. [13
] modified them using MODIS data. Zhao at el. [3
] investigated different detecting schemes for the smoke above the land and ocean using spectral and spatial threshold sets. Despite that the fixed thresholds for multiple bands may be valid in local regions, it is difficult to determine the optimal thresholds because of the spatial and temporal variations [6
]. This can lead to the small regions/areas of smoke being neglected, which reduces the timeliness of fire alarm. In addition, Li et al. [14
] developed an automatic algorithm to detect smoke area by K-means clustering and fisher linear discrimination, and Li et al. [2
] explored the neural network to identify smoke pixels in the image. These methods used the training samples of a few classes besides smoke, such as cloud, water, and vegetation. However, the actual classes in satellite imagery are more complicated. When the methods that only considered these few typical classes are applied across regions, the effectiveness and applicability of these methods will be reduced.
Unlike the previous investigations that mainly focused on the pixel-level smoke detection, the objective of this study is to identify the images containing wildfire smoke, that is, the image-level smoke detection. The scene classification task aims to interpret a satellite image with a semantic label [15
], which contributes to the smoke scenes’ discrimination and wildfire identification. In recent years, deep learning techniques have made impressive achievements in computer vision and image processing, which brings new momentum into the field of remote sensing [17
]. A large amount of satellite data provides a unique opportunity to use the deep learning method in the application of smoke scene identification. Nevertheless, existing aerial or satellite datasets for scene classification mainly focus on the land-use types. For example, UC-Merced Dataset [19
], WHU-RS Dataset [20
], RSSCN7 Dataset [21
], and AID Dataset [22
] contain high-resolution aerial images of the specific land-use types, such as airplane, baseball diamond, bare land, beach, desert, farmland, forest, storage tanks, pond, river, and so on. However, these datasets do not involve the classes related to wildfire. Previous smoke detection datasets are mainly collected from surveillance cameras. The smoke image dataset used in the work of [23
] was constructed with 1220 smoke images and 1648 non-smoke images. The smoke image datasets consisting of the real smoke images from videos and synthetic smoke images generated from rendering with the real background were introduced in the works of [24
]. Moreover, smoke video datasets in the works of [27
] provide smoke and non-smoke videos. Some smoke videos were also introduced in the dynamic scene dataset DynTex [29
], which is composed of over 650 sequences of dynamic textures. The dataset in the work of [30
] has 11 classes with each class containing more than 50 videos. These smoke datasets composed of images or videos are acquired from the conventional cameras, which can be used in the surveillance application. However, the close observation scenes in these datasets are very different from the satellite observations in texture, color, background, and so on. In addition, the advantages of wide observation, low cost, and timeliness of satellite sensors to detect smoke scenes are valuable for wildfire detection. As far as we know, there is no satellite remote sensing smoke detection dataset so far. This motivates us to construct a new large-scale smoke dataset with satellite imageries to tackle this dilemma. Therefore, we collected a new dataset using MODIS data, namely USTC_SmokeRS, by considering smoke-related classes. The dataset contains thousands of satellite imageries from fire smoke and several classes that are visually very close to smoke, such as cloud, dust, and haze with various natural backgrounds. As the image label is easier to determine than the pixel label, we constructed the new dataset and developed a model used for smoke scene classification, which is deemed as the first stage of fire detection. The pixel-level smoke dataset will also be constructed in future work.
Scene classification has drawn increasing attention in the last decade. Traditional methods adopted the bag-of-visual-words (BOVW) approach to the land-use scenes with the features extracted by scale invariant feature transform (SIFT) method [19
]. Deep belief network (DBN) [21
] and sparse autoencoder (SA) [32
] were also used to capture the representative features of different classes. Recently, the convolutional neural network (CNN) has been employed to the scene classification task and achieved the state-of-the-art performance. A variety of networks have been developed such as AlexNet [16
], VGGNet [33
], GoogLeNet [34
], ResNet [35
], and DenseNet [36
] to push the state-of-the-art of image classification on benchmark datasets like ImageNet and CIFAR-10/100 [37
]. In addition, the visual attention mechanism inspired by the human perception was developed to selectively utilize the informative features during the image processing, which can enhance the representation capacity of networks [39
]. With the successes of the attention mechanism in machine translation [41
] and object detection [42
], it has been introduced into a wider range of applications such as image captioning [43
], video [45
], and scene classification [46
]. For the scene classification task, Wang et al. [47
] proposed the residual attention module that incorporates the residual learning idea [35
] and the bottom-up top-down feedforward structure [49
] to obtain the attention-aware features. Multiple modules were stacked to generate the residual attention network. In contrast, Hu et al. [48
] focused on the channel-wise relationship and designed the squeeze-and-excitation (SE) block with a lightweight gating mechanism. The SE-Net stacked by multiple SE blocks can perform recalibration of channel-wise features to improve the network capacity. Although these attention-based models performed well [47
] on some datasets, it is necessary to further exploit the most informative spatial features to improve the classification results. As each image is assigned with a label in the scene classification task, the target object located in partial regions of the image should be allocated with the most responsive receptive fields [42
]. Hence, the effective network design for comprehensive use of spatial and channel-wise attention is of critical importance to improve the model performance.
This study presents a new smoke detection model (SmokeNet) that fully exploits the spatial and channel-wise attentive information to identify smoke scenes. Different from the spatial attention used in the work of [47
], we propose a bottleneck gating mechanism of spatial attention that can generate spatial attentive features. On the basis of the new USTC_SmokeRS dataset, we examine the performance of SmokeNet and compare it with the state-of-the-art models. In summary, the contributions of this work are three-fold.
We construct a new satellite imagery dataset based on MODIS data for smoke scene detection. It consists of 6225 RGB images from six classes. This dataset will be released as the benchmark dataset for smoke scene detection with satellite remote sensing.
We improve the spatial attention mechanism in a deep learning network for scene classification. The SmokeNet model with spatial and channel-wise attention is proposed to identify the smoke scenes.
Experimental results on the new dataset show that the proposed model outperforms the state-of-the-art methods.
The remainder of this paper is organized as follows. Section 2
introduces the new dataset and the proposed model. Section 3
reports the experimental results. Section 4
provides the results analysis and discussion. Finally, Section 5
makes a few concluding remarks.
The classification results and evaluating visualizations indicate that the proposed SmokeNet model can achieve higher accuracy than the state-of-the-art models for scene classification on the USTC_SmokeRS dataset. For the purpose of remote sensing-based wildfire detection, we tried our best to collect thousands of satellite imageries of fire smoke and five smoke-related classes. This can solve the dilemma of lacking wildfire-related dataset. Meanwhile, the proposed SmokeNet has the advantage of integrating spatial channel-wise attention [48
] and residual attention modules [47
] to fully exploit the spatial class-discriminative features. The new dataset and model can effectively assist the identification of wildfire smoke, dust, and haze using satellite remote sensing.
In order to achieve the prompt detection of wildfire, we focus on the identification for the important wildfire signal, that is, smoke [57
]. As mentioned in the works of [18
], scene classification can automatically interpret an image with a specific class. From this point of view, unlike the previous smoke detection research [2
], this paper aims to recognize the images containing the wildfire smoke, which is of critical importance to the rapid wildfire detection. However, existing aerial datasets for scene classification mainly concentrate on the specific land-use types, as shown in the works of [19
], while the smoke-related datasets [23
] were collected from the conventional cameras with close observation. These datasets cannot meet the demand of our task. This prompts us to delve into the long-term process of disaster data collection and processing. Optical satellite sensors may cover multi- and hyper-spectral domain [17
], whereas the true-color RGB images generated from the visible spectral domain of red, green, and blue are commonly used for visual interpretation [6
]. To ensure the applicability of the proposed model in different satellite sensors, RGB images were used instead of multispectral data to develop the smoke detection algorithm. In the future, we can also explore the use of CNN pre-trained parameters trained on large-scale RGB datasets to further improve the classification results of this task, which proves to be effective in the works of [33
]. Therefore, we constructed the USTC_SmokeRS dataset with the true-color MODIS RGB images.
On the basis of the practical situation of fire smoke detection and the actual classes in the satellite imageries, we integrated more smoke-like aerosol classes and land covers in this new dataset, for example, cloud, dust, haze, bright surfaces, lakes, seaside, vegetation, and so on. This is an important improvement in comparison with the previous research [2
], which considered only a few specific classes. Although this increases the difficulty of classification, it is very important for practical smoke detection. It is also worth noting that the images of each class contain a variety of land covers despite that there are six classes in the dataset. For example, the images of land class may consist of different classes of vegetation, surfaces, lakes, or mountains. The high inter-class similarities and small intra-class variations of the dataset make it more challenging to effectively distinguish the smoke scenes from other classes.
Uncertainties, Errors, and Accuracies: The effectiveness of the state-of-the-art and proposed models were validated on the new dataset. Given that the disaster images are more difficult to obtain as compared with the ordinary land-use types, we set up the experimental protocols with four different proportions of training images as described in Section 2.3
. The experimental results illustrate that the proposed SmokeNet model outperforms other models trained with different numbers of training images. This is because SmokeNet can not only dynamically recalibrate the channel-wise features and generate the attention-aware features, but also optimize the representation of spatial information using the proposed spatial attention mechanism. The advantages were confirmed by the results in Figure 4
and Table 6
. This refinement can effectively boost the spatial-aware capacity of network so as to take advantage of the class-specific features in an image for scene classification. To verify this statement, we also show the attention visualizations of different models in Figure 5
and Figure 6
. Moreover, the proposed SmokeNet can process and identify around 20 images per second with a GeForce RTX 2080Ti GPU, which ensures the recognition speed of fire smoke in practical application. In this paper, the spatially explicit tests were not performed, but the spatial differences can be further explored in future work using the images divided according to the specific geographic regions.
In summary, the proposed SmokeNet demonstrated better capacity for smoke scene detection by merging the spatial and channel-wise attention. The new dataset collected in this study is also instrumental for researchers and practitioners working in the field of wildfire detection and remote sensing. In addition to the MODIS data, the proposed method has promising application prospects to identify fire smoke in the RGB images of other satellite sensors, such as the Advanced Himawari Imager (AHI) data of Himawari-8 satellite, Operational Land Imager (OLI) data of Landsat-8 satellite, Visible Infrared Imaging Radiometer Suite (VIIRS) data of Suomi National Polar-orbiting Partnership (S-NPP) satellite, Geostationary Operational Environmental Satellite (GOES) data, Sentinel satellite data, and GaoFen-4 (GF-4) satellite data, among others. As the satellite spatial resolution influences the range of smoke and land cover features in an image when the satellite data are processed into the true-color RGB images, the transfer learning among the satellite RGB images with different spatial resolutions is worth further exploration. In the future, we will assess the effectiveness of our model in the application of various satellite RGB images with different spatial resolutions. In addition, considering of the abundant spectral bands in satellite sensors, we will develop new algorithms that can utilize multispectral data and RGB images simultaneously to solve the difficulties discussed in Figure 7
, thereby improving the classification results. Furthermore, the pixel-level smoke dataset will be constructed to help develop the model for the discrimination of smoke pixels and spreading areas in an image.