BoucaNet: A CNN-Transformer for Smoke Recognition on Remote Sensing Satellite Images

: Fire accidents cause alarming damage. They result in the loss of human lives, damage to property, and signiﬁcant ﬁnancial losses. Early ﬁre ignition detection systems, particularly smoke detection systems, play a crucial role in enabling effective ﬁreﬁghting efforts. In this paper, a novel DL (Deep Learning) method, namely BoucaNet, is introduced for recognizing smoke on satellite images while addressing the associated challenging limitations. BoucaNet combines the strengths of the deep CNN EfﬁcientNet v2 and the vision transformer EfﬁcientFormer v2 for identifying smoke, cloud, haze, dust, land, and seaside classes. Extensive results demonstrate that BoucaNet achieved high performance, with an accuracy of 93.67%, an F1-score of 93.64%, and an inference time of 0.16 seconds compared with baseline methods. BoucaNet also showed a robust ability to overcome challenges, including complex backgrounds; detecting small smoke zones; handling varying smoke features such as size, shape


Introduction
Fires cause severe damage to economies, properties, ecosystems, and human lives.They destroy properties, homes, and resources, leading to considerable financial losses, and contribute to ecological imbalances.For example, since 1990, wildfires have destroyed an average of 2.5 million hectares per year in Canada [1].In addition, over the past decade, the cost of firefighting in Canada ranged between $800 million and $1.5 billion a year [1].Since January 2023, 260,000 hectares have already burned in the European Union [2].Researchers have focused on developing fire ignition and early detection systems to reduce this alarming statistic and improve firefighting capabilities [3,4].Both smoke and fire detection systems are used to provide comprehensive early warning and fire protection.Fire detection systems are used to detect the presence of flames, while smoke detection systems are adopted to identify the first signs of smoke, even before flames are visible.
Recently, smoke recognition methods made significant progress by exploiting visible features captured by vision sensors [5].Additionally, classical machine learning methods, such as dynamic texture and optical flow, were employed to manually extract smoke features from images or videos.These extracted features were then used to identify the presence of smoke using various classifiers, such as SVM (Support Vector Machine), Random Forest, and AdaBoost.These approaches showed interesting efficiency, but were related to false alarms and the identification of relevant features that accurately represented the smoke recognition problem [5].
Deep learning models were successfully employed in many fields and industries [6,7].More specifically, they were used for fire ignition detection due to their ability to learn to automatically extract smoke features from large amounts of data.They provide diverse and informative feature maps, which are often better than manually generated features in terms of performance and robustness [8,9].More recently, satellite remote sensing images were adopted for this task, representing a great opportunity thanks to the advantages of satellite remote sensing, including timeliness and large coverage areas [10,11].
High false-alarm rates are still present due to background complexity; the variability of smoke regarding its size, intensity, and shape; and the presence of smoke-like objects, such as haze, dust, and clouds.These objects often have very similar textures, colors, shapes, and spectral features to smoke, leading to false results in detecting smoke.Therefore, this paper presents a novel ensemble learning method, namely BoucaNet, for recognizing smoke on remote sensing satellite images, addressing these challenging limitations.BoucaNet employed a vision transformer, EfficientFormer v2 [12], and a deep CNN (Convolutional Neural Network), EfficientNet v2 [13], to extract smoke features from satellite images.It was trained and evaluated using a satellite dataset, USTC_SmokeRS [14], which comprises six classes (smoke, cloud, haze, dust, seaside, and land).This paper presents three main contributions: 1.
A novel DL method, BoucaNet, is introduced to detect the presence of smoke in satellite images, thereby improving the performance of DL-based smoke classification methods.

2.
BoucaNet demonstrated a robust ability to handle challenging situations such as background complexity and dynamism; detecting small smoke areas; varying characteristics of smoke regarding its air concentration, flow pattern, intensity, shape, and color; and handling its visual similarity to haze, dust, and clouds.This ability reduces false alarms, making BoucaNet a reliable solution for smoke remote sensing applications with high accuracy.

3.
An optimized architecture is proposed in this study, achieving fast inference time, which is an important aspect in developing an early smoke-detection system.
The remainder of this paper is structured as follows: Section 2 presents state-of-the-art methods for smoke recognition using DL approaches.Section 3 introduces the proposed method, BoucaNet, and provides details about the satellite dataset, USTC_SmokeRS.Section 4 reports and discusses the experimental results of BoucaNet.Section 5 concludes the paper.

Related Works
Over the years, numerous DL methods were developed to improve the performance of smoke classification in different fields of application, as presented in Table 1.Among them, Tao et al. [15] suggested a simple CNN to recognize smoke in ground images, addressing challenging limitations such as varying smoke colors, shapes, and textures.The proposed CNN is a modified AlexNet [16] by changing the order of the max pooling layers and normalization layers, which follow the first and second convolutional layers.The modified AlexNet was trained and evaluated using the Yuan dataset (5695 smoke images and 18,522 non-smoke images) [17], resulting an accuracy of 96.88%.Yin et al. [18] proposed a new deep normalization CNN, namely DNCNN, to improve smoke detection performance.DNCNN incorporates batch normalization into convolutional layers to deal with overfitting and gradient dispersion.Data augmentation techniques (vertical flipping, rotation, and horizontal flipping) were also used to address the challenges of imbalanced data between smoke and non-smoke images (5695 smoke images and 18,522 non-smoke images [17]).Test results showed that DNCNN achieved an impressive performance with an accuracy of 98.08%, surpassing popular CNNs such as AlexNet, ZF-Net [19], and VGG-16 [20].Khan et al. [21] studied three CNN models (AlexNet, VGG-16, and GoogleNet [22]) to identify smoke in a normal and foggy IoT environment.Experimental tests were performed using a very large dataset, comprising 18,532 smoke images, 17,474 non-smoke images, 17,474 non-smoke images with fog, and 18,532 smoke images with fog.VGG-16 obtained the higher performance with an accuracy of 97.72% compared with AlexNet, GoogleNet, and published fire models, demonstrating its ability to detect smoke in a foggy environment.
Peng and Wand [23] proposed a video smoke detection method to recognize smoke in complex environments.First, a GMM (Gaussian Mixture Model) [24] was employed as an image processing method to extract the suspected smoke areas from images collected from surveillance cameras.Then, the SqueezeNet model [25] was adopted to detect the presence of smoke.Using a large dataset (25,000 smoke images and 25,000 non-smoke images), this proposed method showed a high performance with an accuracy of 97.12% and a high prediction time compared with existing wildfire models such as AlexNet, ShuffleNet [26], Xception [27], and MobileNet [28].Gu et al. [29] developed a DCNN (Deep Dual-Channel Neural Network) as a smoke recognition method.The DCNN is composed of two deep subnet channels, SBNN (Selective-based Batch Normalization Network) and SCNN (Skip Connection-based Neural Network).SBNN comprises six convolutional layers, four normalization layers, three max pooling layers, and three fully connected layers.SCNN includes eleven convolutional layers, seven normalization layers, three max pooling layers, and one global average pooling layer.DCNN was trained on large public learning data [17], comprising 5695 smoke images and 18,522 non-smoke images, and data augmentation techniques (rotation of 90, 180, and 270 degrees).It achieved an accuracy of 99.5%, higher than hand-crafted methods and state-of-the-art DL methods such as DNCNN [18], AlexNet, VGG, GoogLeNet, Xception, ResNet, etc.
Zhang et al. [30] presented a DL method, called DC-CNN (Dual-Channel Convolutional Neural Network), for detecting smoke.DC-CNN is composed of two channels.The first channel employs a pretrained AlexNet in extracting smoke features.The second channel is a simple CNN architecture, consisting of four convolutional layers, a pooling layer, and two fully connected layers for generating more advanced characteristics.Extensive studies were conducted using learning data, including 9794 smoke and 9794 non-smoke images, to handle the challenges related to smoke features, such as transparency properties, homogeneity, and visual similarity to clouds, steam, haze, and fog.DC-CNN obtained the highest accuracy of 99.33% compared with baseline DL models such as LeNet, AlexNet, VGG-16, and DNCNN [18].Jia et al. [31] designed a new method for detecting smoke in videos.Firstly, GMM-based domain knowledge of smoke was adopted to segment the suspected areas of smoke.Then, three pretrained deep learning models (AlexNet, Inception v3, and ResNet50 [32]) were used to recognize smoke.ResNet50 with GMM performed best, with an F1-score of 99.32% compared with the other models using 138 smoke videos as testing data.He et al. [33] proposed a DL method for smoke detection in a foggy environment.This method combines the VGG-16 method as a backbone to extract smoke features and an attention method, which consists of channel attention and spatial attention to improve the detection of small smoke areas.It was also trained and evaluated using 33,666 images (8342 smoke images, 8522 smoke with fog images, 8401 non-smoke images, and 8401 non-smoke with fog images).It achieved an F1-score of 99.97%, outperforming the AlexNet, VGG-16, and SqueezeNet methods.
Zhang et al. [34] developed an end-to-end CNN method to identify smoke.Two CNNs (spatial stream and temporal stream), each comprising five convolutional layers, three max pooling layers, and an attention module to suppress noise, and which extract salient features from temporal and spatial feature maps and improve detection performance, were adopted to extract the spatial and temporal features of smoke.This method achieved an accuracy of 96.8%, better than state-of-the-art methods using 116 fire videos and 89 nonfire videos.Cheng et al. [35] presented a deep convulational network, namely PACNN, to improve the robustness of smoke recognition tasks.PACNN is a deep CNN with a PAAModule (Pixel Aware Attention Module), which integrates into the residual structure via element-wise addition and skip connection on two feature maps.Testing results showed that PACNN reached a high accuracy of 98.91% compared with popular CNNs (AlexNet, Inception v4, ResNet34, SEResNet34, DenseNet-121, and DNCNN) and vision transformers (ViT, Swin-T, and DeiT-Ti) using the Yuan dataset.
Tao and Duan [36] introduced a video smoke recognition method, AFSNet, to address slow-moving smoke challenges.AFSNet is composed of three main modules: AFSM (Adaptative Frame Selection Module) for extracting multi-scale spatial and spatiotemporal features; FEM (Feature Extraction Module) for incorporating a context attention module, an enhanced dilated convolution module, and a spatiotemporal feature attention module to minimize the loss of detailed information; and RM (Recognition Module) for detecting smoke presence.AFSNet was trained on two large datasets, SRSet (14,100 smoke images and 15,380 non-smoke images) and RISE (12,567 videos).It achieved impressive F1-scores of 96.57% and 91.00% using the SRSet and RISE datasets, respectively, surpassing classical machine learning methods and existing deep learning models.Cheng et al. [37] proposed a novel vision transformer, called CViTNet (Convolution-enhanced Vision Transformer Network), for identifying smoke.CViTNet consists of three stages (s1, s2, and s3).The first stage, s1, comprises a convolutional stem and a ViT transformer encoder.Each of the s1 and s2 stages includes a ViT transformer encoder [38] and a convolutional token embedding, which was proposed to improve the multiscale feature representation of tokenization.Using the Yuan dataset, CViTNet achieved a high accuracy of 99.20% compared with existing CNNs (AlexNet, ResNet, SEResNet, DenseNet, DNCNN, etc.) and vision transformer methods (ViT-B, DeiT-S, conViT-Ti, Swin-T, etc.) [37].
In the study conducted by Mohammed [39], a pretrained InceptionResNet v2 model [40] was employed for the detection of forest smoke and fires.Mohammed utilized a dataset comprising aerial and ground images (1102 fire images and 1102 smoke images).Data augmentation methods, including scaling and horizontal/vertical flipping, were applied during the training phase.Testing results showed that InceptionResNet v2 achieved an impressive accuracy of 99.09%.Chen et al. [41] studied the effectiveness of five DL methods (LeNet5, VGG-16, ResNet18, MobileNet v2 [42], and Xception) for wildland smoke/fire recognition on aerial images.These models were trained using a large dataset comprising a total of 53,451 images, which were divided into three categories: 25,434 fire/smoke images, 14,317 fire/no-smoke images, and 13,700 no-fire/no-smoke images.VGG-16 obtained an accuracy of 99.91%, surpassing MobileNet v2, ResNet18, LeNet5, Xception, and a traditional machine learning method (Logistic Regression) by 0.56%, 1.52%, 4.58%, 5.35%, and 9.54%.
Dilshad et al. [43] proposed a fire detection model, E-FireNet, to recognize fires in a surveillance environment.E-FireNet is a modified VGG-16 by deleting block 5 and adjusting the convolutional layers of block 4. The experimental setup was performed using data augmentation techniques (horizontal flipping, rotation, and scaling).E-FireNet achieved an accuracy 98% better than that of the pretrained MobileNet v1, VGG-19, EfficientNet-B0, VGG-16, and NASNetMobile v1 models using the SV-Fire dataset (1500 images) [43].Yar et al. [44] developed a modified YOLO v5 method for detecting and locating fires in smart cities.A total of 1957 images, comprising indoor fires (118 images), building fires (723 images), and vehicle fires (1116 images), were used to train and evaluate the proposed model, achieving an F1-score of 84%.
Priya and Vani [45] introduced a CNN based on Inception v3 architecture [46] for the recognition of forest smoke/fires using satellite images.Their study utilized a dataset consisting of 534 satellite images, with 239 fire images and 295 no-fire images, for both training and testing purposes.Their proposed method achieved an accuracy of 98%.Ba et al. [14] also proposed a DL method, namely SmokeNet, to address the challenge of recognizing smoke on satellite data, including varying smoke features such as colors, shapes, and spectral overlaps.SmokeNet is a CNN model with channel-wise and spatial attention.A novel satellite dataset, namely USTC_SmokeRS, comprising 6225 satellite images divided into six classes (smoke, cloud, haze, dust, seaside, and land), was used in the training and testing phases.SmokeNet showed high performance with an accuracy of 92.75%.
As described in Table 1, deep learning methods performed better in recognizing smoke.However, several challenging limitations persist, including the complexity and dynamics of the background; the visual similarity between smoke, clouds, dust, and haze; the varying characteristics of smoke regarding its air concentration, flow pattern, and color; and detecting small smoke zones.

Materials and Methods
In this section, the proposed DL method, BoucaNet, designed for the recognition of smoke using satellite images, is introduced.Subsequently, an overview of the dataset employed to train and test the BoucaNet model is provided.Finally, the evaluation metrics (F1-score, accuracy, and inference time) used in this paper are presented.

Proposed Method for Smoke Classification
In this paper, a new ensemble learning approach, namely BoucaNet, is introduced for recognizing smoke in satellite images and for addressing challenging limitations, including background complexity and dynamics due to the presence of dynamically changing backgrounds in input satellite images; visual similarities of smoke with clouds, dust, and haze; and varying features of smoke regarding its shape, form, color, flow pattern, and texture.BoucaNet combines the deep CNN EfficientNet v2 (EfficientNetV2M) [13] and the vision transformer EfficientFormer v2 (EfficientFormerV2L) [12].EfficientNet v2 [13] is a new family of CNN.It is proposed to address the training limitation of EfficientNet models [47], showing a better parameter efficiency and faster learning speed compared with these models.It adopts an improved progressive learning method, which adaptively adjusts regularization techniques such as data augmentation techniques and dropout methods along with input image size.EfficientNet v2 achieves a high performance with top-1 accuracy of 87.3% using ImageNet21K dataset [48], surpassing the popular vision transformers (ViT, DeiT, and T2T-Vit) and existing CNNs (EfficientNet, RegNetY, ResNetSt, NFNet, BotNet, etc.) [13].EfficientFormer v2 was developed by Li et al. [12] to improve the size and latency of vision transformers while maintaining high performance.This model is an updated version of the EfficientFormer model, integrating a fine-grained joint search method, which optimizes the speed and size of the model, simultaneously.Using the ImageNet-1K dataset [49] as the learning data, it achieves an impressive top-1 accuracy of 83.5% and a low latency of 0.9 ms on iPhone 12 (iOS 16), outperforming existing competitive CNN methods (MobileNet v2, EfficientNet, ResNet, etc.) and vision transformer models (Mobile ViT, EdgeVit, LeViT, DeiT, T2T-ViT, Swin-Tiny, CSwin, etc.) [12].
To employ EfficientNet v2 and EfficientFormer v2 models in the specific task of smoke recognition, their classification layers (last layers), originally developed for different classification tasks, are removed.As depicted in Figure 1, the preprocessing steps start with resizing the input satellite images to 224 × 224 pixels.Next, four data augmentation techniques, including rotation, shearing, shifting, and zooming, are utilized to diversify learning data, improve the potential of BoucaNet to generalize different real-world scenarios, and ovoid overfitting.Then, the input satellite images and the generated images are simultaneously fed into the EfficientNet v2 and EfficientFormer v2 models to extract complex contextual features, comprising both smoke plume patterns and background contextual information, and provide a comprehensive representation of various smoke scenarios.After concatenating the two feature maps generated by the EfficientNet v2 and EfficientFormer v2 models, the Gaussian dropout regularization technique with a rate of 0.3 is employed.This method adds random noise from a Gaussian distribution to the input satellite data, improving BoucaNet's generalization ability and avoiding overfitting.Finally, a Softmax function generates a probability score ranging from 0 to 1, determining the appropriate class, such as smoke, cloud, haze, dust, seaside, or land, for the input satellite images.

Datasets
Many large fire datasets are made available to help researchers in benchmarking and comparing DL techniques dealing with the same problem.However, this is not the case for smoke recognition problems, especially when using satellite data, thus making the evaluation of these DL methods a little challenging.
To train and test the proposed smoke recognition method, BoucaNet, the available satellite data, USTC_SmokeRS [14], is utilized.This dataset is collected using MODIS (Moderate Resolution Imaging Spectroradiometer) and represents numerous smoke scenes through satellite remote sensing.It is selected from a remote sensing platform in Hefei, China, and the Level-

Evaluation Metrics
In this work, three metrics (accuracy, F1-score, and inference time) are used to evaluate the proposed ensemble learning approach, BoucaNet.The accuracy and F1-score metrics are determined using the true positive rate (TP), false positive rate (FP), true negative rate (TN), and false negative rate (FN).

•
Accuracy is the proportion of accurate predictions relative to the total number of predictions, as shown in Equation (1).
• F1-score integrates precision and recall metrics to calculate the performance of the proposed model, as presented in Equation ( 2).
• The inference time is the average time taken by BoucaNet to identify and recognize the presence of smoke in an input satellite image during the test step.

Results and Discussion
The proposed DL model, BoucaNet, was developed using Python and TensorFlow version 2.11 [50].For training and testing this model, a machine equipped with an NVIDIA GeForce RTX 2080Ti GPU, an Intel(R) Xeon(R) CPU (E5-2620 v4), and 64GB of RAM was utilized.
BoucaNet was trained using the USTC_SmokeRS satellite dataset.This dataset allowed BoucaNet to learn on various classes and scenarios, thereby enabling it to learn and recognize various aspects of smoke in satellite images.It comprises a total of 6225 satellite images, divided into six distinct classes.These images were split into three sets as shown in Table 2  During the training process, various hyperparameters were selected to optimize the learning of BoucaNet, including a learning rate of 0.001, the Adam optimizer, a total of 150 training epochs, and a batch size of eight.Additionally, the categorical cross-entropy loss function (see Equation ( 3)) was employed.
where z is the binary indicator, A is the number of classes (six classes, including smoke, cloud, haze, dust, land, and seaside), and p is the predicted probability.The experimental setup utilized input satellite images with a size of 224 × 224 pixels.To improve BoucaNet's performance and avoid overfitting, four data augmentation techniques, such as shear, rotation, shift, and zoom, were employed, enabling BoucaNet to handle a wide range of real-world scenarios.Additionally, the GPU was used to facilitate model training and calculate the inference time.
The evaluation of BoucaNet includes several key aspects.Firstly, its performance was analyzed in terms of F1-score, accuracy, and inference time with the method, namely CT-Fire, which combines EfficientFormer v2 [14] and RegNetY [51] models as the backbone, RegNetY-16GF [51], the vision transformer EfficientFormer v2 [12], and SmokeNet [14] as the state-of-the-art smoke detection method.Next, the obtained F1-scores of these models for each class, namely smoke, cloud, dust, haze, land, and seaside, were presented.Then, the resulting confusion matrix generated by BoucaNet was illustrated and discussed.Finally, visual results of the input images predicted by these models were presented.
Testing results (loss, F1-score, accuracy, and inference time) of the proposed BoucaNet, CT-Fire, RegNetY-16GF, and EfficientFormer v2 are reported in Table 3. RegNetY-16GF and EfficientFormer v2 were selected due to their excellent performance in classifying objects.CT-Fire is an ensemble learning method, which combines EfficientFormer v2 and RegNetY-16GF to extract features.Then, the Gaussian drop regularization method and the softmax function were used to recognize the presence of smoke.BoucaNet showed a high performance during testing, achieving a loss of 0.2184, an accuracy of 93.67%, and an F1-score of 93.64%.This performance was obtained thanks to the diversity of feature maps extracted by EfficientNet v2 and EfficientFormer v2 models, including details, complexity, and local and global feature (colors, shapes, textures, etc.) for the smoke, cloud, haze, seaside, land, and dust classes, thus enabling BoucaNet to distinguish between smoke and complex backgrounds and identify small areas of smoke.In terms of F1-score, BoucaNet outperformed CT-Fire, RegNetY-16GF, and EfficientFormer by 2.75%, 1.38%, and 1.50%, respectively.This proposed model also performed better than the state-of-the-art method SmokeNet, which achieved an accuracy of 92.75% using the USTC_SmokeRS dataset [14].It demonstrated its potential to address and overcome challenging limitations related to recognizing smoke in satellite images.These challenges include complex backgrounds, comprising various land covers and geographical features, which can make it difficult to accurately identify smoke in input satellite images.Additionally, BoucaNet handled the varying and dynamic nature of smoke in terms of its shape, color, intensity, and flow pattern features, as well as the visual similarities of smoke, including color, shape, and spectral characteristics, which are often shared with clouds, dust, and haze.On the other hand, BoucaNet achieved an efficient processing speed with an inference time of 0.16 seconds, slightly surpassing the inference times of EfficientFormer v2, CT-Fire, and RegNetY-16GF.This inference time showed BoucaNet's suitability for real-time processing of satellite images for smoke recognition while maintaining high performance.Table 4 illustrates the comparative analysis of BoucaNet, RegNetY-16GF, CT-Fire, and EfficientFormer v2 for recognizing smoke, cloud, haze, dust, land, and seaside classes.Bou-caNet achieved superior results with an F1-score of 95.58%, 91.00%, 90.82%, 95.01%, 98.76%, and 90.36% for recognizing cloud, dust, haze, land, seaside, and smoke classes, respectively, compared with CT-Fire, RegNetY-16GF, and EfficientFormer v2.It demonstrated its ability to accurately differentiate between cloud, smoke, haze, dust, land, and seaside features, thereby proving its capability to overcome challenges related to background complexity and visual similarities, including color, shape, and spectral characteristics, between smoke and other classes (cloud, dust, and haze).Figure 4 depicts a confusion matrix of BoucaNet for the six classes (smoke, dust, cloud, haze, seaside, and land) using the testing set.The results obtained provide a comprehensive view of BoucaNet's performance in recognizing these classes and overcoming challenges.BoucaNet performed well in distinguishing between features of the smoke (178 instances), cloud (227 instances), dust (187 instances), haze (178 instances), land (200 instances), and seaside (199 instances) classes.These results demonstrate the robustness of BoucaNet in identifying smoke in varying environmental conditions and complex backgrounds, despite the overlap in visual features between smoke, clouds, dust, and haze.However, it misclassified a small number of smoke instances as clouds (eight instances), dust (six instances), haze (five instances), land (six instances), and seaside (one instance).These misclassifications can be attributed to the complex nature of smoke, which shares visual characteristics (color, shape, spectral texture, etc.) with other classes.Similar to its quantitative performance, BoucaNet performed well in predicting and identifying the presence of smoke, clouds, dust, haze, land, and seaside in input satellite images with high confidence scores (see .For instance, it correctly predicted a smoke image as smoke with a confidence score of 0.99 (as shown in Figure 5c), a cloud instance as cloud with a confidence score of 0.98 (as depicted in Figure 6a), a dust instance as dust with a confidence score of 0.88 (see Figure 7c), and a haze instance as haze with a confidence score of 0.99 (as shown in Figure 8c).Additionally, CT-Fire made incorrect predictions, such as classifying clouds as dust with a confidence score of 0.99 (see Figure 6b) and haze as land with a confidence score of 0.93 (as shown in Figure 8b).RegNetY-16GF also misclassified haze as land with a confidence score of 0.63 (see Figure 8b).EfficientFormer also performed poorly in detecting land as haze with a confidence score of 0.94 (as depicted in Figure 9b).In conclusion, BoucaNet performed well in recognizing smoke in satellite images compared with baseline models (EfficientFormer v2, RegNetY-16GF, CT-Fire, and Smo-keNet).Notably, it demonstrated its potential to address challenging limitations, including complex backgrounds; the dynamic nature of smoke in terms of its shape, intensity, and color; detecting small areas of smoke; and distinguishing visual similarities in terms of color, shape, and spectral characteristics between smoke and other elements, including clouds, dust, and haze.Additionally, BoucaNet achieved an interesting inference time.

Conclusions
In this paper, a novel ensemble learning method, namely BoucaNet, was presented for recognizing smoke in satellite images while addressing the associated challenges.BoucaNet combines the strengths of EfficientNet v2 and EfficientFormer v2 to extract rich and diverse feature maps for smoke, cloud, haze, dust, land, and seaside classes.It demonstrated a high performance, with an accuracy of 93.67% and an F1-score of 93.64%, using the USTC_SmokeRS dataset, which consists of 6225 satellite images.Furthermore, BoucaNet outperformed existing deep learning models for object classification, specifically Efficient-Former v2 and RegNetY-16GF, as well as state-of-the-art methods, including SmokeNet.It also showed an interesting processing speed, with an inference time of 0.16 s.Additionally, BoucaNet demonstrated its potential as a robust solution to the challenges of recognizing smoke in satellite images, including complex backgrounds; the dynamic nature of smoke, which can present variations in shape, intensity, and color; detecting small areas of smoke; and visual similarities between smoke and other elements, such as clouds, dust, and haze.
As future work, the evaluation of BoucaNet is planned for detecting smoke and fires using large scale satellite and/or aerial images in both forest and urban environments.

Figure 1 .
Figure 1.The proposed architecture of BoucaNet.P1, P2, P3, P4, P5, and P6 correspond to the predicted probabilities of the input image belonging to the smoke, cloud, haze, dust, land, or seaside class.
1 and Atmosphere Archive & Distribution System (LAADS) Distributed Active Archive Center (DAAC) situated at the Goddard Space Flight Center in Greenbelt, Maryland, USA.The USTC_SmokeRS dataset comprises a total of 6225 satellite images with dimensions of 256 × 256 pixels and a spatial resolution of 1 km.It comprises six classes: • Smoke (1016 satellite images) as the target class for wildfire detection.• Dust (1009 satellite images) and haze (1002 satellite images) as negative classes to smoke, which share similar features (texture and spectral) with smoke.• Cloud (1164 satellite images) as the most common class in satellite images, with similar color, shape, and spectral characteristics to smoke.• Land (1027 satellite images) and seaside (1007 satellite images) as background classes for fire smoke scenes.

Figures 2 and 3
Figures 2 and 3 depict a USTC_SmokeRS dataset example.

Figure 2 .
Figure 2. USTC_SmokeRS dataset example from top to bottom: smoke images, dust images, and land images.

Figure 3 .
Figure 3. USTC_SmokeRS dataset example from top to bottom: cloud images, haze images, and seaside images.

Figure 4 .
Figure 4. Confusion matrix of BoucaNet on USTC_SmokeRS data test set.

( a )Figure 5 .
Figure 5. Smoke classification results of the proposed models.

Figure 10 .
Figure 10.Seaside classification results of the proposed models.

Table 1 .
Deep learning models for smoke recognition.

Table 3 .
Comparative analysis of BoucaNet and other models on USTC_SmokeRS dataset.

Table 4 .
Comparative analysis of BoucaNet and other DL methods for smoke, cloud, haze, dust, land, and seaside classes.