You are currently viewing a new version of our website. To view the old version click .
Fire
  • Article
  • Open Access

27 September 2024

Deep Learning Approach for Wildland Fire Recognition Using RGB and Thermal Infrared Aerial Image

and
Perception, Robotics and Intelligent Machines (PRIME), Department of Computer Science, Université de Moncton, Moncton, NB E1A 3E9, Canada
*
Author to whom correspondence should be addressed.
This article belongs to the Section Fire Science Models, Remote Sensing, and Data

Abstract

Wildfires cause severe consequences, including property loss, threats to human life, damage to natural resources, biodiversity, and economic impacts. Consequently, numerous wildland fire detection systems were developed over the years to identify fires at an early stage and prevent their damage to both the environment and human lives. Recently, deep learning methods were employed for recognizing wildfires, showing interesting results. However, numerous challenges are still present, including background complexity and small wildfire and smoke areas. To address these challenging limitations, two deep learning models, namely CT-Fire and DC-Fire, were adopted to recognize wildfires using both visible and infrared aerial images. Infrared images detect temperature gradients, showing areas of high heat and indicating active flames. RGB images provide the visual context to identify smoke and forest fires. Using both visible and infrared images provides a diversified data for learning deep learning models. The diverse characteristics of wildfires and smoke enable these models to learn a complete visual representation of wildland fires and smoke scenarios. Testing results showed that CT-Fire and DC-Fire achieved higher performance compared to baseline wildfire recognition methods using a large dataset, which includes RGB and infrared aerial images. CT-Fire and DC-Fire also showed the reliability of deep learning models in identifying and recognizing patterns and features related to wildland smoke and fires and surpassing challenges, including background complexity, which can include vegetation, weather conditions, and diverse terrain, detecting small wildfire areas, and wildland fires and smoke variety in terms of size, intensity, and shape. CT-Fire and DC-Fire also reached faster processing speeds, enabling their use for early detection of smoke and forest fires in both night and day conditions.

1. Introduction

Wildland fires cause ecological imbalance and risk to human health. They result in costly economic losses, and air pollution, as they can destroy property, forests, and animal species. For example, Europe had its worst year for wildland fires in 2022, with around 900,000 hectares burnt and 260,000 hectares have already been burned since January 2023, significantly increasing the risk of economic losses [1,2]. Moreover, in Canada, around 7300 wildfires occurred each year over the past 25 years, burning a total of 2.5 million hectares each year [3], with 2023 a record year with 18 million hectares of burned area [4]. The cost of wildfire fighting in Canada varied between CAD 800 million and CAD 1.5 billion per year over the past decade [3]. Numerous wildland fire detection systems were developed to reduce the impact of fires. These systems employ a variety of technologies, including heat sensors, smoke sensors, gas detectors, and flame detectors, to detect forest fires and improve fire prevention [5].
Recently, the integration of vision sensors for wildfire recognition and detection presents a significant advancement in detecting, monitoring, and preventing wildland fires [6,7,8]. As wildfires are distinguished by their color, texture, pattern, and shape, various feature based methods (e.g., YUV color space [9], optical flow [10], dynamic texture [11], etc.) and traditional machine learning methods (e.g., SVM [12], Bayesian classifier [13], Markov model [14], etc.) have demonstrated high performance in this task.
The main challenges associated with the aforementioned methods are the extraction of informative and relevant characteristics that accurately present the wildland fire detection problem. More recently, deep learning (DL) methods completely revolutionized computer vision tasks, solving complex challenges in various fields such as video recognition [15,16], medical image analysis [17,18], motion estimation [19,20], image analysis [21,22], and object tracking [23,24]. These methods showed remarkable ability compared to classical machine learning methods. As a result of their impressive potential, DL methods were used in detecting and recognizing wildfires [25,26]. However, they face a number of limitations, including the detection of small wildfires and smoke areas, the complexity of the background, and the variability of wildfires and smoke in terms of size, shape, and intensity.
To address these challenging limitations, we developed and adopted, in this paper, two deep learning methods, namely DC-Fire [27] and CT-Fire [28], for efficiently recognizing wildland fires using both infrared (IR) and RGB (visible spectrum) aerial images. IR images capture the temperature gradients, highlighting zones with high heat levels, which indicate an active flame, even when flame and smoke are not visible. RGB aerial images provide the visual context to confirm the presence of wildland smoke and fires. DC-Fire combines two deep CNNs (Convolutional Neural Network), DenseNet-201 [29] and EfficientNet-B5 [30] models. CT-Fire integrates the CNN RegNetY-16GF [31] and the vision transformer EfficientFormer v2 [32] to extract deep features related to smoke and fires. These two models were trained and evaluated using a large aerial dataset, namely FLAME2 [33], which includes both visible and IR images. FLAME2 describes numerous forest fire and smoke scenarios collected during prescribed wildfires in Arizona, providing a rich dataset to capture various aspects of fire and smoke patterns. It was widely used in training and testing advanced fire detection models.
Three main contributions are proposed in this paper:
  • Two DL methods, namely DC-Fire and CT-Fire, were adopted for recognizing smoke and fires using both IR and visible aerial images in order to improve the performance of wildland fire/smoke classification tasks.
  • DC-Fire and CT-Fire showed a promising performance, overcoming challenging limitations, including background complexity, the detection of small wildland fire areas, image quality, and the variability of wildfires regarding their size, shape, and intensity with flame lengths ranging from 0.25 to 0.75 m, occasionally 5 to 10 m.
  • DC-Fire and CT-Fire methods showed fast processing speeds, allowing their use for early detection of wildland smoke and fires during both day and night, which is crucial for reliable fire management strategies.
The rest of this paper is organized as follows: Section 2 reports state-of-the-art methods for wildland fire classification using deep learning methods. Section 3 presents DC-Fire and CT-Fire methods, the dataset used, evaluation metrics utilized in this paper, and implementation details. Section 4 illustrates the experimental results of DC-Fire and CT-Fire methods. Section 5 discusses the potential of the two methods for recognizing wildland fires and their limits. Section 6 summarizes this work.

3. Materials and Methods

We first introduce, in this section, two deep learning methods, DC-Fire and CT-Fire, for recognizing wildland smoke and fires using visible and infrared images. DC-Fire is a robust ensemble learning method that combines two CNNs, DenseNet-201 and EfficientNet-B5, known for their feature extraction abilities. CT-Fire uses the strengths of both CNNs (RegNetY-16GF) and vision transformers (EfficientFormer v2) to generate rich feature maps. Next, we present the aerial training dataset, FLAME2. Then, we define five evaluation metrics (accuracy, precision, recall, F1-score, and inference time) used to evaluate these two methods. Finally, we report the implementation details used for training and testing the DC-Fire and CT-Fire methods.

3.1. Proposed Methods

3.1.1. DC-Fire Method

We developed a deep CNN method, called DC-Fire, for detecting and recognizing smoke and forest fires using both visible and IR data. DC-Fire is an ensemble learning method combining the DenseNet-201 [29] and EfficientNet-B5 [30] models. EfficientNet-B5 is a deep CNN model that employs a compound scaling method to uniformly scale the depth, resolution, and width. It includes a 3 × 3 convolutional layer and multiple MBConv blocks with varying kernel sizes of 3 × 3 and 5 × 5 and uses squeeze-and-excitation optimization to extract characteristics at different levels efficiently. In addition, it also consists of a 1 × 1 convolutional layer and a pooling layer to refine extracted characteristics. It achieved high performance with a Top-1 accuracy of 83.3% compared to popular CNN models such as ResNet, Inception v3, Xception, ResNeXt, InceptionResNet v2, SENet, PNASNET, and AmoebaNet using the ImageNet dataset [30]. DenseNet-201 (Dense Convolutional Network) [29] is also a deep CNN, connecting each layer to the previous ones in order to reuse features, reduce the number of parameters, and overcome the vanishing gradient problem. It comprises 201 layers. It starts with a 7 × 7 convolutional layer, followed by a 3 × 3 max pooling layer. It then has four dense blocks, each containing a series of 1 × 1 and 3 × 3 convolutional layers. Between each two dense blocks, there are transition layers consisting of a 1 × 1 convolutional layer and a 2 × 2 average pooling layer. This model performed well in numerous competitive object classification benchmark tasks (CIFAR-100, ImageNet, Street View House Numbers, and CIFAR-10), showing interesting performances compared to ResNet models [29].
Figure 1 depicts the DC-Fire architecture. To diversify the training data, we first use data augmentation methods such as shift, rotation, shear, and zoom. Then, we simultaneously fed both visible and IR input data and newly generated images into the DenseNet-201 and EfficientNet-B5 models to extract rich and relevant wildland smoke- and fire-related features. Next, the two feature maps generated by the EfficientNet-B5 and DenseNet-201 methods are concatenated, combining the extracted features from both models into a larger feature vector and producing rich and diverse feature maps. Subsequently, an average pooling layer is employed to reduce the spatial dimensions of the concatenating feature map. Then, we improved the generalization of DC-Fire by adding noise to the input images using a Gaussian dropout method with a rate of 0.3. Finally, a sigmoid function is utilized to generate a likelihood score between 0 and 1, identifying the presence of forest fires and smoke on IR and visible input images when the probability is greater than or equal to 0.5.
Figure 1. The proposed architecture of DC-Fire. P1 and P present the predicted probabilities of the input aerial image belonging to the non-fire and fire class.

3.1.2. CT-Fire Method

In this section, we introduce a CNN-Transformer method, namely CT-Fire, to identify smoke and forest fires using visible and thermal infrared images. CT-Fire employs the deep CNN, RegNetY-16GF [31], and the vision transformer, EfficientFormer v2 [32], to generate rich and diverse feature maps, including deep features related to vegetation, diverse terrains, wildland fires, and smoke. RegNetY-16GF was proposed by Ilija et al. [31] to explore the network design paradigm and generate a fast and reliable DL model. This model is composed of three main blocks: the stem, the body, and the head. The stem block contains two 3 × 3 convolutional layers, while the body block comprises four stages (from stage 1 to stage 4). Stage 1 includes two convolutional layers, while stages 2, 3, and 4 consist of a series of identical blocks, each comprising 1 × 1 convolutional layers and a group of 3 × 3 convolutional layers. Each of these layers is followed by batch normalization and a ReLU activation function. The model also incorporates Squeeze-and-Excitation (SE) modules in each identical block, extracting relevant features and suppressing less important ones. RegNetY-16GF showed an impressive result with a Top-1 classification error of 22.5% better than existing CNNs such as EfficientNet, ResNeXt, and ResNet models. EfficientFormer v2 is an advanced version of the EfficientFormer method developed to address the higher latency of vision transformers compared to lightweight CNNs. It uses convolutional layers to generate local and low-level details and transformer blocks to capture global and high-level features. In its feed forward network, a depth-wise convolutional layer captures local information. A modified multi-head self-attention mechanism, integrating a 3 × 3 depth-wise convolutional layer, is adopted to extract both global and local features. This model also utilizes a new fine-grained joint search method to optimize the number of parameters and latency simultaneously. It achieved a superior Top-1 classification accuracy of 83.5% compared to popular CNNs and vision transformer models, such as MobileNet v2, EfficientNet, ResNet, MobileNet v3, DeiT, LeViT, CSwin-T, NasViT-Supernet, etc. [32].
As shown in Figure 2, we first resize the IR and visible input images to a resolution of 224 × 224 pixels. Next, we apply various data augmentation techniques, including shearing, rotating, zooming, and shifting, in order to increase the amount of learning images, while avoiding overfitting. Subsequently, the EfficientFormer v2 and RegNetY-16GF models generate two feature maps, comprising distinct features related to complex background, and forest smoke and fire scenarios. Following this, the features generated by both models are integrated into a larger feature vector, combining diverse features from each model and providing a clear comprehension of different wildfire scenarios. Then, the regularization method, Gaussian dropout with a rate of 0.3, is used to improve CT-Fire’s performance. Finally, a dense layer with a sigmoid function is adopted to determine the CT-Fire results, providing a probability value between 0 and 1. These values are used to identify the presence of forest smoke and fires on the input images when the probability is greater than or equal to 0.5.
Figure 2. The proposed architecture of CT-Fire. L1 and L refer the predicted probabilities of the input aerial image belonging to the fire or non-fire class.

3.2. Dataset

Lots of visible fire data are available to researchers, facilitating the benchmarking of wildfire recognition models. However, this is not the case for thermal IR fire data. For such, we use the available public dataset, FLAME2 [33] for training and testing DC-Fire and CT-Fire. FLAME2 is a very large aerial dataset collected during a prescribed wildfire in Arizona using a DJI mavic 2 enterprise advanced drone. RGB and IR images were captured using a CMOS visual sensor and an uncooled vanadium oxide microbolometer sensor, respectively. Fire intensity in this data varies, with flame lengths ranging from 0.25 to 0.75 m, occasionally attaining 5 to 10 m when large fuels are consumed. FLAME2 consists of 53,451 pairs of images, including 14,317 fire/non-smoke images, 25,434 fire/smoke images, and 13,700 non-fire/non-smoke images for each RGB and IR image. Figure 3 and Figure 4 depict IR/RGB fire and non-fire samples from the FLAME2 dataset.
Figure 3. FLAME2 dataset example. (Top): RGB non-fire images; (Bottom): Their corresponding IR non-fire images.
Figure 4. FLAME2 dataset example. (Top): RGB fire images. (Bottom): Their corresponding IR fire images.

3.3. Implementation Details

CT-Fire and DC-Fire models were developed using TensorFlow version 2.11 [69]. A machine equipped with an NVIDIA GeForce RTX 2080Ti GPU, a RAM of 64 GB, and an Intel(R) Xeon(R) CPU was used to train and test these models.
DC-Fire and CT-Fire were trained using RGB and IR images. FLAME2 was utilized as learning data, enabling these models to learn various smoke and wildfire situations using both IR and visible data. A total of 53,451 pairs of image (RGB and IR) were divided between the training (68,416 images), validation (17,104 images), and test (21,382 images) sets, as presented in Table 2.
Table 2. Dataset subsets.
During training, we used both IR and visible input images with a resolution of 224 × 224 pixels. In addition, shear, rotation, zoom, and shift data augmentation methods were employed to improve CT-Fire and DC-Fire performance and avoid overfitting. The binary cross-entropy loss (BCE) function [70] was also adopted as a loss function to reduce misclassification error during the learning process (see Equation (1)). On the other hand, we selected various hyperparameters, including a batch size of 8, 150 epochs, an optimizer Adam, and a learning rate of 0.001. An early stop was also implemented to interrupt training if model performance did not improve after fifteen epochs. The model showing the lowest validation loss was then saved.
B C E = 1 n i = 1 n y i log ( y ^ i ) + ( 1 y i ) log ( 1 y ^ i )
where y i is the label (in our case, fire and no-fire classes), y ^ i is the predicted output, and n is the total number of samples in the dataset.

4. Experiments and Results

To evaluate the performance of DC-Fire and CT-Fire, we first analyze their accuracy, precision, recall, F1-score, and inference time, which is the average time to detect and recognize the presence of forest smoke and fires in an input image. We then compare these results with state-of-the-art methods (LeNet5, Xception, MobileNet v2, and ResNet-18) [33] and DL object recognition methods (ResNeXt-50 [71] and Swin Transformer v2 [72]). Secondly, we present the confusion matrix of DC-Fire and CT-Fire on the FLAME2 test dataset. Finally, we illustrate the predicted outputs of these models using RGB and IR images of each fire and non-fire class.
Figure 5 shows the learning and validation loss of the DC-Fire and CT-Fire models. We can see that both models improve rapidly during the first learning epochs, with both training and validation losses rapidly decreasing and converging. While the DC-Fire validation loss shows some variation, the best model was saved before the variation occurred, thanks to early stopping methods. It was then tested on unseen data and achieved high performance with well-classified images, as shown in Table 3, indicating that it learned well and generalized with no overfitting, similar to the CT-Fire model.
Figure 5. Loss curves for the proposed DC-Fire and CT-Fire during training and validation steps.
Table 3. Comparative analysis of DC-Fire, CT-Fire, and other DL methods using testing set (both IR and RGB images).
Table 3 shows the performance results of CT-Fire, DC-Fire, and other baseline methods, including LeNet5, Xception, MobileNet v2, ResNet-18, Swin Transformer v2, and ResNeXt-50, using both IR and RGB images (10,691 images each).
We can see that DC-Fire and CT-Fire achieved impressive testing results with an accuracy of 100%, a precision of 100%, a recall of 100%, and an F1-score of 100% for both models thanks to the extraction of deep and relevant features related to background and wildfires by the two deep CNNs (EfficientNet-B5 and DenseNet-201) for DC-Fire, and the vision transformer EfficientFormer v2 and the deep CNN RegNetY-16GF for CT-Fire. This means the CT-Fire and DC-Fire models correctly recognized all positive fire instances and did not make any incorrect predictions. Combining EfficientNet-B5 and DenseNet-201 models can significantly improve the extraction of deep features related to wildland smoke and fires. EfficientNet-B5 offers scalability by efficiently balancing network width, depth, and resolution, enabling the model to generate deep complex features. DenseNet-201 introduces dense connection, allowing the model to learn comprehensive patterns. Moreover, the combination of EfficientFormer v2 and RegNetY-16GF as a backbone generates various feature maps, enhancing the ability of the model to identify complex wildfire and smoke characteristics and providing a powerful approach for identifying wildfire and smoke. EfficientFormer v2 integrates convolutional layers and attention mechanisms, enabling the extraction of local and global features. RegNetY-16GF generates relevant features, including fire characteristics, heat patterns, and smoke plumes. Both CT-Fire and DC-Fire performed better than baseline models LeNet5, Xception, MobileNet v2, ResNet-18, Swin transformer v2, and ResNeXt-50, demonstrating their potential to address challenging limitations such as smaller fire detection, background complexity, and forest fire and smoke variability. DC-Fire and CT-Fire also achieved an interesting inference time of 0.013 and 0.024 s, respectively, showing their ability for early detection of wildland fires and smoke. However, their inference times are slightly slower than the processing time of 0.04 and 0.015 s for ResNeXt-50 and Swin transformer v2, respectively.
The confusion matrix is an important tool for evaluating the performance of the DC-Fire and CT-Fire models. It provides a comprehensive analysis of their classification performance, enabling us to determine their ability in correctly classify fire and non-fire classes. Figure 6 depicts the confusion matrix of both CT-Fire and DC-Fire methods using the FLAME2 test dataset with both IR and RGB images. We can see that the number of predictions in the diagonal elements (indicating the number of correct classifications of CT-Fire and DC-Fire) is very high and there are no prediction errors in the off-diagonal (representing the number of incorrect classifications of CT-Fire and DC-Fire). This result demonstrates the strong ability of these models in distinguishing between wildfire and non-wildfire scenarios using both IR and visible data and overcoming challenges related to this task.
Figure 6. Confusion matrix of both DC-Fire and CT-Fire using both IR and RGB images (both models obtained the same results).
Similar to the numerical results presented in Table 3, the proposed CT-Fire and DC-Fire correctly predict and identify the presence of wildfires using RGB images (as shown in Figure 7 and Figure 8) and infrared images (see Figure 9 and Figure 10) of both fire and non-fire situations. For example, they performed well in classifying IR and RGB aerial fire images with high confidence scores of 0.99 for CT-Fire and 1 for DC-Fire, as depicted in Figure 7c and Figure 9c.
Figure 7. Classification results of DC-Fire and CT-Fire models using RGB fire images.
Figure 8. Classification results of DC-Fire and CT-Fire models using RGB non-fire images.
Figure 9. Classification results of DC-Fire and CT-Fire models using IR fire images.
Figure 10. Classification results of DC-Fire and CT-Fire models using IR non-fire images.
To summarize, both DC-Fire and CT-Fire demonstrated their effectiveness as a reliable wildland fire recognition method, even for detecting small zones of wildfire, using both RGB and IR aerial images. In addition, they showed their potential in overcoming a number of challenges, notably the variability of wildland fires intensity, with flame lengths varying between 0.25 and 0.75 m (occasionally between 5 and 10 m) [33], the complexity of backgrounds, and the quality of the input image, which can be influenced by various factors such as input image resolution (in our case, 224 × 224 pixels).

5. Discussion

5.1. Results Analysis

IR images are crucial for identifying heat sources and can detect wildfires through heavy smoke, making them essential for early detection and monitoring of wildfires, even in dark conditions and during night and day times. In addition, visible images capture the visual details of wildland fires and smoke, offering a clear representation of these events, their intensity, and their environment. Utilizing both IR and visible images provides a comprehensive view of the diverse features of wildland fires and smoke, enabling DL models to learn a complete representation of forest fire and smoke scenarios during both day and night. This allows more reliable detection of wildland fires and smoke. For such, in this paper, we used both IR and visible images to offer large diversified wildland fire data for training the proposed deep learning models, DC-Fire and CT-Fire, and addressing related wildfire challenges. These methods were trained using a large amount of IR and RGB images (53,451 pairs of images). Their performance was then evaluated using a testing set of 10,691 image pairs, which the models had never seen before. DC-Fire and CT-Fire demonstrated their reliable potential in recognizing forest fires and smoke using both IR and RGB images aerial images. They achieved an interesting performance with an accuracy of 100%, a precision of 100%, a recall of 100%, and an F1-score of 100% for both methods. They also performed better than baseline methods such as LeNet5, Xception, MobileNet v2, ResNet-18, and swin Transformer v2 by 1.55%, 1.37%, 0.18%, 0.50%, 36.56%, and 33.98%, respectively, based on the F1-score. This result was achieved by integrating two CNNs (EfficientNet-B5 and DenseNet-201) for DC-Fire to extract rich and relevant feature maps and combining a CNN (RegNetY-16GF) with a vision transformer (EfficientFormer v2) for CT-Fire to capture global and local features. Combining EfficientNet-B5 and DenseNet-201 models can potentially exploit the scalability benefits of EfficientNet-B5 and the dense connectivity of DenseNet-201, thus improving the extraction of deep features related to wildland smoke and fires. In addition, combining EfficientFormer v2 and RegNetY-16GF as a backbone for extracting wildfire and smoke features allows us to generate diverse feature maps that contains fine details and local and global characteristics related to fire, smoke, and heat patterns. This also shows the ability of CT-Fire and DC-Fire in overcoming wildland fire-related limitations, including the complexity of the background, varying wildfire intensity with flame lengths ranging between 0.25 and 0.75 m, the quality of input images, which can be influenced by input image resolution of 224 × 224 pixels in our case, and the variability of wildfires and smoke regarding their size and shape. Additionally, DC-Fire obtained a fast processing speed with an inference time of 0.013 s better than the inference time of CT-Fire (0.024 s). This shows the ability of these DL methods for early wildland fire detection in real-world applications, during both night and day, when used with a monitoring system or UAV equipped with visible and IR cameras. This flexibility to detect fires at different times of the day ensures continuous monitoring and rapid response. Furthermore, early detection enables rapid intervention to reduce the damage caused by wildland fires and improves forest fire detection strategies and management. In conclusion, the fast processing speed and flexibility of DC-Fire make it suitable for real-time fire applications, improving the ability to rapidly identify fire incidents, and helping to manage wildfires more efficiently.
However, the FLAME2 dataset has certain limitations. This dataset represents a prescribed fire in a specific region (Arizona, USA), which includes ponderosa pine forest and pinyon-juniper woodland. It was collected on a unit composed of particular vegetation types and topographical variations (low fuel type and tree density) and under certain weather conditions, such as a temperature between 14 and 15 degrees Celsius, relative humidity between 18 and 20%, and a southwest wind at 4.5 m/s. These factors limit the generalizability of CT-Fire and DC-Fire to detect fires in other zones with different ecological and climatic conditions, and vegetation types. This affects the potential of their integration into real-world fire applications. Additionally, challenging scenarios related to wildfires remain, including the visual resemblances between wildfire, sunset, lighting, sunrise, etc., and the visual similarity among smoke, and smoke-like objects such as clouds, dust, fog, and haze. These scenarios are not represented in the public dataset, FLAME2, used for training and testing the DC-fire and CT-Fire models. This also reduces their generalization and performance, meaning that they do not perform well in real fire scenarios with these confusing visual elements. To the best of our knowledge, FLAME2 is the only public dataset, which includes both RGB and IR images. It presents a prescribed fire in Arizona. The organization of prescribed burns is a complex process and requires many factors to be considered before obtaining the necessary permission. Consequently, in future work, we plan to use a 3D platform to generate synthetic wildland fire images depicting these challenging scenarios. We will then demonstrate the ability of sim-to-real DL methods in detecting and classifying wildfires and smoke as well as addressing related challenges. We first plan to generate these synthetic data using advanced simulation tools such as Unreal Engine. This data will describe diverse wildfire scenarios by varying parameters such as weather conditions, vegetation types, and topography. Then, we will train proposed deep learning models using both real and synthetic images, while we will test them only on real fire images. On the other hand, the use of explainable artificial intelligence methods in wildfire detection systems tackles the crucial challenge of making the outputs of DL models transparent and comprehensible. These methods ensure that users can easily interpret how and why the DL methods reached a decision, improving trust and reliability in the technology. As a second future work, we will therefore employ advanced explanation methods to provide a comprehensive description of the features extracted by CT-Fire and DC-Fire for identifying wildland smoke and fires. On the other hand, we plan to adopt foundation models as powerful pretrained deep learning models for recognizing smoke and wildfires. We will utilize transfer learning from large datasets to improve wildland fire detection.

5.2. Ethical Issues

Drones offer many practical benefits, notably facilitating fire management and disaster response. They enhance the efficiency and safety of firefighting operations. They also provide real-time monitoring for detecting and recognizing wildfires and predicting their spread in complex forest areas using visible and thermal cameras. Recently, drone swarming technology, which integrates numerous drones simultaneously, has been adopted to cover large monitoring zones, such as the Amazon rainforest, and to detect fire spread. Autonomous water-dropping drones have also been used to detect hotspots and drop fire retardants or water, especially in dangerous and inaccessible zones. However, drone users have limited rights and ethical issues such as privacy concerns and data protection. Firefighters need to respect the laws relating to drone flights in residential and private areas, as unethical drone utilization can result in legal actions.
On the other hand, AI (Artificial Intelligence) based fire recognition and detection methods offer significant benefits, also improving fire management strategies. AI models reliably detect wildfires better than traditional methods, such as human observers, thanks to their ability to analyze large amounts of wildfire data from numerous cameras (thermal and visible) and sensors integrated into UAVs or drones. However, the use of AI in fire detection poses several ethical problems, such as data privacy, since AI models analyze large amounts of data, which can include private information. Additionally, bias and transparency in the decision-making processes of these models are important issues. By using explainability methods, AI models can make their results comprehensible and interpretive.

6. Conclusions

In this work, we used both IR and visible images to provide a comprehensive and detailed view of the various characteristics of forest fires and smoke. This enables deep learning methods to learn a rich visual representation of forest fire and smoke scenarios during day and night times. We then adopted two ensemble learning methods, namely CT-Fire and DC-Fire, for recognizing wildland smoke and fires using both IR and RGB images. CT-Fire employs the deep CNN RegNetY-16GF and the vision transformer EfficientFormer v2 as its backbone, while DC-Fire combines two deep CNNs (EfficientNet-B5 and DenseNet-201) to extract the deep features of forest fires and smoke. Experimental tests were performed using a large aerial dataset, FLAME2, resulting in high performance with an accuracy of 100%, a precision of 100%, a recall of 100%, and an F1-score of 100% for both CT-Fire and DC-Fire. This shows that the CT-Fire and DC-Fire models correctly recognized all positive fire instances and did not make any incorrect detections of fire and non-fire instances. In addition, these models outperformed state-of-the-art methods. They also achieved an interesting processing speed, showing the possibility of using them for early fire detection. Moreover, DC-Fire and CT-Fire demonstrated their potential in overcoming challenging limitations, including the complexity of the background, the variability of wildfires and smoke in terms of size, shape and intensity, and image quality. This demonstrates the potential of the DC-Fire and CT-Fire methods for early wildfire detection in real-world applications, during the day and night when utilized with UAVs or drones equipped with IR and visible cameras. This ability allows rapid intervention, reduces wildfire damage, and improves fire detection management strategies.

Author Contributions

Conceptualization, R.G. and M.A.A.; methodology, R.G. and M.A.A.; software, R.G.; validation, R.G. and M.A.A.; formal analysis, R.G. and M.A.A.; writing—original draft preparation, R.G.; writing—review and editing, M.A.A.; funding acquisition, M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was enabled in part by support provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), funding reference number RGPIN-2024-05287.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

This work used a publicly available dataset; see reference [33]. More details about this dataset are available under Section 3.2.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence;
DLDeep Learning;
IRInfrared;
CNNConvolutional Neural Network;
ReLURectified Linear Unit;
SKLFSState Key Laboratory of Fire Science;
CBAMConvolution Block Attention Module;
SHAPSHapley Additive exPlanation;
ACNetCustomized Attention Connected Network;
BCECross-Entropy Loss;
GRUGated Recurrent Unit;
Bi-LSTMBidirectional Long Short-Term Memory;
HOGHistogram of Oriented Gradients;
UAVUnmanned Aerial Vehicle;
BOBayesian optimization;
ANNArtificial Neural Network;
IRCNNInfrared Convolutional Neural Network.

References

  1. European Commission. 2022 Was the Second-Worst Year for Wildfires. Available online: https://ec.europa.eu/commission/presscorner/detail/en/ip_23_5951 (accessed on 20 May 2024).
  2. European Commission. Wildfires in the Mediterranean. Available online: https://joint-research-centre.ec.europa.eu/jrc-news-and-updates/wildfires-mediterranean-monitoring-impact-helping-response-2023-07-28_en (accessed on 20 May 2024).
  3. Government of Canada. Forest Fires. Available online: https://natural-resources.canada.ca/our-natural-resources/forests/wildland-fires-insects-disturbances/forest-fires/13143 (accessed on 20 May 2024).
  4. Shingler, B.; Bruce, G. Five Charts to Help Understand Canada’s Record Breaking Wildfire Season. Available online: https://www.cbc.ca/news/climate/wildfire-season-2023-wrap-1.6999005 (accessed on 20 May 2024).
  5. Anshul, G.; Abhishek, S.; Ashok, K.; Kishor, K.; Sayantani, L.; Kamal, K.; Vishal, S.; Anuj, K.; Chandra, M.S. Fire Sensing Technologies: A Review. IEEE Sensors J. 2019, 19, 3191–3202. [Google Scholar] [CrossRef]
  6. Ghali, R.; Akhloufi, M.A. Deep Learning Approaches for Wildland Fires Remote Sensing: Classification, Detection, and Segmentation. Remote Sens. 2023, 15, 1821. [Google Scholar] [CrossRef]
  7. Khan, F.; Xu, Z.; Sun, J.; Khan, F.M.; Ahmed, A.; Zhao, Y. Recent Advances in Sensors for Fire Detection. Sensors 2022, 22, 3310. [Google Scholar] [CrossRef] [PubMed]
  8. Ghali, R.; Akhloufi, M.A. Deep Learning Approaches for Wildland Fires Using Satellite Remote Sensing Data: Detection, Mapping, and Prediction. Fire 2023, 6, 192. [Google Scholar] [CrossRef]
  9. Toulouse, T.; Rossi, L.; Celik, T.; Akhloufi, M.A. Automatic Fire Pixel Detection using Image Processing: A Comparative Analysis of Rule-based and Machine Learning-based Methods. Signal Image Video Process. 2016, 10, 647–654. [Google Scholar] [CrossRef]
  10. Martin, M.; Peter, K.; Ivan, K.; Allen, T. Optical Flow Estimation for Flame Detection in Videos. IEEE Trans. Image Process. 2013, 22, 2786–2797. [Google Scholar] [CrossRef]
  11. Kosmas, D.; Panagiotis, B.; Nikos, G. Spatio-Temporal Flame Modeling and Dynamic Texture Analysis for Automatic Video-Based Fire Detection. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 339–351. [Google Scholar] [CrossRef]
  12. Ishag, M.M.A.; Honge, R. Forest Fire Detection and Identification Using Image Processing and SVM. J. Inf. Process. Syst. 2019, 15, 159–168. [Google Scholar] [CrossRef]
  13. Ko, B.; Cheong, K.H.; Nam, J.Y. Early Fire Detection Algorithm Based on Irregular Patterns of Flames and Hierarchical Bayesian Networks. Fire Saf. J. 2010, 45, 262–270. [Google Scholar] [CrossRef]
  14. David, V.H.; Peter, V.; Wilfried, P.; Kristof, T. Fire Detection in Color Images Using Markov Random Fields. In Proceedings of the Advanced Concepts for Intelligent Vision Systems, Sydney, Australia, 13–16 December 2010; pp. 88–97. [Google Scholar]
  15. Fahad, M. Deep Learning Technique for Recognition of Deep Fake Videos. In Proceedings of the IEEE IAS Global Conference on Emerging Technologies (GlobConET), London, UK, 19–21 May 2023; pp. 1–4. [Google Scholar]
  16. Ur Rehman, A.; Belhaouari, S.B.; Kabir, M.A.; Khan, A. On the Use of Deep Learning for Video Classification. Appl. Sci. 2023, 13, 2007. [Google Scholar] [CrossRef]
  17. Rana, M.; Bhushan, M. Machine Learning and Deep Learning Approach for Medical Image Analysis: Diagnosis to Detection. Multimed. Tools Appl. 2023, 82, 26731–26769. [Google Scholar] [CrossRef] [PubMed]
  18. Zhao, Y.; Wang, X.; Che, T.; Bao, G.; Li, S. Multi-task Deep Learning for Medical Image Computing and Analysis: A Review. Comput. Biol. Med. 2023, 153, 106496. [Google Scholar] [CrossRef] [PubMed]
  19. Jasdeep, S.; Subrahmanyam, M.; Raju, K.G.S. Multi Domain Learning for Motion Magnification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 13914–13923. [Google Scholar]
  20. Yu, C.; Bi, X.; Fan, Y. Deep Learning for Fluid Velocity Field Estimation: A Review. Ocean Eng. 2023, 271, 113693. [Google Scholar] [CrossRef]
  21. Harsh, R.; Lavish, B.; Kartik, S.; Tejan, K.; Varun, J.; Venkatesh, B.R. NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 5987–5996. [Google Scholar]
  22. Dhar, T.; Dey, N.; Borra, S.; Sherratt, R.S. Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust. IEEE Trans. Technol. Soc. 2023, 4, 68–75. [Google Scholar] [CrossRef]
  23. Xu, T.-X.; Guo, Y.-C.; Lai, Y.-K.; Zhang, S.-H. CXTrack: Improving 3D Point Cloud Tracking With Contextual Information. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 1084–1093. [Google Scholar]
  24. Ibrahim, N.; Darlis, A.R.; Kusumoputro, B. Performance Analysis of YOLO-Deep SORT on Thermal Video-Based Online Multi-Objet Tracking. In Proceedings of the IEEE 13th International Conference on Consumer Electronics—Berlin (ICCE-Berlin), Berlin, Germany, 3–5 September 2023; pp. 1–6. [Google Scholar]
  25. Saleh, A.; Zulkifley, M.A.; Harun, H.H.; Gaudreault, F.; Davison, I.; Spraggon, M. Forest Fire Surveillance Systems: A Review of Deep Learning Methods. Heliyon 2024, 10, e23127. [Google Scholar] [CrossRef] [PubMed]
  26. Gonçalves, L.A.O.; Ghali, R.; Akhloufi, M.A. YOLO-Based Models for Smoke and Wildfire Detection in Ground and Aerial Images. Fire 2024, 7, 140. [Google Scholar] [CrossRef]
  27. Ghali, R.; Akhloufi, M.A. DC-Fire: A Deep Convolutional Neural Network for Wildland Fire Recognition on Aerial Infrared Images. In Proceedings of the fourth Quantitative Infrared Thermography Asian Conference (QIRT-Asia 2023), Abu Dhabi, United Arab Emirates, 30 October–3 November 2023; pp. 1–6. [Google Scholar]
  28. Ghali, R.; Akhloufi, M.A. CT-Fire: A CNN-Transformer for Wildfire Classification on Ground and Aerial images. Int. J. Remote Sens. 2023, 44, 7390–7415. [Google Scholar] [CrossRef]
  29. Gao, H.; Zhuang, L.; Laurens, v.d.M.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  30. Mingxing, T.; Quoc, L. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
  31. Ilija, R.; Prateek, K.R.; Ross, G.; Kaiming, H.; Piotr, D. Designing Network Design Spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10428–10436. [Google Scholar]
  32. Yanyu, L.; Ju, H.; Yang, W.; Georgios, E.; Kamyar, S.; Yanzhi, W.; Sergey, T.; Jian, R. Rethinking Vision Transformers for MobileNet Size and Speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Vancouver, BC, Canada, 17–24 June 2023; pp. 16889–16900. [Google Scholar]
  33. Chen, X.; Hopkins, B.; Wang, H.; O’Neill, L.; Afghah, F.; Razi, A.; Fulé, P.; Coen, J.; Rowell, E.; Watts, A. Wildland Fire Detection and Monitoring Using a Drone-Collected RGB/IR Image Dataset. IEEE Access 2022, 10, 121301–121317. [Google Scholar] [CrossRef]
  34. Wang, K.; Zhang, Y.; Jinjun, W.; Zhang, Q.; Bing, C.; Dongcai, L. Fire Detection in Infrared Video Surveillance Based on Convolutional Neural Network and SVM. In Proceedings of the IEEE 3rd International Conference on Signal and Image Processing (ICSIP), Shenzhen, China, 13–15 July 2018; pp. 162–167. [Google Scholar]
  35. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
  36. Deng, L.; Chen, Q.; He, Y.; Sui, X.; Liu, Q.; Hu, L. Fire Detection with Infrared Images using Cascaded Neural Network. J. Algorithms Comput. Technol. 2019, 13, 1748302619895433. [Google Scholar] [CrossRef]
  37. Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Fulé, P.Z.; Blasch, E. Aerial Imagery Pile Burn Detection Using Deep Learning: The FLAME Dataset. Comput. Netw. 2021, 193, 108001. [Google Scholar] [CrossRef]
  38. Francois, C. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  39. Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Fulé, P.; Blasch, E. The FLAME Dataset: Aerial Imagery Pile Burn Detection using Drones (UAVs). IEEE Dataport 2020. [Google Scholar] [CrossRef]
  40. Ghali, R.; Akhloufi, M.A.; Mseddi, W.S. Deep Learning and Transformer Approaches for UAV-Based Wildfire Detection and Segmentation. Sensors 2022, 22, 1977. [Google Scholar] [CrossRef]
  41. Guo, Y.Q.; Chen, G.; Yi-Na, W.; Xiu-Mei, Z.; Zhao-Dong, X. Wildfire Identification Based on an Improved Two-Channel Convolutional Neural Network. Forests 2022, 13, 1302. [Google Scholar] [CrossRef]
  42. Wang, L.; Zhang, H.; Zhang, Y.; Hu, K.; An, K. A Deep Learning-Based Experiment on Forest Wildfire Detection in Machine Vision Course. IEEE Access 2023, 11, 32671–32681. [Google Scholar] [CrossRef]
  43. Karen, S.; Andrew, Z. Very Deep Convolutional Networks for Large-scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
  44. Anupama, N.; Prabha, S.; Senthilkumar, M.; Sumathi, R.; Tag, E.E. Forest Fire Identification in UAV Imagery Using X-MobileNet. Electronics 2023, 12, 733. [Google Scholar] [CrossRef]
  45. Mark, S.; Andrew, H.; Menglong, Z.; Andrey, Z.; Liang-Chieh, C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  46. Zhang, Z.; Guo, Y.; Chen, G.; Xu, Z. Wildfire Detection via a Dual-Channel CNN with Multi-Level Feature Fusion. Forests 2023, 14, 1499. [Google Scholar] [CrossRef]
  47. Islam, A.M.; Binta, M.F.; Rayhan, A.M.; Jafar, A.I.; Rahmat, U.J.; Salekul, I.; Swakkhar, S.; Muzahidul, I.A.K.M. An Attention-Guided Deep-Learning-Based Network with Bayesian Optimization for Forest Fire Classification and Localization. Forests 2023, 14, 2080. [Google Scholar] [CrossRef]
  48. Ali, K.; Bilal, H.; Somaiya, K.; Ramsha, A.; Adnan, A. DeepFire: A Novel Dataset and Deep Transfer Learning Benchmark for Forest Fire Detection. Mob. Inf. Syst. 2022, 2022, 5358359. [Google Scholar] [CrossRef]
  49. Aral, R.A.; Zalluhoglu, C.; Sezer, E.A. Lightweight and Attention-based CNN Architecture for Wildfire Detection using UAV Vision Data. Int. J. Remote Sens. 2023, 44, 5768–5787. [Google Scholar] [CrossRef]
  50. Sanghyun, W.; Jongchan, P.; Joon-Young, L.; So, K.I. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  51. Kumar, J.S.; Khan, M.; Jilani, S.A.K.; Rodrigues, J.J.P.C. Automated Fire Extinguishing System Using a Deep Learning Based Framework. Mathematics 2023, 11, 608. [Google Scholar] [CrossRef]
  52. Khubab, A.; Shahbaz, K.M.; Fawad, A.; Maha, D.; Wadii, B.; Abdulwahab, A.; Mohammad, A.; Alshehri, M.S.; Yasin, G.Y.; Jawad, A. FireXnet: An explainable AI-based Tailored Deep Learning Model for Wildfire Detection on Resource-constrained Devices. Fire Ecol. 2023, 19, 54. [Google Scholar] [CrossRef]
  53. Dincer, B. Wildfire Detection Image Data. Available online: https://www.kaggle.com/datasets/brsdincer/wildfire-detection-image-data (accessed on 20 May 2024).
  54. Pedro, V.d.V.; Lisboa, A.C.; Barbosa, A.V. An Automatic Fire Detection System Based on Deep Convolutional Neural Networks for Low-power, Resource-constrained Devices. Neural Comput. Appl. 2022, 34, 15349–15368. [Google Scholar] [CrossRef]
  55. Toulouse, T.; Rossi, L.; Campana, A.; Celik, T.; Akhloufi, M.A. Computer Vision for Wildfire Research: An Evolving Image Dataset for Processing and Analysis. Fire Saf. J. 2017, 92, 188–194. [Google Scholar] [CrossRef]
  56. Saied, A. FIRE Dataset. Available online: https://www.kaggle.com/datasets/phylake1337/fire-dataset?select=fire_dataset%2C+06.11.2021 (accessed on 20 May 2024).
  57. Ghali, R.; Akhloufi, M.A. BoucaNet: A CNN-Transformer for Smoke Recognition on Remote Sensing Satellite Images. Fire 2023, 6, 455. [Google Scholar] [CrossRef]
  58. Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
  59. Ba, R.; Chen, C.; Yuan, J.; Song, W.; Lo, S. SmokeNet: Satellite Smoke Scene Detection Using Convolutional Neural Network with Spatial and Channel-Wise Attention. Remote Sens. 2019, 11, 1702. [Google Scholar] [CrossRef]
  60. Fernando, L.; Ghali, R.; Akhloufi, M.A. SWIFT: Simulated Wildfire Images for Fast Training Dataset. Remote Sens. 2024, 16, 1627. [Google Scholar] [CrossRef]
  61. Almeida, J.S.; Jagatheesaperumal, S.K.; Nogueira, F.G.; de Albuquerque, V.H.C. EdgeFireSmoke++: A Novel lightweight Algorithm for Real-time Forest Fire Detection and Visualization using Internet of Things-human Machine Interface. Expert Syst. Appl. 2023, 221, 119747. [Google Scholar] [CrossRef]
  62. Almeida, J.S.; Huang, C.; Nogueira, F.G.; Bhatia, S.; de Albuquerque, V.H.C. EdgeFireSmoke: A Novel Lightweight CNN Model for Real-Time Video Fire–Smoke Detection. IEEE Trans. Ind. Informatics 2022, 18, 7889–7898. [Google Scholar] [CrossRef]
  63. Reis, H.C.; Turk, V. Detection of Forest Fire using Deep Convolutional Neural Networks with Transfer Learning Approach. Appl. Soft Comput. 2023, 143, 110362. [Google Scholar] [CrossRef]
  64. Habbib, A.M.; Khidhir, A.M. Transfer Learning Based Fire Recognition. Int. J. Tech. Phys. Probl. Eng. (IJTPE) 2023, 15, 86–92. [Google Scholar]
  65. Idroes, G.M.; Maulana, A.; Suhendra, R.; Lala, A.; Karma, T.; Kusumo, F.; Hewindati, Y.T.; Noviandy, T.R. TeutongNet: A Fine-Tuned Deep Learning Model for Improved Forest Fire Detection. Leuser J. Environ. Stud. 2023, 1, 1–8. [Google Scholar] [CrossRef]
  66. Al Duhayyim, M.; Eltahir, M.M.; Omer Ali, O.A.; Albraikan, A.A.; Al-Wesabi, F.N.; Hilal, A.M.; Hamza, M.A.; Rizwanullah, M. Fusion-Based Deep Learning Model for Automated Forest Fire Detection. Comput. Mater. Contin. 2023, 77, 1355–1371. [Google Scholar] [CrossRef]
  67. Guo, N.; Liu, J.; Di, K.; Gu, K.; Qiao, J. A hybrid Attention Model Based on First-order Statistical Features for Smoke Recognition. Sci. China Technol. Sci. 2024, 67, 809–822. [Google Scholar] [CrossRef]
  68. Jonnalagadda, A.V.; Hashim, H.A. SegNet: A segmented Deep Learning Based Convolutional Neural Network Approach for Drones Wildfire Detection. Remote Sens. Appl. Soc. Environ. 2024, 34, 101181. [Google Scholar] [CrossRef]
  69. Pramod, S.; Avinash, M. Introduction to TensorFlow 2.0. In Learn TensorFlow 2.0: Implement Machine Learning and Deep Learning Models with Python; Apress: Berkeley, CA, USA, 2020; pp. 1–24. [Google Scholar] [CrossRef]
  70. Al-Dabbagh, A.M.; Ilyas, M. Uni-temporal Sentinel-2 Imagery for Wildfire Detection Using Deep Learning Semantic Segmentation Models. Geomat. Nat. Hazards Risk 2023, 14, 2196370. [Google Scholar] [CrossRef]
  71. Saining, X.; Ross, G.; Piotr, D.; Zhuowen, T.; Kaiming, H. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
  72. Ze, L.; Han, H.; Yutong, L.; Zhuliang, Y.; Zhenda, X.; Yixuan, W.; Jia, N.; Yue, C.; Zheng, Z.; Li, D.; et al. Swin Transformer v2: Scaling Up Capacity and Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12009–12019. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.