Wildfire-Detection Method Using DenseNet and CycleGAN Data Augmentation-Based Remote Camera Imagery

To minimize the damage caused by wildfires, a deep learning-based wildfire-detection technology that extracts features and patterns from surveillance camera images was developed. However, many studies related to wildfire-image classification based on deep learning have highlighted the problem of data imbalance between wildfire-image data and forest-image data. This data imbalance causes model performance degradation. In this study, wildfire images were generated using a cycle-consistent generative adversarial network (CycleGAN) to eliminate data imbalances. In addition, a densely-connected-convolutional-networks-based (DenseNet-based) framework was proposed and its performance was compared with pre-trained models. While training with a train set containing an image generated by a GAN in the proposed DenseNet-based model, the best performance result value was realized among the models with an accuracy of 98.27% and an F1 score of 98.16, obtained using the test dataset. Finally, this trained model was applied to high-quality drone images of wildfires. The experimental results showed that the proposed framework demonstrated high wildfire-detection accuracy.


Introduction
Wildfires cause significant harm to humans and damage to private and public property; they pose a constant threat to public safety. More than 200,000 wildfires occur globally every year, with a combustion area of 3.5-4.5 million km 2 [1]. In addition, climate change is gradually accelerating the effects of these wildfires; there is thus considerable interest in wildfire management [2][3][4]. As wildfires are difficult to control once they spread over a certain area, early detection is the most important factor in minimizing wildfire damage. Traditionally, wildfires were primarily detected by human observers, but a deep learning-based automatic wildfire detection system with real-time surveillance cameras has the advantage of the possibility of constant and accurate monitoring, compared to human observers. The available methods for the early detection of wildfires can be categorized as a sensor-based technology and image-processing-based technology, using a camera. Sensors that detect changes in smoke, pressure, humidity, and temperature are widely used for fire detection. However, this method has several disadvantages, such as high initial cost and high false-alarm rates, as the performance of sensors is significantly affected by the surrounding environment [5][6][7].
An algorithm constructed using the latest neural network architecture of DenseNet [40] could be used to address this issue. DenseNet improves the performance of a model by connecting the feature maps of the previous layers to the inputs of the next layer using concatenation to maximize the information flow between layers.
Inspired by recent works, we generated synthetic wildfire images using GANs to change the image of a fire-free mountain to that of a mountain with a wildfire. The k-folds (k = 5) cross validation scheme was used on the models, and the train set was separated, train sets A and B, consisting of only the original images and of the original and generated images, respectively. Each dataset was divided to obtain the training data and test data, and was used to train a model that was developed based on DenseNet; this facilitated the comparison of the performance with two pre-trained models, VGG-16 [19] and ResNet-50 [20]. This paper is organized as follows. Section 2 describes the architecture of cycle-consistent adversarial networks (CycleGANs) [41], which is one of the main GANs algorithms used for data augmentation, and DenseNet [40], which is used for wildfire-image classification (wildfire detection). The experiment results obtained using both the models and the classification performance comparison with those of the pre-trained models are presented in Section 3. Section 4 presents the conclusion of this study.

Data Collection
The wildfire and non-fire images that were used for training the GAN model and CNN classification models were collected. The mountain datasets were obtained from eight scene-categories databases [42] and a Korean tourist spot database [43]. However, there is no open data benchmark available for fire or smoke images of wildfires [28]. The collection was, thus, solely obtained using web crawling; this limitation resulted in a data imbalance. Considering that the early fire-detection model is intended for application in drones and surveillance cameras for the purpose of monitoring, both categories of datasets were crawled from images or videos obtained using a drone. The sample of the dataset is presented in Figure 1. A total of 4959 non-wildfire images and 1395 wildfire images were set up in our original dataset and resized to 224 × 224 for the network input.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 16 used to address this issue. DenseNet improves the performance of a model by connecting the feature maps of the previous layers to the inputs of the next layer using concatenation to maximize the information flow between layers. Inspired by recent works, we generated synthetic wildfire images using GANs to change the image of a fire-free mountain to that of a mountain with a wildfire. The k-folds (k = 5) cross validation scheme was used on the models, and the train set was separated, train sets A and B, consisting of only the original images and of the original and generated images, respectively. Each dataset was divided to obtain the training data and test data, and was used to train a model that was developed based on DenseNet; this facilitated the comparison of the performance with two pre-trained models, VGG-16 [19] and ResNet-50 [20]. This paper is organized as follows. Section 2 describes the architecture of cycle-consistent adversarial networks (CycleGANs) [41], which is one of the main GANs algorithms used for data augmentation, and DenseNet [40], which is used for wildfire-image classification (wildfire detection). The experiment results obtained using both the models and the classification performance comparison with those of the pre-trained models are presented in Section 3. Section 4 presents the conclusion of this study.

Data Collection
The wildfire and non-fire images that were used for training the GAN model and CNN classification models were collected. The mountain datasets were obtained from eight scene-categories databases [42] and a Korean tourist spot database [43]. However, there is no open data benchmark available for fire or smoke images of wildfires [28]. The collection was, thus, solely obtained using web crawling; this limitation resulted in a data imbalance. Considering that the early fire-detection model is intended for application in drones and surveillance cameras for the purpose of monitoring, both categories of datasets were crawled from images or videos obtained using a drone. The sample of the dataset is presented in Figure 1. A total of 4959 non-wildfire images and 1395 wildfire images were set up in our original dataset and resized to 224 × 224 for the network input.

CycleGAN Image-to-Image Translation
To generate wildfire images, CycleGAN [41] was used, which is a method used for image-to-image translation from the reference image domain (X) to the target image domain (Y), without relying on paired images. As illustrated in Figure 2, in the CycleGAN, two loss functions called the adversary loss [33] and cycle-consistency loss [41] were used.

CycleGAN Image-to-Image Translation
To generate wildfire images, CycleGAN [41] was used, which is a method used for image-to-image translation from the reference image domain (X) to the target image domain (Y), without relying on paired images. As illustrated in Figure 2, in the CycleGAN, two loss functions called the adversary loss [33] and cycle-consistency loss [41] were used. Our objective was to train → such that the discriminator cannot distinguish the image data distribution from → and the image data distribution from domain Y. This objective can be written as follows: However, in a general GAN, the model is not trained over the entire distribution of actual data; it is only trained for reducing the loss. Therefore, a mode collapsing problem occurs in which the optimization fails, as the generator cannot find the entire data distribution, and all input images are mapped to the same output image. To solve this problem, in the CycleGAN, inverse mapping and cycle-consistency loss (ℒ were applied to Equations (1) and (2), respectively, and various outputs were thus produced. The equations of the cycle-consistency loss were as follows: In addition, by converting the X domain into → while adding an identity loss (ℒ that regularized the generator, such that the calculated output was the same as the input, the converted image could be generated, while minimizing the damage to the original image.
The final loss combined with all losses was as follows. Using CycleGAN with this method, it was possible to create various wildfire images, while maintaining the shape and background color of the forest site.

DenseNet
The early wildfire-detection algorithm was constructed using the state-of-the-art net architecture, DenseNet, which is known to perform well in wildfire detection, while alleviating the vanishing gradient problem and reducing the training time [40]. It is a densely connected CNN Our objective was to train G x→y such that the discriminator D y cannot distinguish the image data distribution from G x→y and the image data distribution from domain Y. This objective can be written as follows: (1) However, in a general GAN, the model is not trained over the entire distribution of actual data; it is only trained for reducing the loss. Therefore, a mode collapsing problem occurs in which the optimization fails, as the generator cannot find the entire data distribution, and all input images are mapped to the same output image. To solve this problem, in the CycleGAN, inverse mapping and cycle-consistency loss (L cyc ) were applied to Equations (1) and (2), respectively, and various outputs were thus produced. The equations of the cycle-consistency loss were as follows: ( In addition, by converting the X domain into G y→x while adding an identity loss (L im ) that regularized the generator, such that the calculated output was the same as the input, the converted image could be generated, while minimizing the damage to the original image.
The final loss combined with all losses was as follows. Using CycleGAN with this method, it was possible to create various wildfire images, while maintaining the shape and background color of the forest site.
L G x→y , G y→x , D x , D y = L GAN G x→y , D, X, Y + L GAN G y→x , D, X, Y + λL cyc G x→y , G y→x + L im G x→y , G y→x .

DenseNet
The early wildfire-detection algorithm was constructed using the state-of-the-art net architecture, DenseNet, which is known to perform well in wildfire detection, while alleviating the vanishing gradient problem and reducing the training time [40]. It is a densely connected CNN structure that has a connection strategy. Figure 3 illustrates the original dense block architecture. The network comprises layers, each of which contain a non-linear transformation, and includes functions such as batch normalization, rectified linear unit (ReLU), and convolution. X 0 is a single image, and the network output of the (l − 1) th layer after passing through a convolution is X l−1 . The l th layer receives the feature maps of all preceding layers as its input (Equation (6)).
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 16 structure that has a connection strategy. Figure 3 illustrates the original dense block architecture. The network comprises layers, each of which contain a non-linear transformation, and includes functions such as batch normalization, rectified linear unit (ReLU), and convolution. is a single image, and the network output of the 1 layer after passing through a convolution is . The layer receives the feature maps of all preceding layers as its input (Equation (6)).

Performance Evaluation Metrics
To compare the performance of the models, five commonly used metrics were calculated-accuracy, precision, sensitivity, specificity, and F1-Score [44][45][46]. Accuracy is the ratio of accurately predicted observations to the total number of observations and is the most intuitive performance measurement. Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. Sensitivity is the ratio of correctly predicted positive observations to the actual true observations. Specificity is the ratio of correctly predicted negative observations to the total number of predicted negative observations. The F1 score is the harmonic average of precision and sensitivity, which is generally useful for determining the performance of a model in terms of accuracy. The expressions for the evaluation metrics are presented as follows.
Precision . Sensitivity In the aforementioned equations, the number of true positives that the model predicts, i.e., the number of wildfire images predicted as wildfires and the number of true negatives that model the predicts, i.e., the number of non-fire images identified as non-fire, are denoted by true positive (TP) and true negative (TN), respectively. In addition, the number of false positives that the model predicts, i.e., the non-fire images predicted as wildfires, and the number of false negatives that model predicts, i.e., the wildfire images predicted as non-fire, are denoted as false positive (FP) and false negative (FN), respectively. These four types of data are defined using a confusion matrix in the binary classification. The overall performance-evaluation metrics were evaluated using the wildfire and non-wildfire testing sets.

Performance Evaluation Metrics
To compare the performance of the models, five commonly used metrics were calculated-accuracy, precision, sensitivity, specificity, and F1-Score [44][45][46]. Accuracy is the ratio of accurately predicted observations to the total number of observations and is the most intuitive performance measurement. Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. Sensitivity is the ratio of correctly predicted positive observations to the actual true observations. Specificity is the ratio of correctly predicted negative observations to the total number of predicted negative observations. The F1 score is the harmonic average of precision and sensitivity, which is generally useful for determining the performance of a model in terms of accuracy. The expressions for the evaluation metrics are presented as follows.
In the aforementioned equations, the number of true positives that the model predicts, i.e., the number of wildfire images predicted as wildfires and the number of true negatives that model the predicts, i.e., the number of non-fire images identified as non-fire, are denoted by true positive (TP) and true negative (TN), respectively. In addition, the number of false positives that the model predicts, i.e., the non-fire images predicted as wildfires, and the number of false negatives that model predicts, i.e., the wildfire images predicted as non-fire, are denoted as false positive (FP) and false negative (FN), respectively. These four types of data are defined using a confusion matrix in the binary classification. The overall performance-evaluation metrics were evaluated using the wildfire and non-wildfire testing sets.

Experimental Results
The following sections present the obtained results of the dataset balancing and wildfire detection models. The experiment environment was CentOS (Community enterprise operating system) Linux release 8.2.2004, which was constructed as an artificial intelligence server. The hardware configuration of the server consists of an Intel(R) Xeon(R) Gold 6240 central processing unit, 2.60 GHz, with an Nvidia Tesla V100 GPU, 32 GB memory. The experiences were conducted using the PyTorch deep learning framework [47] with Python language. The result and the example experiment code is available online at Github repository (https://github.com/pms5343/pms5343-WildfireDetection_by_DenseNet).

Dataset Augmentation Using GAN
To alleviate the data imbalance of the collected images, new wildfire images were generated using the CycleGAN as a data augmentation strategy. The objective of using the image-generation model is to convert non-wildfire images from a part of the collected data into wildfire images. A total of 1294 wildfire images (Domain A) and 2311 non-wildfire images (Domain B) from our original dataset were used.
As can be observed from Figure 4, the training was performed by increasing the number of epochs until there was a slight change in each loss, in order to improve the model. The generator loss was learned in the direction of increasing loss as the number of epochs increased because the objective of the generators was to create a fake image such that the discriminator could not determine whether the generated image was real or fake. Conversely, the discriminator losses were trained to reduce the loss, in order to distinguish between the generated and original images. Figure 4b shows that the cycle consistency loss added for the purpose of increasing the diversity of the generated image and the identity mapping loss added for the purpose of minimizing changes in the background of the generated image were also trained in the direction of decreasing exposure. After 650 epochs, there was no significant change in loss, and the training was thus terminated.

Experimental Results
The following sections present the obtained results of the dataset balancing and wildfire detection models. The experiment environment was CentOS (Community enterprise operating system) Linux release 8.2.2004, which was constructed as an artificial intelligence server. The hardware configuration of the server consists of an Intel(R) Xeon(R) Gold 6240 central processing unit, 2.60 GHz, with an Nvidia Tesla V100 GPU, 32 GB memory. The experiences were conducted using the PyTorch deep learning framework [47] with Python language. The result and the example experiment code is available online at Github repository (https://github.com/pms5343/pms5343-WildfireDetection_by_DenseNet)

Dataset Augmentation Using GAN
To alleviate the data imbalance of the collected images, new wildfire images were generated using the CycleGAN as a data augmentation strategy. The objective of using the image-generation model is to convert non-wildfire images from a part of the collected data into wildfire images. A total of 1294 wildfire images (Domain A) and 2311 non-wildfire images (Domain B) from our original dataset were used.
As can be observed from Figure 4, the training was performed by increasing the number of epochs until there was a slight change in each loss, in order to improve the model. The generator loss was learned in the direction of increasing loss as the number of epochs increased because the objective of the generators was to create a fake image such that the discriminator could not determine whether the generated image was real or fake. Conversely, the discriminator losses were trained to reduce the loss, in order to distinguish between the generated and original images. Figure  4b shows that the cycle consistency loss added for the purpose of increasing the diversity of the generated image and the identity mapping loss added for the purpose of minimizing changes in the background of the generated image were also trained in the direction of decreasing exposure. After 650 epochs, there was no significant change in loss, and the training was thus terminated.    Figure  5). The converted image was the image reconstructed by generator , and the result was not significantly different from that of domain B (➀→➁→➃ process in Figure 5). In addition, it was confirmed that there was no difference in the image converted by generator from domain B   Figure 5). The converted image was the image reconstructed by generator G AB , and the result was not significantly different from that of domain B ( 1 → 2 → 4 process in Figure 5). In addition, it was confirmed that there was no difference in the image converted by generator G AB from domain B ( 1 → 5 process in Figure 5). Conversely, the process was conducted in the same manner, and 1195 new 224 × 224-pixel fire images were created from domain B ( Figure 6) and included in the wildfire dataset.

Wildfire Detection
The wildfire detection was realized through the use of a DenseNet-based classification network model consisting of three dense blocks and two transition layers to identify the fire with 224 × 224-pixel-size image inputs. The architecture of the simple network is illustrated in Figure 7.

Wildfire Detection
The wildfire detection was realized through the use of a DenseNet-based classification network model consisting of three dense blocks and two transition layers to identify the fire with 224 × 224-pixel-size image inputs. The architecture of the simple network is illustrated in Figure 7.

Wildfire Detection
The wildfire detection was realized through the use of a DenseNet-based classification network model consisting of three dense blocks and two transition layers to identify the fire with 224 × 224-pixel-size image inputs. The architecture of the simple network is illustrated in Figure 7.
The dense block included a two-kernel filter. One filter was a 1 × 1 size convolution, which was used to decrease the number of input feature map channels, and the other was a 3 × 3 size convolution. After the dense block, the feature maps passed through a phase layer consisting of batch normalization, ReLU, 1 × 1 convergence, and 2 × 2 average pooling, which reduced the width and length of the feature map and the number of feature maps. Finally, after three dense block sessions, the result was drawn after the linear layer at the end, after passing through the global average pooling and softmax classifier sequentially, as in the case of a traditional CNN. Figure 6. Sample of the wildfire images converted from non-fire mountain images.

Wildfire Detection
The wildfire detection was realized through the use of a DenseNet-based classification network model consisting of three dense blocks and two transition layers to identify the fire with 224 × 224-pixel-size image inputs. The architecture of the simple network is illustrated in Figure 7.  The following section presents the results of the wildfire-detection performance obtained using the deep learning classification model based on DenseNet, as compared to the pre-trained model. Two results were derived for each model-one for train set A and the other for train set B.

Dataset Partition
The train and test set partition are specified in the following section. From the collected original dataset, several images were used to generate new images. The forest image used as the GAN domain was deleted from the dataset for the classification model; however, the wildfire domain was not eliminated because it was used as a reference; it was thus not deleted from the dataset. A total of horizontal flip and random crop (by 200 pixel) were used to expand the number of samples of the training sets. The train sets were divided into trainset A, consisting only of photographs taken, and trainset B, consisting of wildfire images generated by the GAN. Many precedent research showed that accuracy becomes lower when the number of data points is imbalanced [48]. In order to avoid the disadvantages of already well-known data imbalances, Train set A kept the data ratio between the two classes similar, even if the total number of data is set less than B. The test set only contains the original photograph and not the generated image. Twenty percent of the total collected original image dataset was selected as the test dataset. Partition of the datasets are shown in Table 1.

Model Training and Comparison of the Models
To demonstrate the performance of the proposed method, two train sets were used in the proposed model and well-known pre-trained models, ResNet-16 and VGG-50, for the performance evaluation. To improve the models' performance of each model, the learning rate and optimizer were tested. Ten values of the initial learning rate between 0.1 and 0.00001 were tested, while changing three representative optimizers-stochastic gradient descent (SGD), Adam [49], and PMSprop [50]. The number of epochs was fixed at 250, and batch size was fixed at 64. The best hyperparameter combination was found based on the average accuracy from the k-folds (k = 5) cross-validation process; presented in Table 2. The training process of each model using the selected hyperparameter combination is illustrated in Figure 8. The training accuracy curve obtained as the number of epochs increased is presented in Figure 8a. The accuracy of the six models increased most significantly between epochs 1 and 10 and then increased steadily until epoch 250.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 16 The training process of each model using the selected hyperparameter combination is illustrated in Figure 8. The training accuracy curve obtained as the number of epochs increased is presented in Figure 8a. The accuracy of the six models increased most significantly between epochs 1 and 10 and then increased steadily until epoch 250. The DenseNet-based proposed model demonstrated the highest training accuracy, with an approximate accuracy of 99% in the final learning approach, followed by ResNet-50 and then VGG-16. In addition, it was demonstrated that the accuracy performance of trainset B, which included generated images, was greater than that of trainset A for all three models. The training loss curve obtained as the number of epochs increased is presented in Figure 8b. The DenseNet and ResNet-16 losses rapidly decreased until epoch 20, whereas the loss of VGG-16 continued to decrease steadily. The training loss also exhibited a better performance for trainset B than that for trainset A in the case of both the initial and final losses.
The classifier models were evaluated based on the performance results, using the five metrics presented in Table 3. DenseNet yielded the best results in terms of all five metrics. Although the VGG-50 model exhibited a slightly lower accuracy, sensitivity, and F1-score, the results obtained on using trainset B were at a similar level as (or better than) those obtained with trainset A. For example, in the case of DenseNet, the accuracy increased from 96.734% to 98.271%, the precision increased from 96.573% to 99.380%, sensitivity increased from 96.573% to 96.976, specificity increased from 96.881% to 99.450%, and the F1-score increased from 96.573 to 98.163. The experimental results showed that a new image created by changing a normal image of a mountain into an image of a mountain on which a fire had occurred could maintain the performance of the CNN and also improve the model performance via the input of various data as training.  The DenseNet-based proposed model demonstrated the highest training accuracy, with an approximate accuracy of 99% in the final learning approach, followed by ResNet-50 and then VGG-16.
In addition, it was demonstrated that the accuracy performance of trainset B, which included generated images, was greater than that of trainset A for all three models. The training loss curve obtained as the number of epochs increased is presented in Figure 8b. The DenseNet and ResNet-16 losses rapidly decreased until epoch 20, whereas the loss of VGG-16 continued to decrease steadily. The training loss also exhibited a better performance for trainset B than that for trainset A in the case of both the initial and final losses.
The classifier models were evaluated based on the performance results, using the five metrics presented in Table 3. DenseNet yielded the best results in terms of all five metrics. Although the VGG-50 model exhibited a slightly lower accuracy, sensitivity, and F1-score, the results obtained on using trainset B were at a similar level as (or better than) those obtained with trainset A. For example, in the case of DenseNet, the accuracy increased from 96.734% to 98.271%, the precision increased from 96.573% to 99.380%, sensitivity increased from 96.573% to 96.976, specificity increased from 96.881% to 99.450%, and the F1-score increased from 96.573 to 98.163. The experimental results showed that a new image created by changing a normal image of a mountain into an image of a mountain on which a fire had occurred could maintain the performance of the CNN and also improve the model performance via the input of various data as training. The bold is the best result among other methods.

Influence of Data Augmentation Methods
In this section, proposed model performance is compared with and without using CycleGAN-based data augmentation, to verify the influence of the proposed method. Horizontal flip, random zoom (200 pixel), rotation (original images were rotated by 10 • and 350 • ), and random brightness (two values were selected arbitrarily from l min = 0.8 to l max = 1.2) methods were used in this section, as traditional data augmentation without GAN. The F1-score was obtained from a combination of training sets consisting of various augmentation methods.
Based on the experimental results, it could be seen that data augmentation from CycleGAN improved the accuracy of wildfire detection models. As can be seen from Table 4, the F1 score trained from data combination including the GAN method was higher by 1.154, 0.902, and 0.821, respectively, than the model trained from traditional method without GAN.

Visualization of the Contributed Features
In order to visualize the output result of the model that exhibits the best performance, a class activation map (CAM) [51] was used to determine the features of the image that were extracted to detect the wildfire. As can be observed from the example of the CAM results in Figure 9, the detection was made primarily based on the presence of smoke or flames in the image, and the elements used for the classification as wildfires were found even in the early stages of the fire, with no flame and little smoke.

Influence of Data Augmentation Methods
In this section, proposed model performance is compared with and without using CycleGAN-based data augmentation, to verify the influence of the proposed method. Horizontal flip, random zoom (200 pixel), rotation (original images were rotated by 10° and 350°), and random brightness (two values were selected arbitrarily from 0.8 to 1.2) methods were used in this section, as traditional data augmentation without GAN. The F1-score was obtained from a combination of training sets consisting of various augmentation methods. Based on the experimental results, it could be seen that data augmentation from CycleGAN improved the accuracy of wildfire detection models. As can be seen from Table 4, the F1 score trained from data combination including the GAN method was higher by 1.154, 0.902, and 0.821, respectively, than the model trained from traditional method without GAN. In order to visualize the output result of the model that exhibits the best performance, a class activation map (CAM) [51] was used to determine the features of the image that were extracted to detect the wildfire. As can be observed from the example of the CAM results in Figure 9, the detection was made primarily based on the presence of smoke or flames in the image, and the elements used for the classification as wildfires were found even in the early stages of the fire, with no flame and little smoke. The smoke in the part of the image that comprises the forest could be detected well, but the smoke in the part that comprises sky was not judged as a factor. It is hypothesized that this occurred because the model confused smoke with clouds or fog, and the smoke near the sky background could thus not be treated as a powerful factor for classifying the features. The smoke in the part of the image that comprises the forest could be detected well, but the smoke in the part that comprises sky was not judged as a factor. It is hypothesized that this occurred because the model confused smoke with clouds or fog, and the smoke near the sky background could thus not be treated as a powerful factor for classifying the features.

Model Application
To apply the learned model to on-site drones or surveillance cameras used to monitor forests, a method of application for higher-resolution images than the model input-image size (224 × 224) is required. There is also a method used for resizing a remote camera image to a lower resolution; however, the method proposed in this study comprises cropping high-resolution images at regular intervals-considering that surveillance cameras are generally used to observe large areas-to derive the result values for each image. Figures 10-12 present an example of a model application based on a drone-tested forest video [52]. This is a 1280 × 720-size drone video of a wildfire that occurred in Daejeon, Korea, in 2015. The white and jade green boxes denote the cropped areas of size 224 × 224 and are indicated in alternate colors for visualization convenience. The cropped images were cut to overlap each other at a certain interval, and 28 images per video frame were cut and input to the classification model. The text in the square box indicates the value derived from the softmax layer of the model, which was the final layer of the model (as it was trained using two classes; if the softmax value of the model was greater than 0.5, it was determined that the range comprised a fire, otherwise, it was determined that the range did not comprise a fire.) Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 16 Figure 10. Example of model application with softmax result for early wildfire (with error). Figure 10. Example of model application with softmax result for early wildfire (with error). Figure 10 presents the result of the application of the model to the image captured approximately 1 min after the wildfire occurred. The photos include not only the forest, but also parts of the nearby villages. The model detected the smoke generated in the forest and determined the location at which the fire had occurred. However, a greenhouse at the bottom right of the photo was falsely detected as a wildfire (0.829). It was suggested that this was a problem caused by the error of not properly taking into consideration specific images like cities, roads, and farmland, when training the initial model. This phenomenon was also found when applied to other sites.

Conclusions
With the development of remote camera sensing technology, many researchers attempted to improve existing wildfire-detection systems using CNN-based deep learning. In the damage-detection field, it is difficult to obtain a sufficient amount of the necessary data for training models; data imbalance or overfitting problems have thus caused the deterioration of the models' performance. To solve these problems, traditional image transformation methods such as image rotation were primarily used. A method of increasing the learning data was also adopted, wherein the flame image was artificially cut and pasted over a forest background. However, these two methods have their respective weaknesses-failure to increase the diversity of images and the necessity of manual labor, while providing unnatural images. The results of this study addressed this issue.
Our study had several advantages. First, a data augmentation method based on the same rules as those of artificial intelligence was used. It could also generate data while requiring minimal manpower. Using adversary, cycle-consistency, and identity losses, the optimized model could be used to produce various flame scenarios. The model could also be pre-trained for various wildfire scenarios in new environments, prior to the management of the forest; higher detection accuracy As can be seen from the class activation map in Figure 11, the model mistook the building feature. Although it could not be judged that this was falsely detected by all artifacts, it was confirmed that false positives might occur when more than half of the cropped images were not natural objects. Conversely, there were no false positives caused by natural objects, such as confusion of distinguishing between clouds and smoke. Figure 12 presents the result of the application of the model, approximately 10 min after the wildfire progression. As the fire was accompanied by flames after the fire had grown to some extent, the softmax layer provided a prediction with 100% probability, and the fire could be detected more easily than at the beginning of the fire. After applying the method of cropping without resizing the image, damaging the original image becomes unnecessary. As each cropped image is discriminated individually, the location of the fire can be tracked, while continuously obtaining real-time video footage, using a surveillance camera.

Conclusions
With the development of remote camera sensing technology, many researchers attempted to improve existing wildfire-detection systems using CNN-based deep learning. In the damage-detection field, it is difficult to obtain a sufficient amount of the necessary data for training models; data imbalance or overfitting problems have thus caused the deterioration of the models' performance. To solve these problems, traditional image transformation methods such as image rotation were primarily used. A method of increasing the learning data was also adopted, wherein the flame image was artificially cut and pasted over a forest background. However, these two methods have their respective weaknesses-failure to increase the diversity of images and the necessity of manual labor, while providing unnatural images. The results of this study addressed this issue.
Our study had several advantages. First, a data augmentation method based on the same rules as those of artificial intelligence was used. It could also generate data while requiring minimal manpower. Using adversary, cycle-consistency, and identity losses, the optimized model could be used to produce various flame scenarios. The model could also be pre-trained for various wildfire scenarios in new environments, prior to the management of the forest; higher detection accuracy could, thus, be expected. Second, we improved the detection accuracy by applying a dense block based on DenseNet in the model. The training history and test results showed that the proposed methods facilitated good model performance. Third, it was proposed that the model could be applied to high-resolution images to overcome the limitations that depend primarily on the use of small-sized images, as inputs to the model. This allows us to identify the approximate location of the wildfire from a wide range of photographs.
There were also several limitations to our study. The model training was conducted using a limited forest class. Although during the experiment conducted with drone images the model identified the cloud and wildfire areas well (the upper part of the cropped photos in Figure 11), the smoke in the part of the image comprising the sky was not captured as a feature when the test data was obtained using CAM. This could be adjusted by increasing the class range or by learning additional models using images that are likely to confuse the model. Another potential problem was that the model performance for detection of wildfires in the nighttime was not considered. This temporal variable was excluded from the study because the purpose of this study was to check the efficiency of the data augmentation from artificial intelligence method and the efficiency of dense block in wildfire detection models. However, these details should be considered in further studies because of the different characteristics in the nighttime detection and in the day-time detection.
By improving upon the achievements and limitations of this study, in a future study, we intend to implement a forest-fire detection model in the field, by installing real-time surveillance cameras in Gangwon-do, Korea, which is exposed to the risk of wildfires every year.
In addition, by developing a technology that calculates the location of fires using image processing to measure fire area distance from camera and displays it on a map user interface, we intend to provide disaster-response support information for decision makers to realize a quick response in the event of the occurrence of a wildfire.

Conflicts of Interest:
The authors declare no conflict of interest.