Early Fire Detection Based on Aerial 360-Degree Sensors, Deep Convolution Neural Networks and Exploitation of Fire Dynamic Textures

: The environmental challenges the world faces have never been greater or more complex. Global areas that are covered by forests and urban woodlands are threatened by large-scale forest ﬁres that have increased dramatically during the last decades in Europe and worldwide, in terms of both frequency and magnitude. To this end, rapid advances in remote sensing systems including ground-based, unmanned aerial vehicle-based and satellite-based systems have been adopted for e ﬀ ective forest ﬁre surveillance. In this paper, the recently introduced 360-degree sensor cameras are proposed for early ﬁre detection, making it possible to obtain unlimited ﬁeld of view captures which reduce the number of required sensors and the computational cost and make the systems more e ﬃ cient. More speciﬁcally, once optical 360-degree raw data are obtained using an RGB 360-degree camera mounted on an unmanned aerial vehicle, we convert the equirectangular projection format images to stereographic images. Then, two DeepLab V3 + networks are applied to perform ﬂame and smoke segmentation, respectively. Subsequently, a novel post-validation adaptive method is proposed exploiting the environmental appearance of each test image and reducing the false-positive rates. For evaluating the performance of the proposed system, a dataset, namely the “Fire detection 360-degree dataset”, consisting of 150 unlimited ﬁeld of view images that contain both synthetic and real ﬁre, was created. Experimental results demonstrate the great potential of the proposed system, which has achieved an F-score ﬁre detection rate equal to 94.6%, hence reducing the number of required sensors. This indicates that the proposed method could signiﬁcantly contribute to early ﬁre detection.


Introduction
The increasing occurrence of large-scale forest fires in modern society significantly impacts society and communities in terms of remarkable losses in human lives, infrastructures and properties [1]. Depending on burn severity, wildfires also impact environment and climate change, increasing the released quantity levels of CO 2 , soot and aerosols and damaging the forests that would remove CO 2 from the air. This results in extremely dry conditions, increasing the risk of wildfires. Furthermore, forest fires lead to runoff generation and to major changes to the soil infiltration [2]. To this end, computer-based early fire warning systems that incorporate remote sensing technologies have attracted used RGB images and re-tuned two pre-trained CNNs, based on VGG16 and ResNet50 architectures in order to develop a fire detection warning system. It is worth mentioning that for the training, they created an unbalanced dataset including more non-fire images. The main advantage of CNNs is the fact that they automatically extract and learn features [24,25]. Extending deep learning approaches, Barmpoutis et al. [26] combined the power of modern deep learning networks with multidimensional texture analysis based on higher-order LDSs (h-LDSs). The use of UAVs in remote sensing platforms for environmental monitoring has become increasingly popular. Chen et al. [27] used optical and infrared sensors and data in order to train a CNN for fire detection, while Zhao et al. [28] used a UAV equipped with global positioning systems (GPS) and deployed a 15-layered self-learning deep convolutional neural network architecture for fire feature extraction and classification. In Reference [20], satellite optical in combination with neural network architecture were used for forest fire detection.
Unlike the flame, optical cameras with good visibility can easily capture smoke in the first few minutes of a fire from a long distance (more than 50 km away) due to the fact that it moves upwards [4,29]. However, the reliable identification of smoke is also still an open research issue since there are many objects in nature that have similar characteristics with smoke. In addition, the large variations of smoke's appearance as well as environmental changes, including clouds and shadows, make the task of smoke detection even more difficult. More specifically, regarding the fire detection based on the identification of smoke occurrence, Filonenko et al. [30] presented a smoke detection method relying on shape features and color information of smoke regions for surveillance cameras. Their work improved the processing speed of both low-resolution and high-definition videos with the utilization of both CPU and General-Purpose Graphics Processing Unit (GPGPU). In another approach, based on contour analysis and edge regions with decreasing sub-band wavelet energies, Toreyin et al. [31] developed an algorithm for indoor or outdoor smoke detection. Furthermore, an algorithm which extracts optical flows and motion features for the discrimination between smoke regions and similar moving objects was proposed by Chunyu et al. [32]. Based on the note that smoke is a non-rigid dynamical object, Barmpoutis et al. [33] and Dimitropoulos et al. [29] efficiently modeled smoke regions using dynamic texture analysis. Sudhakar et al. [34] proposed a method for forest fire detection through UAVs using color identification and smoke motion analysis. Additionally, Yuan et al. [35] combined two color spaces and used an extended Kalman filter in order to perform smoke detection. More recently, Dai et al. [36] noticed that the smoke screen area changes rapidly in infrared smoke interference image sequences and they used infrared sensors and a super-pixel segmentation method in order to detect smoke. Subsequently, Wang et al. [37], using satellite data, resampled from Suomi National Polar-orbiting Partnership (Suomi NPP) the VIIRS day-night band pixel radiances to the footprint of M-band pixels in order to link visible energy fraction to the emission factors and to estimate the fire burning phase.
Additionally, many deep learning algorithms have been developed for smoke detection, providing a way of avoiding damages caused by fire. More specifically, Yin et al. [38] proposed a deep normalization and a CNN with 14 layers, while Tao et al. [39] adapted the architecture of AlexNet. Luo et al. [40] proposed a smoke detection algorithm combining motion characteristics of smoke and a CNN. As the big challenge of deep learning algorithms is the lack of a significant amount of labelled data, Xu et al. [41] implemented an early smoke detection method using synthesized smoke images for the training of the region-based Faster R-CNN, which provides satisfying results. Similar work can also be seen in Reference [42], where a method of synthesizing smoke images based on the expansion of the domain-invariant smoke feature was introduced.
Although several technologies based on different sensors have been proposed for fire surveillance covering different needs, the developed systems and detection algorithms are far from optimal in respect of the complexity of systems that use a collection of sensors, the higher computational time that is required and the accuracy rates. Moreover, ground-or UAV-based systems have limited field of view or they use specialized aerial hardware with complex standard protocols for data collection, and satellite data are not immediately available, limiting their potential eventual widespread use by local authorities, forest agencies and experts.
Thus, in this paper, taking into account the advantage of the cost of UAVs having significantly decreased and the fact that recently, many high-resolution omnidirectional cameras have been developed, we propose a new remote sensing system in order to achieve fire detection at an early stage. Compared to conventional cameras, omnidirectional cameras can cover a wider field with only a single camera, showing their great potential on surveillance applications [43]. Thus, given the urgent priority around the protection of the value and potential of forest ecosystems and global forest future, we aim to extend our preliminary work [44] and the already well-established fire detection systems [13,14,26,29,33] by proposing a novel 360-degree remote sensing system, incorporating the recently introduced 360-degree sensors, DeepLab v3+ models and a validation approach, taking into account the environmental appearance of the examined instant captures for early forest fire detection. More specifically, this paper makes two main contributions: • We propose a novel early fire detection remote sensing system using aerial 360-degree digital cameras in an operationally and time efficient manner, aiming to overcome the limited field of view of state-of-the-art systems and human-controlled specified data capturing.

•
A novel method is proposed for fire detection based on the extraction of stereographic projections and aiming to detect both flame and smoke through two deep convolutional neural networks. Specifically, we initially perform flame and smoke segmentation, identifying candidate fire regions through the use of two Deeplab V3+ models. Then, the detected regions are combined and validated taking into account the environmental appearance of the examined instant capture test image.
Finally, in order to evaluate the efficiency of the proposed methodology, we used seven datasets, including a created 360-degree dataset, namely the "Fire detection 360-degree dataset", that consists of 150 images of forest and urban areas that contain synthetic and real fire events.
The remainder of this paper is organized as follows: Section 2 describes the methodology proposed in this work. Section 3 describes the protocol followed for the experimental analysis and presents and discusses the experimental results. Finally, Section 4 summarizes the main conclusions of this work and points to future directions.

Materials and Methods
The framework of the proposed methodology is shown in Figure 1. In this figure, an aerial 360-degree remote sensing system is used in order to capture unlimited field of view images. Initially, two DeepLab v3+ networks are trained towards the aim of the identification of candidate fire (flame and smoke) regions ( Figure 1a). Then, once equirectangular raw images are acquired, they are converted to stereographic projection format images ( Figure 1b). Subsequently, these images are fed to the trained networks and the candidate detected fire regions are validated taking into account the environmental appearance of the examined test images (Figure 1c).
Specifically, exploiting the 360-degree sensors, in order to reduce false-positive fire alarms after the DeepLab V3+-based segmentation, the detected regions are divided into blocks and these are compared with parts of the test image that potentially cause false alarms. To this end, to filter out the possibly false-positives, multidimensional texture features are estimated, and a clustering approach is used in order to accurately identify the fire regions.

Data Description
For the evaluation of the proposed method, we used 7 datasets consisting of more than 4000 images that contain both flame and smoke events. More specifically, for the training of the proposed methodology, we used the Corsican Fire Database (CFDB) [45,46], the FireNet dataset [47], two datasets created by the Center for Wildfire Research [48,49] and two more publicly available smoke datasets [50,51]. Furthermore, we created a test 360-dataset ( Figure 2), namely the "Fire detection 360-degree dataset" [52], consisting of 150 360-degree equirectangular images of forest and urban areas that contain synthetic and real fire events. More specifically, in order to capture the equirectangular images, we used a 360-degree camera (sensor type: CMOS, sensor size 1/2.3") mounted on an UAV equipped with GPS. Subsequently, in order to create the synthetic fire events, we reproduced synthetic video frames [13] solving linear system equations and estimating the tensor generated at time k when the system is driven by random noise, V. For more details regarding dynamic texture synthesis, we refer the reader to reference [53]. Then, the synthesized data was adapted to the 360-degree images and the size of fires was suitably manually adjusted with regards to the distance and the assumed start time of fire. To the best of our knowledge, and as 360-degree digital camera sensors are a newly introduced type of camera, there is not any dataset consisting of 360-degree images that contain fire.

Stereographic Projection of Equirectangular Raw Projections
Nowadays, as 360-degree cameras become widely available, 360-degree data has become more popular. The 360-degree images bring unlimited view and can be represented into various projections. The most widely used of them are the equirectangular projection, stereographic projection, rectilinear projection and the cubemap projection [54]. Each type of 360-degree image projections has its own unique features.
Equirectangular projection is the most popular projection which maps a full 3D scene onto a 2D surface. The equirectangular raw images are captured by 360-degree cameras, with the aspect ratio of height (h): width (w) equal to 1:2. For the task of fire detection and in order to avoid false alarms due to the existence of distortions in the equirectangular images, we convert these images to stereographic projections ( Figure 3). Stereographic projection performs well at minimizing distortions [54], and such type of projection concentrates the target object into the center, making the response for the task of early fire detection easier, as the sky region, including clouds and sunlight reflections, are arranged in the peripheral area, which can easily be used for the rejection of false-positives ( Figure 3b). Furthermore, it is worth mentioning that if the parameter z f and UAV flight altitude remain constant, the object is always in the same position and has the same size proportion. The stereographic projection is shown in Figure 4 and the coordinates p (x , y ) are calculated as follows: where longitude and latitude are the polar coordinates: and the z f is a fitting parameter of the stereographic projection to equirectangular image dimensions.

Detection and Localization of Candidate Fire Regions
A popular DeepLab V3+ model [55] for semantic segmentation is employed in this work for the detection and localization of candidate fire regions. The DeepLab models have been extensively used in the task of semantic image segmentation and tested on large volumes of image datasets. This model provides a capability in learning multi-scale contextual features through atrous spatial pyramid pooling (ASPP) and uses a decoder module for the refinement of the segmentation results, especially along object boundaries. The applied model utilizes an ImageNet pretrained InceptionResNet v2 as the main feature extractor, which has been proven to be a robust network in the image recognition field. It is worth mentioning that the InceptionResNet v2 outperforms ResNet-101 on the Pascal VOC2012 validation set and it has been identified as a strong baseline [56]. In this study, two DeepLab V3+ networks are trained, and a modified loss function is used in order to adjust the model to better deal with candidate fire regions' detection task. The modified loss function of the DeepLab model is as below: where w p , r p and t p denote the weighting factors, the reference values and the predicted values at pixel p respectively, and N is the total number of pixels. Regarding the weighting factors, we set the w p = 3 when p is a fire pixel and otherwise, we set w p = 1. Thus, we aim to force the model to effectively detect the total number of fire events in each image. Finally, a median value in a 7-by-7 neighborhood was applied to the output of the DeepLab v3+ models.

Adaptive Post-Validation Scheme
For the validation of the identified candidate fire regions and aiming to decrease the high false alarm rates caused by natural objects which present similar characteristics with fire, we propose a new adaptive scheme through the division of the candidate regions into blocks with size n × n and the comparison of them with the training dataset and with specific regions of each test image. More specifically, we specified that around the horizon level, there is a number of natural objects like clouds, increased humidity and sunlight reflections that have similar appearance to flame or smoke. Therefore, we divided training images into blocks (n × n), and in the stereographic projection images, we divided a circular formation around the horizon level into blocks (n × n-we set n = 16 based on our previous research [26]) ( Figure 5). In this, we excluded the blocks that were identified as fire events from the fire segmentation part. Subsequently, we consider that candidate flame and smoke regions and the created blocks of the training dataset and of the region around the horizon level contain spatially varying visual patterns that exhibit certain stationarity properties [26]. These are modelled through the following linear dynamical system considering them as a multidimensional signal evolving in the spatial domain: where x ∈ R n is the hidden state process, y ∈ R d is the observed data, A ∈ R n×n is the transition matrix of the hidden state and C ∈ R d×n is the mapping matrix of the hidden state to the output of the system. The quantities w(t) and Bv(t) are the measurement and process noise respectively, with w(t) ∼ N(O, R) and Bv(t) ∼ N(0, Q), while y ∈ R d is the mean value of the observation data [26]. Then, assuming that the tuple M = (A, C) describe each block ( Figure 6), we estimate the finite observability matrix of  For the comparison of candidate fire regions' blocks with the training dataset and around the horizon-level blocks, the distance between them needs to be estimated. Furthermore, in order to solve the non-linearity of the problem, each candidate fire region, through its division into blocks, is considered as a cloud of points on the Grassmann manifold. To this end, we estimated k clusters of blocks of the training dataset and blocks around the horizon level of each test image by the Karcher mean algorithm [58,59] and we applied a k-NN (k-Nearest Neighbors) approach, with each candidate fire block being assigned to the nearest neighbor cluster. To address the k-NN problem, we apply the inverse exponential map between two points on the manifold, e.g., G 1 and G 2 , to map the first Grassmannian point to a tangent space of the second one, while preserving the distance between the points [58]. Thus, using the inverse exponential map, we move from the Grassmann manifold to the Euclidean space. Hence, the dissimilarity metric between G 1 and G 2 can be defined as follows: where the inverse exponential map, exp −1 , defines a vector in the tangent space of a manifold's point, i.e., the mapping of G 1 to the tangent space of G 2 , and F indicates the Frobenius norm. Then, the candidate flame and smoke regions are assigned to the class most common amongst its k nearest neighbors.

Experimental Results
Through the experimental evaluation, we want to demonstrate the superiority of the proposed system using 360-degree cameras. Furthermore, we want to show that the proposed methodology improves the detection accuracy of fires against other state-of-the-art approaches. To this end, initially, we carried out an ablation analysis and then we compared the proposed methodology with a number of widely used methods in order to show the great potential of the proposed methodology. Both analyses were conducted on the developed "Fire 360-degree dataset".
The code of the proposed structure was implemented in Matlab and all calculations were performed on a 12 GB GPU. Furthermore, it is worth mentioning that in order to have a fair comparison, we used the same training and testing set in our experiments.
The performance of early fire detection (e.g., detection of small fires) in 360-degree stereographic projection images was evaluated using two measures, namely, F-score and mean Intersection over Union (mIoU). The IoU, also known as the Jaccard Index, is an effective metric and is defined as the area of overlap between the detected fire region and the ground truth divided by the area of union between the detected fire region and the ground truth: F-score is the most used evaluation metric and is defined as the harmonic mean of precision and recall, taking into account the true-positives, the false-positives and the false-negatives detected regions, and it is defined as follows: IoU and F-score values' range is between 0 and 1, where these metric scores reach their best values at 1. It is worth mentioning that for the evaluation of the proposed methodology, we considered IoU as a pixel-based metric, while we considered F-score as a region-based metric. For example, for the calculation of F-score, a flame or smoke region is considered as a true-positive if it contains at least one pixel with actual flame or smoke, respectively. In our analysis, the mean IoU score and F-score were calculated for flame and smoke classes separately and then taking into account the flame and smoke of a fire region as one fire event. More specifically, in order to calculate the F-score, we count the positive detected fire events if only one of flame or smoke are detected.

Ablation Analysis
The presented fire detection method comprised two main components of flame/smoke segmentation and post-validation processing. Thus, in this section, we elaborate on a more detailed analysis in terms of the contribution of each feature to the fire identification process. The applied DeepLab v3+ model was used to identify the candidate fire regions in a semantic image segmentation task, while the adaptive post-validation scheme was used to reject the flame or smoke false-positive regions.
The results for early fire detection performance under different configurations, with or without the validation step, are shown in Table 1. It can be seen that the validation scheme improves the overall accuracy of fire detection. This is because the post-validation processing step rejects false-positives that have similar characteristics with flame and smoke. Furthermore, the proposed methodology works better for flame detection as the number of false-positive in smoke detection is higher, mainly due to the existence of clouds with similar characteristics with smoke. Finally, in the case of taking into account either the detection of flame or smoke in order to define an identified region as a fire event, the precision rate achieved is 90.3% and the recall rate achieved is 99.3%, while the F-score rate is 94.6%.

Comparison Evaluation
In Table 2, we present the evaluation results of the proposed framework in comparison to six state-of-the-art methods. This analysis revealed improved robustness of the proposed methodology compared to other methods. More specifically, the proposed system towards flame, smoke and either flame or smoke detection achieves F-score rates 94.8%, 93.9% and 94.6% respectively, achieving F-score rate improvements up to 11.4% in fire detection through flame (Faster R-CNN/Grassmannian VLAD encoding: 83.4%, proposed: 94.8%), up to 6.5% in fire detection through smoke (Faster R-CNN/Grassmannian VLAD encoding: 87.4%, proposed: 93.9%) and up to 7.2% in fire detection through either flame or smoke (Faster R-CNN/Grassmannian VLAD encoding: 87.4%, proposed: 94.6%). Similarly, it achieves mIoU rates of 78.2%, 70.4% and 77.1%, achieving mIoU rate improvements up to 3.8% in fire detection through flame (Faster R-CNN/Grassmannian VLAD encoding: 74.4%, proposed: 78.2%), up to 0.5% in fire detection through smoke (Faster R-CNN/Grassmannian VLAD encoding: 69.9%, proposed: 70.4%) and up to 3.3% in fire detection through either flame or smoke (Faster R-CNN/Grassmannian VLAD encoding: 73.8%, proposed: 77.1%). It is worth mentioning that the method that combines Faster R-CNN and Grassmannian VLAD (Vector of Locally Aggregated Descriptors) encoding [26] produces the best fire detection results after the proposed methodology. It could be explained by the fact that the fire regions that are detected from the Faster R-CNN are validated through their projection to a Grassmannian space and their representation as a cloud of points on the manifold. In addition to the method that combines Faster R-CNN and Grassmannian VLAD encoding, the proposed methodology is compared against: SSD [60], FireNet [47], YOLO v3 [61], Faster R-CNN [62] and U-Net [63] architectures. Towards the aim of fire detection through either flame or smoke detection, these methods achieve F-score rates of 67.6%, 71.1%, 78.8%, 71.5% and 71.9% and mIoU rates of 59.8%, 61.4%, 69.5%, 65% and 67.4%, respectively. Also, it is notable that fire detection through flame achieves better rates than through smoke detection. This is due to the fact that in the images of the created dataset, there are more objects that have similar characteristics to smoke. So, the number of false-negatives is higher in smoke detection. Furthermore, as depicted in Table 2, the approaches that take into account the dynamics of flame and smoke improve the fire detection rates. Figures 7 and 8 illustrate qualitative results of the proposed method in images that contain synthetic fire and real fire events, respectively. Figure 7 provides examples of fire detection in different environments. In the case of a sub-urban environment (Figure 7a), the proposed systems accurately detect the smoke (Figure 7c). Furthermore, in the case of a single fire event in a forest (Figure 7d) that consists from both flame and smoke, the fire is accurately detected but only through flame detection. On the other hand, in the case of multi-fire events (Figure 7g), the system detects the fire events either through flame or smoke detection and erroneously detects some clouds as a smoke event. The results of Figure 8 show that in good visibility conditions, the proposed system detects the fire event in a suburban environment even in the early stages of the fire. Finally, it is worth mentioning that fire detection through either flame or smoke increases true-positive rates at the maximum level, but at the same time, increases false-positive rates as the rates of non-overlapping false-positive flame and smoke detected regions are combined.

Discussion
Remote sensors play an important role in vision-based surveillance systems. Thus, in combination with computer vision methods and the improvement of computation and storage ability, surveillance systems have been developed, achieving accurate object recognition and localization. The proposed novel remote sensing system for fire detection demonstrates that early fire detection can be accurately achieved using 360-degree cameras mounted on an UAV.
In this study, we used visible-range data captured by 360-degree optical cameras, aiming to propose a flexible and cost-effective remote sensing system for a fire detection system [64]. The use of 360-degree cameras and stereographic projections has the advantage that it significantly reduces the number of redundant sensors and overlapped information. A wide field of view is always advantageous in surveillance since it can collect richer information by reducing the number of blinding areas to zero. Compared to conventional sensors (optical, infrared and microwave), omnidirectional cameras can cover a 360-degree field of view with only a single camera reducing the number of required sensors. For example, if the field of view of a conventional camera is 90 degrees, then in order to capture the 360-degree or panoramic scene, at least 4 sensors will be required. It is worth mentioning that many UAVs take 24 single shots in order to reconstruct 360-degree images accurately [65]. In the past, rotating the cameras or increasing the number of available cameras were often used for enlarging the field viewing region. However, rotating cameras cannot record one 360-degree scene at the same time. In addition, using multiple sensors, the location of cameras needs to be delicately designed for fully exploring their field of view and reducing the overlapping viewing areas. Subsequently, in contrast to satellite-based systems, the proposed system offers better data availability but not global surveillance.
Furthermore, another advantage of the 360-degree cameras and the use of stereographic projections is the fact that the examined scene is always located in the center of the image. This makes the detection of the horizon line and of regions, that tend to cause plenty of false alarms due to the sunlight, clouds and other smoke-colored objects, easier. To this end, we proposed the adaptive post-validation scheme, with the aim of decreasing the high false alarm rates caused by natural objects which present similar characteristics to fire. This technique could easily be used in data retrieved from different remote sensing technologies and sensors in the case that the horizon line is known in advance.
There are, however, some limitations in the proposed system. UAVs are affected by weather conditions including wind or other extreme events. In addition, the flight time of UAVs is limited, so a network of UAVs is required in order for us to achieve continuous capture. Otherwise, a 360-degree camera can be placed at the top of a watchtower, achieving similar results with the proposed system and methodology. Moreover, the proposed system is not capable of wildfire detection at night. However, the proposed remote sensing system can be integrated with 360-degree data from other types of sensors, e.g., infrared or microwave, expanding the fire detection capabilities.
In the future, we aim to install an autonomous operation network of 360-degree sensors, performing periodic flights, for the surveillance of wide areas and to apply the proposed method for the detection of real fire events. Thus, we aim to extend the system, estimating the spread rate of fire and assisting in evacuation plans in case of hazardous events. With the frame sequences extracted from the videos and motion analysis techniques, we aim to calculate the spread rate and even predict the movement directions of fire, which will make great contributions to fire escape in real-world situations.

Conclusions
Nowadays, omnidirectional and 360-degree cameras are widely available and have become more and more popular, hence they can be used for surveillance, enabling wider field of view and avoiding the need to use many sensors or install complicated systems. Thus, in this paper, we proposed an intelligent system integrating an UAV and a 360-degree camera in order to achieve forest fire surveillance in a way that was never considered before. The proposed methodology combines deep convolutional neural networks and exploits dynamics of flame and smoke. From the experimental results, we conclude that the post-validation scheme and exploitation of fire dynamic texture significantly improves the detection rates, reducing the false-positives. This, along with stereographic projections, enables us to discard false-positive regions that have similar characteristics with regions that are more likely to contain similar to false-positive features of each examined image.