Obtaining Infrared Thermal Camera Sensor Calibration Data for Implementation in FireBot Autonomous Fire Protection Robot System

: Fire protection is one of the activities that follow the development of technology in real-time and implements all the innovations of a detection system. This paper presents a unique solution for the development of an autonomous robot for the prevention, detection, and extinguishing of ﬁres by studying the problem of choosing the optimal early-detection sensor in the infrared part of the spectrum, which characterizes the highest level of excitation in the state of prevention. The robot is equipped with several different sensors arranged in a hierarchical structure. Thermal detection has proven to be a signiﬁcant investment that can be adapted to the different complexity of the objects to be protected, taking into account image processing and modular implementation of the required sensors. To this end, it is necessary to calibrate systems for different thermal cameras. The calibration procedure on seven cameras and two pyrometers resulted in data required for input-data correction and anomaly detection. The results of the analysis conﬁrmed that devices of a higher price range have a lower deviation from the reference value compared to low-cost technical solutions. At the same time, results were observed indicating malfunction of more expensive devices, whose data exceed the speciﬁed nominal accuracy. Thanks to the performed calibration procedure and the obtained results, the observed problem is not an obstacle for implementation in an autonomous robotic system and can be used to correct the input data required for computer analysis.


Introduction
Each year, fire causes a significant number of fatalities as well as significant material losses. As a result, society now places a high importance on fire prevention and early detection, and thus, it is also the main area of study and development for many scientists and different sectors. Rapid technological advancement, particularly in the areas of robotics, embedded systems, and machine learning, is having an impact on how the field of firefighting is developing. This field is becoming more effective and safe, and consequently, is minimizing the danger for people and property, as well as firefighters. There are some existing robotic solutions for firefighting in the area of forest firefighting [1]. There is a number of robotic systems for indoor and outdoor firefighting that can be found in [2][3][4]. Over time, there have been several attempts to develop an autonomous robot for firefighting that uses advanced methods for navigation and mapping as well as different image-processing techniques for fire prevention and detection [5][6][7]. More recent work proposes using unmanned aerial vehicles (UAVs) for both indoor and outdoor firefighting [8][9][10]. Due to the many challenges involved, such as real-time prevention and detection, complex mapping and navigation inside narrow and often dark areas, changes of environment, obstacle avoidance, and balancing the processing power with energy consumption, none of the of video signals for drones. They are globally recognized for developing their First Person View (FPV) goggles for drones. They are in charge of electronics, sensors, hardware, and remote control of the robot via FPV goggles. The third partner is the Faculty of Electrical Engineering, Computer Science, and Information Technology of Osijek. The faculty is in charge of developing software for autonomous navigation, fire prevention, and fire detection using various sensors as well as visual and thermal cameras.
As mentioned in the introduction, the main task is the research and development of a complex system for FPDE called FireBot. It will be constructed as a highly efficient yet also highly performant system. This will be achieved by balancing components of all subsystems to achieve the best power/performance ratio. When combined with an existing state-of-the-art algorithm that will be further modified and implemented, the robot will be capable of autonomous navigation in a previously mapped indoor space and fire prevention and detection in real time. Furthermore, an attached fire-extinguishing device will be administered in case a fire is detected and will be capable of extinguishing a small fire. In addition, FireBot will be capable of reporting anomalies to the authorities if an anomaly is detected. If FireBot suspects an anomaly is detected but it is not completely confident, a remote user will have the possibility to take control of FireBot and to see what the robot sees, both with a visual and infrared thermal camera to confirm or to deny anomaly detection. To further extend FireBot's capabilities in fire prevention, various sensors for different types of gases will be implemented, along with a microphone and sound-processing algorithm for detecting other anomalous events, e.g., gas and water leaks and other types of noises. More detailed descriptions of FireBot's architecture as well as technical information will be presented in the following subsections.

FireBot Architecture
FireBot presents a new and innovative concept for FPDE. It utilizes state-of-the-art technologies that enable autonomous navigation, including avoiding all obstacles, video surveillance, and FPDE. It has a LiDAR and a depth camera, as well as the IR and ultrasonic sensors used in RTAB, a cutting-edge SLAM (simultaneous localization and mapping) algorithm for autonomous mapping and navigation [22], and a modern convolutional neural network (CNN) paired with infrared thermal (IRT) and RGB cameras for fire and temperature-anomaly detection. The final version of FireBot is currently in development, and the final model is presented in Figure 1.  FireBot has various other sensors for monitoring the surroundings and detection of potential anomalies (various gas sensors, microphone for detecting water and gas leaks, Appl. Sci. 2022, 12, 11657 4 of 28 intrusion detection, etc.). There are three types of fire-extinguishing devices attached to FireBot (foam, powder, and CO 2 ) for extinguishing different types of fires. Paired with a rotating mechanical hand, on top of which is an electronic nozzle, FireBot can precisely direct its nozzle to the source of fire for fast and efficient extinguishing.
The FireBot system architecture consists of the three main logical components for indoor FPDE: an embedded system module for sensor and actuator management (ESM-SAM), a system module for SLAM and navigation (SMSN), and a system module for fire prevention and detection (SMFPD). Another important module is a system module for fire extinguishing (SMFE), which consists of three different fire-extinguishing devices (powder, foam, and CO 2 ) along with the movable arm and electronic nozzle. Furthermore, a charging station is provided separately from the robot. The complete proposed system-architecture diagram is provided in Figure 2, and a detailed description of FireBot's architecture and all its subsystems can be found in [23]. Appl. Sci. 2022, 12, 11657 4 of 28 FireBot has various other sensors for monitoring the surroundings and detection of potential anomalies (various gas sensors, microphone for detecting water and gas leaks, intrusion detection, etc.). There are three types of fire-extinguishing devices attached to FireBot (foam, powder, and CO2) for extinguishing different types of fires. Paired with a rotating mechanical hand, on top of which is an electronic nozzle, FireBot can precisely direct its nozzle to the source of fire for fast and efficient extinguishing.
The FireBot system architecture consists of the three main logical components for indoor FPDE: an embedded system module for sensor and actuator management (ESMSAM), a system module for SLAM and navigation (SMSN), and a system module for fire prevention and detection (SMFPD). Another important module is a system module for fire extinguishing (SMFE), which consists of three different fire-extinguishing devices (powder, foam, and CO2) along with the movable arm and electronic nozzle. Furthermore, a charging station is provided separately from the robot. The complete proposed systemarchitecture diagram is provided in Figure 2, and a detailed description of FireBot's architecture and all its subsystems can be found in [23].

Hardware Specifications of an Experimental Platform
In this subsection, a brief overview of FireBot's hardware components will be presented. At the starting point of developing our complex system, the commercially available robotic platform TurtleBot2 (Willow Garage, Menlo Park, California, USA) was used for the implementation of the SLAM algorithm and for the first tests. TurtleBot2 is a lowcost robot platform with open-source software, and it is a platform based on a differential drive, equipped with a gyroscope, three front bumper sensors, and three cliff sensors. Additionally, RPLidar A2 was added along with the Orbbec Astra RGB-D camera (Orbbec 3D Technology International, Inc., Troy, MI, USA). Since TurtleBot2 does not have a central processing unit, a portable computer (Intel i7-10610U, 32 GB DDR4, Ubuntu 18.04) was used on top of a robot. After an initial custom robot specification was defined, a custom robotic platform was built that is presented in Figure 3. Hardware specifications of the first custom prototype are presented in Table 1 along with the approximate prices of publicly available components. Labor hours for software development, hardware assembly, and indoor-developed components are not included in the final price.

Hardware Specifications of an Experimental Platform
In this subsection, a brief overview of FireBot's hardware components will be presented. At the starting point of developing our complex system, the commercially available robotic platform TurtleBot2 (Willow Garage, Menlo Park, CA, USA) was used for the implementation of the SLAM algorithm and for the first tests. TurtleBot2 is a low-cost robot platform with open-source software, and it is a platform based on a differential drive, equipped with a gyroscope, three front bumper sensors, and three cliff sensors. Additionally, RPLidar A2 was added along with the Orbbec Astra RGB-D camera (Orbbec 3D Technology International, Inc., Troy, MI, USA). Since TurtleBot2 does not have a central processing unit, a portable computer (Intel i7-10610U, 32 GB DDR4, Ubuntu 18.04) was used on top of a robot. After an initial custom robot specification was defined, a custom robotic platform was built that is presented in Figure 3. Hardware specifications of the first custom prototype are presented in Table 1 along with the approximate prices of publicly available components. Labor hours for software development, hardware assembly, and indoor-developed components are not included in the final price.  After the initial tests in different environments, several problems appeared. First of all, a skid-steering drivetrain with rubber wheels was inefficient, as it was very dependent on the surface type. Slight changes in ground textures resulted in different behaviors of the robot. That problem was partially solved by switching to mecanum wheels, but at the same time, it introduced another problem. Due to the nature of mecanum wheels, a lot of vibrations were introduced, which significantly reduced video quality from the visual camera. For that reason, the final version of FireBot will be equipped with a differential drivetrain with two stronger motors with rubber wheels and two caster wheels for stability. In addition, full suspension will be added to the chassis to further improve stability. A second problem was that the Raspberry Pi 4 did not have enough processing power for both navigation and image processing. That problem will be solved by using an industrial PC (i7-10750H, 32 GB DDR4, 512 GB SSD, Ubuntu 20.04) for navigation and an NVidia Jetson Xavier AGX 64 GB for image processing. In addition, sensor-data management will be upgraded from Arduino Mega to dedicated custom electronics with a CAN Bus interface. The lead batteries that were used are very heavy, and their capacity proved to be insufficient for the required autonomy. Therefore, the battery capacity will be increased,  After the initial tests in different environments, several problems appeared. First of all, a skid-steering drivetrain with rubber wheels was inefficient, as it was very dependent on the surface type. Slight changes in ground textures resulted in different behaviors of the robot. That problem was partially solved by switching to mecanum wheels, but at the same time, it introduced another problem. Due to the nature of mecanum wheels, a lot of vibrations were introduced, which significantly reduced video quality from the visual camera. For that reason, the final version of FireBot will be equipped with a differential drivetrain with two stronger motors with rubber wheels and two caster wheels for stability. In addition, full suspension will be added to the chassis to further improve stability. A second problem was that the Raspberry Pi 4 did not have enough processing power for both navigation and image processing. That problem will be solved by using an industrial PC (i7-10750H, 32 GB DDR4, 512 GB SSD, Ubuntu 20.04) for navigation and an NVidia Jetson Xavier AGX 64 GB for image processing. In addition, sensor-data management will be upgraded from Arduino Mega to dedicated custom electronics with a CAN Bus interface. The lead batteries that were used are very heavy, and their capacity proved to be insufficient for the required autonomy. Therefore, the battery capacity will be increased, the battery technology will change to LiFePO 4 , and the total weight of the battery pack will decrease. The hardware specifications of the final version of FireBot along with the Appl. Sci. 2022, 12, 11657 6 of 28 approximate prices of publicly available components are presented in Table 2. Labor hours for software development, hardware assembly, and indoor-developed components are not included in the final price.

Fire Prevention and Fire Detection
When it comes to fire prevention and fire detection, there are many existing solutions. Some of them use different types of sensors, which include temperature sensors [11], smoke and gas sensors [12], or flame detectors [13], which are becoming obsolete due to their limitations. In [14], the authors presented an advanced method for fire detection using chemical sensors. The advantage of that kind of approach lies in the fact that chemical symptoms of fire appear even before the smoke or flame, so it presents an excellent method for fire prevention.
Using CNNs in conjunction with raw RGB picture-processing and computer-vision algorithms has been a popular fire-detection technology during the past decade. The two pre-trained networks utilized in [15] were VGG16 and ResNet-50, which the authors improved by adding more fully connected layers. They evaluated these models using a dataset that was unbalanced and contained fewer fire images, simulating the environment of the actual world. The results indicated higher accuracy in comparison to the base models, but the additional layers also resulted in longer training times. Two steps make up the unique fire-detection approach that the authors suggested in [16]. The first step is to use a faster R-CNN network to find and locate potential fire zones. In the second phase, validation of the fire zones that have been spotted is carried out by analyzing spatial attributes using linear dynamic systems. Finally, VLAD encoding is used, which greatly enhances efficiency and decreases detection errors, to differentiate between actual fire and fire-colored objects. The proposed solution maintained a high true-positive rate while drastically reducing the false-positive rate since they employed many fire-colored images for training. Based on CNN's cutting-edge object-identification models, the authors in [17] suggested four revolutionary fire-detection techniques: Faster R-CNN, R-FCN, SSD, and YOLOv3. Faster R-CNN and R-FCN are examples of two-stage object-detection networks since they comprise both a classification network and a region-proposal network. In the initial step, CNN uses input images to produce region suggestions. The second stage determines whether the fire is present in the suggested regions using region-based object-detection CNN. One-stage networks (SSD and YOLOv3) were proposed since two-stage networks have slower detection speeds. They use a single forward CNN to forecast the object class. All suggested solutions outperformed other non-CNN-based approaches in tests using two separate datasets. There is frequently a need for an accurate, quick, and portable fire-detection solution that can be used on hardware with constrained computational power and that is also reasonably priced. Since GoogleNet is better suited for implementation on FPGA and other memory-constrained hardware while maintaining good classification accuracy, the authors presented a low-cost fire-detection CNN architecture based on it in [18]. Two primary-convolution layers, four max-pooling levels, one average-pooling layer, and seven inception layers make up the 100 layers of the suggested model. They also employed a transfer-learning strategy in this work. In terms of accuracy, the experiments produced great results when compared to more robust models like AlexNet.
Another approach to consider when it comes to fire prevention and fire detection is using infrared thermal (IRT) cameras. The benefit of this method is its capacity for early fire detection and prevention because it can identify anomalies before other symptoms manifest (smoke or smell), which can be seen in Figure 4.
termines whether the fire is present in the suggested regions using regiondetection CNN. One-stage networks (SSD and YOLOv3) were proposed sin networks have slower detection speeds. They use a single forward CNN to object class. All suggested solutions outperformed other non-CNN-based a tests using two separate datasets. There is frequently a need for an accurat portable fire-detection solution that can be used on hardware with constrain tional power and that is also reasonably priced. Since GoogleNet is better s plementation on FPGA and other memory-constrained hardware while main classification accuracy, the authors presented a low-cost fire-detection CNN based on it in [18]. Two primary-convolution layers, four max-pooling levels, pooling layer, and seven inception layers make up the 100 layers of the sugg They also employed a transfer-learning strategy in this work. In terms of experiments produced great results when compared to more robust models Another approach to consider when it comes to fire prevention and fir using infrared thermal (IRT) cameras. The benefit of this method is its capa fire detection and prevention because it can identify anomalies before oth manifest (smoke or smell), which can be seen in Figure 4. For instance, an electrical installation that is overheating because of an faulty connection may catch fire eventually. It is feasible to recognize an inc perature and respond appropriately by using thermal imagery. In [19], sc CNN and IRT images of rotating machinery to extract fault features. Then, identification was carried out using the Softmax Regression classifier. T method had outstanding performance in detecting and recognizing variou bearings and rotors during the nine different types of faults that were used system's effectiveness. IRT images were used in [20] to find electrical-faci detection methods, Fast Region-Based CNN (Fast R-CNN), Faster R-CNN, were employed. The detected objects were examined using a thermal-intens ysis (TIAA). The results were most accurate when using a Faster R-CNN. the authors of [21], transforming images in the HSV color space produces For instance, an electrical installation that is overheating because of an overload or a faulty connection may catch fire eventually. It is feasible to recognize an increase in temperature and respond appropriately by using thermal imagery. In [19], scientists used CNN and IRT images of rotating machinery to extract fault features. Then, faultpattern identification was carried out using the Softmax Regression classifier. The proposed method had outstanding performance in detecting and recognizing various defects on bearings and rotors during the nine different types of faults that were used to assess the system's effectiveness. IRT images were used in [20] to find electrical-facility flaws. As detection methods, Fast Region-Based CNN (Fast R-CNN), Faster R-CNN, and YOLOv3 were employed. The detected objects were examined using a thermal-intensity-area analysis (TIAA). The results were most accurate when using a Faster R-CNN. According to the authors of [21], transforming images in the HSV color space produces better results than other color schemes such as grayscale. They employed the Otsu, Prewitt, and Roberts techniques for thresholding. A loose phase connection, an imbalanced phase connection, an overloaded phase connection, and a solar panel with a fault were used as test cases for their hot-region-detection approach. Their suggested approach successfully identified overheating on the device.
Although every kind of approach brings an advantage for a specific use-case, none of the abovementioned approaches or other commercially available solutions are ideal for FireBot. Due to the specific use-case, FireBot should navigate in an indoor closed area that is changing dynamically and at the same time be able to detect fire using a visual camera. It should also be able to prevent potential fire outbreaks by using an infrared thermal camera along with various other sensors and extinguish fire if detected. To be able to do all that autonomously and efficiently, a new solution that is a fusion of multiple approaches, specially designed for FireBot's use-case, is being developed. Some of the considered approaches are described in the following subsection, and the final solution is still in development.

Image Classification
When it comes to fire detection using a visual camera, the main goal was to create a dataset that can be used to train, validate, and test custom or existing convolutional neural network (CNN) architectures that can detect fire in input images. The first task was to create a dataset for training. Due to the lack of proper datasets, we had to create a new one. That was accomplished by obtaining publicly available datasets in addition to images scraped from the Internet. The final dataset used for training consists of 50,972 non-fire images and 7359 fire images. The validation dataset consists of 3000 non-fire images and 3000 fire images, and the test dataset consists of 2000 and 2000 images of non-fire and fire, respectively. The evaluated CNNs include various implementations of ResNets, MobileNets, and EfficientNets. ResNets represent the oldest evaluated type of CNNs, whereas MobileNets and EfficientNets are more recent. The best-performing networks out of the four crafted network tiers on the stated dataset include ResNet-101 [25], MobileNetV3-Large variant [26], and EfficientNet-B3 [27]. All evaluated models were trained for 120 epochs, which indicated the number of passes of the entire training dataset that the evaluated model completed. The complete testing methodology, together with training parameters, is available in [28]. The main evaluation metrics of focus were recall and F1-score calculated from the entries in the confusion matrix. Recall is a measure of how many of the positive cases (fire) the classifier correctly predicted as a fire over all the positive cases in the dataset. The recall represents the most important metric because we do not want to misclassify a fire event as a non-fire event, which can lead to extensive or total property damage and has a very high chance of claiming human lives. The F1-score metric was chosen to have a balanced overview of the overall performance of a given model, as it is defined as the harmonic mean of precision and recall. The metrics for all four tiers (L1-L4) of the evaluated networks for the first 60 epochs can be seen in Figure 5. On the designed dataset, a similar conclusion can be derived when looking at all four tiers. EfficientNet outperformed MobileNet, which was close behind. In comparison to other examined networks, Base ResNet exhibited its aging characteristics with poor recall and a low F1-score measure. The similarities between MobileNet and EfficientNet are due to the fact that EfficientNet scales the network width, depth, and input-image resolution in addition to including numerous components from the complete MobileNet network stack. As can be expected, the best results were achieved with network models evaluated in the L4 tier. In Figure 6, the number of parameters, number of operations, and size on disk are presented to visually compare the complexity and required computational power for each tested model. It is shown that all ResNet models had the lowest performance but also the largest number of parameters as well as size on disk, which makes them least suitable for embedded usage.
In       In Figure 7b,e there are some examples in which the network wrongly classified images. Figure 7b is a very challenging image since the flame is very small and barely visible. In the same image, the Figure 7a network correctly classified the image as a fire, but that is due to the sun, which is very bright and is fire colored, thus leading to the network mistaking it as fire. That is confirmed in image in Figure 7b in which the sun is covered but small flames remain, and the network wrongly changed the prediction to non-fire. Figure 7d shows an example in which the fire is overexposed whereas other parts of the image are darkened, posing a challenge to the evaluated model, which predicted correctly but with a lower level of confidence. Because of the stated overexposure, fire morphology cannot be observed properly. A similar problem occurs when observing various light bodies (light bulbs, LEDs, neon lights, etc.). For that reason, many such images were included in the dataset. Figure 7e is another example in which the network made a wrong prediction due to the light, which is quite bright in a very dark area. The images in Figure 7c,f are examples in which the network made a correct prediction with high confidence. To further increase the network's performance, the dataset must be additionally extended, including more real-world images, and the used network models should also be enhanced to better suit our use case. In addition, some of the newer network architectures will be evaluated on the created dataset.

Semantic Segmentation
To enhance the precision of fire detection even further and to localize the fire in input images, two approaches were used-semantic segmentation and object detection. The goal of semantic segmentation is to cluster the parts of an image together that belong to the same object class. In our case, two target classes are fire and smoke. Semantic segmentation does not only expect labels and bounding box parameters as output. The result is a high-quality image with each pixel assigned to a certain class, often the same size as the input image. Therefore, it is a classification of images at the pixel level. There are two types of image segmentation: semantic segmentation, which classifies each pixel with a label, and instance segmentation, which classifies each pixel and differentiates each object instance. For the purpose of image segmentation, U-Net architecture [29] was used. U-Net architecture was originally designed for medical-image segmentation but over the time it was adapted for different purposes. It is one of the earliest deep-learning segmentation models, and several GAN variations, including the Pix2Pix generator, use the U-Net  Figure 7b is a very challenging image since the flame is very small and barely visible. In the same image, the Figure 7a network correctly classified the image as a fire, but that is due to the sun, which is very bright and is fire colored, thus leading to the network mistaking it as fire. That is confirmed in image in Figure 7b in which the sun is covered but small flames remain, and the network wrongly changed the prediction to non-fire. Figure 7d shows an example in which the fire is overexposed whereas other parts of the image are darkened, posing a challenge to the evaluated model, which predicted correctly but with a lower level of confidence. Because of the stated overexposure, fire morphology cannot be observed properly. A similar problem occurs when observing various light bodies (light bulbs, LEDs, neon lights, etc.). For that reason, many such images were included in the dataset. Figure 7e is another example in which the network made a wrong prediction due to the light, which is quite bright in a very dark area. The images in Figure 7c,f are examples in which the network made a correct prediction with high confidence. To further increase the network's performance, the dataset must be additionally extended, including more real-world images, and the used network models should also be enhanced to better suit our use case. In addition, some of the newer network architectures will be evaluated on the created dataset.

Semantic Segmentation
To enhance the precision of fire detection even further and to localize the fire in input images, two approaches were used-semantic segmentation and object detection. The goal of semantic segmentation is to cluster the parts of an image together that belong to the same object class. In our case, two target classes are fire and smoke. Semantic segmentation does not only expect labels and bounding box parameters as output. The result is a high-quality image with each pixel assigned to a certain class, often the same size as the input image. Therefore, it is a classification of images at the pixel level. There are two types of image segmentation: semantic segmentation, which classifies each pixel with a label, and instance segmentation, which classifies each pixel and differentiates each object instance. For the purpose of image segmentation, U-Net architecture [29] was used. U-Net architecture was originally designed for medical-image segmentation but over the time it was adapted for different purposes. It is one of the earliest deep-learning segmentation models, and several GAN variations, including the Pix2Pix generator, use the U-Net design. The model architecture is fairly simple. It consists of an encoder (for downsampling) and a decoder (for upsampling) with skip connections. Skip connections are used to concatenate the encoder feature map with the decoder, which helps the backward flow of gradients for improved training. Two main metrics when evaluating image segmentation are the dice coefficient and Intersection-Over-Union (IoU). IoU is the area of the overlap between the predicted segmentation and the ground truth divided by the area of the union between those two measures. It ranges from 0 to 1 (0 to 100%). The dice coefficient is calculated exactly the same as the F1-score, two times the area of overlap divided by the total number of pixels in both images. The preliminary results, dice score, and IoU of the evaluated U-Net network are presented in Figure 8. From the obtained score, it can be seen that the dice score and IoU score were both slightly above the 0.6, which is considered a good score but leaves a lot of room for progress both in improving the dataset and upgrading the trained network model. An example of segmented images from the training is presented in Figure 9, which also confirms the abovementioned statement that the segmentation network already achieved great results, but additional improvements will make it even better. Appl. Sci. 2022, 12, 11657 11 of 28 design. The model architecture is fairly simple. It consists of an encoder (for downsampling) and a decoder (for upsampling) with skip connections. Skip connections are used to concatenate the encoder feature map with the decoder, which helps the backward flow of gradients for improved training. Two main metrics when evaluating image segmentation are the dice coefficient and Intersection-Over-Union (IoU). IoU is the area of the overlap between the predicted segmentation and the ground truth divided by the area of the union between those two measures. It ranges from 0 to 1 (0 to 100%). The dice coefficient is calculated exactly the same as the F1-score, two times the area of overlap divided by the total number of pixels in both images. The preliminary results, dice score, and IoU of the evaluated U-Net network are presented in Figure 8. From the obtained score, it can be seen that the dice score and IoU score were both slightly above the 0.6, which is considered a good score but leaves a lot of room for progress both in improving the dataset and upgrading the trained network model. An example of segmented images from the training is presented in Figure 9, which also confirms the abovementioned statement that the segmentation network already achieved great results, but additional improvements will make it even better.

Object Detection
As we mentioned in the previous subsection, the second approach is object detection using the YOLOv5 network model. YOLOv5 is an upgraded network model of the You Only Look Once network model presented in 2016 in [30], YOLOv2 presented in [31], YOLOv3 presented in [32], and YOLOv4 presented in [33]. It is an extremely fast, state-of-  design. The model architecture is fairly simple. It consists of an encoder (for downsampling) and a decoder (for upsampling) with skip connections. Skip connections are used to concatenate the encoder feature map with the decoder, which helps the backward flow of gradients for improved training. Two main metrics when evaluating image segmentation are the dice coefficient and Intersection-Over-Union (IoU). IoU is the area of the overlap between the predicted segmentation and the ground truth divided by the area of the union between those two measures. It ranges from 0 to 1 (0 to 100%). The dice coefficient is calculated exactly the same as the F1-score, two times the area of overlap divided by the total number of pixels in both images. The preliminary results, dice score, and IoU of the evaluated U-Net network are presented in Figure 8. From the obtained score, it can be seen that the dice score and IoU score were both slightly above the 0.6, which is considered a good score but leaves a lot of room for progress both in improving the dataset and upgrading the trained network model. An example of segmented images from the training is presented in Figure 9, which also confirms the abovementioned statement that the segmentation network already achieved great results, but additional improvements will make it even better.

Object Detection
As we mentioned in the previous subsection, the second approach is object detection using the YOLOv5 network model. YOLOv5 is an upgraded network model of the You Only Look Once network model presented in 2016 in [30], YOLOv2 presented in [31], YOLOv3 presented in [32], and YOLOv4 presented in [33]. It is an extremely fast, state-of-

Object Detection
As we mentioned in the previous subsection, the second approach is object detection using the YOLOv5 network model. YOLOv5 is an upgraded network model of the You Only Look Once network model presented in 2016 in [30], YOLOv2 presented in [31], YOLOv3 presented in [32], and YOLOv4 presented in [33]. It is an extremely fast, state-ofthe-art network model for object detection. Its main task is to detect instances of objects of a certain class within an image. Object-detection methods can be categorized into two main types: one-stage methods, which prioritize inference speed, and two-stage methods, which prioritize accuracy. YOLOv5 is a one-stage method. It works by dividing images into a grid system; then, each grid is responsible for detecting objects within itself. Outputs are predicted bounding boxes and probabilities for each component. The architecture of the YOLO model consists of three parts. The first part is the backbone. It is used to extract key features from an input image. The neck is the second part. It is used to create pyramid parts. Feature pyramids aid models in generalizing when it comes to object scaling. It also helps in the identification of the same object in various sizes. The last part is the head. It is responsible for the final detection step. It uses anchor boxes to construct final output vectors with class probabilities and bounding boxes. Figure 10 depicts the YOLOv5 loss function as a combination of classification loss and localization loss. If an object is detected, the classification loss at each cell is the squared error of the class conditional probabilities for each class. The localization loss measures the errors in the predicted boundary-box locations and sizes. From the presented loss, the YOLOv5 model achieved excellent results but still leaves room for additional progress. An example of a trained model prediction on our dataset is presented in Figure 11. the-art network model for object detection. Its main task is to detect instances of objects o a certain class within an image. Object-detection methods can be categorized into two main types: one-stage methods, which prioritize inference speed, and two-stage methods which prioritize accuracy. YOLOv5 is a one-stage method. It works by dividing image into a grid system; then, each grid is responsible for detecting objects within itself. Output are predicted bounding boxes and probabilities for each component. The architecture o the YOLO model consists of three parts. The first part is the backbone. It is used to extrac key features from an input image. The neck is the second part. It is used to create pyramid parts. Feature pyramids aid models in generalizing when it comes to object scaling. It also helps in the identification of the same object in various sizes. The last part is the head. I is responsible for the final detection step. It uses anchor boxes to construct final outpu vectors with class probabilities and bounding boxes. Figure 10 depicts the YOLOv5 los function as a combination of classification loss and localization loss. If an object is de tected, the classification loss at each cell is the squared error of the class conditional prob abilities for each class. The localization loss measures the errors in the predicted bound ary-box locations and sizes. From the presented loss, the YOLOv5 model achieved excel lent results but still leaves room for additional progress. An example of a trained mode prediction on our dataset is presented in Figure 11.  After the image segmentation and object detection, in addition to knowing whethe fire is present in an image, we also know the exact location of the fire in that image. Thi is critical for directing the fire-extinguishing nozzle to the proper location for quick and effective fire extinguishment. the-art network model for object detection. Its main task is to detect instances of objects of a certain class within an image. Object-detection methods can be categorized into two main types: one-stage methods, which prioritize inference speed, and two-stage methods, which prioritize accuracy. YOLOv5 is a one-stage method. It works by dividing images into a grid system; then, each grid is responsible for detecting objects within itself. Outputs are predicted bounding boxes and probabilities for each component. The architecture of the YOLO model consists of three parts. The first part is the backbone. It is used to extract key features from an input image. The neck is the second part. It is used to create pyramid parts. Feature pyramids aid models in generalizing when it comes to object scaling. It also helps in the identification of the same object in various sizes. The last part is the head. It is responsible for the final detection step. It uses anchor boxes to construct final output vectors with class probabilities and bounding boxes. Figure 10 depicts the YOLOv5 loss function as a combination of classification loss and localization loss. If an object is detected, the classification loss at each cell is the squared error of the class conditional probabilities for each class. The localization loss measures the errors in the predicted boundary-box locations and sizes. From the presented loss, the YOLOv5 model achieved excellent results but still leaves room for additional progress. An example of a trained model prediction on our dataset is presented in Figure 11.  After the image segmentation and object detection, in addition to knowing whether fire is present in an image, we also know the exact location of the fire in that image. This is critical for directing the fire-extinguishing nozzle to the proper location for quick and effective fire extinguishment. After the image segmentation and object detection, in addition to knowing whether fire is present in an image, we also know the exact location of the fire in that image. This is critical for directing the fire-extinguishing nozzle to the proper location for quick and effective fire extinguishment.
To be able to train an image-segmentation model or object-detection model, a dataset with a ground truth mask of fire and smoke for every image is required. Because of the lack of that kind of dataset, we started building a new one and began manually annotating every image. For that purpose, a new annotation tool was developed that will be further explained in the next subsection

FireSense Image Annotation Tool
For image annotation, there are many tools available. The drawing of polygons, rectangles, points, and lines is one of the fundamental features that every annotation tool must have. Additionally, they must be able to export annotated data in multiple formats that are used by different deep-learning models for computer vision. Data from various annotation tools are exported in a variety of formats. The two most popular formats are Pascal VOC and Microsoft COCO. The main difference between COCO and Pascal VOC is the file format. COCO is stored in JSON format [34], whereas Pascal VOC is stored in XML format [35]. There are also some additional distinctions in the way the annotation data are presented in the stored format. LabelMe [36] is the most-often-used tool for data annotation and was developed by MIT researchers. It is a Web-application tool for image annotation. Although the online application is closed to new users, a copy of the Python-written program has been created and is now in use. This application exports the data in XML format. The biggest disadvantage of the LabelMe annotation tool is that the dataset must be distributed to multiple computers in order for numerous users to annotate it. Another popular annotation tool is the free and open-source browser-based Computer Vision Annotation Tool (CVAT) [37]. It is used for digital image and video annotation. The tasks of image classification, object detection, and picture segmentation can be accomplished with CVAT. Annotators can cooperate on a particular project because projects can be local or shared online. One of the advantages of CVAT is that it can use the TensorFlow API to automatically annotate images. The client side of the CVAT application is limited to working only on the specified browser engine because it only supports Google Chrome and chromium-based browsers. Make Sense [38] is another well-liked tool for annotating data. Make Sense is a GitHub-hosted, free-to-use image-annotation online application. Users can export image-annotation data in a variety of formats.
For the purpose of creating a quality dataset, a new image-annotation tool called FireSense was developed. It is based on the previously mentioned open-source Make Sense tool but with several improvements. The dataset is located on a centralized server, and batches of randomly selected images are sent to the annotators. Administrators can easily manage the images in the dataset. Additionally, double-blind image annotation with a 90% intersection over union (IOU) overlap is enabled with random batches. Annotations are manually checked if the IOU is less than 90%. It has a backend (on the server side) and a frontend (on the client side), both of which are hosted on a dedicated server. Clients connect using a Web browser. It also allows users to export image-annotation data in a variety of formats, which include CSV, YOLO, VOC XML, VGG JSON, and COCO. When a user logs in, they immediately receive a batch of images. The large area in the middle shows the editor viewport, where polygons can be added or modified. In the left sidebar, all images from the batch are listed. The top-right sidebar shows all the labels that users annotated. There, the user can change the polygon's class, toggle the visibility, or remove the polygon. In the bottom-right sidebar, the user can change other information about the image, including type of space, time of day, whether there is an artificial light source, the size of the flame, etc. The FireSense interface is presented in Figure 12. To create an annotated-image dataset, some of the images from the dataset mentioned in Section 2.2.1. were used along with some additional fire images acquired from several different firefighting departments, and by using the abovementioned image-annotation tool, we created a dataset of 11,164 manually annotated images. That dataset includes a total of 48,662 flame annotations, 3350 smoke annotations, and 1230 non-fire images. Out of all annotated images, 2356 are in a warehouse, 476 are inside an office, and others are outside or unknown. Half of the images were taken during daytime and a quarter during night; the rest could not be determined. This dataset is currently being used along with the image-segmentation network model for fire detection and fire localization. In addition, this dataset is constantly being increased to enhance the model confidence even further in determining the presence of fire and smoke pixels. More information about the FireSense (Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Osijek, Croatia) annotation tool and created dataset can be found in [39].

Temperature Anomaly Detection
When it comes to fire prevention by determining potential temperature anomalies, an infrared thermal camera is used. Infrared thermal imaging is the best way to capture the temperature-range characteristic of the Earth's electromagnetic spectrum. Figure 13 clearly shows the maximum of the visible part of the spectrum and the detection range of the infrared thermal imager. When objects were heated to a temperature higher than 525 °C, in addition to thermal radiation, the emission of light detected by a classical camera began to appear. However, the fire situation had progressed already.  To create an annotated-image dataset, some of the images from the dataset mentioned in Section 2.2.1. were used along with some additional fire images acquired from several different firefighting departments, and by using the abovementioned image-annotation tool, we created a dataset of 11,164 manually annotated images. That dataset includes a total of 48,662 flame annotations, 3350 smoke annotations, and 1230 non-fire images. Out of all annotated images, 2356 are in a warehouse, 476 are inside an office, and others are outside or unknown. Half of the images were taken during daytime and a quarter during night; the rest could not be determined. This dataset is currently being used along with the image-segmentation network model for fire detection and fire localization. In addition, this dataset is constantly being increased to enhance the model confidence even further in determining the presence of fire and smoke pixels. More information about the FireSense (Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Osijek, Croatia) annotation tool and created dataset can be found in [39].

Temperature Anomaly Detection
When it comes to fire prevention by determining potential temperature anomalies, an infrared thermal camera is used. Infrared thermal imaging is the best way to capture the temperature-range characteristic of the Earth's electromagnetic spectrum. Figure 13 clearly shows the maximum of the visible part of the spectrum and the detection range of the infrared thermal imager. When objects were heated to a temperature higher than 525 • C, in addition to thermal radiation, the emission of light detected by a classical camera began to appear. However, the fire situation had progressed already. To create an annotated-image dataset, some of the images from the dataset mentioned in Section 2.2.1. were used along with some additional fire images acquired from several different firefighting departments, and by using the abovementioned image-annotation tool, we created a dataset of 11,164 manually annotated images. That dataset includes a total of 48,662 flame annotations, 3350 smoke annotations, and 1230 non-fire images. Out of all annotated images, 2356 are in a warehouse, 476 are inside an office, and others are outside or unknown. Half of the images were taken during daytime and a quarter during night; the rest could not be determined. This dataset is currently being used along with the image-segmentation network model for fire detection and fire localization. In addition, this dataset is constantly being increased to enhance the model confidence even further in determining the presence of fire and smoke pixels. More information about the FireSense (Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Osijek, Croatia) annotation tool and created dataset can be found in [39].

Temperature Anomaly Detection
When it comes to fire prevention by determining potential temperature anomalies, an infrared thermal camera is used. Infrared thermal imaging is the best way to capture the temperature-range characteristic of the Earth's electromagnetic spectrum. Figure 13 clearly shows the maximum of the visible part of the spectrum and the detection range of the infrared thermal imager. When objects were heated to a temperature higher than 525 °C, in addition to thermal radiation, the emission of light detected by a classical camera began to appear. However, the fire situation had progressed already.  Figure 13. Radiation of the sun and earth by wavelengths. Figure 13. Radiation of the sun and earth by wavelengths.
As mentioned earlier, FireBot will be equipped with an infrared thermal-imaging camera that will provide complete spatial radiometric data of the observed scene to detect the potential development of a fire in time before it manifests visually. A thermal-imaging camera does not measure temperature but registers radiation in the infrared part of the spectrum, as shown in Figure 14. Depending on the recording parameters set, the most important of which are the emissivity of the object, the amount of radiation reflected from the surroundings, and, at greater distances, the transmission of the atmosphere, accurate radiometric data can be obtained.
Appl. Sci. 2022, 12, 11657 15 of 28 As mentioned earlier, FireBot will be equipped with an infrared thermal-imaging camera that will provide complete spatial radiometric data of the observed scene to detect the potential development of a fire in time before it manifests visually. A thermal-imaging camera does not measure temperature but registers radiation in the infrared part of the spectrum, as shown in Figure 14. Depending on the recording parameters set, the most important of which are the emissivity of the object, the amount of radiation reflected from the surroundings, and, at greater distances, the transmission of the atmosphere, accurate radiometric data can be obtained. The intensity of infrared radiation from the surface of the observed scene is measured by the camera's sensor and interpreted as the temperature of every pixel of a thermogram. The thermogram is used to better understand the scene and the potential temperature anomaly when inspecting visually, whereas the radiometric data are required for developing an efficient algorithm for automatic anomaly detection. If an infrared thermal camera without full spatial radiometric data is used, radiometric data can be accurately estimated from a thermogram and detected temperature boundaries, as described in [40]. The idea for anomaly detection is as follows: FireBot will have a predefined route for patrolling, during which it will actively search for a fire using a visual camera. On that route, it will have predefined points of interest that present a potential danger or contain some expensive equipment that needs to be monitored regularly-for example, computers, electric machines, electrical cabinets, server racks, etc. Every time FireBot comes to a certain point of interest, it will take an image of that scene; detect the hotspots; calculate the area of each hotspot, average temperature, and maximum temperature of every hotspot; and compare that data to the data gathered from all previous passes. If the detected temperatures are increased in comparison to the previous states, the number of hotspots is increased, or the area of a hotspot is increased beyond the predetermined threshold, it will be a trigger for an alarm, and an automatic warning message will be sent to the supervisor along with the location of that point of interest. At that moment, a remote supervisor can take control of FireBot and visually inspect that point by using an infrared thermal camera. If a more detailed inspection is required, a remote user can easily adjust the temperature scale or color map of a thermogram to better understand the scene, as described in [40], and call maintenance service if required. An example of anomaly detection is presented in Figure 15, and the flowchart of the temperature anomaly-detection algorithm is presented in Figure 16. The intensity of infrared radiation from the surface of the observed scene is measured by the camera's sensor and interpreted as the temperature of every pixel of a thermogram. The thermogram is used to better understand the scene and the potential temperature anomaly when inspecting visually, whereas the radiometric data are required for developing an efficient algorithm for automatic anomaly detection. If an infrared thermal camera without full spatial radiometric data is used, radiometric data can be accurately estimated from a thermogram and detected temperature boundaries, as described in [40]. The idea for anomaly detection is as follows: FireBot will have a predefined route for patrolling, during which it will actively search for a fire using a visual camera. On that route, it will have predefined points of interest that present a potential danger or contain some expensive equipment that needs to be monitored regularly-for example, computers, electric machines, electrical cabinets, server racks, etc. Every time FireBot comes to a certain point of interest, it will take an image of that scene; detect the hotspots; calculate the area of each hotspot, average temperature, and maximum temperature of every hotspot; and compare that data to the data gathered from all previous passes. If the detected temperatures are increased in comparison to the previous states, the number of hotspots is increased, or the area of a hotspot is increased beyond the predetermined threshold, it will be a trigger for an alarm, and an automatic warning message will be sent to the supervisor along with the location of that point of interest. At that moment, a remote supervisor can take control of FireBot and visually inspect that point by using an infrared thermal camera. If a more detailed inspection is required, a remote user can easily adjust the temperature scale or color map of a thermogram to better understand the scene, as described in [40], and call maintenance service if required. An example of anomaly detection is presented in Figure 15, and the flowchart of the temperature anomaly-detection algorithm is presented in Figure 16.

Radiometric Data Estimation
As mentioned in the previous subsection, the approach for temperature-anomaly detection requires full spatial radiometric data to be able to successfully analyze the observed scene and detect the presence of an anomaly. If the full radiometric data are not available, as is the case with some lesser-known camera manufacturers, it was shown in [40] that by using image-processing techniques, it is possible to accurately estimate radiometric data from a thermogram and detected temperature boundaries (minimum and maximum detected temperature) of the observed scene. Estimated data can be further used in developing an anomaly-detection algorithm.
To estimate full radiometric data, we used a white-hot IRT image as input. Pixel values of the input image were then transferred to the array, in which every element represented one pixel of the input image with the intensity in a range of 0 to 255. Temperature boundaries were detected by the IRT camera, and they will be referred to as detLTB and detHTB. By using simple linear conversion, radiometric data for every pixel from the input image was calculated with the following Equation (1): where cIP represents the value (intensity) of the current image pixel, maxIP represents the maximum possible value (intensity) of image pixels (in this case 255), detLTB represents the detected lowest temperature boundary, and detHTB represents the detected highest temperature boundary. After all the values were calculated and rounded to two decimal places, they were stored in a csv file, which represents the estimated radiometric data. Before using estimated data for developing an algorithm for anomaly detection, it is required to calculate the estimation error in order to confirm the estimated data are correct. For error estimation, a Flir E60bx handheld IRT camera that provides full radiometric data was used. The data provided by the camera were considered ground truth and compared to the estimated data. First, dataDiffMat was calculated by subtracting the ground-truth matrix from an FLIR camera with the estimated data using Equation (2).
After, the minimum-and maximum-estimation errors were extracted using Equations (3) and (4), respectively. minEstErr = min(abs(dataDi f f Mat)) (3) maxEstErr = max(abs(dataDi f f Mat)) ( An average-estimation error was calculated as a sum of all absolute errors divided by the number of elements in a matrix, as shown in Equation (5).
The minimum calculated estimation error was 0.000 • C, the maximum was 0.077 • C, and the average error was 0.037 • C, which are great results, and the estimated results can be further used for developing an algorithm for anomaly detection.

Generating Images from Radiometric Data
The data estimated in the previous subsection can be used to generate a thermogram not only in the original temperature range/scale but also in any desired temperature range to better understand the observed scene. First, the desired temperature range of the thermogram is defined as definedLowTempBoundary (defLTB) and definedHighTempBoundary (defHTB). Then, all values/temperatures above/below the defined threshold are set to low/high boundary. Finally, image pixel intensities are calculated according to Equation (6).
where imgPix represents pixel intensities of a generated image (0-255), estRadData represents the estimated radiometric-data matrix, defLTB represents the defined low-temperature boundary, defHTB represents the defined high-temperature boundary, minPixVal represents the minimum intensity of a pixel in an output image (0), and maxPixVal represents the maximum intensity of a pixel in an output image (255). In addition to the temperature range, the color map used to generate the image can also be changed to better understand the scene. Several examples of generated images are presented in Figure 17.
where imgPix represents pixel intensities of a generated image (0-255), estRadData represents the estimated radiometric-data matrix, defLTB represents the defined low-temperature boundary, defHTB represents the defined high-temperature boundary, minPixVal represents the minimum intensity of a pixel in an output image (0), and maxPixVal represents the maximum intensity of a pixel in an output image (255). In addition to the temperature range, the color map used to generate the image can also be changed to better understand the scene. Several examples of generated images are presented in Figure 17.

Influence of Emissivity on Radiometric Data
A developed system for proactive detection in data analysis relies on the accuracy of radiometric information. In the thermal part of the spectrum, the choice of the correct emissivity of the material plays the most important role for the determination of the real temperature value. In addition, the acquisition angle should not be neglected. Figure 18 shows the characteristic curve of the emissivity for color and good radiators in red and blue for shiny metals, which have a low emissivity. The green color indicates the angle of exposure recommended for thermographic analysis. The reason for this can be clearly seen in the behavior of the curves, which significantly change the values of the coefficients after 30 • . An analysis of the sensitivity of the measurement result to different emissivity levels at a reflected ambient temperature of 13.5 • C shown in Figure 18 with the red temperature values was performed on the wall sample presented in Figure 15. We concluded that within the recommended acquisition angle of 120 • the measurement error was less than 0.1 • C. A further deviation of the recording angle led to an error of 5 • C. It should be noted that the specified values for each thermogram differed considerably. There were no metal objects in the image, so an analysis of the blue emissivity characteristic of metals could not be performed. A detailed description of the directional spectral emissivity can be found in [41].

Influence of Emissivity on Radiometric Data
A developed system for proactive detection in data analysis relies on the accuracy of radiometric information. In the thermal part of the spectrum, the choice of the correct emissivity of the material plays the most important role for the determination of the real temperature value. In addition, the acquisition angle should not be neglected. Figure 18 shows the characteristic curve of the emissivity for color and good radiators in red and blue for shiny metals, which have a low emissivity. The green color indicates the angle of exposure recommended for thermographic analysis. The reason for this can be clearly seen in the behavior of the curves, which significantly change the values of the coefficients after 30°. An analysis of the sensitivity of the measurement result to different emissivity levels at a reflected ambient temperature of 13.5 °C shown in Figure 18 with the red temperature values was performed on the wall sample presented in Figure 15. We concluded that within the recommended acquisition angle of 120° the measurement error was less than 0.1 °C. A further deviation of the recording angle led to an error of 5 °C. It should be noted that the specified values for each thermogram differed considerably. There were no metal objects in the image, so an analysis of the blue emissivity characteristic of metals could not be performed. A detailed description of the directional spectral emissivity can be found in [41].

Basic Calibration When Using Several Different Cameras
Infrared thermal-imaging cameras represent a significant financial item in the overall project cost, and their price increases with resolution. Although the cost of a single pixel decreases [412], as seen in Figure 19, as resolution increases, the number of pixels increases, and so does the total cost of the camera sensor. To optimize the final product, it is planned to install a thermal-imaging camera, which is the optimum in terms of complexity of the space where the FireBot will patrol. The main technical parameters are field of view (FOV), spatial resolution (IFOV), thermal sensitivity/NETD, and accuracy in (°C) or (%) of the reading. Comparing the data from Figure 18 to the typical camera-accuracy value of ±2 °C or ±2%, we can see that the camera accuracy significantly affects the radiometric data.

Basic Calibration When Using Several Different Cameras
Infrared thermal-imaging cameras represent a significant financial item in the overall project cost, and their price increases with resolution. Although the cost of a single pixel decreases [42], as seen in Figure 19, as resolution increases, the number of pixels increases, and so does the total cost of the camera sensor. To optimize the final product, it is planned to install a thermal-imaging camera, which is the optimum in terms of complexity of the space where the FireBot will patrol. The main technical parameters are field of view (FOV), spatial resolution (IFOV), thermal sensitivity/NETD, and accuracy in ( • C) or (%) of the reading. Comparing the data from Figure 18 to the typical camera-accuracy value of ±2 • C or ±2%, we can see that the camera accuracy significantly affects the radiometric data.  Figure 19. Single-sensor pixel-cost evolution, source [42].
The choice of camera affects the behavior of the system in which it is located. When developing an algorithm for temperature-anomaly detection using an IRT camera and the radiometric data provided by the camera, we consider that the data (temperatures) are correct or within the range of manufacturer-defined error. To ensure that these data are correct, all IRT cameras must be calibrated. Calibration is a process in which the infrared radiation that a camera detects is correlated with known temperatures. All cameras on the market are calibrated to factory specifications, but over time and due to the aging of electronics, a calibration shift is caused, and consequently, cameras produce inaccurate temperature measurements. Unfortunately, the owner of the camera cannot recalibrate the camera on their own but can determine the camera's deviation for a certain measurement point with the help of a body with a known temperature. For that purpose, a blackbody source is used. A blackbody is a physical body with high emissivity, which means that it radiates and absorbs almost all electromagnetic radiation. The blackbody has a predefined range of temperatures it can achieve and, as such, can be used as a reference point for determining the camera's accuracy. When talking about the manufacturer's calibration, the correction table is directly saved in the camera's firmware and automatically corrects camera readings, whereas in our case, the correction table was used to manually (or within our algorithm) correct detected temperatures. In this subsection, we used seven IRT cameras, presented in Table 3, and two blackbodies, presented in Table 4, to determine whether their accuracy was within the manufacturer's margin of error and whether they can be used as such on FireBot for fast and efficient temperature-anomaly detection. All seven cameras were calibrated at seven temperature points, ranging from 27 °C to 45 °C with a 3 °C step in between. This range was chosen because it represents the temperature interval in the range of increased latent hazards, such as passive consumption of electrical equipment when overheated and triggering fire hazards. Increasingly higher temperature values automatically become an area of interest and further analysis due to their height and temperature difference relative to the environment. All measurements were repeated 10 times and the results were expressed as average values. For this reason, the values in the tables are expressed in two decimal places. It is important to note that raw camera data can be downloaded to three decimal places. Figure 19. Single-sensor pixel-cost evolution, source [42].
The choice of camera affects the behavior of the system in which it is located. When developing an algorithm for temperature-anomaly detection using an IRT camera and the radiometric data provided by the camera, we consider that the data (temperatures) are correct or within the range of manufacturer-defined error. To ensure that these data are correct, all IRT cameras must be calibrated. Calibration is a process in which the infrared radiation that a camera detects is correlated with known temperatures. All cameras on the market are calibrated to factory specifications, but over time and due to the aging of electronics, a calibration shift is caused, and consequently, cameras produce inaccurate temperature measurements. Unfortunately, the owner of the camera cannot recalibrate the camera on their own but can determine the camera's deviation for a certain measurement point with the help of a body with a known temperature. For that purpose, a blackbody source is used. A blackbody is a physical body with high emissivity, which means that it radiates and absorbs almost all electromagnetic radiation. The blackbody has a predefined range of temperatures it can achieve and, as such, can be used as a reference point for determining the camera's accuracy. When talking about the manufacturer's calibration, the correction table is directly saved in the camera's firmware and automatically corrects camera readings, whereas in our case, the correction table was used to manually (or within our algorithm) correct detected temperatures. In this subsection, we used seven IRT cameras, presented in Table 3, and two blackbodies, presented in Table 4, to determine whether their accuracy was within the manufacturer's margin of error and whether they can be used as such on FireBot for fast and efficient temperature-anomaly detection. All seven cameras were calibrated at seven temperature points, ranging from 27 • C to 45 • C with a 3 • C step in between. This range was chosen because it represents the temperature interval in the range of increased latent hazards, such as passive consumption of electrical equipment when overheated and triggering fire hazards. Increasingly higher temperature values automatically become an area of interest and further analysis due to their height and temperature difference relative to the environment. All measurements were repeated 10 times and the results were expressed as average values. For this reason, the values in the tables are expressed in two decimal places. It is important to note that raw camera data can be downloaded to three decimal places.  It is difficult to compare the characteristics of the cameras listed in the Table 3 from the point of view of image analysis if the parameters FOI and IFOV are not taken into account. In Figure 20 a comparison can be seen of the size of the thermogram when the cameras were 1 m away from the object to be imaged. In the middle, the field of view of the unit sensor of the IFOV sensor can be seen and compared. From this, we can conclude that in order to achieve nominal accuracy, it is important that the IFOV completely cover the area where the analyzed object is located. Furthermore, we can see that for certain applications where it is important to see a large area in a small space, the resolution itself is not as important as the optics of the camera, i.e., the angle that the camera captures.   It is difficult to compare the characteristics of the cameras listed in the Table 3 from the point of view of image analysis if the parameters FOI and IFOV are not taken into account. In Figure 20 a comparison can be seen of the size of the thermogram when the cameras were 1 m away from the object to be imaged. In the middle, the field of view of the unit sensor of the IFOV sensor can be seen and compared. From this, we can conclude that in order to achieve nominal accuracy, it is important that the IFOV completely cover the area where the analyzed object is located. Furthermore, we can see that for certain applications where it is important to see a large area in a small space, the resolution itself is not as important as the optics of the camera, i.e., the angle that the camera captures. In addition to the infrared thermal imagers listed in Table 3, two pyrometers were also used in the calibration. The reason for this was to investigate the possibility of using a pyrometer as an additional support for a targeted temperature measurement at any point exposed to a temperature outside the temperature range of thermal imagers due to fire. Compared to thermal-imaging cameras, pyrometers are inexpensive, widely available, and easy to integrate into a system. The first was the Raytek RAYMX2D with a In addition to the infrared thermal imagers listed in Table 3, two pyrometers were also used in the calibration. The reason for this was to investigate the possibility of using a pyrometer as an additional support for a targeted temperature measurement at any point exposed to a temperature outside the temperature range of thermal imagers due to fire. Compared to thermal-imaging cameras, pyrometers are inexpensive, widely available, and easy to integrate into a system. The first was the Raytek RAYMX2D with a temperature range of −30 • C to 900 • C and a measurement error of ±1 • C or ±1% of the reading; the other was the Parkside PTIA 1 with a temperature range of −50 • C to 380 • C and a measurement error of ±1.5 • C. Two different blackbodies were used during the calibration process, and their specifications are listed in Table 4.
The reason for this is thermal inertia and the time required to reach a steady state. Five hours of effective labor and two people were required to calibrate nine devices. The instruments were divided into two groups depending on the angle of the optics, paying attention to possible sources of reflected radiation and the influence of the operator. Figure 21 shows a blackbody Voltcraft IRS-350 with a set temperature value of 50 • C (green diodes) and a current value of 44.9 • C (red diodes). The active area where the camera calibration is performed is shown in red in the thermogram. The calibration process is performed when the blackbody enters a steady state. For this reason, we decided to perform the calibration in seven values, from 27 • C to 45 • C, with a step of 3 • C, which is the first larger amount compared to the specified measurement uncertainty of the camera. The specified interval covers the values for early detection of hotspots, where accuracy and precision are important. Higher temperature values quickly become noticeable, and the detection algorithm had no problems with them.
Appl. Sci. 2022, 12, 11657 22 of 28 temperature range of −30 °C to 900 °C and a measurement error of ±1 °C or ±1% of the reading; the other was the Parkside PTIA 1 with a temperature range of −50 °C to 380 °C and a measurement error of ±1.5 °C. Two different blackbodies were used during the calibration process, and their specifications are listed in Table 4. The reason for this is thermal inertia and the time required to reach a steady state. Five hours of effective labor and two people were required to calibrate nine devices. The instruments were divided into two groups depending on the angle of the optics, paying attention to possible sources of reflected radiation and the influence of the operator. Figure 21 shows a blackbody Voltcraft IRS-350 with a set temperature value of 50 °C (green diodes) and a current value of 44.9 °C (red diodes). The active area where the camera calibration is performed is shown in red in the thermogram. The calibration process is performed when the blackbody enters a steady state. For this reason, we decided to perform the calibration in seven values, from 27 °C to 45 °C, with a step of 3 °C, which is the first larger amount compared to the specified measurement uncertainty of the camera. The specified interval covers the values for early detection of hotspots, where accuracy and precision are important. Higher temperature values quickly become noticeable, and the detection algorithm had no problems with them. Using the raw data, Figure 22 shows the detailed temperature distribution on the active surface of another blackbody whose surface is not flat, unlike the Voltcraft. The image shows a maximum value of 50.774 °C, a minimum value of 50.006 °C, and an average value of 50.41068 °C, which was indicated on the display with a value of 50.4 °C. In Figure 22, a circular area can be seen with slight deviations in temperature, which were not real or physically possible. This is a result of the surface geometry, which, due to the different emissivity of individual circle areas, resulted in different emissivities and different amounts of radiation while the blackbody had a steady temperature-value camera presenting slight differences in temperature values. We can conclude that the camera does not measure temperature but registers infrared radiation, based on which the temperature values are assigned in accordance with the calibration data. In addition to the changes in geometry, a change in a surface emissivity is also possible and occurs in practice (degradation in paint due to the sunlight UV exposure or partial surface oxidation). Using the raw data, Figure 22 shows the detailed temperature distribution on the active surface of another blackbody whose surface is not flat, unlike the Voltcraft. The image shows a maximum value of 50.774 • C, a minimum value of 50.006 • C, and an average value of 50.41068 • C, which was indicated on the display with a value of 50.4 • C. Appl. Sci. 2022, 12, 11657 22 of 28 temperature range of −30 °C to 900 °C and a measurement error of ±1 °C or ±1% of the reading; the other was the Parkside PTIA 1 with a temperature range of −50 °C to 380 °C and a measurement error of ±1.5 °C. Two different blackbodies were used during the calibration process, and their specifications are listed in Table 4. The reason for this is thermal inertia and the time required to reach a steady state. Five hours of effective labor and two people were required to calibrate nine devices. The instruments were divided into two groups depending on the angle of the optics, paying attention to possible sources of reflected radiation and the influence of the operator. Figure 21 shows a blackbody Voltcraft IRS-350 with a set temperature value of 50 °C (green diodes) and a current value of 44.9 °C (red diodes). The active area where the camera calibration is performed is shown in red in the thermogram. The calibration process is performed when the blackbody enters a steady state. For this reason, we decided to perform the calibration in seven values, from 27 °C to 45 °C, with a step of 3 °C, which is the first larger amount compared to the specified measurement uncertainty of the camera. The specified interval covers the values for early detection of hotspots, where accuracy and precision are important. Higher temperature values quickly become noticeable, and the detection algorithm had no problems with them. Using the raw data, Figure 22 shows the detailed temperature distribution on the active surface of another blackbody whose surface is not flat, unlike the Voltcraft. The image shows a maximum value of 50.774 °C, a minimum value of 50.006 °C, and an average value of 50.41068 °C, which was indicated on the display with a value of 50.4 °C. In Figure 22, a circular area can be seen with slight deviations in temperature, which were not real or physically possible. This is a result of the surface geometry, which, due to the different emissivity of individual circle areas, resulted in different emissivities and different amounts of radiation while the blackbody had a steady temperature-value camera presenting slight differences in temperature values. We can conclude that the camera does not measure temperature but registers infrared radiation, based on which the temperature values are assigned in accordance with the calibration data. In addition to the changes in geometry, a change in a surface emissivity is also possible and occurs in practice (degradation in paint due to the sunlight UV exposure or partial surface oxidation). In Figure 22, a circular area can be seen with slight deviations in temperature, which were not real or physically possible. This is a result of the surface geometry, which, due to the different emissivity of individual circle areas, resulted in different emissivities and different amounts of radiation while the blackbody had a steady temperature-value camera presenting slight differences in temperature values. We can conclude that the camera does not measure temperature but registers infrared radiation, based on which the temperature values are assigned in accordance with the calibration data. In addition to the changes in geometry, a change in a surface emissivity is also possible and occurs in practice (degradation in paint due to the sunlight UV exposure or partial surface oxidation). In order to take into account all influences on the measurement uncertainty of the results, the camera displays temperature values with one decimal place. During the measurement, it was necessary for the spatial resolution IFOV (Instantaneous Field of View), i.e., the viewing angle of a single pixel of the sensor, be completely within the range of the reference surface. Figure 23 shows the maximum distance the camera could be from the blackbody for the measurement to be accurate. The maximum distance of the camera was compared with the field of view (FOV) that the camera would have at the specified distance, depending on its resolution. Appl. Sci. 2022, 12, 11657 23 of 28 In order to take into account all influences on the measurement uncertainty of the results, the camera displays temperature values with one decimal place. During the measurement, it was necessary for the spatial resolution IFOV (Instantaneous Field of View), i.e., the viewing angle of a single pixel of the sensor, be completely within the range of the reference surface. Figure 23 shows the maximum distance the camera could be from the blackbody for the measurement to be accurate. The maximum distance of the camera was compared with the field of view (FOV) that the camera would have at the specified distance, depending on its resolution. Figure 23. The maximum camera distance up to which accurate measurement is possible.
In the measurement, we tried to fill the frame as much as possible with the active element of the blackbody to make it easier to determine the mean, as shown in Figure 23.
To determine the mean value, it was not enough to read 10 different readings from the camera, and it was necessary to change the camera for each measurement so that the fluctuations of the temperature of the blackbodies were reduced to a minimum when determining the mean value. Table 5 shows the mean values of the measurements made according to the indicated procedure.

Analysis of Measurement Result Deviations
There are two basic approaches to analyzing thermographic records. One is to determine the highest temperature that the object under study can withstand. In our case, we were dealing with temperatures as high as 90 °C, which most structural-and electricalinsulating materials must withstand for a short period of time. The second approach is to determine the temperature difference between two identical objects, i.e., between a correct and an incorrect one, and to decide, based on the magnitude of the temperature difference, which method and time interval should be used to respond to the observed anomaly. It should always be kept in mind that an infrared thermal imager does not measure temperature but registers radiation. The deviation of the calibration data from the reference temperature represents an indication of correctness if all recording parameters have been entered correctly. Table 6 shows the deviations in °C and has been expanded to include In the measurement, we tried to fill the frame as much as possible with the active element of the blackbody to make it easier to determine the mean, as shown in Figure 23. To determine the mean value, it was not enough to read 10 different readings from the camera, and it was necessary to change the camera for each measurement so that the fluctuations of the temperature of the blackbodies were reduced to a minimum when determining the mean value. Table 5 shows the mean values of the measurements made according to the indicated procedure.

Analysis of Measurement Result Deviations
There are two basic approaches to analyzing thermographic records. One is to determine the highest temperature that the object under study can withstand. In our case, we were dealing with temperatures as high as 90 • C, which most structural-and electricalinsulating materials must withstand for a short period of time. The second approach is to determine the temperature difference between two identical objects, i.e., between a correct and an incorrect one, and to decide, based on the magnitude of the temperature difference, which method and time interval should be used to respond to the observed anomaly. It should always be kept in mind that an infrared thermal imager does not measure temperature but registers radiation. The deviation of the calibration data from the reference temperature represents an indication of correctness if all recording parameters have been entered correctly. Table 6 shows the deviations in • C and has been expanded to include information on average camera prices so that the accuracy data can be correlated with financial indicators. If the mean value of the deviation for the individual devices is plotted, a comparison can be made with the data from Table 3. From Figure 24, it is clear that the Flir A70 (Teledyne FLIR LLC, Wilsonville, Oregon, USA) with a lens angle of 29 • was not correctly factory calibrated and it is not suitable for further use. The Flir One Pro showed a deviation from the nominal accuracy of 34%, but considering the price and the purpose, which is primarily qualitative, we cannot consider it to be deficient. When the data from Table 6 were put into a graphical form, as shown in Figure 25, it was possible to clearly distinguish the cameras that were accurate and precise in terms of accuracy over the entire temperature range of the calibration.
information on average camera prices so that the accuracy data can be correlated with financial indicators. If the mean value of the deviation for the individual devices is plotted, a comparison can be made with the data from Table 3. From Figure 24, it is clear that the Flir A70 (Teledyne FLIR LLC, Wilsonville, Oregon, USA) with a lens angle of 29° was not correctly factory calibrated and it is not suitable for further use. The Flir One Pro showed a deviation from the nominal accuracy of 34%, but considering the price and the purpose, which is primarily qualitative, we cannot consider it to be deficient. When the data from Table 6 were put into a graphical form, as shown in Figure 25, it was possible to clearly distinguish the cameras that were accurate and precise in terms of accuracy over the entire temperature range of the calibration.     If the mean value of the deviation for the individual devices is plotted, a comparison can be made with the data from Table 3. From Figure 24, it is clear that the Flir A70 (Teledyne FLIR LLC, Wilsonville, Oregon, USA) with a lens angle of 29° was not correctly factory calibrated and it is not suitable for further use. The Flir One Pro showed a deviation from the nominal accuracy of 34%, but considering the price and the purpose, which is primarily qualitative, we cannot consider it to be deficient. When the data from Table 6 were put into a graphical form, as shown in Figure 25, it was possible to clearly distinguish the cameras that were accurate and precise in terms of accuracy over the entire temperature range of the calibration.    Going one step further and analyzing the percent deviation from the mean of the deviation for each temperature, as shown in Figure 26, can lead to a false conclusion, as the most accurate camera showed an increase in deviation as the temperature increased. The above procedure can still be useful when analyzing cameras with similar characteristics, which can be seen from the characteristics of the cameras that were below 1% deviation in the class. Appl. Sci. 2022, 12, 11657 25 of 28 Going one step further and analyzing the percent deviation from the mean of the deviation for each temperature, as shown in Figure 26, can lead to a false conclusion, as the most accurate camera showed an increase in deviation as the temperature increased. The above procedure can still be useful when analyzing cameras with similar characteristics, which can be seen from the characteristics of the cameras that were below 1% deviation in the class. From Table 6 and Figure 26 it is evident that it was necessary to perform the laboratory factory calibration of the Flir A70 29° camera, whose deviation of 3.23 times greater than the specified accuracy designation indicates that factory calibration was necessary. From the above, there was a need to introduce an indicator that links the accuracy and price range of each camera. Figure 27 shows the numerical value of the price ratio in relation to the mean temperature deviation. It is expressed in USD per °C. The two cameras shown in red justified the investment's accuracy, as determined by the calibration procedure. We left the Flir A70 29° for comparison, although we concluded that said camera is not to be used. By further analysis of the calculated indicator, it was concluded that a simple indicator that valorizes the possibility of using the optimal camera is not applicable in this case, but the decision to choose a camera must be determined by the technical characteristics of the individual application, taking into account the need for details and the need for accuracy and indicators within a narrow group of similar characteristics. The aforementioned analysis must not be performed until the minimum resolution and accuracy of the camera installed in the FireBot have been determined for the exact application, since the investment optimum must be taken into account.  From Table 6 and Figure 26 it is evident that it was necessary to perform the laboratory factory calibration of the Flir A70 29 • camera, whose deviation of 3.23 times greater than the specified accuracy designation indicates that factory calibration was necessary. From the above, there was a need to introduce an indicator that links the accuracy and price range of each camera. Figure 27 shows the numerical value of the price ratio in relation to the mean temperature deviation. It is expressed in USD per • C. The two cameras shown in red justified the investment's accuracy, as determined by the calibration procedure. We left the Flir A70 29 • for comparison, although we concluded that said camera is not to be used. By further analysis of the calculated indicator, it was concluded that a simple indicator that valorizes the possibility of using the optimal camera is not applicable in this case, but the decision to choose a camera must be determined by the technical characteristics of the individual application, taking into account the need for details and the need for accuracy and indicators within a narrow group of similar characteristics. The aforementioned analysis must not be performed until the minimum resolution and accuracy of the camera installed in the FireBot have been determined for the exact application, since the investment optimum must be taken into account.
Going one step further and analyzing the percent deviation from the mean of the deviation for each temperature, as shown in Figure 26, can lead to a false conclusion, as the most accurate camera showed an increase in deviation as the temperature increased. The above procedure can still be useful when analyzing cameras with similar characteristics, which can be seen from the characteristics of the cameras that were below 1% deviation in the class. From Table 6 and Figure 26 it is evident that it was necessary to perform the laboratory factory calibration of the Flir A70 29° camera, whose deviation of 3.23 times greater than the specified accuracy designation indicates that factory calibration was necessary. From the above, there was a need to introduce an indicator that links the accuracy and price range of each camera. Figure 27 shows the numerical value of the price ratio in relation to the mean temperature deviation. It is expressed in USD per °C. The two cameras shown in red justified the investment's accuracy, as determined by the calibration procedure. We left the Flir A70 29° for comparison, although we concluded that said camera is not to be used. By further analysis of the calculated indicator, it was concluded that a simple indicator that valorizes the possibility of using the optimal camera is not applicable in this case, but the decision to choose a camera must be determined by the technical characteristics of the individual application, taking into account the need for details and the need for accuracy and indicators within a narrow group of similar characteristics. The aforementioned analysis must not be performed until the minimum resolution and accuracy of the camera installed in the FireBot have been determined for the exact application, since the investment optimum must be taken into account.

Conclusions
Obtaining calibration data for infrared thermal-imaging cameras for implementation in the FireBot autonomous fire protection robotic system represents the starting point for optimizing the algorithm for detecting anomalies that may lead to the development of a fire. The proposed algorithm provides a clear insight into the operational performance of a self-driving autonomous solution. The proposed FireBot system-architecture diagram provides insight into the complex structure and a detailed approach for realizing the optimal technical solution. Accordingly, special emphasis is placed on the detection of the excitation maximum, which can be detected in the early phase of the anomaly. Infrared thermography was prescribed as the solution. Although this is a widely used method of non-destructive testing, the price of cameras increases significantly with increasing resolution and detection accuracy. In order to select the optimal camera, a blackbody calibration procedure must be performed. The calibration procedure indicated a defective camera and provided the necessary indicators for optimal selection depending on the complexity and specific characteristics of the space under investigation. The procedure itself and the observation of a defective camera led to the conclusion that, regardless of the specifications, camera calibration must be performed before installation in FireBot to ensure control over the installed component in addition to correcting the input data. In addition, this procedure enables faulty cameras that do not comply with factory specifications to be detected, as shown in the example of one camera analyzed in this paper.