1. Introduction
Thermal imaging is a non-contact method in which the radiation pattern of an object is converted into a visible image called a thermal image or thermogram. All objects at a temperature above absolute zero (−273 °C) emit infrared radiation. The infrared band, with a wavelength from 3 to 14 μm, is called the thermal infrared region. This is analysed in imaging applications that use heat signatures. Thermal imaging maps the surface temperature of any object with high thermal and spatial resolutions [
1].
Since the introduction of affordable dual sensors cameras, i.e., Thermal Infrared (TIR) and visible (RGB), mounted on Unmanned Aerial Vehicles (UAVs), many research results were published in various application domains, such as regional security, the monitoring of structures and infrastructures, the monitoring of archaeological sites, environmental monitoring, application in the agriculture field, etc. [
2]. However, this trend is just a consequence of many years of pioneering in placing thermal imaging devices on satellites, airplanes, helicopters, and do-it-yourself UAVs in the beginning for military purposes [
3] but later for other application fields [
2].
All currently operating air and space-borne TIR remote sensors detect the energy emitted by the object itself [
4]. These systems do not require an external source of infrared radiation for retrieval of emissivity spectra, because radiation is measured by comparing the object’s response and the modelled black body. As this approach does not rely upon keeping samples and black bodies at a fixed temperature, it is also appropriate for field use [
4]. The emissivity of the material is maximal when the imaging device observes it perpendicular to its surface [
1]. However, most of the time this is not guaranteed, and data should be corrected geometrically. As detailed in Ref. [
4], rocks, soils, and vegetation have been researched much more than man-made materials. Specifically, materials are not in pristine condition, as their variability is influenced by many factors, and, for such reason, it is suitable to develop a spectral index of weathering, one that relates material change over time with laboratory measures on new materials.
Explosive remnants of war are part of everyday life all the way from World War I. The neutralisation of devices 50 or more years old is reported almost daily across Europe. It is noted in [
5] that 64 countries are contaminated by land mines. Although both World Wars are far back in history, regional conflicts are constant. For some countries in armed conflicts, there exists a database of the most frequently used explosive devices with their characteristics, grouped by type [
6]. Other countries still need to develop such documents, in order to understand the level of risk for humanitarian mine clearance better. There are military systems that can scatter more than 1000 mines per minute easily [
5]. It is estimated, that for each mine placed, it takes 100 times more time to be cleared [
5]. In 2019 there were 30 accidental explosions of ammunition stockpiles, each of them generating large dispersion of unexploded ordnances [
7]. Most of the explosive war remnants are well known, at least by their physical dimensions, the weight of the explosives, the type of fuse, and the material of the cover. Unlike landmines, such ordnances lie mostly uncovered on the surface and unaffected by weather or vegetation. During a project in Bosnia and Herzegovina in 2019, a humanitarian organisation, Norwegian People’s Aid (NPA), collected imaging data about unexploded ordnances [
8]. As part of its project, the NPA organisation tested new technologies for the purpose of surveying suspected hazardous areas. Thermal imaging has been identified as a promising technology for several reasons, namely, due to its easy implementation in practice, the relatively low cost and availability of sensors compared to hyperspectral cameras (which are, on average, about 12 times more expensive), and the easy determination of criteria for the separation of targets (e.g., UXOs) from the terrain. Therefore, thermal imaging and LiDAR sensors were utilised and mounted on UAVs. We experimented with these thermal images in this research.
Object detection by using thermal images is a broad research field [
9,
10,
11], however, detecting UXOs is not researched that much. That is why publications on this topic are rare. In 2018 the authors in [
12] developed a prototype application that utilised thermal imaging for detecting PFM-1 ‘Butterfly Mines’. The mines were deployed aerially, so there was no map of their positions. The thermal information was complemented by RGB images of the same area. Mines were detected with a precision of around 78%, while the metal components of the casing were detected with 100% accuracy. It was emphasised that an environment (e.g., sand, grass, or cobble-shaped) can affect detection precision. The best detection precision was noticed around 30 to 120 min after sunrise or sunset. Buried or covered mines were hard to detect, as their thermal properties were masked by the cover. The work in [
13] was focused on the detection of land mines with respect to thermal changes in the environment. If the land mine was buried, then the thermal change was too small to be able to detect and identify UXO accurately. Nevertheless, the authors rated the results qualitatively as promising if the depth was shallow. Research in a similar direction was conducted in [
14], where buried land mines were being detected. Time series of thermal images were captured, and afterwards, the differences in the temperature were inspected between regions with buried land mines and a regular environment with no land mines. The authors claimed that better detection could be achieved at a 10 min delay after the heating was stopped, and at the condition that the land mines were buried no deeper than 35 mm. It was a laboratory experiment, where heating and cooling could be started/stopped arbitrarily. Methods based on Deep Learning are more a rarity than the rule in the field of Detecting UXO devices. It is worth mentioning the work of the research group from Binghamton University, which expanded their research from 2018 [
12] by utilising Deep Learning for automated detection and mapping of PFM-1 mines [
15].
In our opinion, the gap between large quantities of imaging data, usually gathered in reconnaissance missions, and reliability, needed when working in an environment contaminated by explosive ordnances, could be bridged by Convolutional Neural Networks (CNN). Important research was carried out in [
16] by colleagues from our partner institution the University of Rijeka. On a problem of automated person detection in thermal images, the authors experimented with various state-of-the-art CNN-based object detectors, namely, with Faster R-CNN, SSD, Cascade R-CNN, and YOLOv3, whereat all models were retrained on their dataset of thermal images. They demonstrated that the last version of YOLO [
17] (version 3 at that time) should be utilised for retraining, as it was significantly faster than other detectors with similar state-of-the-art effectiveness. Many versions of YOLO and its combinations have been introduced to date (e.g., YOLO-G [
18] for military target detection, YOLO-FIRI [
19]), but one of the most important is certainly its latest stable and well-verified version YOLOv5 architecture [
20], which is important due to the high processing speed and higher precision. (The latest version is, otherwise, YOLOv7.)
To the best of our knowledge, no automated solution exists for surface UXO detection by using thermal imaging data that tackles the problem of identifying more than one class of objects. In this study, we will adapt and retrain the state-of-the-art YOLO architecture in order to detect UXOs from 11 different classes on highly variable thermal images. On the other hand, we identified that there are no publicly available thermal imaging data about land mines and other UXOs. The second aim of this study is, therefore, to publish thermal imaging data with a very small Ground Sampling Distance. The selected YOLO architecture was, therefore, redesigned to our UXO detection problem, fine-tuned by using a grid-search approach, and, finally, trained end-to-end on thermal images. The effectiveness of our adapted and retrained CNN architecture was confirmed experimentally by detecting UXOs from 11 different classes on our challenging database of 808 thermal images.
The contribution of this research work is summarised in:
The introduction of a sophisticated Unexploded Ordnance detection algorithm by using thermal imaging data, where this algorithm is a minor modification of the state-of-the-art YOLO neural network;
The first study that assesses the effectiveness of the state-of-the-art CNN-based object detectors on an eleven class Unexploded Ordnance detection problem by using thermal images;
The development and publishing of the UXOTi_NPA public database of annotated thermal images of Unexploded Ordnance;
Baseline results of object detection for the UXOTi_NPA dataset.
This article is structured as follows.
Section 2.1 introduces the UXOTi_NPA evaluation dataset with annotated thermal images. A short overview of the state-of-the-art YOLO object detector is given in
Section 2.2, followed by a detailed description of a novel Unexploded Ordnance detection algorithm based on the YOLOv5 model in
Section 2.3.
Section 3 presents some of the results obtained on the UXOTi_NPA public dataset, followed by
Section 4, which emphasises certain aspects of our detection method, and concludes this paper briefly with some hints about future work.
3. Results
The results of our experiments are presented in this sequel. First, we solved the problem of classification and detection of UXOs from 11 different classes (i.e., an eleven UXO class detection problem). All five versions of the YOLOv5 model, presented in the previous section, were trained separately on the training set of the UXOTi_NPA database. Trainings were conducted for 300 epochs, with the same hyperparameters (see
Table 3). Sample graphs, captured during the training of the YOLOv5n version, are presented in
Appendix A. A trend of changing three different loss functions with respect to the training epoch can be observed (see
Figure A1,
Figure A2 and
Figure A3). We noticed a similar trend by also training our other CNNs. The trained Neural Networks were then evaluated on the testing set of the UXOTi_NPA database. The classification and detection effectiveness were assessed by using classic metrics, such as Precision, Recall, Mean Average Precision (mAP) at a 0.5 threshold, and the average mAP at thresholds ranging from 0.5 to 0.95 with a step 0.05. The results of this first experiment are gathered in
Table 4. Furthermore, the number of free parameters (in Million) is given in this Table for each retrained YOLOv5 version.
UXOs from all 11 classes were merged into one common class in the next experiment. Therefore, we were solving the detection problem: Are there any UXOs in the thermal image (regardless of which type), and if so, where? This experiment was denoted as a ‘single UXO class detection problem’. With it, we wanted to simulate a real-world scenario, where UXO removers are interested primarily in whether a UXO is present in an observed area at all and are only later interested in what type it is. We utilised the same versions of the YOLOv5 model (only the output layer was modified appropriately). The training and other settings remained the same as in the first experiment. Sample graphs, captured during the training of the YOLOv5x version for a single UXO class detection problem, are depicted in
Appendix A (see
Figure A4,
Figure A5 and
Figure A6). Similar trends were noticed by other YOLOv5 versions. The trained Neural Networks were then evaluated on the testing set of the UXOTi_NPA database, whereat UXOs from all 11 classes were merged into one common testing class. The obtained results are gathered in
Table 5, where the highest metrics are marked in bold.
More detailed results are gathered in
Table 6 for the smallest version YOLOv5n, with metrics calculated for each of the eleven UXO classes. A new Results Table was obtained for every pair of confidence and IoU thresholds. The confidence threshold was set to 0.001 and the IoU threshold to 0.5 for
Table 6. The confusion matrix has only ones along a diagonal using such a setting, which is why it is not shown. The abovementioned metrics and the number of all instances (UXOs) for each class in all 88 testing images are presented in
Table 6. Similar results were also obtained for the other YOLOv5 versions. It should be noted that the detection effectiveness decreases with higher confidence and IoU thresholds.
Some qualitative results are shown for the eleven UXO class detection problem in
Figure 4 and for the single UXO class detection problem in
Figure 5. In both cases the left column contains the original thermal images from the testing set of the UXOTi_NPA database, with expert annotations (i.e., bounding boxes) overlaid to each image. The colour and index next to the bounding box indicate the UXO class (see also the class indexes in
Table 1). All UXOs are grouped into one common class (index equal to 1) in the case of a binary detection problem (see
Figure 5). On the other hand, the right column shows the detection and classification results obtained by using our retrained YOLOv5 model versions. Bounding boxes with a detected UXO in the appropriate colour are depicted, supplemented by information about the detected class index and detection probability. The results in rows one and three were obtained with the retrained YOLOv5n, while the results in rows two and four were obtained with the retrained YOLOv5x version.
4. Discussion
In this research, we developed and verified thoroughly a computational method based on Deep Learning and Convolutional Neural Networks aimed at detecting Unexploded Ordnances from imaging material. It is an important support activity that contributes to the neutralisation of explosive war remnants. A selected terrain can be inspected from the air (e.g., by using Unmanned Aerial Vehicles) and potentially dangerous areas can be located in advance by using our automated solution. One of the novelties of our study is that thermal images were utilised to detect UXOs. Thermal imaging is an important source of information about the environment and its changes, especially if this information is not perceivable in the visible spectrum. Various materials, including UXOs, leave their own thermal signatures in the image, based on which they can be identified very reliably in the scene. We took advantage of this fact in our solution. A detailed list of the advantages of using the thermal (infrared) spectrum over the visible spectrum by the terrain demining is gathered in a report [
8].
Figure 6 depicts an example of a land mine with a green metal casing placed on green grass, which is indistinguishable in the visible spectrum (see the top image) but easily separable in the thermal spectrum (see the middle and bottom images).
During the development, we encountered the problem that there is no publicly available dataset of Unexploded Ordnance thermal images. The next contribution of our research is, therefore, that we have published the UXOTi_NPA public dataset with such annotated thermal images. This dataset, consisting of 808 thermal images with UXOs from 11 different classes, is slightly unbalanced, since UXOs from the smallest class are present in 79 images, and from the largest class in 161 images. Most often, UXOs from three different classes appear in a single image from the UXOTi_NPA dataset. Our public dataset could be considered very small compared to the COCO dataset with 328,000 images, or to ImageNet with more than 1,280,000 images. Naturally, the smaller dataset requires special approaches by computational methods’ training.
Our computational model was based on the latest stable and well-verified model of the well-recognised object detector YOLO (i.e., model v5). It should be stressed that the YOLOv7 is the latest YOLO version. Our choice was further substantiated by related research using thermal imaging data. The YOLOv5 model appears in five versions, that differ from each other in the number of free parameters. We experimented with all of them in this study. Our smallest CNN architecture had 1.8M free parameters, and YOLOv5x, as the largest architecture, had more than 48 times bigger capacity (i.e., 86.6M free parameters). Analysing the efficiency of architectures with different capacities has an important applicative value. A version of YOLOv5n could, for instance, be installed on an embedded device (e.g., as part of a UAV), where it would perform quasi-real-time detection of UXOs. On the other hand, the remaining YOLOv5 architectures are too complex, and are, therefore, more suitable for non-real-time processing, the largest architectures even for detecting very small objects in vast areas (e.g., for thermal images captured with large GSD).
The obtained results for the ‘eleven UXO class detection problem’ pointed out that the Precision was higher than 98% (with the lowest value of 98.3% and the highest value of 98.8% by YOLOv5n and YOLOv5s, respectively), and the Recall was equal to 100% for all five YOLOv5 versions. Such a Recall value suggests that our computational method has always detected all UXOs in the scene. The metric mAP at threshold 0.5 was equal to 99.5% for all architectures, while the mAP at thresholds between 0.5 and 0.95 (step size 0.05) varied between 87.0% (YOLOv5n) and 90.5% (YOLOv5l). We can summarise that YOLOv5l proved to have the highest detection accuracy, and YOLOv5s demonstrated the lowest number of False Positives.
It can be seen from the results (see
Table 4) that increasing the capacity of our Convolutional Neural Network did not improve the effectiveness significantly. Quite similar effectiveness was, thus, gained with smaller and larger architectures (even with 48 times larger), which is an important insight, especially if our solution would be integrated into devices with limited computing power. The opposite was noticed when applying the YOLOv5 model on the COCO dataset (see
Table 2), where significantly better results were obtained with larger models.
The effectiveness of the YOLOv5 model on the UXOTi_NPA dataset was higher by almost half than on the COCO dataset. Undoubtedly, one reason is that the UXOTi_NPA is smaller and has fewer classes than the COCO dataset. At the same time, the searched objects in UXOTi_NPA had a greater visual similarity (e.g., textures) than the more diverse objects in the COCO dataset. We presume that low-dimensional feature vectors are sufficient to describe UXOs in thermal images, which is why the YOLOv5 architectures performed well, even with a small capacity.
Let us analyse the training of YOLOv5 on the UXOTi_NPA dataset. Our dataset is relatively small, so overfitting is an imminent threat. We mitigated this problem by initialising the weights of our architectures with the weights of the YOLOv5 model pre-trained on the COCO dataset. It should be emphasised that no transfer learning was performed, as we trained our initialised architectures from scratch (no layers were frozen!). Naturally, the initialisation of synaptic weights with a pre-trained model has a beneficial influence on training. The training time is reduced, and, at the same time, the training is less dependent on the correct setting of the learning rate (this rate can be lower and also partly inaccurate). A lower learning rate means smaller adaptations of synaptic weights, resulting in solutions that are located in the vicinity of the solutions of the pre-trained model. This limits the risk of overfitting greatly. Additionally, an augmentation was employed during the training of our CNNs (see
Table 3). The training was performed for 300 epochs. We took the model after the last epoch as a final result. For demonstration purposes, we have shown in the
Appendix A the trend of changing loss functions with respect to the epoch for our smallest YOLOv5n architecture (see
Figure A1,
Figure A2 and
Figure A3). Similar behaviour was noticed by other, larger architectures and by solving the single UXO class detection problem (see
Figure A4,
Figure A5 and
Figure A6). We noticed that graphs became almost flat around the 60th epoch. The latter suggests that the training time can be shortened by around 5 times.
We also studied the effectiveness of our computational method by solving the single UXO class detection problem in this research. This means that the samples from all 11 classes of the UXOTi_NPA dataset were treated as one single class during training and evaluation. In this case, the Precision metric increased as expected for all architectures (up to 99.4% by YOLOv5l) compared to the same metric calculated by the eleven UXO class detection problem. The same applies to the average mAP at thresholds from 0.5 to 0.95 (increased up to 91.5%). On the other hand, the Recall and mAP at threshold 0.5 remained unchanged with respect to the results of the eleven class problem. For security and military applications, it is of course a more advantageous solution where we are able to distinguish between different UXOs (e.g., mine clearance). The ‘single UXO class’ solution comes into play in cases where the user needs to be warned (alerted) about areas with impending danger.
Detection accuracy on the UXOTi_NPA dataset by using our approach is relatively high already (see
Table 4,
Table 5 and
Table 6). However, the average mAP metrics at thresholds ranging from 0.5 to 0.95 (
[email protected]:0.95) indicate that there is still enough room for improvements in follow-up studies (i.e., at higher IoU thresholds).
5. Conclusions
In this paper, we have shown that the combination of deep Convolutional Neural Networks and thermal imaging can be used advantageously to detect UXOs in a real environment. The state-of-the-art object detector YOLOv5 was adapted successfully to recognise and localise UXOs in thermal images captured by UAVs. Our solution uses Deep Learning to detect and recognise UXOs, similar to [
15]. However, there are important differences between the two approaches. Our algorithm uses thermal images and a YOLO model, while [
15] performs detection using Faster R-CNN in the visible spectrum. The biggest difference is that our algorithm is adapted for the eleven UXO class detection, while the solution in [
15] only detects PFM-1 land mines (i.e., a single UXO class).
The public UXOTi_NPA dataset with thermal images of UXOs from eleven classes was published as part of this study. Our computational method was verified comprehensively on this dataset. The obtained metrics can also be considered as baseline results for the UXOTi_NPA dataset. We believe that this evaluation dataset will encourage future research in this area, because, to the best of our knowledge, there are no publicly available datasets with annotated thermal images of land mines and other UXOs. When developing our detection algorithm, we did not integrate the knowledge about UXOs and the thermal spectrum explicitly, but this knowledge was provided implicitly through the training set. From this point of view, our approach is general and applicable to any problem of object detection from (thermal) images.
Further research that introduces multimodality into the UXO detection process is planned; namely, we want to merge data from RGB visual sensors and thermal cameras. At the same time, we want to analyse the importance of the RGB and thermal spectrum statistically. An experiment with thermal imaging material that would be captured at higher altitudes is also foreseen. The UXOTi_NPA database size is rather limited in its first version, and we plan to expand this database with new thermal images with UXOs, preferably from different geographical areas. We also want to upgrade the verification protocol by introducing K-fold validation, where K will be set to 3 or 4.