Abstract
Technological advancements have facilitated the development of sophisticated vision systems, integrating optical sensors with artificial vision and machine learning techniques to create applications in different fields of robotics. One such field is Search and Rescue (SAR) robotics, which has historically played a significant role in assisting brigades following post-disaster events, particularly in exploration phases and, crucially, in victim identification. The importance of employing these systems in victim identification lies in their functionality under challenging conditions, enabling the capture of information across different light spectrum ranges (RGB, Thermal, Multispectral). This article proposes an innovative comparative analysis that scrutinizes the advantages and limitations of three sensor types in victim detection. It explores contemporary developments in the state-of-the-art and proposes new metrics addressing critical aspects, such as functionality in specific scenarios and the analysis of environmental disturbances. For the indoor and outdoor testing phase, a quadrupedal robot has been equipped with these cameras. The primary findings highlight the individual contributions of each sensor, particularly emphasizing the efficacy of the infrared spectrum for the thermal camera and the Near Infrared and Red Edge bands for the multispectral camera. Ultimately, following system evaluations, detection precisions exceeding 92% and 86%, respectively, were achieved.
1. Introduction
The past decade has witnessed significant technological advancements in perception systems, with a particular focus on vision-based systems applied in the domain of field robotics. Notably, a considerable impetus has been observed in applications focused on vision sensors, as highlighted in [1,2,3]. These kinds of sensors leverage different light spectrum ranges to generate environmental data, thereby facilitating the extraction of valuable information essential for robotic mission applications. Sensors incorporated in RGB, thermal, and multispectral cameras are the most widely utilized in outdoor robotics, particularly the first two sensors, which are commonly applied in the Search and Rescue (SAR) domain [4,5]. In contrast, the latter (multispectral) exhibits extensive applicability in precision agriculture [6,7]; recent studies have demonstrated its utility in SAR robotics.
Within the spectrum of applied advancements in SAR, the contemporary state of the art directs attention toward perspectives primarily centered on the general identification of people [8,9,10]. One of the principal goals inherent in Search and Rescue missions is to optimize the preservation of lives within the briefest conceivable timeframe. The initial hours after a disaster are characterized by heightened criticality, with the likelihood of locating survivors being at its peak during this period. Nonetheless, owing to the abrupt occurrence of these events, achieving the full preparedness of rescue teams for prompt deployment within a few hours and at a designated location poses a formidable challenge. Rescue teams have saved victims even ten days after a catastrophe, as exemplified by the earthquake in Turkey in February 2023. Notably, three individuals were rescued alive 248 h after the event, as reported by CNN [11]. Furthermore, the challenging conditions in post-disaster environments impede the visual identification of victims during the initial assessment by a first-responder. These adverse conditions encompass scenarios such as victims being entirely or partially concealed by debris, the presence of immobile or unconscious victims, low or no illumination in the surroundings [12], potential gas leaks, and electrical hazards [13].
In the state of the art, metrics are proposed for validating SAR missions, such as the one described by Jacoff et al. [14], named AAAVRoboCupRescue, which involves parameters such as the number of located victims, the number of robots and operators involved, and the accuracy, following Equation (1). The current metrics for assessing the quality of a detection method primarily rely on the number of individuals found [15]. In the works by Katsamenis et al. [16,17] and De Oliveira et al. [17], teh authors present approaches related to person detection but do not establish a comparison or specific metrics for validating the quality and implication of sensors in detections. While current methods score the missions considering victim detections, the current state of the art lacks a methodology to evaluate and compare the factors influencing this detection, especially by analyzing the three types of visual sensors (RGB, Thermal, and Multispectral).
Combining different sensory sources has shown high effectiveness in extracting relevant information from analyzed elements, as seen in the work by Corradino et al., where they combine radar and optical satellite imagery to map lava flows [18]. It is noteworthy in the state-of-the-art literature that using RGB–thermal cameras is prevalent. Yet, the incorporation of multispectral bands simultaneously for this purpose has not been explored. This work takes as a basis the foundation for victim detection based on the works conducted by the authors in the domains of RGB [19], Thermal [20,21], and Multispectral [22].
2. Related Works
2.1. Context and Historical Evolution
The utilization of visible information captured by cameras is pivotal across various domains of robotics and other scientific disciplines for the analysis of processes, states, and decision-making. Presently, in search and rescue robotics, a substantial portion of the utilized information originates from the visible spectrum of light (RGB images) and a segment from the infrared spectrum (thermal images). However, the multispectral spectrum has been relatively underexplored in this realm of robotics, particularly in identifying victims during search and rescue missions.
Figure 1 shows three distinct spectral ranges of light, specifically RGB [400 nm–700 nm], thermal infrared [8 m–14 m], and multispectral (Red Edge [690 nm–730 nm] and Near-Infrared [NIR] [750 nm–2.5 m]), as described previously.
Figure 1.
Ranges of light in the visible and non-visible spectrum. Source: Authors.
Notably, the human-perceptible visible light spectrum (RGB range) is considerably narrower than the infrared or ultraviolet spectra, which are perceptible by certain animals such as snakes or insects. Due to its widespread application in agriculture, the multispectral range is particularly distinctive in victim detection. The selection of these spectral ranges is informed by the anticipated directions of future research, as delineated in contemporary studies. It aligns with the specifications of commercially available portable data acquisition instruments (images) compatible with robot payloads less than 10 kg.
A comprehensive examination of the research works over the past two decades is necessary for scientific research on RGB, Thermal, and Multispectral sensors within search and rescue operations. A literature search was conducted using the online database Google Scholar to generate a chronological graph depicting the evolution of publications (Figure 2). The graph identifies three types of histograms: yellow, grey, and blue. These histograms were generated as follows: the search criteria for each type of sensor defined by colour were the respective phrases ’Multispectral sensors in search and rescue’, ’Thermal sensors in search and rescue’, and ’RGB sensors in search and rescue’. An annual search was performed in the specific interval field; for example, for the year 2000, the interval considered was [1999–2000], generating 23 data for each sensor type. The mentioned data were entered into the histogram generation function of Microsoft Excel to perform visual analysis and corresponding graphical representation, resulting in Figure 2. Simultaneously, second-degree polynomial trend curves were applied to define the increasing trend in the evolution of the analyzed works.
Figure 2.
Evolution Publications on Knowledge Web over the last two decades in the context of RGB/Thermal/Multi-spectral sensors for search and rescue. Source: authors.
During the early 2000s, the scientific output in these three fields was notably modest. However, a remarkable exponential growth trend has been observed in the last two decades. This surge in research activity can be attributed to a confluence of factors, with a significant emphasis on the scientific community’s growing interest in exploring applications related to search and rescue. Additionally, advancements in technology, particularly in the realm of cost reduction for both sensors and data processing equipment, have played a pivotal role in fostering this upward trajectory.
A particular observation pertains to the distribution of research output among the three sensor types. Thermal-sensor-related works (represented in grey in Figure 2) stand out as the most abundant, surpassing studies in the RGB (blue bar) domain by a substantial ratio of over 2:1. Meanwhile, studies in the RGB field outnumber those related to multispectral images (orange bar) by approximately 20%. This distribution can be rationalized a priori, considering the challenging conditions in SAR environments, especially poor or nonexistent illumination. Consequently, the materials’ thermal-emission-based information emerges as a special resource for rescuers, facilitating initial inferences regarding environmental conditions, leak detection, victim identification, and other critical aspects.
2.2. Vision Sensors in Search and Rescue Robotics
In this section, the most relevant developments in the state of the art related to the three types of sensors applied to identifying victims in SAR robotics are compiled, as detailed in Table 1.
Table 1.
Comparison of state-of-the-art methods for victims/people SAR tasks using vision sensors.
3. Materials and Methods
3.1. Robotic System and Processing
A quadrupedal robot equipped with hardware–software instrumentation was employed to develop the mission indoors and outdoors. This instrumentation enables both data collection and onboard processing using the ROS framework. The utilized instruments are outlined in Table 2, providing details on the robotic system’s characteristics and specifications for the cameras employed in the process. Specifically, the spectral ranges of light acquisition for these cameras are described.
Table 2.
Materials for the proposed system implementation.
The detection of victims relies on convolutional neural network models (Thermal [20,21], RGB [19], Multi-spectral [22]) from the authors’ previous developments. These models have been integrated into the proposed system to generate a centralized and robust detection by fusing three inferences and a subsequent comparative analysis.
Figure 3 illustrates the developed structure, which includes a Command Station for monitoring the robot’s navigation process. This station sends various user-defined points of interest for navigation, and the robot explores these points reactively. Reactive navigation involves reaching the designated point, capturing three types of data, processing the images through the three convolutional neural network models to generate predictions, establishing a redundant filtering mechanism across the three systems, determining valid detections, and subsequently placing indicators on the map generated by the robot during the mission’s progression.
Figure 3.
Layout of subsystems connections. Source: authors.
3.2. Field Tests
As test scenarios, wherein diverse missions for victim identification will be executed to undertake the proposed comparison, indoor and outdoor environments with conditions akin to a post-disaster setting have been explicitly delineated. Within these environments, individuals simulating victims have been strategically positioned.
Outdoor experiments were conducted in collaboration with the Spanish Military Emergency Unit (UME) during the “XVI Jornadas Internacionales de la Universidad de Málaga sobre Seguridad, Emergencias y Catástrofes”. Together with the organization, and leveraging their expertise, we meticulously recreated environmental conditions, allowing for the extrapolation of results to realistic situations. Additionally, the standards for such experiments were dictated by the National Institute of Standards and Technology (NIST) [41], which was also considered for indoor tests, defining parameters such as the type of debris, ground obstacles, etc. The experiments were conducted repetitively under the same environmental conditions to minimise error variation for each evaluation aspect.
In the first scenario, a series of victims were strategically placed in a tunnel, requiring identification by various participating robotic teams. Subsequently, these identifications were verified through specialized canine units. Figure 4a illustrates a panoramic view of the tunnel exit, featuring emergency response teams, a helicopter, and a vehicle positioned for other simulation exercises. On the other hand, Figure 4b depicts an indoor scenario within the Center for Automation and Robotics facilities, where primary phase system validation tests were conducted before outdoor experiments.
Figure 4.
Experimental environments developed indoors and outdoors. Source: authors.
3.3. Algorithms and Evaluation Metrics
A system was devised to evaluate the proposed method, which stems from developing a series of missions for processing the proposed victim detection system through the three proposed vision modes. The outcome aims to maximize the accurate identification of victims in an explored area, placing potential “refined” locations on an informed environmental map. In this manner, the markings serve as reference points for first responders to prioritize their attention to those areas.
3.3.1. Implemented Algorithm
The implemented algorithm governing the system operates on a sequential modular synergy for decision-making. The first step took the operator-defined destination points; collision-free trajectories were calculated using an RRT (Rapidly Exploring Random Tree) planner. Position estimation was acquired through sensor fusion, combining data from the Inertial Measurement Unit (IMU) and lidar system. Through this stage, the robot navigated through the environment to capture images.
The localization was based on SLAM, including an EKF (Extended Kalman Filter), which uses Lidar and IMU, as GPS, in such conditions (indoors), cannot accurately estimate positions. On the other hand, the assignment of inspection points (x, y) was carried out through the interface of the informed map generated (described in Figure 3). The navigation to each point was autonomous, with dynamic obstacle avoidance facilitated by the Lidar system. The path planning, considering known starting and destination points and the environment map, employed an RRT planner and the A* search algorithm.
On the other hand, vision systems undergo a series of stages, from data acquisition to the identification of victims. Raw images are captured and preprocessed using computer vision techniques to eliminate environmental noise and enhance their quality before entering the neural network. For this purpose, techniques such as erosion, dilation, and Gaussian filtering were applied to the images. In the case of multispectral images, they are matrix-combined using an operation defined by the Victim Detection Index (VDIN). Once processed by the CNN, detection results, such as bounding boxes, labels, and precision, are incorporated into each resulting image. These processed data are stored through logs to generate the final mission percentages at the end of the operation.
Pseudocode Algorithm 1 delineates the functionality of the implemented system, starting from the exploration phase through zones designated by the operator, the acquisition of visual data from the environment, the processing of three types of images, and the evaluation of system performance.
3.3.2. CNN-Based Algorithm
In this section, the CNN-based Detection Algorithm is introduced. For the evaluation phase of the proposed method, an embedded subroutine was developed within Algorithm 2. This subroutine identified victims using three types of images through convolutional neural networks. Specifically, the architecture of YOLOv8 was employed due to its notable features: its high inference speed and high precision rate in detection. The version of YOLOv8 utilized was ’m’, chosen for its balance between accuracy and inference time compared to the ’n’ and ’s’ versions. While the ’x’ and ’l’ versions marginally increase accuracy, the processing time hinders real-time inference.
As a preliminary step to detection, a preprocessing phase was conducted on the multispectral images. Though the multispectral camera outputs RGB channels as well, for the purposes of this study, channels relevant to the victim detection index (Red, Green, Near Infrared) from Equation (2), as previously proposed by the authors in [22], were utilized. Although both the thermal camera and the multispectral camera produce an output image as a matrix with normalized intensities ranging from 0 to 255, unlike RGB, which combines three channels, when sending this data as inputs to the CNN, as the network requires three channels for processing, for the first two cases, the channel was replicated three times.
| Algorithm 1 Victim detection and robotic exploration system. |
|
The pre-training stage of the three convolutional neural network models involved data preprocessing. In this process, images were labelled according to the labels in the list, utilizing various labels to account for scenarios where victims may be partially obscured. In the case of rescuers, an additional label for legs was included, considering the robot’s limited visibility field.
- Victim, victim leg, victim torso, victim arm, victim head;
- Rescuer, rescuer leg.
Before training, data augmentation was performed to enhance the model’s robustness against disturbances commonly encountered in outdoor environments. This involved applying morphological operations to the images, specifically rotation (20%), brightness modification (30%), and contrast adjustment (30%). The datasets were divided into training (70%), validation (20%), and test (10%) sets, available in the repositories: RGB (https://drive.upm.es/s/xPKDp5Xyh1HTHWA, accessed on 14 January 2024) (total = 1454, 2064 × 1544 px), Multispectral (https://drive.upm.es/s/xPKDp5Xyh1HTHWA, accessed on 14 January 2024) (total = 1454, 2064 × 1544 px), and Thermal 1 (https://mega.nz/fm/26RjCCiQ, accessed on 14 January 2024)–Thermal 2 (https://drive.upm.es/s/xPKDp5Xyh1HTHWA, accessed on 14 January 2024) (total = 3750, 1920 × 1080 px). The neural network training parameters were set to 190 epochs, a batch size 4, and 63,000 iterations. The newly trained models were used to infer new images following the Pseudocode Algorithm 2.
| Algorithm 2 CNN-based algorithm. |
|
Confusion matrices were employed to highlight the precision of class detection in order to evaluate the trained models. Figure 5 shows the three confusion matrices obtained by evaluating the trained models. The key findings reveal that the main diagonals exhibited high values, emphasizing the correct functionality of the models, with Thermal, Multispectral, and RGB models ranked in descending order of effectiveness.
Figure 5.
Confusion matrix for CNN-trained models. Source: Authors.
3.3.3. Proposed Metrics for Method Analysis
A set of metrics is proposed, as described in Table 3, to assess the effectiveness of each vision system in victim detection. Following the state of the art and the authors’ framework, these metrics encompass the primary evaluation criteria in the field of search and rescue for each system individually, addressing both functional conditions and the general performance of the proposed detection system.
Table 3.
Quantitative metrics proposed for the evaluation of vision systems.
In a general mode, a direct evaluation of three types of sensors is proposed for the generic conditions present in diverse environments, incorporating normalized coefficients [0–100] by functionality efficacy, as delineated by Equation (3) (Thermal–RGB–Multi). Given the generality of the application framework, the time parameter was not considered in this instance, and the coefficients relating to victims focused generically on objects.
On the other hand, in a SAR approach, systems are evaluated concerning their correct functionality in each envisaged scenario. Similarly, specific parameters, such as processing time, are negatively penalized because time is a critical parameter in exploration. In contrast, others, such as robustness in changing light conditions, contribute more significantly to the total score. Likewise, critical situations, such as in identifying a concealed victim, are of particular interest, as is the ability to identify victims in poor lighting conditions, a recurrent situation in post-disaster environments.
The expressions encapsulating the weighting relationship for the coefficients are summarized in Equation (4) for each type of image detection. The modified coefficients were established following the conducted experiments, assigning greater weight to those deemed more pertinent in Search and Rescue operations.
4. Results and Discussion
4.1. Mission Execution in Indoor–Outdoor Environments
To evaluate this study, the proposed metrics (), and the overall system presented, a series of missions were conducted indoors and outdoors, as synthesized in Figure 6. This figure illustrates various instances of traversal in different environments over distinct time intervals and perspectives on the progression of data collection phases.
Figure 6.
Indoors–outdoors exploration process at the University of Malaga and Center for Automation and Robotics facilities. Source: authors.
Figure 6b–e depict the traversal sequence for the indoor scenario, spanning from time 0 to a median time of 200 s. A blue circle denotes the robot’s position at each instant. In this scenario configuration, 20 tests were conducted, with five victims distributed throughout, including covered and partially covered cases. On the other hand, Figure 6e–h showcase the tunnel exploration carried out during the “XVI Jornadas Internacionales de la Universidad de Málaga sobre Seguridad, Emergencias y Catástrofes”. This exercise was executed once in a single pass due to the logistical complexity involved. The figures highlight the individual explorations conducted by ARTU-R and Spot robots.
Figure 6i,k provide the perspective captured by a UAV during the mission’s development. In contrast, Figure 6j,l depict the corresponding thermal footprints captured by the aerial vehicle. Finally, Figure 6m,p illustrate the robot capturing images of a person acting as a victim for processing. These images were acquired from various perspectives assigned by the operator.
4.2. System Performance Evaluation
4.2.1. Evaluation of Victim Identification in SAR Missions Performed
Figure 7 illustrates the detections conducted in the missions of the scenarios presented in Figure 6. The figure highlights a notably high precision rate, exceeding an average of 87%, for identifying both victims and first responders.
Figure 7.
Evaluation of three vision methods for victim detection in different scenarios. Source: authors.
Figure 7a,d showcase the detection results with a high average precision rate exceeding 91% for indoor scenarios, employing thermal and multispectral images. Notably, the system demonstrates the effective detection of torsos and legs. On the other hand, Figure 7b,c,e pertain to outdoor scenarios, emphasizing bounding boxes that highlight various body parts of identified victims, achieving a precision rate exceeding fifty percent.
4.2.2. Individual Evaluation of Systems Using the Proposed Metrics
The metrics outlined in Table 3 have been evaluated based on victim detection quality data in the conducted missions. The evaluation considers the mean precision values (mAP) for inferences under the ten specified conditions across repetitions of missions indoors and subsequent evaluations outdoors.
Figure 8 compiles the normalized percentage values from 0 to 100 obtained for each of the three types of images, according to the ten indices established in this study. This figure provides an intuitive and straightforward overview of the strengths of each image type relative to its counterparts. Significant differentials were observed, particularly in scenarios involving fully covered victims and poor light conditions, where thermal cameras exhibit a considerable advantage. In contrast, in aspects such as heat sources or summer weather conditions, RGB and multispectral images are better alternatives.
Figure 8.
Comparative radial graph of the different coefficients that evaluate the indication of mission success for each light spectrum range. Source: authors.
Concerning indoor, outdoor, and changing light conditions, the three cameras’ mean effectiveness values remain within a similar range. However, for scenarios involving partially covered victims or those wearing “camouflage” clothing that may be confused with the surroundings, the ranges differ moderately, with differences of up to 20 per cent.
The rounded mean values corresponding to the radial Figure 8, relative to the coefficients of situational analysis for environmental conditions, are synthesized in Table 4 under the Metrics Analysis section. Here, the results of Equation (3) are also presented, representing the values for each image type in a generic sense and specifically according to Equation (4) for the Search and Rescue case.
Table 4.
Metrics obtained from the coefficients proposed for the type of image in the three ranges of the light spectrum.
The highest to lowest calculated scores for generic detection scenarios in exploration missions are Thermal 755, Multispectral 714, and RGB 662. The variation between extremes is 93 points. A similar situation arises with the SAR Score, where the order remains unchanged. Still, the specific incidence difference for Search and Rescue is much higher (with a difference of up to 222 points), considering the weighting factors for the proposed coefficients.
In all three cases, as well as for experiments conducted indoors and outdoors, the best individual performer is the thermal range for victim detection, followed by multispectral and RGB.
4.2.3. Combined Evaluation of Systems Using the Proposed Metrics
Figure 9 presents a boxplot diagram of the percentage values of victim detection in each configuration of the coefficients (). The most significant differences were observed among the coefficients related to and , highlighting their pronounced impact on victim identification. In contrast, and exhibit a minimal influence on their respective parameters of identifying victims with either type of camera.
Figure 9.
Evaluation of the errors in the mean values of the proposed coefficients in Figure 8. Source: authors.
These findings underscore the sensitivity of the detection outcomes to variations in specific coefficients, particularly those associated with environmental and contextual considerations. The coefficients , , , and emerge as critical factors influencing the performance of victim detection algorithms, suggesting the need for the careful consideration and optimization of these parameters in the designed system.
On the other hand, the evaluation of processing time, measured in frames per second (fps) for the three cases using the YOLOv8-m version, demonstrates a performance ranging from approximately 26 to 28 fps for inference through the convolutional neural network (CNN) on thermal and RGB images, respectively. Meanwhile, the multispectral range exhibits a rate of 8 fps, attributable to the preprocessing and size of images across different spectral ranges. Although the processing times in the first two cases approach so-called real-time processing, the last case is an exception. To compensate for this latency, a pause sequence is executed, during which the robot remains stationary in the sample acquisition zone to gather and process data, effectively overlaying this time latency.
Although individual victim detection systems tend to perform well in specific cases, the results demonstrate that a system cannot be generalized for all scenarios. Therefore, once the individual impact of each coefficient is known, it is necessary to establish a robust and redundant system for victim detection. Figure 10 illustrates the maximized outcome of the areas covered by the implemented redundant system, reaching a success rate of up to 93.5% (following Table 4) in potential search and rescue scenarios.
Figure 10.
Approach to the maximized area, according to the best values of the proposed coefficients in Figure 8. Source: authors.
4.2.4. Discussion
The approach to victim detection, and the development of a substantial comparison among the three types of images to achieve this purpose, has allowed the establishment of criteria for examining the strengths and weaknesses for each image type under specific functionality conditions. While, individually, the systems exhibit acceptable functionality in particular conditions, the proposed synergy and selective use of image types based on environmental conditions result in a detection accuracy exceeding 93%, as observed in the conducted experiments.
In this context, the optimal functionality conditions for the different sensors are as follows: thermal sensors perform well under changing light conditions, detect individuals regardless of clothing colour, and effectively identify partially or fully covered victims in low-light conditions. Multispectral sensors demonstrate strength in scenarios involving heat sources in summer and fire conditions, both indoors and outdoors. Finally, RGB sensors excel in processing time.
Regarding the state of the art, no similar work has been found that compares the three image types by evaluating environment metrics and functionality; most approaches to victim detection in search and rescue environments are carried out with thermal cameras due to the applicability of computer vision techniques for the segmentation of the thermal information of a victim and their environment, in addition to the commercial availability of equipment oriented to this task. However, there are significant advances in using multispectral perception cameras to detect biometric characteristics. This may be useful for feature extraction oriented to victim detection in the non-visible spectrum for post-disaster environments.
Likewise, almost all works focus on integrating sensors in UAVs from the top view plane, working on datasets. Along these lines, references [22,38] propose a different approach based on detecting and localizing victims online from UGVs in navigation areas with low-light conditions. Table 1 provides a detailed overview of the most relevant works and presents a comparison concerning the methods and techniques employed in their development.
5. Conclusions
In the context of this study, it is deduced that the spectral ranges of light with the most significant impact on the detection of victims are primarily Green (GRE), Red (RED), Near-Infrared (NIR), and Infrared. Regarding the former two, their significance lies in their substantial contribution of environmental information. On the other hand, the latter two are noteworthy for their ability to capture information that eludes human visual perception.
The comparative analysis undertaken for victim detection across three spectral ranges has facilitated the identification of key parameters with varying degrees of impact on system precision. Notably, (indicating a totally covered victim), (corresponding to poor light conditions), and (indicating the presence of heat sources) have emerged as influential factors significantly affecting victim identification. In contrast, (associated with changing light conditions) and (with indoor settings) exhibit minimal impact, independent of their respective parameters, in identifying victims with either type of camera. The prominent influence of these identified factors accentuates the critical importance of careful consideration and optimization during system design.
Regarding thermal imaging dominance under low-light conditions, these images outperform RGB and multispectral methods in victim detection, leveraging their reliance on thermal footprints. The method detects individuals hidden by slender obstructions, demonstrating superior obstacle penetration compared to RGB-based techniques.
As thermal imaging relies on a single measurement spectrum, Multispectral’s broader spectrum combinations enhance its adaptability and robustness in diverse scenarios. Additionally, the Multispectral method demonstrates robust detection in challenging conditions, such as scenarios involving less distinctive clothing, where RGB methods face limitations. In scenarios with heat sources or incidents like fires, the combined use of RGB and Multispectral methods offers a substantial advantage, surpassing the performance of the thermal method.
As prospective avenues for future research, exploring potential novel indices could be contemplated through integrating the diverse bands captured by the three types of sensors. These indices could be iteratively generated through repetitive loops, leveraging their operational versatility as matrices. Alternatively, optimization and error minimization techniques, facilitated by machine learning, could be applied to extract distinguishing elements from the new images. This approach aims to assess their relevance in the detection of victims.
Author Contributions
Conceptualization, A.B., C.C.U., D.O. and J.d.C.; methodology, A.B. and C.C.U.; software, C.C.U. and D.O.; validation, C.C.U. and D.O.; formal analysis, C.C.U. and A.B.; investigation, C.C.U., A.B., D.O. and J.d.C.; resources, A.B. and J.d.C.; data curation, C.C.U. and D.O.; writing—original draft preparation, C.C.U., D.O. and J.d.C.; writing—review and editing, C.C.U., A.B. and D.O.; visualization, A.B. and J.d.C.; supervision, A.B. and J.d.C.; project administration, A.B. and J.d.C.; funding acquisition, A.B. and J.d.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research has been possible thanks to the financing of TASAR (Team of Advanced Search And Rescue Robots), funded by “Proyectos de I+D+i del Ministerio de Ciencia, Innovacion y Universidades” (PID2019-105808RB-I00) and “Proyecto CollaborativE Search And Rescue robots 709 (CESAR)” (PID2022-142129OB-I00) founded by MCIN/AEI/10.13039/501100011033 and “ERDF A 710 way of making Europe”.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data are contained within the article.
Acknowledgments
We extend a special thanks to the colleagues at the LAENTIEC Laboratory at the University of Malaga who helped with some of the DJI Mavic drone shots.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| RGB | Red Green Blue |
| CNN | Convolutional Neural Network |
| ROS | Robot Operating System |
| SAR | Search and Rescue |
| IR | Infrared |
| NIR | Near-infrared |
| NIST | National Institute of Standards and Technology |
| UAV | Unmanned Aerial Vehicle |
| UGV | Unmanned Ground Vehicle |
| TASAR | Team of Advanced Search And Rescue Robots |
| FPS | Frames Per Second |
| VDIN | Victim Detection Index |
References
- Adamkiewicz, M.; Chen, T.; Caccavale, A.; Gardner, R.; Culbertson, P.; Bohg, J.; Schwager, M. Vision-Only Robot Navigation in a Neural Radiance World. IEEE Robot. Autom. Lett. 2022, 7, 4606–4613. [Google Scholar] [CrossRef]
- Wilson, A.N.; Gupta, K.A.; Koduru, B.H.; Kumar, A.; Jha, A.; Cenkeramaddi, L.R. Recent Advances in Thermal Imaging and its Applications Using Machine Learning: A Review. IEEE Sens. J. 2023, 23, 3395–3407. [Google Scholar] [CrossRef]
- Zhang, H.; Lee, S. Robot Bionic Vision Technologies: A Review. Appl. Sci. 2022, 12, 7970. [Google Scholar] [CrossRef]
- Rizk, M.; Bayad, I. Human Detection in Thermal Images Using YOLOv8 for Search and Rescue Missions. In Proceedings of the 2023 Seventh International Conference on Advances in Biomedical Engineering (ICABME), Beirut, Lebanon, 12–13 October 2023; pp. 210–215. [Google Scholar]
- Lai, Y.L.; Lai, Y.K.; Yang, K.H.; Huang, J.C.; Zheng, C.Y.; Cheng, Y.C.; Wu, X.Y.; Liang, S.Q.; Chen, S.C.; Chiang, Y.W. An unmanned aerial vehicle for search and rescue applications. J. Phys. Conf. Ser. 2023, 2631, 012007. [Google Scholar] [CrossRef]
- Deng, L.; Mao, Z.; Li, X.; Hu, Z.; Duan, F.; Yan, Y. UAV-based multispectral remote sensing for precision agriculture: A comparison between different cameras. ISPRS J. Photogramm. Remote. Sens. 2018, 146, 124–136. [Google Scholar] [CrossRef]
- Blekanov, I.; Molin, A.; Zhang, D.; Mitrofanov, E.; Mitrofanova, O.; Li, Y. Monitoring of grain crops nitrogen status from uav multispectral images coupled with deep learning approaches. Comput. Electron. Agric. 2023, 212, 108047. [Google Scholar] [CrossRef]
- AlAli, Z.T.; Alabady, S.A. A survey of disaster management and SAR operations using sensors and supporting techniques. Int. J. Disaster Risk Reduct. 2022, 82, 103295. [Google Scholar] [CrossRef]
- Karasawa, T.; Watanabe, K.; Ha, Q.; Tejero-De-Pablos, A.; Ushiku, Y.; Harada, T. Multispectral object detection for autonomous vehicles. In Proceedings of the Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, 23–27 October 2017; pp. 35–43. [Google Scholar]
- Sharma, K.; Doriya, R.; Pandey, S.K.; Kumar, A.; Sinha, G.R.; Dadheech, P. Real-Time Survivor Detection System in SaR Missions Using Robots. Drones 2022, 6, 219. [Google Scholar] [CrossRef]
- Haq, H. Three Survivors Pulled Alive from Earthquake Rubble in Turkey, More Than 248 Hours after Quake. 2023. Available online: https://edition.cnn.com/2023/02/16/europe/turkey-syria-earthquake-rescue-efforts-intl/index.html (accessed on 14 January 2024).
- Pal, N.; Sadhu, P.K. Post Disaster Illumination for Underground Mines. TELKOMNIKA Indones. J. Electr. Eng. 2015, 13, 425–430. [Google Scholar]
- Safapour, E.; Kermanshachi, S. Investigation of the Challenges and Their Best Practices for Post-Disaster Reconstruction Safety: Educational Approach for Construction Hazards. In Proceedings of the Transportation Research Board 99th Annual Conference, Washington, DC, USA, 12–16 January 2020. [Google Scholar]
- Jacoff, A.; Messina, E.; Weiss, B.; Tadokoro, S.; Nakagawa, Y. Test arenas and performance metrics for urban search and rescue robots. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), Las Vegas, NV, USA, 27–31 October 2003 2003; Volume 4, pp. 3396–3403. [Google Scholar]
- Kleiner, A.; Brenner, M.; Bräuer, T.; Dornhege, C.; Göbelbecker, M.; Luber, M.; Prediger, J.; Stückler, J.; Nebel, B. Successful search and rescue in simulated disaster areas. In RoboCup 2005: Robot Soccer World Cup IX 9; Springer: Berlin/Heidelberg, Germany, 2006; pp. 323–334. [Google Scholar]
- Katsamenis, I.; Protopapadakis, E.; Voulodimos, A.; Dres, D.; Drakoulis, D. Man Overboard Event Detection from RGB and Thermal Imagery: Possibilities and Limitations. In Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments, PETRA ’20, New York, NY, USA, 30 June–3 July 2020. [Google Scholar]
- De Oliveira, D.C.; Wehrmeister, M.A. Using Deep Learning and Low-Cost RGB and Thermal Cameras to Detect Pedestrians in Aerial Images Captured by Multirotor UAV. Sensors 2018, 18, 2244. [Google Scholar] [CrossRef]
- Corradino, C.; Bilotta, G.; Cappello, A.; Fortuna, L.; Del Negro, C. Combining Radar and Optical Satellite Imagery with Machine Learning to Map Lava Flows at Mount Etna and Fogo Island. Energies 2021, 14, 197. [Google Scholar] [CrossRef]
- Cruz Ulloa, C.; Garcia, M.; del Cerro, J.; Barrientos, A. Deep Learning for Victims Detection from Virtual and Real Search and Rescue Environments. In ROBOT2022: Fifth Iberian Robotics Conference; Tardioli, D., Matellán, V., Heredia, G., Silva, M.F., Marques, L., Eds.; Springer: Cham, Switzerland, 2023; pp. 3–13. [Google Scholar]
- Cruz Ulloa, C.; Prieto Sánchez, G.; Barrientos, A.; Del Cerro, J. Autonomous Thermal Vision Robotic System for Victims Recognition in Search and Rescue Missions. Sensors 2021, 21, 7346. [Google Scholar] [CrossRef]
- Ulloa, C.C.; Llerena, G.T.; Barrientos, A.; del Cerro, J. Autonomous 3D Thermal Mapping of Disaster Environments for Victims Detection. In Robot Operating System (ROS): The Complete Reference; Koubaa, A., Ed.; Springer International Publishing: Cham, Switzerland, 2023; Volume 7, pp. 83–117. [Google Scholar]
- Ulloa, C.C.; Garrido, L.; del Cerro, J.; Barrientos, A. Autonomous victim detection system based on deep learning and multispectral imagery. Mach. Learn. Sci. Technol. 2023, 4, 015018. [Google Scholar] [CrossRef]
- Sambolek, S.; Ivasic-Kos, M. Automatic person detection in search and rescue operations using deep CNN detectors. IEEE Access 2021, 9, 37905–37922. [Google Scholar] [CrossRef]
- Lee, H.W.; Lee, K.O.; Bae, J.H.; Kim, S.Y.; Park, Y.Y. Using Hybrid Algorithms of Human Detection Technique for Detecting Indoor Disaster Victims. Computation 2022, 10, 197. [Google Scholar] [CrossRef]
- Lygouras, E.; Santavas, N.; Taitzoglou, A.; Tarchanidis, K.; Mitropoulos, A.; Gasteratos, A. Unsupervised human detection with an embedded vision system on a fully autonomous UAV for search and rescue operations. Sensors 2019, 19, 3542. [Google Scholar] [CrossRef]
- Domozi, Z.; Stojcsics, D.; Benhamida, A.; Kozlovszky, M.; Molnar, A. Real time object detection for aerial search and rescue missions for missing persons. In Proceedings of the SOSE 2020—IEEE 15th International Conference of System of Systems Engineering, Budapest, Hungary, 2–4 June 2020; pp. 519–524. [Google Scholar]
- Quan, A.; Herrmann, C.; Soliman, H. Project vulture: A prototype for using drones in search and rescue operations. In Proceedings of the 15th Annual International Conference on Distributed Computing in Sensor Systems, DCOSS 2019, Santorini Island, Greece, 29–31 May 2019; pp. 619–624. [Google Scholar]
- Perdana, M.I.; Risnumawan, A.; Sulistijono, I.A. Automatic Aerial Victim Detection on Low-Cost Thermal Camera Using Convolutional Neural Network. In Proceedings of the 2020 International Symposium on Community-Centric Systems, CcS 2020, Tokyo, Japan, 23–26 September 2020. [Google Scholar]
- Arrazi, M.H.; Priandana, K. Development of landslide victim detection system using thermal imaging and histogram of oriented gradients on E-PUCK2 Robot. In Proceedings of the 2020 International Conference on Computer Science and Its Application in Agriculture, ICOSICA 2020, Bogor, Indonesia, 16–17 September 2020; pp. 2–7. [Google Scholar]
- Gupta, M. A Fusion of Visible and Infrared Images for Victim Detection. In High Performance Vision Intelligence: Recent Advances; Nanda, A., Chaurasia, N., Eds.; Springer: Singapore, 2020; pp. 171–183. [Google Scholar]
- Seits, F.; Kurmi, I.; Bimber, O. Evaluation of Color Anomaly Detection in Multispectral Images for Synthetic Aperture Sensing. Eng 2022, 3, 541–553. [Google Scholar] [CrossRef]
- Dawdi, T.M.; Abdalla, N.; Elkalyoubi, Y.M.; Soudan, B. Locating victims in hot environments using combined thermal and optical imaging. Comput. Electr. Eng. 2020, 85, 106697. [Google Scholar] [CrossRef]
- Dong, J.; Ota, K.; Dong, M. UAV-Based Real-Time Survivor Detection System in Post-Disaster Search and Rescue Operations. IEEE J. Miniaturization Air Space Syst. 2021, 2, 209–219. [Google Scholar] [CrossRef]
- Zou, X.; Peng, T.; Zhou, Y. UAV-Based Human Detection with Visible-Thermal Fused YOLOv5 Network. IEEE Trans. Ind. Inform. 2023, 1–10. [Google Scholar] [CrossRef]
- Wang, X.; Zhao, L.; Wu, W.; Jin, X. Dynamic Neural Network Accelerator for Multispectral detection Based on FPGA. In Proceedings of the International Conference on Advanced Communication Technology, ICACT, Pyeongchang, Republic of Korea, 19–22 February 2023; pp. 345–350. [Google Scholar]
- McGee, J.; Mathew, S.J.; Gonzalez, F. Unmanned Aerial Vehicle and Artificial Intelligence for Thermal Target Detection in Search and Rescue Applications. In Proceedings of the 2020 International Conference on Unmanned Aircraft Systems, ICUAS 2020, Athens, Greece, 1–4 September 2020; pp. 883–891. [Google Scholar]
- Goian, A.; Ashour, R.; Ahmad, U.; Taha, T.; Almoosa, N.; Seneviratne, L. Victim localization in USAR scenario exploiting multi-layer mapping structure. Remote. Sens. 2019, 11, 2704. [Google Scholar] [CrossRef]
- Petříček, T.; Šalanský, V.; Zimmermann, K.; Svoboda, T. Simultaneous exploration and segmentation for search and rescue. J. Field Robot. 2019, 36, 696–709. [Google Scholar] [CrossRef]
- Gallego, A.J.; Pertusa, A.; Gil, P.; Fisher, R.B. Detection of bodies in maritime rescue operations using unmanned aerial vehicles with multispectral cameras. J. Field Robot. 2019, 36, 782–796. [Google Scholar] [CrossRef]
- Qi, F.; Zhu, M.; Li, Z.; Lei, T.; Xia, J.; Zhang, L.; Yan, Y.; Wang, J.; Lu, G. Automatic Air-to-Ground Recognition of Outdoor Injured Human Targets Based on UAV Bimodal Information: The Explore Study. Appl. Sci. 2022, 12, 3457. [Google Scholar] [CrossRef]
- NIST, National Institute of Standards and Technology: Gaithersburg, MD, USA, 2022. Available online: https://www.nist.gov/ (accessed on 14 January 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).