Application of the Infrared Thermography and Unmanned Ground Vehicle for Rescue Action Support in Underground Mine—The AMICOS Project

: Extraction of raw materials, especially in extremely harsh underground mine conditions, is irrevocably associated with high risk and probability of accidents. Natural hazards, the use of heavy-duty machines, and other technologies, even if all perfectly organized, may result in an accident. In such critical situations, rescue actions may require advanced technologies as autonomous mobile robot, various sensory system including gas detector, infrared thermography, image acquisition, advanced analytics, etc. In the paper, we describe several scenarios related to rescue action in underground mines with the assumption that searching for sufferers should be done considering potential hazards such as seismic, gas, high temperature, etc. Thus, possibilities of rescue team activities in such areas may be highly risky. This work reports the results of testing of a UGV robotic system in an underground mine developed in the frame of the AMICOS project. The system consists of UGV with a sensory system and image processing module that are based on an adaptation of You Only Look Once (YOLO) and Histogram of Oriented Gradients (HOG) algorithms. The experiment was very successful; human detection efﬁciency was very promising. Future work will be related to test the AMICOS technology in deep copper ore mines.


Introduction
Underground mining operations are inextricably related to various natural hazards that pose an increasingly higher risk on the health and life of underground workers. As the depth of excavation and concentration of mining processes grows, the significance of natural hazards, namely, poisonous gases occurrence, climatic hazard, roof collapse risk, etc., is going to increase, creating a demand for new tools of security investigation and methods to be used in rescue actions. The machinery used in deep underground mines will also contribute to the growth of this type of natural hazard, due to their increasing power, by radiating heat and emitting exhaust fumes. The impact of increased temperatures and humidity on underground mine's workers is harmful to their health. It may lead to the disruption of many body functions, causing health issues, such as weakness, fainting, cramps, vomiting, dehydration, or heart attacks [1]. As the area of active workings is vast, with widely dispersed work places, and sectors with insufficient air supply or unacceptable concentration of dangerous gases may occur in case of some accidents, there is a need for development of robotics in the area of rescue operations and safety assessment. Development of automation and technical advancement is an obvious contributor to increased efficiency, allowing to still profitably extract raw materials in the reality of ultimately depleting easy-accessible ore deposits. However, its potential should be also used to the fullest in order to make miner's workplaces safer, until fully remote or autonomous operation of whole mines is possible.
A teleoperated or autonomous robot can be sent to dangerous zones, equipped with, for instance, some basic first-aid means, an additional oxygen apparatus, and water to locate a person in danger, providing help without exposing the rescue team to undue risk.
There are many examples of application of mobile robots for rescue actions to be found in the literature (it will be discussed also later). In [2], an example of small, high-powered robots designed for rescue actions related to collapsed buildings and traffic accidents is presented. The five robots described in [3] comprised a fire brigade-aid system, allowing detecting people, ensuring an overview of the accident site, being able to perform some handling tasks, and creating virtual three-dimensional maps of the underground/insidebuilding areas. In [4], the unsuccessful use of unmanned ground vehicles at mudslides was studied, as well as what contributed to the identification of some crucial problems to be faced when implementing such a technology in new domains. Strictly mining-related implementations of rescue robotics can be found in [5]. The authors highlighted some significant features of the underground environment to be taken into consideration when designing robots for such a purpose, and based on the similarities with other subterranean applications, they presented thirty-three requirements for the design of an effective rescue solution.
To perform a rescue operation or an investigation of underground workings, the robot has to be equipped with an appropriate mobile platform, steered by a control system, tuned to the underground conditions (humidity-resistant, able to move on uneven ground), possibly lightweight-to enable long battery-driven operation (Section 3.1). It must be equipped with reliable sensors, for example, infrared camera for human detection and depth camera enabling navigation, indoor localization of the robot, and recognition of a human based on image processing (Section 3.3). Additionally, if possible, a localization system based on an underground wireless communication network should be included, while in the case of lack of it, or temporal not functioning, data storing has to be ensured, with the possibility of quick retrieval and interpretation when an UGV returns from a mission.
This paper presents a design of a remotely controlled UGV for rescue actions support in underground environments and the results of field tests of the human detection functionality. The method of detection, based on an open access algorithm, utilizing infrared (IR) thermovision for so-called "hot-spot detection" and RGB (human eye visible colors) depth images-to verify human presence, is described. The structure of the paper is as follows. First, we recall some important work in this dynamically developing area, then we describe proposed inspection robot for image-based human detection purposes. The inspection robot has been validated in inactive underground gold and arsenic mine "Zloty Stok" in Poland. Several potential scenarios have been defined and executed. Acquired inspection data have been processed using adopted image processing algorithms called YOLO and HOG developed in [6][7][8], respectively. The conclusions close the paper.

State-of-the-Art
What had only been an ambitious vision and remained a prototype struggling with limited computational power as well as other technical problems (e.g., supplying a sufficient amount of energy for prolonged operation while maintaining a relatively light and mobile form) started blooming after the first ever reported usage of robotics for urban search and rescue-the World Trade Center disaster-related operations [9,10].
Murphy [11] presented a brief tutorial on the application of unmanned vehicles for the urban health and rescue, with the focus put on problems related to human-robot interaction. In addition to the grounded domain theory consisting of workflow and information flow models, helpful from the point of view of the human-robot interaction analysis, some issues to be faced for the sake of appropriate preparation of skilled operators have been underlined. In [12], tools for data association and navigation in uneven surroundings have been presented, which allowed to successfully create 3D maps with satisfactory accuracy. The robot tested by the authors managed to perform fully autonomous image acquisition in some of the corridors in an abandoned mine, where the path was relatively simple to navigate. A lot of research efforts was made to enable autonomous operation of robots, the crucial part of which is to ensure avoiding of obstacles. A specific example of an obstacle avoidance system designed for mine rescue robots can be found in [13]. In the paper, the structure of a sensor system and fuzzy reasoning-based obstacle avoidance technique has been shown. A rescue multi-robot system built by the authors has been presented in [14], that used a set of node robots to create a communication path based on Zigbee standard. To enhance overpassing of obstacles an interesting articulated snake-like design has been suggested for one of the robots. In the project record paper of Center for Mining Innovation in South Africa from 2012 [15], the so-called Mine Safety Platform has been introduced. Robots of both types: tracked and legged, were tested in order to identify the design criteria for a reconfigurable robot, which role could be to inspect hanging walls after blasting (from the safety point of view) or assess the gold grade in mining faces. The problem of visualizing data in the best interpretable and understandable way has been also raised in the work. The authors of [16] focused on the navigation of rescue robots in underground mining, contributing an improved algorithm capable of fusing multisensor data. Using of the Back Propagation neural network (BPNN) was proposed as a tool for improving the matching degree of data from various sensors, and a combination of the Extended Kalman Filter (EKF) with BPNN was proven to be effective in solving the mismatching problem in data fusion of the classic EKF approach. Few significant examples of UGVs used in underground mines have been presented in [17] and analyzed, with the aim to derive functional requirements of this type of technology. The article in [18], studying the mechatronics and information and communication technology development in the mining industry, provides some examples of both: inspection mobile robots to be used in the catastrophe-affected areas of mines and robots which can perform some basic labor. The review article in [19] drew some conclusions from the analysis of multiple rescue robotics applications in Chinese coal mines, highlighting technological, market, and policy license limitations and problems to be solved in order to create practical solution. What is more, an example of recent local research at the Wrocław University of Science and Technology can be given. The authors of [20] have proposed to apply an Unmanned Ground Vehicle (UGV), equipped with RGB, thermal imaging cameras, and a localization system for the maintenance of belt conveyor. The main task for the robot presented in the above mentioned paper was to identify overheated idlers basing on the detection of so called "hot-spots"; however, it was suggested that the functionality could be extended by equipping it with some gas and environmental parameters sensors (temperature, humidity, air velocity). More examples of the implementation of robotics for raw material industry can be found in the following works [21], where legged robots for inspection purposes have been presented among others in [22][23][24] with examples of micro-aerial vehicles.
It is of high importance to find an appropriate technique for the first step of the human detection process, which is generating of so-called "regions of interest" (ROI) or "hot-spots". The simplest, however, least flexible, way of doing it, is to establish a threshold of intensity allowing to differ a warmer person and colder surrounding. A few of the possible solutions providing pliability and increased accuracy of ROI generation are mentioned below. In [25], instead of using a rigid intensity thresholds, the authors based the generation of regions of interest by combining image segments, using the low-frequency features of infrared images. It helped to make the method less dependent on the brightness and flexible. Another example can be found in [26], where the use of a statistical tool of adaptive intensity oriented projection has been presented. Other approach to identification of candidate regions has been proposed in [27]. The shape-independent procedure used in this case utilizes vertical/horizontal variance curves.
Real-time human recognition has been intensively developed in last two decades, mainly for military applications and autonomous cars industry. In [28], a solution for detecting and tracking pedestrians with use of a sole digital infrared camera has been shown. In this case, support vector machine (SVM) has been utilized to define if the objects detected initially as hot-spots are humans or not. In [29], a 3D Laser Detection and Ranging (LADAR)-based detection and tracking strategy is presented, the distinguishable ability of which was driven by fusing multiple line and 3D scanners data. In order to distinguish human and non-human objects, the Strength of Detection (SOD) measure has been used. An example of detection system's module for the recognition of pedestrians, together with some suggestions of possible improvements of evaluating human's presence in both nocturnal and daylight environments can be found in [30]. The authors of [31] highlighted the need for combining high resolution imaginery with the range data in order to increase the performance of detection systems. In a further work devoted to a system enabling the detection, localization, and tracking of humans for military UGV's [32], the authors shown a method to extract shape features from segmented stereo vision scenes. A review of infrared camera-based features detection methods for military UGVs, including human recognition, can be found in [33]. Other objects possible to be detected, that are significant from the point of view of navigation of such vehicles are other vehicles, water and mud, tree trunks, or narrow negative obstacles. The authors have underlined an important benefit of IR-based features perception, which is the ability to perceive through obscurants, what is extremely important in the cases of rescue actions in underground environments. A feature descriptor called Histogram of Oriented Gradients (HOG) together with support vector machine, which have been used to localize moving humans has been presented in [34]. The authors of [35] contributed to increasing the real-time efficiency and accuracy of IR pedestrian detection methods by proposing three improvements: introduction of two candidate filtering units in order to decrease the probability of false alarm occurrence; improved head identification method based on brightness and gradient magnitude; multiframe approval matching rule to obtain higher detection rate. Another example of an advanced fast detection system, which can be used to identify humans (among 9000 object categories), was presented in [6]. The authors focused on making improvements in localization ability and recall the previous version of so called "you only look once" (YOLO) framework. To enhance distinguishing a person from the background an adaptive Boolean map-based saliency model has been later appended, what allowed to improve thermal image detection of pedestrians [36]. In one of the most recent works, the authors of [37] have shown an advanced pedestrian detection method, using convolutional neural network, accurate in various weather conditions, comparing it to five other common methods.

UGV Platform-A Brief Description
The mobile platform's drive system comprises of two 24 V, 250 W direct current motors and an integrated gearbox of cylindrical type. An additional gear wheel is mounted on the motor shaft, which together whith the one on the drive shaft ensures a gear ratio of 1.12, allowing to increase the torque and reduce speed even more. The drive shaft is embedded in two bearings, which join it to the frame. Each wheel is driven separately by an individual DC motor. The position of the wheels is measured by an encoder mounted on their axle. On this basis, the distance traveled by the robot is determined (see Figure 1).

Control System
At the current phase of the project, the UGV platform is controlled by remote controller (see Figure 2).
To control the robot, the signals are transmitted wirelessly using one of the 2.4 GHz band channels. The operator can control the robot by observing the area in front of it based on the image provided by the image transmission module (Vid) at 5.8 GHz. The Pulse Position Modulation (PPM) signal is decoded in the micro-controller (uC) and transformed into a speed and direction signals for the individual motors. It should be highlighted that in parallel the AMICOS (Autonomous MonitorIng and COntrol System for mining plants) consortium is working on fully autonomous mobility to fulfill the requirements defined in the project.

Sensory System
The detection module contains two vision sensors: one RGB video camera and one infrared thermomvision camera. The parameters of the RGB camera have been selected appropriately so the angle of view of both cameras is similar. Both devices were mounted on a rotating bracket, which, by performing a swing movement, allowed to acquire image in front of the robot and on both of its sides. Rotational speed has been smoothly adjusted with use of a dedicated module. In addition, in front of the vehicle a stereo-vision camera has been attached, which provided additional spatial information-depth estimation (see Figure 3). The area in front of the robot has been illuminated with 48 W led lamps connected to a stand located behind the cameras (Figure 4). There has been also an additional camera mounted on the mobile platform, which served for the observation of the surface in front of the vehicle. The vision data has been transmitted wirelessly to an external remote control panel. The block diagram of the sensory system, together with its power supply is shown in the Figure 5.

The "Zloty Stok" Gold and Arsenic Underground Historic Mine
The Złoty Stok gold and arsenic mine is an inactive mining object, located in Góry Złote (Eng: Golden Mountains, Cz: Rychlebské hory; Ger: Reichensteiner Gebirge) in Eastern Sudetes (see Figure 6: the map of Złoty Stok ore mining area). The gold and arsenic deposit is located within the northern part of the Złoty Stok-Skrzynka Tectonic Zone, which contacts younger carboniferous granitoid rocks of Kłodzko-Złoty Stok Massif from the west. The main rock types in the northwestern part of this mountain range in Poland are syenites, amphibolites, and crystalline limestones. In the southeastern part, methamorphic rocks, like crystalline shales and gneisses predominate, are accompanied by some effusive rocks-basalts. Figure 6. The map of Złoty Stok ore mining area (from in [38]). Legend: 1-mining area, 2limestones, 3-ore nests, 4-mining waste heaps, 5-slants, galleries and adits, 6-shafts.
The rocks hosting the ore deposit are of polymethamorphic type with well-developed blastomylonythic series [39]. The exploitation of the ore has begun in the thirteenth century and lasted until the early 1960s of the twentieth century. It was the oldest Polish gold mine and a significant provider of arsenic oxide, which became its main product from the 18th century [38]. Many of the underground corridors, located in strongly cracked and deformed massive have been made manually and with use of explosives in the later years of exploitation. As mining activities developed, the exploitation of the ore concentrated in four mining fields: in the Western Field (I), at the bottom of the Mount Haniak, Biała Góra field (II)-most south from the Złoty Stok city, Góra Krzyżowa/Eastern field (III)-on the western hillsides of Złoty Jar valley and Góra Sołtysia field (IV)-east of the valley (See Figure 6).
The orebody has been accessed gradually, from above, through a system of shafts and adits see Figure 7. In 1920, the two kilometer long Gertruda adit was built, that connected all the mining fields. During II World War, and in the period shortly after, the mine was closed, most of the adits and chambers were flooded and buried. Then, after about 30 years, some elements of infrastructure-including adits, have been made available for tourists. One of them is the above mentioned Gertruda, and the second is the so called Sztolnia Czarna (the Black Adit), the entrance to which is located in the upper part of Złoty Jar valley. It allows access into former exploitation chambers and is connected with a 300 m long underground corridor located at the lower level, having its exit near the Gertruda adit.

The Scenarios of Experiments
Since the underground mine's environment is complex, rich in elements of wall and roof support, with the uneven surfaces of the corridors' walls and other local changes in the excavations' cross sections, distinguishing a human from the background is a challenging task. In an emergency situation, an injured or scared person is very likely not to be in a standing position and sometimes to be partially covered or hidden in a recess. Taking the above mentioned issues into consideration the experiment has covered multiple cases of human detection, including the ones with a non-obvious human-surrounding layout. The method of human detection has been tested on the following cases (Figure 8a-d). Results are presented in Section 6.

General Concept
The first element of the proposed algorithm is the efficient detection of the human. The second significant issue is ensuring the method's immunity to interference and the obtaining of unambiguous classification. In order to increase the certainty of human localization, the fusion of sensors and object detection methods have been used. Additionally, input data retention has been applied to allow for human detection in the emergency situationswhen the person is in the lying position. Probably, a machine learning approach, namely, pretraining of the classificator, would be more appropriate; however, at the current stage of testing, it was difficult to obtain enough images to prepare the reference data.

HOG Algoritm
As the first step of the vision-based human recognition process, the Histogram of Oriented Gradients (HOG) feature descriptor has been employed, which is accessible from an OpenCV library [40]. The method involves computing of the gradient orientation in each section of the image divided by a dense grid of evenly distributed cells (detection windows/ROIs) [8]. HOG-based algorithms are used, among other for the pedestrian detection [41,42], including applications based on thermovision digital images [43]. The sequence of the operations to be performed in order to detect objects with use of HOG descriptor, is presented in the right dashed frame in the Figure 9. The image acquired by the camera can be subjected to preprocessing: normalization, brightness and contrast correction, or gamma correction. In the first phase of the experiment there was no preprocessing performed, to evaluate the performance of the algorithm on the raw data.

YOLO Algorithm
For the detection and further recognition of the human, a neural network-based approach can be also used, like, for instance, recently intensively upgraded system for the real time object detection You Only Look Once (YOLO) [7]. The YOLO is a real-time object detection system that uses a wide scope of ideas to get the effects of standard fast convolutional neural networks (CNN) not only with an equal efficiency, but also faster and with less input needed. In opposition to most of the classifiers trained and used on smaller resolutions (256 × 256), YOLO can be trained on a resolution of 224 × 224, that is increased to 448 × 448 for the detection purposes, which already gives a better results in mAP (mean Average Precision) [6].
In addition, YOLO shows high flexibility in the resolution part of training, allowing for the multi-scale training, what gives an option to train on datasets with wide range of possible resolutions. In this way, it is possible to customize final network to suit different tasks. For example, lower resolutions make most sense for systems that do not have much computing power to start with or focus on multiple stream analysis. On the opposite spectrum, higher resolutions give better results in term of detection accuracy without giving up on computing speed suitable for the real time detection.
Other ideas, like use of anchor boxes for bounding boxes prediction and making use of more tall, thin boxes instead of short, wide boxes allows the model to start with a better representation, allowing the network to learn patterns faster. Its directly connected to learning through dimension clusters instead of hand-picked priors [44].
For a further improvement of performance, mostly in accuracy and flexibility area, comes the method of hierarchical classification, that also allows merging of multiple datasets. Labels on the tree are based on ImageNet, an online lexical database describing relations between certain objects and structures [45]. The latest version of the algorithm (YOLOv3) further improves already established methods and throws in more ideas, like bounding box and class prediction [7].
This system utilizes a pretrained model to classify the object as a one from its available sets of thousands of classes [46]. Both pretrained and the ones to be trained based on self set of images models can be used; however, the training process demands large datasets and is time-consuming.
In this work, a pretrained model based on YOLOv3 algorithm was used for the task to be done in a novel environment, namely, a human detection in underground mine. Parameters of the model used consist of 416 × 416 input based on COCO dataset with possibility of detecting 80 different, mostly dynamic objects. This allows for testing on an already proven network, eliminating possibility of errors arising from training on a new one. At the same time, the model used has not been trained with this environment in mind, what can lead to worse performance in terms of accuracy.

Decision-Making
It has appeared that even such powerful algorithms may provide false detection or may not detect the object at all. The idea in the paper is to combine the information from four sources: results of RGB-based human detection and results of IR-based human detection by two algorithms. It is proposed to use Formula (1), which provides final decision as a sum of partial decision.
where W are the weights, and D 1 and D 3 are the results of processing by the YOLOv3 algorithm of IR and RGB image, respectively. Similarly, the D 2 and D 4 are the results of processing by the HOG algorithm of IR and RGB image, respectively. The weights corresponding to the YOLOv3 have the higher values, because the empirical analysis indicates that it returns correct results more frequently. Similar idea is applied to the data source: IR images are more likely to detect human in difficult lighting conditions, so configurations using IR source are weighted higher. Those rules are summarized in weight matrix presented in Table 1. It should be noted that the results obtained by the HOG algorithm are in binary form (0, 1), while in the YOLOv3 it is given as a percentage. In the presented procedure the percentage values are transformed to binary decision, i.e., if % is less than 50 we set it to zero; otherwise, set it to 1. Table 1. Weights assigned to every configuration.

YOLO HOG
Classifier C defined in Equation (1) takes a value between 0 and 8 and can be interpreted as follows.
• If the C ≥ 5 the procedure recognize the detected object as a human. • If the C ∈ [3,4] the procedure recognize the detected object as a probably human. • If the C ∈ [1, 2] the result of the procedure cannot be classified. The results of the procedure are transmitted to the remote operator, who will make the decisions. • If the C = 0 the procedure not detected any object.

Results
Few specific cases with different positions of human body have been considered in order to obtain a realistic testing set for the image analysis algorithms (YOLOv3 and HOG). The algorithms have been set up to work on the input data from both: IR and RGB cameras. The detection scenarios have been presented in this section juxtaposing: the RGB camera image on the left and the IR camera image on the right. The results are presented consistently in the same order (the YOLOv3 algorithm is the first one, and the HOG is the second). Standing people, facing the camera or turned back to it; crouching; and partially covered bodies have been detected with use of both, or one of the methods. The results of the experiment are presented in the form of figures in which, in the case of successful detection, human distinguished from the background is enclosed in a rectangle.

•
Human standing in the front view. In Figures 10 and 11 the results of detection of the standing person in the front view are presented. Both of them properly detect the standing person by using the IR image. Unfortunately, when RGB image is used, only the YOLO algorithm is successful. The HOG was not able to detect human in this case.  • Human standing rear view. Figures 12 and 13 show the results of detection of a standing person in rear view. In this case, regardless of the image type used (RGB or IR), correct human detection was obtained.      • Human lying-case 1. In Figures 18 and 19 the detection of lying person (case 1) is presented. The YOLO algorithm was able to detect human only on the IR image, but HOG provided a false detection result. The HOG generated a ROI that does not contain a human as based on the IR image. What is worth noting in this case, the YOLO did not detect human using RGB image, while the HOG identified two ROIs (both contain human).  • Human lying-case 2. Figures 20 and 21 show the results of detection of lying person (case 2). Both algorithms have properly detected ROIs based on images of both types (RGB and IR). However, these results are a bit ambiguous because the algorithms have been trained to detect human in the most typical position (standing). In the case of lying person, the algorithms did not provide proper results. To overcome this problem, the methodology presented in this paper included rotating the image by 90°, 180°, and 270°. Such a transformation allows to detect human. The HOG algorithm pointed out the human, however, identified ROI does not fully cover the object.  It is possible that the algorithms will not properly recognize the human, even in the case when the IR image is used, on which the person's presence is obvious for an observer (see Figure 22). We have found that basic preprocessing consisting of contrast and brightness adjustments or histogram equalization may help (see Figure 23).  The result of detection depends on many factors: lighting, distance between an object and camera, and others. YOLO's neural network detects human with a certainty at the level of 85-100 percent in general. However, it may also provide false detection (or no detection).
As presented in Figure 24, it may also happen that RGB image analysis will provide misleading ROI identification. To prevent such situations, the final decision about classifying of an object should not be based on a single case, but a sequence of images should to be loaded and analyzed to confirm the detection event. Moreover, the classifier C proposed in the Section 5.4 takes into consideration the results of both algorithms and both types of digital images to provide the correct decision.

Conclusions
In order to increase the certainty of identification, the method should not base on an instantaneous detection event but a sequence of frames. Currently, it is assumed that the robot will provide the information about the localization of injured or unconscious human in form of markers on its known path, when it returns to the base. It can be equipped with some basic first-aid equipment, additional oxygen apparatus and water, which may significantly increase the survival time, before the rescue team can reach the place of accident. The rescue and help functionality can be upgraded in the future by means of a wireless communication path, to be created by dropping of the RFID or Bluetooth nodes on the way to maintain constant connection with the base, what would enable real-time reporting of human detection.
The AMICOS UGV robotic system for supporting of rescue action in deep underground mines has been presented the paper. The mechanical part, control system, sensory system is original concept developed in the AMICOS. The procedure for human detection base on YOLOv3 and HOG image processing algorithms. These algorithms have been adopted and combined into decision making procedure that allows to detect human in RGB/IR images. In the paper, the classifier has been proposed. It combined the results of both algorithms with different weights in order to obtain higher reliability of detection. Additionally, the several scenarios have been presented that may happen in a real mine. The system has been tested in the inactive underground mine "Zloty Stok" in southwest Poland. The trial has appeared to be very successful and detection ability are very promising before test in active deep underground copper mine. The authors would like to recall that this is continuation of already published work focused on hot spot detection belt conveyor maintenance.
The used image classification systems are freely available tools trained to detect pedestrians in everyday life (street, home, etc). During the experiments conducted in the scope of this work, it was not able to detect lying person; thus, the authors decided to rotate the image, which significantly helped in terms of detection ability. The use of an existing neural network model allowed to avoid complicated training process based on underground mining-related data.
Future work will comprise collection of the data for training of the image classification system. It demands numerous pictures of humans in various positions in the underground environment to be taken and time consuming labeling process. Moreover, autonomous operation is planned to be developed with possible simultaneous 3D mapping of excavations.