MineBL: A Battery-Free Localization Scheme with Binocular Camera for Coal Mine

Accurate localization in underground coal mining is a challenging technology in coal mine safety production. This paper proposes a low-cost battery-free localization scheme based on depth images, called MineBL. The main idea is to utilize the battery-free low-cost reflective balls as position nodes and realize underground target localization with a series of algorithms. In particular, the paper designs a data enhancement strategy based on small-target reorganization to increase the identification accuracy of tiny position nodes. Moreover, a novel ranging algorithm based on multi-filter cooperative denoising has been proposed, and an optimized weighted centroid location algorithm based on multilateral location errors has been designed to minimize underground localization errors. Many experiments in the indoor laboratories and the underground coal mine laboratories have been conducted, and the experimental results have verified that MineBL has good localization performances, with localization errors less than 30 cm in 95% of cases. Therefore, MineBL has great potential to provide a low-cost and effective solution for precise target localization in complex underground environments.


Introduction
Coal-as one of the energy mineral resources-occupies an important position in the production and consumption of primary energy. However, due to the harsh underground mining environment and the constantly changing working face, it is particularly crucial to ensure the safety of underground personnel and production efficiency effectively [1]. The localization technology in underground coal mines plays a prominent role in safe mining, personnel monitoring and scheduling, and underground post-disaster rescue [2,3]. Due to the complex underground environments and technical challenges, the wireless localization effect of underground targets is unsatisfactory [4,5]. Therefore, achieving the precise localization of underground targets has always been important to the safe production and intelligent construction of coal mines.
(2) Low localization accuracy issues: Harsh underground coal mine environments as well as complex and changeable tunnel structures, lead to significant multipath effects and severe signal attenuations of wireless signals, resulting in low location accuracy of wireless signal-based underground localization techniques [22][23][24]. (3) High deployment cost issues: The deployment of active localization base stations based on the above underground local-ization techniques has high costs/maintenance costs, which widely hinder applications in large-scale underground coal mines [25][26][27].
The underground localization technique based on vision sensors has become a recent research hotspot for several reasons, e.g., no electromagnetic interference and low cost [28]. This technology achieves underground target localization by identifying position nodes in images and obtaining location information of the nodes [29][30][31][32]. However, the traditional visual localization techniques based on monocular [33,34] or binocular cameras [35] are easily influenced by unfavorable illumination conditions in the underground, resulting in low accuracy of underground target recognition and large distance calculation errors. Compared to monocular or binocular cameras, depth-sensing cameras can carry out active ranging and they are robust to dark environments [36][37][38]; they have good application prospects in underground coal mines. Nevertheless, most of the depth-image based localization techniques need to locate the target carrying a camera [39], which may incur high deployment costs due to the high mobility of underground personnel and equipment. Moreover, some localization schemes use fixed cameras for locating [40]. However, due to multiple production processes and frequent changes in the working face, such localization solutions require fixed cameras to be deployed several m apart or cameras to be redeployed as the working face moves. The significant deployment and maintenance costs may limit their applications in underground environments. Therefore, a low-cost and easily deployed battery-free underground localization system is urgently needed.
To solve the above problems, this paper proposes a low-cost battery-free localization based on depth images in underground coal mines, called MineBL. The low-cost battery-free reflective balls are utilized as position nodes to realize the underground target localization by combining a series of localization algorithms. Figure 1 illustrates the main ideas of the localization method proposed in this paper. A depth camera is introduced to identify the position nodes, and then the nodes with known coordinates are taken as the base stations, while those carried by the localization targets are regarded as the unknown nodes. The distance between the base stations and the unknown nodes is measured to obtain the location information of the localization targets. However, many challenges still exist when implementing this localization scheme, mainly in the following aspects. Firstly, the complex underground environments make the system extremely difficult to recognize the deployed tiny position nodes. Secondly, the depth images collected by the depth cameras often contain a lot of noise, which may make the range accuracy of the system unstable. Thirdly, the accuracies of the traditional multilateral localization methods highly depend on the range accuracy, leading to the low localization accuracy of the system. In order to solve the first challenge, this paper proposes a data enhancement strategy based on smalltarget reorganization and further designs a micro-position node recognition algorithm for complex underground environments. For the second challenge, a multi-filter cooperative denoising algorithm was designed to obtain the relative distance between the base stations and targets. For the third challenge, we present a weighted centroid localization algorithm based on multi-location errors, which determine the weighting factors according to the error size of every location data. To summarize, the main contributions of this paper are as follows.

•
The paper proposes an underground battery-free localization scheme called MineBL.
To the best of our knowledge, it is the first paper to propose a low-cost battery-free localization scheme based on depth images in an underground coal mine, which can realize the accurate and safe localization of underground targets. MineBL can be deployed on a wide range of mobile devices, such as inspection robots, which has good mobility and can migrate with underground working faces. • The paper also proposes a novel range algorithm based on collaborative denoising of multiple filters and has designed an optimized weighted centroid localization algorithm based on multilateral location errors to minimize underground location errors. The above methods can be applied to other underground localization systems.

System Overview
This section provides a detailed overview of the overall MineBL framework. MineBL devices include position nodes (base stations and unknown nodes), depth cameras, and an edge computing node. As shown in Figure 2, localization base stations with known coordinates are prepositioned underground, and the targets with unknown nodes move underground. The position node recognition algorithm identifies the position nodes from images captured by depth cameras through deep learning methods. In particular, the data enhancement strategy of small-target reorganization is proposed to increase the recognition accuracy of small position nodes in complex coal mine environments. Based on the multi-filter collaborative denoising algorithm, the depth information is denoised to obtain accurate information on depth, and the relative distance between base stations and the targets is calculated by combining the two-dimensional information. Finally, an optimized centroid algorithm based on multilateral localization is proposed, which takes the reciprocal of coordinate errors as the weight and integrates the localization results of multiple sets of data to obtain the location of the targets accurately.

Data enhancement Recognition model Recognition model
Position node recognition Base Station#3 Base Station#n

Ranging Algorithm
Localization Algorithm

Position Node Recognition
It is difficult to detect tiny position nodes in complex coal mine environments. On the one hand, small targets contain fewer pixels and have little influence on the overall loss function of the images. On the other hand, small targets are prone to be mismarked and missed. To solve these problems, this paper proposes a data enhancement strategy based on small-target reorganization to enhance the recognition accuracy of position nodes. Then, the YOLOV5 model is used to identify the position nodes in the images.

Data Collection
Position node images were collected from indoor corridors and mines with different angles, different distances, and different light intensities, respectively. A total of 2200 images were collected for the dataset-1100 images of position nodes under the indoor corridor and 1100 images of position nodes under the indoor corridor. For the collected data sets, annotation software was used to label the position nodes of each image, and the labeled data sets were divided into the training set and test set, with 1760 training sets and 440 test sets in total.

Data Enhancement
To enhance the robustness of the model, a data augmentation strategy was used for the training dataset. Firstly, basic data enhancement techniques, such as geometric transformation and color domain transformation, were employed to expand the dataset. Geometric transformation enhancement refers to the geometric transformation of the images, including flipping, rotation, cropping, deformation, scaling, random cropping, and other operations [41], to reduce the impact of the camera angle as well as the distance of the object from the camera on the final recognition results. The data enhancement of color transformation changes the content of the images, so common conducts include adjusting brightness or saturation and noise injection [42]. Randomly adjusting the brightness or saturation of an image within a certain range can make the data set suitable for different lighting conditions. Noise injection involves injecting normal distributed Gaussian noise into the images so that the model can learn more stable features of the dataset.
Although the above data enhancement methods expand the training data to a certain extent, the number of positive samples of small targets is much smaller than that of large targets, which still leads to the model training imbalance [43]. To solve this problem, this paper proposes a data enhancement algorithm based on small-target reorganization, including the parts of copy-paste and histogram matching, which can enhance the recognition accuracy of position nodes by increasing the frequency of the position node occurrence in the images. Specifically, the copy-paste method is used to generate more position nodes and increase the number of position nodes in images, as well as the number of images containing position nodes in order to make the training more balanced. Meanwhile, under the condition of diversified illumination levels between images, if the position nodes of an image are simply inserted into another image, the added position nodes will not match the background and introduce a lot of noise into the newly generated images. Therefore, the paper introduces the histogram matching method of images on the basis of the copy-paste strategy. Histogram matching means that the histogram of one image is registered according to the histogram of the other image so that the histogram distribution of the two images is consistent. Before the "copy-paste" of the position nodes, an image of the position nodes is histogram-matched with the other to be inserted so that the position nodes are better integrated into the images to be inserted.

Recognition Model
Due to the different appearances, shapes, and postures of various objects, and the interference of illumination and occlusion, target recognition is the most challenging problem in computer vision. Th YOLO algorithm is one of the most popular algorithms in target recognition [44][45][46]; YOLOV5 was selected for the target recognition of position nodes in the paper. The position node detection model based on YOLOV5 is shown in Figure 3. The model is mainly composed of the focus module, convolution, and batch normalization (CBL) module, cross-stage partial (CSP) module, scaling cross-stage partial (SPP) module, standard convolution (Conv) module, and the tensor stitching (Concat) module. CSP1_x means that the CSP1 has x residual units, CSP2_x means that the CSP2 has 2x residual units, and x affects the depth of the network structure. The detection model is divided into the input, backbone, neck, and head according to the processing stages. Specifically, the input stage preprocesses the image, the backbone stage extracts feature information from the input image, the neck stage fuses the feature information extracted from the backbone part and the head stage calculates target category probability and position coordinates through loss functions to obtain target prediction results.  The recognition process of position nodes is as follows: First, the input image is divided into N × N cells, and each cell generates a prior box for the targets of different scales, such as large, medium, and small. If the center of the target is in a grid, the prior box of the grid is responsible for identifying the target. As shown in Equation (1), YOLOV5 uses the confidence degree c to represent the target classification probability and the performance of matching targets in the prior box.
where P is the target probability in the prediction box. If there is no target in the prediction box, P is 0, otherwise 1. H is the intersection ratio between the predicted box and the real box. Then, the segmented images are normalized, and the normalized dataset is sent to the lower layer feature extraction network for feature extraction. Next, the prediction box is set by k-means clustering, and the position of the prediction box is calculated. According to the offset value of the predicted coordinates, the location of the target center point and the width and height of the prediction box are calculated. Finally, the position node identification result is the output.

Ranging Algorithm
After the position nodes are detected by the deep learning model, the relative distance between the base stations and the localization targets can be obtained by using the information from depth images. However, many noises exist in the original information, so the localization accuracy is unsatisfactory. To solve this problem, the paper proposes a multi-filter collaborative denoising range algorithm. The noises in the original information of depth images are processed to obtain the exact absolute depth of position nodes firstly. Then the relative distance between base stations and the location targets can be acquired by combining the two-dimensional information.

Multi-Filter Cooperative Denoising
Firstly, an edge-preserving spatial filter is adopted to remove noises while preserving edge details. For the depth images based on triangulation, the noises increase by a square as the distance from the objects to the cameras increase, resulting in excessive smooth edge information of the near images and insufficient smooth edge information of the distant images. Thus, converting the depth images back to disparity images preserves the edge details of the distant objects since disparity is approximately inversely proportional to depth.
Secondly, a filling filter is employed to fill the black spots in-depth images. As the main camera of depth adopted in the paper is the left camera, the number of objects seen by the left or the right cameras is usually inconsistent. Specifically, when the objects seen by the left camera have never appeared in the right camera, the depth of these locations cannot be calculated, which is reflected in-depth images as a black dot with a depth value of 0, due to the lack of disparity information. Moreover, the black dots also appear when overexposed or underexposed, or when objects are too close to the camera, less than the minimum detectable distance. As shown in Figure 4, the maximum (farthest) of "gray" pixels are selected in the paper to fill the black spot area.

MAX
Fill with MAX of "gray" pixels Finally, a time filter is utilized to remove the noises caused by conditional changes. As the conditions may change, such as light changes, dynamic changes, and noises caused by the infrared emitters, the past apparent information may be lost, and excessive noises may be introduced, because there is no connection between the frame information provided by the cameras; that is, each frame is independent. Thus, the methods of the time filter and the exponential moving average are used to "remember" the depth value of a pixel in the previous frame number and take the average value. The exponential moving average, also known as weighted moving average, is an averaging method that gives higher weight to recent data. The EMA value at time t is shown as Equations (2).
where a represents the degree to which the weight is reduced, Y t represents instant disparity values or depth values, S t−1 is the EMA value at time t − 1, δ thresh is one of the custom setting param on behalf of the depth threshold of the adjacent pixels. If the threshold is exceeded, edges will be present and the smoothing effect of the time filter will be temporarily disabled to prevent the edges from being smoothed. After the collaborative applications of the above three filters, the effect is shown in Figure 5. It can be seen that the quality of depth images has been improved significantly and the black holes have been significantly reduced, which can improve the range accuracy.

Target-To-Base Station Ranging
After obtaining the absolute depth of the position nodes, the relative distance between base stations and the targets can be obtained by combining the two-dimensional information. As shown in Figure 6, according to the simple camera model, 3D point P(X, Y, Z) is projected onto point p(u, v) on the 2D images, and has the relationship shown in Equation (3).
where p is a point in the 2D plane, K is the camera's internal parameter matrix.
[R|T] is the external parameter matrix, used to describe the Euclidean transformation between the world coordinate system and the camera coordinate system. P is [X, Y, Z, 1], describing the world coordinate information of 3D points. The matrix form of Equation (3) is as follows.
where s is a scaling coefficient that varies with focal length. Without considering the interconversion between the world coordinates and the camera coordinates (that is, using the camera coordinate system), the R and T can be expressed as shown in Equation (5).
where f x and f y are the focal lengths after scaling, while u 0 and v 0 are the numbers of translations at the origin. Then, the 2D and 3D coordinates can be converted to and from each other using the Z information provided by depth images.  After calculating the 3D coordinates of every position node in the camera coordinate system, the relative distance ρ i between the base station and the coordinate value (x i , y i , z i ) and the target with the coordinate value (x 0 , y 0 , z 0 ) can be calculated as shown in Equation (7).
where A is the coefficient matrix, and L is the constant vector. X is the unknown vector (x 0 , y 0 , z 0 ) T . A and L can be expressed as Equations (10).
As shown in Equation (11), the estimated coordinate value of the unknown node can be calculated by the least square method.
As shown in Equation (11), when the error of L on the right side of the equation is small, the positioning effect is good and the estimated coordinate X is close to the real coordinate. However, in the actual positioning process, due to the environmental interference and other factors caused by inaccurate ranging, the error of L is large, and the X will seriously deviate from the true value of the unknown node. Therefore, the localization algorithm needs to be further improved.

Traditional Centroid Algorithm
In the localization algorithm based on range, the errors of localization results with only one set of data may be huge, so it is necessary to synthesize multiple sets of data. In the traditional centroid algorithm, based on range-based localization with N sets of data, the average value of N estimated locations is generally taken as the final result. The traditional centroid algorithm is shown in the formula. The traditional centroid algorithm defaults to the equal weights of estimation coordinates of each group of data, which fails to reflect the differential impacts of the data.

Weighted Centroid Algorithm
The paper proposes an improved weighted centroid algorithm based on a multilateral localization by synthesizing multiple sets of data. When solving the location coordinate equation, if L is the exact value, the accurate estimated coordinate value X can be obtained. At this point, the left and right sides of the equation AX = L are equal. Meanwhile, the greater the error of L, the greater the error of the least square solution equation, and the lower the reliability of data. Therefore, in the improved weighted centroid algorithm, the reciprocal of the coordinate error value is taken as the weight. The specific expressions of the algorithm are shown in Equation (12), Equation (13), and Equation (14).
norm(X) = where (x 0 , y 0 , z 0 ) represent the location coordinates of the targets determined by the weighted centroid algorithm and W i indicates the weights of each group of data. (x i ,ŷ i ,ẑ i ) shows the coordinate value estimated by the multilateral localization algorithm of the data of group i, while A i and L i denote the linear equation param determined by the data of group i, and norm(X) expresses the binary norm of X.

Simulation Results
Firstly, the simulation experiments were carried out to verify the feasibility of the experiments. The simulation experiments mainly evaluated the performances of the localization algorithm, and the influences of range errors and the number of localization base stations on the performances of the localization algorithms. The simulation area was 40 m long, 5 m wide, and 3 m high. Every setup ran 500 times and the average results are reported.

Localization Performance
Six localization base stations and 10 unknown nodes were deployed in the simulation area. When the standard deviation of ranging errors was 10 m, the simulation localization results of the traditional weighted centroid algorithm and the optimized weighted centroid algorithm in this paper are shown in Figure 8. It can be seen that the average localization error of the traditional weighted centroid algorithm was 34.4 cm and that of the algorithm described in this paper was 24.7 cm. Thus, the localization accuracy of the proposed algorithm in the paper is higher than that of the traditional weighted centroid algorithm.

Impact of Range Error
This section evaluates the effects of range errors on localization accuracy. The range error was evenly measured between 0 and 21 cm. After large quantities of simulation experiments, the results are shown in Figure 9. It has been proven that the localization accuracy of the two algorithms is close when the range errors are small. With the increase of range errors, the localization errors of the two algorithms increase. However, the localization errors of the proposed algorithm in the paper are smaller, indicating that higher localization accuracy has been realized in this paper. This is because the greater the range error is, the higher the requirement for the accuracy of the weight is. In the paper, the proposed algorithm selects the values in accordance with the accuracy of the localization results.

Impact of the Number of Localization Base Stations
This section evaluates the effect of the number of base stations participating in multilateral localization on the localization accuracy. In this section, 10 localization base stations are deployed in the simulation area to test the localization accuracy when 4, 5, 6, 7, 8, and 9 base stations are selected, respectively. As can be seen from Figure 10, when the selected base stations participating in multilateral localization are few, the range errors are generally huge and the difference in localization accuracy between the traditional centroid algorithm and the weighted centroid algorithm is small. When the base stations are greater than or equal to 5, the range error is small and the weighted centroid performs better than the traditional centroid. This is because the more reference points there are, the greater the cumulative errors and the higher the requirements for the accuracy of weights are. The improved weighted centroid algorithm chooses the value according to the accuracy of localization results, which avoids the accumulation of errors in the process and leads to higher localization accuracy. Meanwhile, when five or six localization base stations are selected, the localization error is smaller than that of four localization base stations. However, when seven to nine localization base stations are selected, the localization accuracy does not increase but declines due to the excessive introduction of points and more unpredictable errors. Considering the cost of base stations and other factors, the optimal number of base stations is five to six.

Experimental Results
As shown in Figures 11 and 12, the MineBL prototype system was constructed in indoor corridors and underground coal mine laboratories, respectively. The localization base stations were placed in the experimental space, and the unknown nodes were placed on the localization targets. We used a D455 camera from the RealSense series, which uses binocular cameras and active infrared imaging. Its maximum resolution of depth is 1280 × 720, and the furthest detection range is 10 m. Edge computing nodes were introduced to receive and process the camera images and then the location of the robot was estimated by the computers equipped with an Intel Core i5 2.3 GHz CPU and 8 G RAM. Before the experiments, the cameras were calibrated to obtain the internal param needed for localization. In addition, the motion capture system could track the actual location of the robot in real-time. A total of 1000 experiments were performed under each experimental condition to evaluate the overall performance of the system. In addition, the localization performance of the system was closely related to the identification accuracy of nodes and the range accuracy of the depth cameras. The paper also evaluates the influences of various factors, such as ambient light and distance, on identification accuracy and range accuracy.

Localization Base Station
Depth Camera Moving Node

The Overall Localization Performance
In this section, the overall localization performance of the MineBL is evaluated. Figures 13 and 14 show the cumulative distribution of MineBL localization errors (CDFs) in the indoor corridor and the coal mine underground. It can be seen that in the indoor corridor, the median localization error is about 12.7 cm, and 90% of localization errors are less than 25.6 cm. Moreover, in the coal mine underground, the median localization error is about 14.8 cm, and 90% of localization errors are less than 27.6 cm. The experimental results show that MineBL has good performance in both indoor and coal mine environments. We also evaluated the time required to complete a localization process and found that MineBL only needs 85 ms.

Accuracy of Position Node Recognition
This section shows the identification accuracy of position nodes under the influences of different distances and diversified ambient light, respectively. The influence of the distance between the nodes and the depth cameras on the identification accuracy of position nodes was evaluated. In the indoor scene, we changed the distance between the nodes and the cameras, and then tested the identification accuracy of the nodes at each distance. As can be seen from Figure 15, within the effective detection range of the depth cameras, the average identification accuracy of nodes decreased with the increase of distance. The average identification accuracy of nodes within 5 m was above 90%, and that of nodes beyond 5 m was above 80%. The identification accuracies of nodes under different ambient light intensities were evaluated. The ambient light intensity was changed by turning on and off different numbers of LED lights; seven levels of illumination were selected-1, 50, 250, 1250, 2500, 3750, and 5000-where 1 Lux represents a dark environment. Figure 16 shows that the identification accuracy was kept at about 97% on average, and the change of ambient light had no obvious influence on the identification accuracy under different ambient light intensities.

Accuracy of Ranging
Then the influence of the distance between the nodes and the depth cameras on the range accuracy of the system was evaluated. In the indoor scene, we placed three unknown nodes first and tested the range accuracy of the depth cameras at each distance as the distance between the nodes and the cameras changed. As can be seen from Figure 17, within the effective detection range of the depth cameras, the range errors increased with the increase of the distance. The range error within 5 m was less than 10 cm, and the range error within 10 m was no more than 25 cm. In addition, the range accuracies at different ambient light intensities were evaluated. In the indoor scene, we placed three unknown nodes. The ambient light intensity was changed by turning on and off different numbers of LED lights, and seven levels of illumination were selected-1, 50, 250, 1250, 2500, 3750, and 5000-where 1 lux represents a dark environment. As shown in Figure 18, the influence of ambient light on range errors was not obvious. This is because the multi-filter collaborative denoising method adopted in this paper obtained more accurate depth information and is more robust to the change of ambient light.

Conclusions
To summarize, a low-cost battery-free localization scheme based on depth images, MineBL, is proposed in this paper; it has good mobility and can migrate with the underground working faces. MineBL uses battery-free and low-cost reflective balls as position nodes to locate underground targets in combination with a series of localization algorithms. Firstly, we designed a data enhancement strategy based on small-target reorganization and further propose a micro-position node recognition algorithm for complex underground coal mine environments. Secondly, we designed a multi-filter cooperative denoising algorithm to obtain the accurate relative distance between the base stations and the targets. Finally, we proposed a weighted centroid location algorithm based on multi-location errors. The weighted factor was determined according to the error of each group of the location data to improve the localization accuracy of the system. A large number of experiments in the indoor laboratory and the underground coal mine laboratory were conducted, respectively, and the experimental results verify that the MineBL underground localization scheme has good localization performances; the localization errors were less than 30 cm in 95% of cases.
Author Contributions: Method design and experiment, S.Q. and Z.B.; data processing and analysis, S.Q. and Y.Y.; guide, X.Y.; writing and review, S.Q. All authors have read and agreed to the published version of the manuscript.