Autonomous Search of Radioactive Sources through Mobile Robots

The research of robotic autonomous radioactivity detection or radioactive source search plays an important role in the monitoring and disposal of nuclear safety and biological safety. In this paper, a method for autonomously searching for radioactive sources through mobile robots was proposed. In the method, by using a partially observable Markov decision process (POMDP), the search of autonomous unknown radioactive sources was realized according to a series of radiation information measured by mobile robot. First, the factors affecting the accuracy of radiation measurement during the robot’s movement were analyzed. Based on these factors, the behavior set of POMDP was designed. Secondly, the parameters of the radioactive source were estimated in the Bayesian framework. In addition, through the reward strategy, autonomous navigation of the robot to the position of the radiation source was achieved. The search algorithm was simulated and tested, and the TurtleBot robot platform was used to conduct a real search experiment on the radio source Cs-137 with an activity of 37 MBq indoors. The experimental results showed the effectiveness of the method. Additionally, from the experiments, it could been seen that the robot was affected by the linear velocity, angular velocity, positioning accuracy and the number of measurements in the process of autonomous search for the radioactive source. The proposed mobile robot autonomous search method can be applied to the search for lost radioactive sources, as well as for the leakage of substances (nuclear or chemical) in nuclear power plants and chemical plants.


Introduction
For more than half a century, nuclear energy and nuclear technology have been steadily developed in the world. Nuclear energy plays an important role in optimizing the energy structure, ensuring energy security, promoting pollution reduction and responding to climate change, etc. Radioactive materials have been widely used in the fields of industry, agriculture, national defense, medical treatment and scientific research, which have effectively promoted national production and economic and social development. However, in the process of nuclear energy and nuclear technology application, if a nuclear radiation accident occurs, it will pose a great threat to social security and the country's political economy, easily causing large-scale casualties and widespread social panic.
According to statistics from the International Atomic Energy Agency (IAEA) [1], as of 31 December 2019, more than 3686 nuclear accidents have been confirmed by the Illegal Traffic Database (ITDB) on theft of nuclear and radioactive materials and other illegal activities. Therefore, it is an important Sensors 2020, 20, 3461 2 of 17 safety issue in the application of nuclear technology to detect the radioactive distribution of radioactive sources in space, and quickly search for and remove nuclear radioactive materials in scattered areas.
Although researchers have been paying attention to this topic for more than two decades, the detection and search of unknown radioactive sources is still a challenging operation in real environment. The main difficulties come from: (1) If the missing or stolen radioactive source is searched by operation personnel, it will increase the operation time and operator's health risk [2]; (2) if it is searched by a robot, due to the unknown position and direction of the radioactive source and limited perspective of the robot's observation, the search process is very difficult [3].Therefore, in this paper, an autonomous search algorithm for unknown radioactive sources is designed by using a partially observable Markov decision process (POMDP). The main contributions of this paper are as follows: (1) The robot radiation measurement error model is established, and the factors that affect the radiation measurement results during the movement of the robot are found, which helps the design of the autonomous search algorithm in the later stage. (2) The coordinate, direction of the detection point and detector count of the robot at the current moment are taken as current knowledge, and the posterior probability density function (PDF) of radioactive source parameters is used as the information status in the POMDP. The next action of the robot is selected through the reward strategy of information entropy. (3) The Markov chain Monte Carlo (MCMC) method is used to improve the particle filter to approximate the calculation of PDF, thereby completing the importance sampling of particles.
To study the above topics, this paper is organized as follows: In Section 2, related research in this area is investigated. In Section 3, the error model analysis of robotic radiation measurement is performed. In Section 4, based on the error measurement model, an autonomous search algorithm for unknown radioactive sources is designed. In Section 5, simulation and real experiments are performed. Finally, the conclusions and shortcomings are described.

Prior Work
In this section, a survey of literature on radioactive source parameter estimation and search strategies is conducted. The search of search strategies is not only about radioactive sources in nuclear physics, but also gas diffusion sources (including biochemistry).

Estimation of Source Parameter
The essence of parameter estimation is to calculate the position and Source Term Parameter of the radioactive source. The traditional method uses measured values of nuclear radiation detectors to determine whether there is a radioactive source in a certain area, and then measures multiple measurement points, and uses the least square method [4,5] or geometric method [5][6][7] to estimate the position and the intensity of the radioactive source. In addition, in [4], the position is correlated with the count rate of the detector; and the change of radiation source position and the deviation between the measured value and the model prediction are also considered. In [5], several probabilistic methods are also provided to estimate the position and intensity of the radioactive source, such as maximum likelihood estimation. In [6], a combined geometric positioning method and sequential probability ratio test are used for radioactive source positioning. Traditional methods rely on the sensitivity of the detector.
In addition, according to the principle of mathematical statistics, radioactive decay occurs randomly, but follows a certain statistical distribution (Poisson distribution or normal distribution), so when estimating the parameters of a radioactive source, the measured value of the detector can be described as a random variable, just like the maximum likelihood estimation method in [5,8]. But when there are three or more radioactive sources at the same time, the maximum likelihood estimation method is not applicable. In addition, the higher SNR thresholds are also considered to be its shortcomings. In [8][9][10][11][12][13], the Bayesian estimation algorithm was used to make up for the shortcomings of the maximum likelihood estimation method. According to Bayesian theory, the posterior probability distribution of the parameter vector of the radioactive source is constructed from the observation Sensors 2020, 20, 3461 3 of 17 data in the radiation field. Therefore, the information of the radioactive source is obtained by solving the posterior probability distribution, which can be approximated and represented by importance sampling [8], progressively corrected importance sampling [11], particle filtering [9,10] and MCMC methods [12,13]. It is worth noting that [12] revealed that the accuracy of the measurement count rate is closely related to the orientation of the detector, which provides a good idea for the error analysis of robotic radiation measurement.
In addition to the above two methods, a Gaussian mixture model is also used to characterize the radiation field and establish a radiation map. Using a logarithmic gradient classifier, the local graph is segmented into several positions of interest to perform radioactive source localization [14]. In the chaotic 3D environment [15], a method of drawing a spatial distribution map of radiation in the environment using a gamma camera was proposed. The source can be effectively found and located according to the spatial distribution map.

Search Strategy
The simplest method for a search strategy is to search in a fixed manner within a specific area. For example, in [7] the UAV was used to detect radioactive materials after the nuclear power plant accident. The UAV moved in a circular trajectory in the air, and estimated the position of the radioactive source as a centroid, which is calculated by using the detected dose of three points. The three-point positioning method has very high requirements on the detector's response time. If the response time is long, the estimation error will be very large. In [16][17][18][19] the detector was mounted on the UAV to search in a traversal manner. The advantage of this method is that it does not require prior estimation of the radioactive source's parameters, and the search accuracy was high, but the search efficiency was low. In order to save time and maximize the search efficiency, [19] also proposed the binary search method and the successive approximation search method. The binary search method is suitable for radioactive sources with high activity. Compared with the traversal search algorithm, it was implemented by discarding half of the area at a time, which greatly reduces search time. The successive approximation search method required the activity of the radioactive source to be higher, because it needs to detect a significant change in dose rate at the boundary of the region.
The second method of a search strategy is based on behavior and criteria [20][21][22]. For example, [20] used behavior-based navigation, combining data detected from previously visited areas. This allows the robot to detect safely and effectively. The study in [21] introduced a new behavior-based method for navigating mobile robots in an unknown environment. In this method, each behavior was implemented by a fuzzy controller and executed independently. The authors in [22] constructed a behavior-based control system architecture for the detection of radioactive hot spots under the limitation of sensors. Such methods need to define robot-related behaviors before searching, such as avoiding collisions, detecting radioactive sources and so on. The robot took corresponding actions based on the perceived environmental characteristics and defined criteria until the source item was searched. Such methods are difficult to apply to unknown environments because it is difficult to design behaviors and norms in that conditions. The third method is as follows. The robot moves randomly or in a fixed manner for a period of time in the search area. With the sensor sensing the radiation information, the source position is calculated by using the source parameter estimation method during this period. Then, combining the methods of information gain [9,10,23], information entropy [24], artificial potential field [25,26], etc., the robot could move to the target point. It is worth noting that in [26], when detecting gamma rays, the incidence angle of the sensor affected the measured radiation intensity. This method required very high accuracy of the source item estimation. If the estimation is not accurate, the robot cannot find the source. The dynamically updated search method [27][28][29][30][31][32] can make up for this shortcoming well. For example, in [27] the dynamic programming method for robot chemical source tracking was proposed. The robot searched for chemical sources based on real-time wind speed and the estimated chemical gas concentration. In [28], in the Bayesian framework, the search efficiency was improved Sensors 2020, 20, 3461 4 of 17 by using the semantics between the detected and recognized gas and the objects in the environment. In addition, the probability of detection and recognition was correlated with the robot's current position and target distance though using the Markov decision process, to minimize the search time. In [29][30][31][32], POMDP was used to design the gas source search algorithm and the Bayesian frame was used to estimate the gas source parameters. PDF acted as the information state in POMDP, and the gas source search was completed through the reward mechanism and behavior selection. In [29,30] three reward mechanisms were provided: information entropy, Infotaxic II and Bhattacharyya distance. In [31], relative entropy (also known as Kullback-Leibler divergence) was used as a reward for POMDP. The authors in [32] combined the potential energy and the entropy into free energy to be minimized as the reward of POMDP. The studies in [29][30][31][32] provided good ideas for the autonomous search algorithm for radioactive sources in this paper.
This paper uses POMDP to design a method for robots to autonomously search for unknown radioactive sources, but the differences from the above reference are: (1) This paper establishes an error model of the nuclear radiation measurement of the robot during the search process and analyzes the influence of the robot's speed, positioning accuracy and detection angle on the unknown radioactive source position estimation. (2) In this paper, the speed and angular velocity of the robot are related together as a limited set of actions for POMDP. However, in reference [29][30][31][32] the POMDP behavior set was a single direction indicating behavior, such as {., ↑, →, ↓, ←}. The limited action set method provided in this paper can make the robot more maneuverable in the search process. (3) The current knowledge is related to the direction of robot movement, rather than a single coordinate and detector count, and the MCMC method was used to improve the particle filter to approximate the posterior, and then complete the importance sampling of the particles.

Radiation Measurement Principle
Assuming that the lost or stolen radioactive source is approximately a point, so only the source activity and spatial location are considered in the study, and the spatial volume is not considered. In homogeneous air, the exposure dose rate of the γ-ray source at distance R is [33]: Among them, X is the exposure; Г and A is the exposure dose rate constant and the activity of the point radiation source, respectively; R 2 = (x i − x 0 ) 2 + (y i − y 0 ) 2 , where x 0 , y 0 is the source coordinate. The dose equivalent rate . H (µSv/h) of γ radiation source at distance R can be obtained from Equation (1), where w is the radiation weighting factor, and the value of photon and electron is 1; D is the absorbed dose; f is the conversion factor for converting exposure into absorbed dose. The radioactive decay of radioactive materials generally occurs randomly, but within a certain time interval, through the statistics of a large number of atoms, it can be found that the decay process is subject to certain statistical laws. In radiation measurement, the number of radioactive particles emitted by a radioactive source in a unit time can be detected. This process has statistical fluctuations in radioactive counts and obeys the Poisson distribution [9], where C pm ∈ N + is the count rate per minute (min −1 ) of the detector, which represents the detected count value within a minute; λ = ηM, where M is the average value of multiple measurements of N Sensors 2020, 20, 3461 5 of 17 particles generated by the decay of the radioactive source in a certain time interval; η is the efficiency of the detector. C pm can be calculated from Equation (3), where Energynumber is energy response constant [5].

Error Analysis of Robotic Radiation Measurement Model
The position of the radiation source is estimated by using Equation (1) after the robot measures multiple points. However, factors such as the detection angle, detection distance, detection time and environmental media [34] bring difficulty to the position estimation. Therefore, it is necessary to find the factors that interfere with the robot's radiation measurement and reduce the measurement uncertainty during the movement.
Assume that the motion model of the mobile robot is differential motion model, that is: where x, y, θ are robot position and direction; v is linear velocity; ω is angular velocity. Assume that the robot is at point A (x 1 , y 1 ) at time t 0 , and moves to point B (x 2 , y 2 ) at time t 1 (shown in Figure 1), where t 1 = β∆t. According to the principle of triangle, we can get Sensors 2020, 20, x 5 of 16

Error Analysis of Robotic Radiation Measurement Model
The position of the radiation source is estimated by using Equation (1) after the robot measures multiple points. However, factors such as the detection angle, detection distance, detection time and environmental media [34] bring difficulty to the position estimation. Therefore, it is necessary to find the factors that interfere with the robot's radiation measurement and reduce the measurement uncertainty during the movement.
Assume that the motion model of the mobile robot is differential motion model, that is: where , , are robot position and direction; is linear velocity; ω is angular velocity. Assume that the robot is at point A ( , ) at time , and moves to point B ( , ) at time (shown in Figure 1), where = ∆ . According to the principle of triangle, we can get Among them, is the distance from point B to the radioactive source; is the distance between point A and point B, ( ) is the angle between the line connecting the point (A or B) and the radiation source and the direction of the robot's movement. It should be noted here that the detector is installed directly in front of the robot and there is no relative displacement between the detector and the robot, so the robot's moving direction is the detection orientation. Robot movement is a continuous process, and changes with time, so can be approximated by the least squares polynomial form as where is polynomial coefficient,  Among them, d 2 is the distance from point B to the radioactive source; l is the distance between point A and point B, is the angle between the line connecting the point (A or B) and the radiation source and the direction of the robot's movement. It should be noted here that the detector is installed directly in front of the robot and there is no relative displacement between the detector and the robot, so the robot's moving direction is the detection orientation. Robot Sensors 2020, 20, 3461 6 of 17 movement is a continuous process, and ϕ 2 changes with time, so ϕ 2 can be approximated by the least squares polynomial form as where a j is polynomial coefficient, According to Equation (5), the error model ∆d 2 of the linear approximation of the distance d 2 from point B to the radiation source is, where ∆l, ∆ϕ 1 , ∆ϕ 2 are the errors of l, ϕ 1 , ϕ 2 , respectively. It is not difficult to see from Equation (7) that the positioning error of the radiation source is related to the robot self-positioning, the angle between the line connecting the γ-ray source and the sensor and the direction of the sensor's movement, and whether other factors are related needs further discussion. Assume that the anticipation error of positioning and incidence angle is zero and they are not related to each other, so Among them, σ 2 ∆l , σ 2 ϕ 2 , σ 2 ϕ 1 are the measurement variance of the corresponding parameters, and their value are unknown. The distance l between point A and point B is a function of x 1 , y 1 , x 2 , y 2 . The linear approximation model error ∆l is obtained as ∆l = ∂l so the variance σ 2 ∆l is calculated as According to [35], σ 2 where σ 2 ϕ is the measurement variance of the incident angle between the γ-ray and the sensor. From Equation (6), when p = 2, combined with the literature [35], Equation (10) can be rewritten as According to the Lagrange multiplier, Equation (11) can be obtained in the final form as In addition, the distance l between point A and point B can be determined according to the robot's speed v, that is, During the measurement, σ 2 ∆d 2 needs to be minimized, so use the following formula to get the value of β: Bring Equation (15) into Equation (14), so From Equation (16), we can see that the robot's positioning accuracy of the radioactive source during the movement is affected by the number of measurements, the measurement accuracy of position sensor, the angle between the line connecting the gamma ray and the measurement sensor and the direction of the sensor's movement, and the speed of the robot. By analyzing the factors that affect the positioning of the radioactive source, it provides help for the following mobile robot radioactive source positioning strategies.

Search Strategy
The position of the radioactive source is initially unknown for a searching robot. Based on its own perception of the surrounding environment, by using the local information for path planning, the robot could finally find the unknown radioactive sources. Furthermore, since radioactive source is strictly controlled items, if it gets lost, its number is generally known. Therefore, in this paper, we just consider the case of a single lost radioactive source.
Robot autonomous search strategy was designed using POMDP [36] in this paper. In POMDP model, the robot needs to collect environmental information (observation values) through sensors to update its credibility on the current state, which is the basis for the robot's decision on the next action choice. According to Equations (3) and (4), the observation sequence of the sensor when the robot moves to the target point is established, that is, z 1:k = {z 1 , z 2 , z 3 , · · · , z k }, where z k = (x k , y k , θ k , C pmk ) T represents the measurement value C pmk obtained by the robot at a certain position (x k , y k ) and in a certain direction θ k . Assume the parameter vector I of the unknown radioactive source is I = (x 0 , y 0 , θ 0 , I) T , where x 0 , y 0 is the coordinate of the plane position, θ 0 is the included angle of the radiation source with respect to the movement direction of the starting point of the robot, I is the activity of radioactive source and the unit is Bq. According to Bayes' rule, the likelihood function of the observation sequence z 1:k is, among them, p( z i |I) = P C pmi , λ i . Furthermore, according to Bayesian theory, after obtaining the observation sequence z 1:k in the radiation field, the posterior probability distribution function of the radiation source's parameter vector I can be obtained as p(I|z 1:k ). Since the searching action of radioactive source is a gradual process, the asymptotic calculation formula of the posterior probability distribution function is as follows: Then, according to Equation (18), the information state of discrete time k + 1 can be obtained as In addition, when k = 0, p(I) is the prior distribution of the source's parameters. At first, it was thought that unknown radioactive sources could be located anywhere in the search area and could be set to be evenly distributed. Next, after making certain judgments based on local information, the robot needs to continue moving to make further observation records until the radioactive source is searched. Then the robot's finite action set is defined as  (20), According to Equation (20), the measurement value z k+1 of the sensor at time k + 1 depends on the position and direction at time k and the selected action a (a ∈ A). Within each time step ∆t, the robot should move in the direction where the expected count rate is maximum. Therefore, this paper uses information entropy to describe the reward of action a. Then the robot's information entropy during the search process changes to [37], where S k is the information entropy of the radioactive source, that is, S k = p(I|z 1:k ) log p(I|z 1:k )dI and Pr k can be calculated using the kernel density estimation method [38]. The physical meaning of Equation (21) is explained in [37]. The first term on the right side of the formula indicates that the robot has found the radioactive source at time k + 1, while the second term indicates that the robot has not yet searched the source at time k + 1, and needs further searching until the source is found, so S k+1 = 0, and in the search process, only the second term of Equation (21) is used to represent the reward. According to the Equation (2), the expectation of the Shannon entropy S k+1 of measured value z k+1 at the time k + 1 can be obtained as p z k+1 I p I z 1:k+1 log p I z 1:k+1 dI .
Then, while the robot is looking for a radioactive source, according to the optimal strategy π k * , D(a k ) is minimized, that is, π k * = argmin a k ∈A D(a k ).

Parameter Estimation of Radioactive Sources
Using a mobile robot to search for radioactive sources is a gradual process, and the radioactive source parameters are gradually estimated according to the search strategy of Section 4.1. However, in the search strategy, the posterior probability distribution function is difficult to obtain analytically. Therefore, the MCMC method is used to improve the particle filter to approximate the posterior probability distribution function. The basic idea of the particle filtering algorithm is to use a group of n particles with their own weights I i k , w i k , i = 1, · · · , n to approximate the posterior probability density p(I|z 1:k ) of the radioactive source parameter I. In addition, the weight of the particles satisfies n i=1 w i k = 1. Then the posterior probability density is approximately as follows: where δ(·) is a Dirac function. Based on this approximation, complex integration operations can be converted into sum operations, such as S k ≈ − n i=1 w i k log w i k . Assume that the importance sampling density is q(I|z 1:k ), the prior probability transfer distributionis is chosen as q(I k |I, z 1:k ) = p(I k |I k−1 ), then according to the first-order Markov hypothesis, the unnormalized weight of each particle at time k can be obtained as, Among them, C pm is the radiation dose counting rate detected by the detector at time k, λ i is the estimated value of the radiation intensity of the i-th particle at the observation point (x k , y k ).
The normalized weight of each particle is w i k = w * i k n j=1 w * j k , so we can get a set of weighted particles. Then the estimated values of the radiation source's parameters are obtained according to Equation (24). However, in particle filtering, due to frequent resampling, the particles will lack diversity, mainly concentrated on some particles with higher weight. In this paper, a valid number of samples N th is set as the threshold. When the weight value n i=1 w i k of the current particle set is less than N th , random resampling is performed by the roulette selection method, and a little noise is added to the particle set after sampling. If without resampling, the particle set is retained directly. In order to improve the diversity of the obtained particle set, the resampled particles are continuously filtered by using the Metropolis-Hastings sampling algorithm [39], so that the sampling points are gradually moved to the central region of the posterior probability distribution. That is, taking I i k−1 as the mean and τ 2 as the importance sampling function for variance q I k I 0:k, z 1:k , then q I k I 0:k, z 1:k = N I i k−1 , τ 2 . New particle I i * 0:k is extracted from the importance function q I k I 0:k, z 1:k , and new particles are accepted with a probability value α I i * 0:k = min 1, p(z 1:k I i * 0:k ) p(z 1:k I i 0:k ) until a new particle after the k-th observation is obtained.

Simulation Experiment and Analysis
The simulation preliminarily verified the feasibility of autonomous search of the radioactive source by the robot. The simulation assumptions are as follows: The measured value of environmental background radiation count is 1 cps;

5.
The calculation method of Pr k refers to the method of [29].
According to the robot's differential motion model (4), within the duration ∆t = 1 s, the robot's pose at the time k + 1 is (x k , y k , θ k ), the initial pose of the robot is (30,20,0), and the robot's motion set is A. Combining the factors that affect the positioning accuracy of the radioactive source in Section 3, simulation experiments prove the performance of the algorithm from different schemes.
(1) Algorithm parameters: The number of particles is n = 5000, the initialized particle information x i , y i are random numbers in the range of [0, 200], and I i is a random number in the range of 3 × 10 6 , 1 × 10 9 . The robot's linear velocity v = 5 m/s, angular velocity ω = 0.1 rad/s. Within the duration ∆t, the measurement times N are 1, 5, 10, 15, 20 respectively. The condition for the end of the algorithm is that the estimated position error of the radioactive source is less than 1 m (x ≤ 1 m and y ≤ 1 m).
When N = 1, the simulation results are shown in Figure 2. Figure 2a-d shows the estimated position of the radiation source (represented by green circles), the position of the robot (represented by triangles) and the search trajectory of the robot at time k = 11, k = 30, k = 40, k = 48, respectively. When the robot is at the starting point (30,20), the observation value z 1 is obtained according to simulation assumption (3) (as shown in Figure 2e). Combining the observation value z 1 and the particle information (x i , y i , I i ), the normalized weight w i 1 of each particle is calculated. The position and size of the radioactive source are estimated by bringing the normalized weight w i 1 and particle information (x i , y i , I i ) into formula (24), and the information entropy is further obtained through Equations (21), (22) and (25). The information entropy is used to select the next movement behavior of the robot. Then, the particle information is updated with the effective sample number N th = 2n/3, position variance The error statistics of different measurement times within ∆t are shown in Table 1. It can be seen in Table 1 that as the number of measurements increases, the accuracy of the radioactive source intensity estimation is higher. From Figure 2e and Figures S1-S4, it can be found that the greater the number of measurements, the smaller the random error of the observed value. Due to the influence of the robot linear velocity and angular velocity, as the number of measurements increases, the iteration number k decreases to 36 and does not change. Sensors 2020, 20, x 10 of 16 number of measurements, the smaller the random error of the observed value. Due to the influence of the robot linear velocity and angular velocity, as the number of measurements increases, the iteration number decreases to 36 and does not change.  (2) Algorithm parameters: The number of particles = 5000, the robot's linear velocity = 5 m/s, the angular velocity = 0.1 rad/s. The number of measurements within ∆ is = 20. The angular velocity of the robot = 0.1 rad/s, = 0.3 rad/s, = 0.5 rad/s, = 0.7 rad/s, = 0.9 rad/s. The condition for the end of the algorithm is that the estimated position error of the radioactive source is less than 1 m (x ≤ 1 m and y ≤ 1 m). The simulation results are shown in Figure  3 when = 0.7 rad/s. The error statistics at different angular velocities are shown in Table 2. As the angular velocity increases, the number of iterations decreases first and then increases. It is because as the angular velocity increases, the robot's rotation amplitude increases, which affects the observation value and particle weight, thus affecting the robot's behavior choice. It can be seen from  (2) Algorithm parameters: The number of particles n = 5000, the robot's linear velocity v = 5 m/s, the angular velocity ω = 0.1 rad/s. The number of measurements within ∆t is N = 20. The angular velocity of the robot ω = 0.1 rad/s, ω = 0.3 rad/s, ω = 0.5 rad/s, ω = 0.7 rad/s, ω = 0.9 rad/s. The condition for the end of the algorithm is that the estimated position error of the radioactive source is less than 1 m (x ≤ 1 m and y ≤ 1 m). The simulation results are shown in Figure 3 when ω = 0.7 rad/s. The error statistics at different angular velocities are shown in Table 2. As the angular velocity increases, the number of iterations k decreases first and then increases. It is because as the angular velocity increases, the robot's rotation amplitude increases, which affects the observation value and particle weight, thus affecting the robot's behavior choice. It can be seen from When the number of measurements and the angular velocity of the robot are constant within the duration ∆t, as the robot linear velocity increases during simulation, the number of iterations decreases and the error of the estimated radioactive source remains basically unchanged. However, in the actual process, the robot linear velocity increases. Within constant duration ∆t, the number of measurements should be gradually reduced. It is because in the actual test process the sensor needs a response time, but the simulation process does not. In addition, when the number of measurements, the linear velocity and the angular velocity of the robot are constant within ∆t, the larger the positioning error of the position sensor, the greater the cumulative position error according to Equation (20). Therefore, the observation value obtained by Equation (3) is smaller, resulting in a larger weight value for each particle, so that the estimated position and size of the radiation source are larger.  When the number of measurements and the angular velocity of the robot are constant within the duration ∆ , as the robot linear velocity increases during simulation, the number of iterations decreases and the error of the estimated radioactive source remains basically unchanged. However, in the actual process, the robot linear velocity increases. Within constant duration ∆ , the number of measurements should be gradually reduced. It is because in the actual test process the sensor needs a response time, but the simulation process does not. In addition, when the number of measurements, the linear velocity and the angular velocity of the robot are constant within ∆ , the larger the positioning error of the position sensor, the greater the cumulative position error according to Equation (20). Therefore, the observation value obtained by Equation (3) is smaller, resulting in a larger weight value for each particle, so that the estimated position and size of the radiation source are larger.

Real Experiment and Analysis
To test the effectiveness of the search algorithm, the experimental design is as follows: 1. The energy of the gamma rays generated when the Cs-137 radioactive source (as shown in Figure  4b), with 37 MBq activity stored in the lead (as shown in Figure 4a), is 0.662 MeV. 2. The parameters of the G-M radiation detector are as follows: Measurement range-0.01 − 5000 uSv/h, energy response-40 -3 , sensitivity-≥ 3000 cpm/mR/h, relative error: ≤15%. The five sides of the cuboid are aluminum alloy shells, and the front panel material is plastic.

Real Experiment and Analysis
To test the effectiveness of the search algorithm, the experimental design is as follows: 1.
The energy of the gamma rays generated when the Cs-137 radioactive source (as shown in Figure 4b), with 37 MBq activity stored in the lead (as shown in Figure 4a), is 0.662 MeV.

2.
The parameters of the G-M radiation detector are as follows: Measurement range-0.01-5000 uSv/h, energy response-40KeV-3 MeV, sensitivity-≥ 3000 cpm/mR/h, relative error: ≤15%. The five sides of the cuboid are aluminum alloy shells, and the front panel material is plastic.

3.
Use the TurtleBot robot as the experimental search robot, as shown in Figure 4c. The robot is equipped with an Acer notebook, and the notebook is installed with the Ubuntu + ROS system. 4.
The hokuyo UTM-30 lidar sensor is used to establish an environment map in advance, that is, the environment map already exists during the radioactive source search experiment.

5.
Obtain the robot's coordinates by reading the robot's odometer information.
6. Equation (1) is simplified as . H = Y A R 2 , and k is constant. The dose equivalent rate measured by the nuclear radiation detector at a distance of 1 m from the radioactive source is 10.45 uSv/h and thus Y = 2.824 × 10 −8 . 7.
According to literature [5], for Cs-137 with an average γ-ray energy of 0.662 MeV, the energy response constant is 12,200 cpm/µSv/h.

8.
The position coordinate of the radioactive source in the experiment is (6.4 m, 4.0 m, 0.5586 rad), the initial position of the robot is (0, 0, 0), and the number of particles is n = 1000. 9.
Due to the control of radioactive sources in universities, the experiment is conducted in 12 m × 8 m indoors.
the environment map already exists during the radioactive source search experiment. 5. Obtain the robot's coordinates by reading the robot's odometer information. 6. Equation (1) is simplified as ̇= Υ , and k is constant. The dose equivalent rate measured by the nuclear radiation detector at a distance of 1 m from the radioactive source is 10.45 uSv/h and thus Υ = 2.824 × 10 . 7. According to literature [5], for Cs-137 with an average γ-ray energy of 0.662 MeV, the energy response constant is 12,200 cpm/μSv/h. 8. The position coordinate of the radioactive source in the experiment is (6.4 m, 4.0 m, 0.5586 rad), the initial position of the robot is (0, 0, 0), and the number of particles is = 1000. 9. Due to the control of radioactive sources in universities, the experiment is conducted in 12 m × 8 m indoors.
It can be seen from Figure S8 and Table 3 that when the number of measurements is constant, the estimated error of the radioactive source coordinates increases with the increase of the robot linear velocity. This is because in nuclear radiation measurements the G-M tube detector has a response time of tens of milliseconds (some G-M tubes even require a response time of a few seconds). When the response time is constant, the greater the linear velocity of the robot, the greater the distance the robot moves during measurement and the greater the positioning error. When the robot linear velocity is constant, the number of measurements increases and the accuracy of estimating the position of the radiation source also increases. Due to the limitation of the detector's response time, the number of measurements within the duration ∆t cannot be increased indefinitely. In addition, it can be seen from the experimental results that the method proposed in this paper is not suitable for estimating the activity of radioactive sources.  It can be seen from Figure S8 and Table 3 that when the number of measurements is constant, the estimated error of the radioactive source coordinates increases with the increase of the robot linear velocity. This is because in nuclear radiation measurements the G-M tube detector has a response time of tens of milliseconds (some G-M tubes even require a response time of a few seconds). When the response time is constant, the greater the linear velocity of the robot, the greater the distance the robot moves during measurement and the greater the positioning error. When the robot linear velocity is constant, the number of measurements increases and the accuracy of estimating the position of the radiation source also increases. Due to the limitation of the detector's response time, the number of measurements within the duration ∆ cannot be increased indefinitely. In addition, it can be seen from the experimental results that the method proposed in this paper is not suitable for estimating the activity of radioactive sources.

Conclusions
In this paper, a method for autonomous search of radioactive sources by mobile robots was proposed. The next behavior of the robot was chosen through POMDP based on the locally observed information. In the Bayesian framework, the improved MCMC method is used to estimate the parameters of the radioactive source, and the robot's behavior selection is driven by the reward of information entropy. In addition, this paper also analyzes the factors that affect the accuracy of the radiation measurement during the movement of the robot, and designs the behavior set of POMDP according to these factors. Simulation and experimental results show that the algorithm has excellent performance in a barrier-free environment. The estimated accuracy of the radioactive source is related to the behavior of the robot, the number of measurements, the accuracy of the position sensor and

Conclusions
In this paper, a method for autonomous search of radioactive sources by mobile robots was proposed. The next behavior of the robot was chosen through POMDP based on the locally observed information. In the Bayesian framework, the improved MCMC method is used to estimate the parameters of the radioactive source, and the robot's behavior selection is driven by the reward of information entropy. In addition, this paper also analyzes the factors that affect the accuracy of the radiation measurement during the movement of the robot, and designs the behavior set of POMDP according to these factors. Simulation and experimental results show that the algorithm has excellent performance in a barrier-free environment. The estimated accuracy of the radioactive source is related to the behavior of the robot, the number of measurements, the accuracy of the position sensor and the angle of incidence of the radiation and the sensor. In the future, we will study the autonomous search of radioactive sources in obstacle environments, and consider the case where there are multiple radioactive sources or a coordinated search using multiple land robots or a combination of air and land robots.
Supplementary Materials: The following are available online at http://www.mdpi.com/1424-8220/20/12/3461/s1. Figure S1: The estimation results of radioactive sources when the number of measurements N = 5 and angular velocity ω = 0.1 rad/s. Figure S2: The estimation results of radioactive sources when the number of measurements N = 10 and angular velocity ω = 0.1 rad/s. Figure S3: The estimation results of radioactive sources when the number of measurements N = 15 and angular velocity ω = 0.1 rad/s. Figure S4: The estimation results of radioactive sources when the number of measurements N = 20 and angular velocity ω = 0.1 rad/s. Figure S5: The estimation results of radioactive sources when the number of measurements N = 20 and angular velocity ω = 0.3 rad/s. Figure S6: The estimation results of radioactive sources when the number of measurements N = 20 and angular velocity ω = 0.5 rad/s. Figure S7: The estimation results of radioactive sources when the number of measurements N = 20 and angular velocity ω = 0.9 rad/s. Figure S8: Real experiment results.