Moving-Target Position Estimation Using GPU-Based Particle Filter for IoT Sensing Applications

: A particle filter (PF) has been introduced for effective position estimation of moving targets for non-Gaussian and nonlinear systems. The time difference of arrival (TDOA) method using acoustic sensor array has normally been used to for estimation by concealing the location of a moving target, especially underwater. In this paper, we propose a GPU -based acceleration of target position estimation using a PF and propose an efﬁcient system and software architecture. The proposed graphic processing unit (GPU)-based algorithm has more advantages in applying PF signal processing to a target system, which consists of large-scale Internet of Things (IoT)-driven sensors because of the parallelization which is scalable. For the TDOA measurement from the acoustic sensor array, we use the generalized cross correlation phase transform (GCC-PHAT) method to obtain the correlation coefﬁcient of the signal using Fast Fourier Transform (FFT), and we try to accelerate the calculations of GCC-PHAT based TDOA measurements using FFT with GPU compute uniﬁed device architecture (CUDA). The proposed approach utilizes a parallelization method in the target position estimation algorithm using GPU-based PF processing. In addition, it could efﬁciently estimate sudden movement change of the target using GPU-based parallel computing which also can be used for multiple target tracking. It also provides scalability in extending the detection algorithm according to the increase of the number of sensors. Therefore, the proposed architecture can be applied in IoT sensing applications with a large number of sensors. The target estimation algorithm was veriﬁed using MATLAB and implemented using GPU CUDA. We implemented the proposed signal processing acceleration system using target GPU to analyze in terms of execution time. The execution time of the algorithm is reduced by 55% from to the CPU standalone operation in target embedded board, NVIDIA Jetson TX1. Also, to apply large-scaled IoT sensing applications, we use NVIDIA Tesla K40c as target GPU. The execution time of the proposed multi-state-space model-based algorithm is similar to the one-state-space model algorithm because of GPU-based parallel computing. Experimental results show that the proposed architecture is a feasible solution in terms of high-performance and area-efﬁcient architecture.


Introduction
In this paper, we propose an accelerated target position tracking system using a GPU-based acoustic sensor and a particle filter (PF) for effective tracking of moving targets.We focus on using parallel processing of GPU to track sudden change of target movement by using multiple system state equations in the existing PF.The proposed parallel processing is scalable for number of sensors and for tracking multiple target.So, proposed architecture can be used in systems such as Internet of Things (IoT) applications.We analyzed the execution time of the algorithm for actual operation on the GPU.
Through this, we searched for the proper design elements of the memory buffer and the algorithm that computes the signal processing.
Position estimation of moving targets using an acoustic signal is a method that can be used not only in air but also in water due to its special environment [1].The sensor array receives the acoustic signal from the target and estimates the position by measuring three or more time difference of arrival (TDOA) values in the three-dimensional space [2].Currently, there is a generalized cross correlation phase transform (GCC-PHAT) method for measuring TDOA in the frequency domain not only in the time domain.In particular, GCC-PHAT is a method of obtaining a correlation coefficient by converting a signal into a frequency domain, and has strength in a real-time system because a relatively small amount of calculation is required compared to a method of obtaining a general correlation coefficient.In this paper, we propose an algorithm using parallel Fast Fourier Transform (FFT) in GPU to increase the processing speed of TDOA measurement for each acoustic sensor in three-dimensional space.
In general, various filters such as the kalman filter (KF), extended KF, unscented KF and PF are used to estimate the state of the target [3].Additionally, there is a problem that it is difficult to estimate the target state due to the non-linearity between the system state and the measured value in estimating the state of the target using TDOA measurement.Therefore, in this paper, we propose an accelerated system that estimates the target position using the PF, which has an advantage in non-linear systems and non-Gaussian systems.
The PF is a sequential monte carlo (SMC) method that estimates the state of a system by observing an error.If the number of particles is sufficient, an optimal estimate can be obtained.However, if the number of particles is not sufficient, the estimation may be problematic.That is, increasing the number of particles to obtain an optimal estimate means that the amount of computation in the system increases, which inevitably affects the operation speed of the system.Therefore, in this paper, we propose accelerating the processing speed by simultaneously processing the state update process and weight calculation process of each particle using GPU.Additionally, we propose the PF system for multiple state equation to accurately track even when the sudden movement of a target deviates from the system state equation used in the process in the Markov chain based PF.
We described the target system with multiple state-space models, which are processed by the parallel processing algorithm on GPU.In the experiment, we show an implemented result using the proposed algorithm on the GPU and analyzed the execution time of the algorithm required for the target position estimation, including the partial execution time of the proposed algorithm.

Related Works
PF using TDOA measurements have been widely studied, but the processing of noise in the algorithm and noise environment have been emphasized rather than high speed processing [4,5].Also, a target estimation method using TDOA underwater was studied [6].In this paper, we focus on providing algorithm implementation and verification by approaching GPU from acceleration using parallel processing and estimating the sudden movement of the target using multiple system state equations.
As a method for measuring TDOA, various methods using the GCC algorithm have been introduced [7,8].They have relative advantage for real-time applications in which performance is more important.The significant approach based steered response power-phase transform (SRP-PHAT) introduces an effective approach for the robust signal processing in sound localization [9], and GPU-based acceleration is also proposed in TDOA measurement using SRP-PHAT [10].Our paper is based on initial approach [11], which using the GCC-PHAT-based TDOA measurement, specially presenting our experience in implementing GPU-based acceleration approach to guarantee real-time performance.In addition, the GPU-based acceleration for TDOA measurement value in specific number of sensors [12] has been presented case study.
Various studies on the GPU-based PF [13,14] and parallelization of multi-target tracking have been presented.The scheduling algorithm and processor selection in paralleling PF processing is introduced [15].A significant approach [16] also introduce the parallelization approach of the memetic algorithm and PF for tracking multi objects.Compare to these approaches, we additionally consider the uncertain model of target system state, in case of suddenly abnormal target movement.So, we adopted multiple state-space models which are selectively applied to the PF processing, to determine the good enough estimation result.This requires the more computation resource or increase the calculation time, so that we had to accelerate the computation in PF processing by allocating them into independent calculation unit block in GPU.
There are many studies of SMC methods in using uncertain state-space models [17].Some approaches, which adopt multiple state-space models to overcome weakness in PF processing for the uncertain model, are introduced [18,19].These studies consider the PF processing using multiple state-space models in single system.Our approach is slightly comparable because we try to allocate the individual kernels for PF processing into multiple processing unit in GPU.This enables to apply the specific state-space model to the PF processing kernel independently, so that we could get scalability using the proposed architecture in case of tracking large-scaled moving target, which is more important in IoT-driven applications.Some target localization studies for IoT-driven applications using radio-frequency identification (RFID) tags including PF processing have already been studied [20][21][22].Therefore, this study aims at implementation of scalable GPU-based algorithm which can be used in IoT sensing applications.

Proposed Architecture
The proposed overall architecture of this paper is as follows.This paper focuses on accelerating the position estimation of a sound source based on a PF using an acoustic signal sensor array in a GPU.The TDOA measurement is required for the sound source localization based on the proposed GPU, and the location of the current target can be determined using three TDOA measurements in the three-dimensional space.In order to obtain three TDOA measurements, four acoustic sensors are required including one reference acoustic sensor.In this study, we used GCC-PHAT to obtain TDOA measurements.To obtain TDOA measurements at high speed, the GCC-PHAT process of the acoustic signal input by multiple sensors is performed in parallel using GPU.
The proposed method uses a PF to estimate the current state of a target using TDOA measurements, and we accelerate it.The PF process used in the study is as follows.Using the computed TDOA measurements and the state equations of the system, we predict the current state from the previous state of the system and obtain the observed value from the state of the updated system.Then, the weight of each particle is obtained by comparing with the actual measurement.In this process, we propose accelerating the state update and weight calculation process of the particle mentioned above by parallel operation of each particle in the kernel using GPU.Also, PF operation is customized using multiple system state equations based on GPU for tracking sudden movements of moving targets which also can be used for tracking multiple target.The proposed architecture is shown in Figure 1.Moreover, this proposed architecture can be used in IoT applications as shown in Figure 2, that is consisted with many sensors.Due to the amount of signal data from many sensors, it is reasonable using GPU in terms of high performance computing to accelerate signal processing for detecting target more accurately by increasing the number of sensors.

Particle Filter (PF) with Time Difference of Arrival (TDOA) Measurement
To estimate the movement of the acoustic signal source, we use the TDOA measurement obtained from the acoustic sensors.Four acoustic sensors including one reference sensor are used to estimate the position of the target in a three-dimensional space.The proposed PF is based target position estimation using TDOA measurements obtained in this structure.The PF is a method of probabilistically estimating the state of a target using N particles, also called the Monte Carlo method.Unlike the extended KF and unscented KF, the PF has strength in non-Gaussian and non-linear systems.The proposed algorithm is scalable to the number of sensors, and it is applicable even if the number of sensors is increased as shown in Figure 3.The proposed architecture uses four sensors which is minimum number of sensors required to detect a target in three-dimensional space or can use more than minimum number of sensors.The PF processing can operated on the GPU using each sensor node set, and the proposed architecture can detect multiple targets in parallel using acoustic signal.
First, the state of the system used at PF processing is defined as follows.The state of the moving target is defined as a uniform speed motion.In this case, the system state equation for estimating the current state s k in a previous state s k−1 is expressed by Equation (1), where s k denotes a position vector of the three-dimensional space at time k.A denotes a state transition equation matrix, and p k denotes process noise.The observation model, which represents the relationship between the state of the system and the measured value, is defined by Equation ( 2), where z k denotes measured value and v k denotes measurement noise.In this study, we propose accelerating the position estimation of the target using TDOA measurements, and the measured value z k is a TDOA measurement obtained from the signal input to each acoustic sensor.In Equation ( 2), the relation between the system state and the TDOA measurements is represented by h, and it can be obtained from the following procedure.The distance between the target and the sensor is defined by r a (s k ) in Equation (3).Also the reference sensor for obtaining the TDOA measurement is defined as s a , where u k is the position vector of the target and s a is the position vector of the ath sensor.As shown in second line of Equation (3) , r ab is displacement value between distance to s a and s b from target, which can be represented with the velocity of source wave 'c' multiplied by the TDOA value between sensor s a and s b .Figure 3's left one shows the reference sensor and the other sensor.TDOA measurements used for the PF are obtained between reference sensor s a and the other sensor.Equation ( 4) is used to obtain the TDOA measurement value by using the distance from each reference sensor s a and sensor s b to the target.The velocity c is the speed of the sound wave in the water, which is 1500 m/s.Therefore, the target location estimation by applying three TDOA measurement values is described with the following equation; Equation (5) defines a relation h ab between the system state and the TDOA measurement.The state from three or more TDOA measurement values are represented in Equation ( 6) by a matrix H, where index 'a' represents reference sensor and index 'X' represents relative sensors.
The operation of the PF for estimating the position of the moving target using the relation between the system and the measurement is shown in Algorithm 1.The overall algorithm of the PF using the TDOA measurement is as follows.First, the state of the system for each particle is updated.Then, each particle predicts the observed value using the relational expression between the system state and the measurement values from each updated state.Additionally, the weight of each particle is calculated according to the probability distribution of the actual observed TDOA measurement value and the predicted measurement value.After that, the effective sample size (ESS) is calculated, then it is compared with threshold N th [23,24].The N th is defined as a predefined threshold for resampling.In our implementation, the threshold N th is defined as 0.8 times of number of particles.Therefore, if the ESS is below a predetermined number of samples, which is denoted as threshold N th , the weight of each particle is resampled.The resampling process removes low weighted particles and selects particles with high weight to reduce errors due to probability distributions.The PF estimates the position of the moving target more accurately using the resampling process.

TDOA Measurement Using Generalized Cross Correlation Phase Transform (GCC-PHAT)
To estimate the location of sound sources from TDOA observations, we use the GCC-PHAT method.As mentioned before, the GCC-PHAT is a method of obtaining the correlation coefficient of signals using phase transform, and operates in the frequency domain as that requires less calculation that the time domain.Algorithm 2 shows the algorithm for obtaining the TDOA measurement using acoustic signal input from two acoustic sensors.The TDOA measurements can be obtained by using the Fourier transform process and inverse Fourier transform process of two acoustic signals as in the algorithm.The position of the sound source in the three-dimensional space can be obtained from these measurements.In this paper, we propose performing the calculation process in parallel and accelerating the operation speed of the entire algorithm.
TDOA measurement T ab = argmax p (I GPH AT (p))/sampling rate 7: end for

Accelerated Algorithm Based on GPU
Our approach accelerates the proposed position estimation of a moving target using GPU.The GPU is an architecture that has strength in parallel computing, and is used for high-speed computation of a large amount of data.The proposed GPU-based moving target position estimation can be divided into the operation of obtaining the TDOA measurement value from the sensor, the operation of PF processing, and the tracking of the moving target using multiple system state equations.
The calculation of the TDOA measurements in the frequency domain uses GCC-PHAT.Therefore, we use the cuFFT library supported by CUDA for accelerating the computation of TDOA measurements using Fast fourier transform.To accelerate Fourier transform on the data using the same coefficients, the Fourier transform operations for multiple sensors are processed in parallel as shown in Figure 4 by using the batch function of cuFFT.At this time, when the number of sensors is K and the amount of data obtained from each sensor is N, the amount of data copied to the GPU device is K*N.Because the position estimation of the moving target using TDOA measurement is more accurate with the use of many sensors, the proposed accelerated TDOA measurement calculation has an advantage as the data amount increases.As mentioned before, since the proposed algorithm is scalable for number of sensors, the proposed parallel Fast fourier transform is also scalable in the number of sensors.Also, we propose the parallel computing of the PF using GPU as shown in Figure 5.Because the PF processing depends on the number of particles, the entire process can be accelerated by paralleling and accelerating the operation of each particle in GPU.In particular, the process of estimating the position of moving targets required the acceleration in performing the weight calculation process of each particle in parallel through the process of updating the state of each particle and calculating the measured value from the state.

N particles
For each particle, parallel computing

Target Tracking Using Multiple State-Space Models
The KF and PF, which can be used for tracking the position of a target have system state equations based on Markov chains.In other words, if the target moves differently from the predefined system state equation, the estimation result in inaccurate.Therefore, in this paper, we define several other system state equations that are expected and go through them like the current system state equations.This proposed structure also can be used for multiple target tracking using multiple system models.From each multiple state update model, we obtain a measured value.Then the measured values are compared with actual measured values, and particles that have measured values with the smallest differences are selected for the remaining PF process.Using proposed method, tracking of sudden movements of a target can be easily enabled using multiple system state equations based on GPU.
The operation of the GPU-based PF using the proposed system state equations is shown in Figure 6.The process of updating the state of each particle by multiple system state equations is performed in parallel using the kernel function in the GPU.For all particles obtained from the multiple state-space model, the state update process, weight calculation, and normalized process are performed in parallel.To obtain the particles of the state-space model suitable for the current system state, the predicted measured values are compared with actual measured values by finding the smallest difference.Then the particles with the smallest differences are selected and resampled.In our implementation, we use systematic resampling algorithm [25,26] in which the random number generation is required in resampling process.There is a study [27] to accelerate resampling in GPU-based PF.In our study, for reducing the additional overhead running random number calculation during the resampling process, we utilized thread-based running method, which is provided by host CPU, in the process of copying data to the GPU.Algorithm 3 describes the PF processing using multiple state-space models in GPU.The transition matrix A m is used to formulate multiple state-space models.The PF processing for the selected state-space model is performed in the individual block unit of GPU.This block runs the threads on which the corresponding particles are allocated.Through the synchronization between threads, the average measurement result can be estimated from the updated state, so that we could determine the appropriate state-space model, then continue to perform the remained PF processing incrementally.Algorithm 3 PF using multiple state-space models in GPU 1: Set initial state s m 0 (m = 1, 2, ..., L) 2: Generate particles s m 0(i) (i = 1, 2, ..., N), (m = 1, 2, ..., L) 3: Set initial weights W m 0(i) = 1/N (i = 1, 2, ..., N), (m = 1, 2, ..., L) 4: for k = 1, 2, 3, ..., i ← calculated by thread index, m ← calculated by block index 5: __syncthreads(); 8: __syncthreads(); 10: if e m is minimum value where p(z Resampling 17: end if 18: end if 20: end for

Experimental Results
Figure 7 shows each experimental flow of MATLAB and GPU.We implemented the proposed GPU-based target position estimation algorithm and analyzed it by operating on a target board.The NVIDIA Jetson TX1 and Tesla K40c (NVIDIA Corporation, Santa Clara, CA, USA) were used as the target GPU as shown in Table 1.We verified the implemented algorithm and analyzed the execution time.The target position estimation algorithm using the PF and the TDOA measurement were implemented using MATLAB (Mathworks, Natick, MA, USA) and its operation was verified as shown in Figure 8.The black dots indicate each position of the acoustic sensor, the blue line indicates the actual moving path of the target, and the red line is the result of estimating the target position using the algorithm.Figure 8's right graph shows the result of target position estimation according to the state change of the target using the multiple state-space models proposed in this study.Even though the state equations used in PF was changed due to the target's sudden movement after a certain time, it was confirmed that the target was accurately estimated using the proposed PF technique.
We compared and analyzed the algorithm execution time used in the present position estimation of the proposed PF with a CPU-based algorithm and GPU-based algorithm in embedded system.Target movement was measured from the signal input to the sensors at 1 s intervals.We used NVIDIA Jetson TX1 as shown in Figure 7 and the fastest fourier transform in the west (FFTW3) library was used to compute FFT in CPU-based algorithm.For the result of when 2000 particles, the execution time of the GPU based algorithm was 6.19 ms, and the CPU based algorithm's execution time was 15.06 ms.We could estimate the target position more quickly when we use the proposed GPU-based algorithm in embedded system.The algorithm execution time is shown in Figure 9a according to the number of particles.In this graph, the horizontal axis is the number of particles and the vertical axis is the execution time of the algorithm.As shown in the graph, when we used the GPU-based algorithm, the execution time of the algorithm to estimate the current state of the target was reduced by about 55% on average.As a result, we find out that proposed GPU-based architecture is feasible for embedded system.Figure 9b shows the partial execution time of the algorithm in TDOA measurement calculation, PF processing, and copying data to GPU in the proposed multiple state-space model based algorithm.As with the result in Figure 9a, the total execution time of the algorithm is increased as the number of particles increased.Also, we found out that the total execution time is shorter than that of the CPU based algorithm, even though the time required to copy data for executing the algorithm in GPU is included.
We compared the execution time of the proposed GPU-based algorithm and the CPU-based algorithm according to the sampling rate of input data when using 2000 particles.The GPU algorithm used for this experiment is a simple GPU-based PF algorithm, not using a multiple state-space model based GPU PF algorithm for accurate comparison with the CPU-based algorithm.Table 2 shows the execution time of the algorithm measured by changing the sampling rate.As the sampling rate increases, the time required to execute the entire algorithm increases.That is, the amount of data input during the same time increases, so that more time is taken to calculate the TDOA measurement value.When using the sampled signal at a sampling rate of 44 kHz, the CPU-based algorithm took about 62.60 ms and the GPU-based algorithm took about 18.24 ms.The reducing rate of GPU-based algorithm for CPU-based algorithm has increased as sampling rate increased.As a result, we found that the GPU-based algorithm is less affected by the execution time relative to the sampling rate.To estimate sudden state changes of the system, we implemented the proposed GPU-based PF using a multiple state-space model that can be predicted.The multiple state-space model based PF using GPU with added parallel processing has similar performance to the PF that just uses the GPU.The results are as follows.Figure 10 shows the execution time according to the number of state-space models with the proposed GPU-based multiple system model using a PF.In this experiment, we use NVIDIA Tesla K40c to see GPU parallelism by multiple state-space model.In the GPU kernel, we implemented block-thread kernel architecture, especially divided the particles in each block according to the state-space model and each block has 1024 threads in two dimensions.Therefore, the implemented algorithm uses the same number of GPU blocks as the number of state-space models.Because of GPU's parallelism, for each number of particles, 1000 to 5000, the execution time is only different by less than 0.16 ms according to the number of state-space models.We found that parallel execution in the GPU does not significantly affect execution time even if the number of particles increases by the number of system models.For example, if the number of state-space models is X, the number of particles is X*N.Table 3 shows the number of blocks and threads, execution time for PF processing, and execution time for copying particles to GPU according to the number of state-space models when the number of particles is 5000.When the number of state-space models is 1 and 100, it shows just a 0.16 ms time difference for processing the PF, while there is a 10.99 ms time difference for copying data to GPU.It shows that a lot of execution time is used in the process of copying data to the GPU.It also shows that the PF processing time does not change significantly when the number of state-space models increases.Using this proposed GPU-based algorithm, we found that the state-space model suitable for sudden movement of the target was found through parallel computing as verified using MATLAB.With this parallel architecture, we find out that this architecture can also be applied to multiple target tracking using different state-space models, as implemented in this study in IoT sensing applications.

Conclusions
In this paper, we implemented an accelerated GPU-based algorithm to estimate moving target position by using multiple state-space models which can be applied to IoT sensing applications.The PF was used to estimate the position of the target, and we parallelized the calculation process of each particle in the GPU kernel.Also, we accelerated TDOA calculation through a parallel FFT process using GPU.Additionally, we extended the PF processing algorithm using a GPU-based multiple state-space model to estimate sudden movement change of the target.The proposed algorithm was initially simulated using MATLAB, and then, the proposed GPU-based algorithm was verified on target GPU.As a result, the execution time of the proposed algorithm using GPU was reduced by about 55% as compared with the CPU-based algorithm in target embedded board, NVIDIA Jetson TX1.The multiple state-space model based PF with parallel processing has a similar execution time in target GPU NVIDIA Tesla K40c with a difference of less than 0.16 ms when estimating the sudden change of movement of the target.Based on this result, the proposed architecture is more effective in terms of high performance and detecting sudden movement changes of the target with lots of sensor data in large-scaled sensing applications.Therefore, the proposed architecture can be effectively applied in these IoT applications due to scalability and parallelism.In a future study, we plan to extend our study on the GPU-based high speed processing algorithm considering the perspective of real-time processing in the big data sensing applications.

Figure 6 .
Figure 6.PF using multiple state-space models in GPU.

Figure 9 .
Figure 9. (a) Execution time of algorithm according to the number of particles (b) Partial execution time of algorithm according to the number of particles.

Figure 10 .
Figure 10.Execution time of algorithm according to the number of particles and state-space models.

Table 2 .
Execution time of algorithm according to the sampling rate.

Table 3 .
Number of threads and blocks and partial execution time according to number of state-space models.