The proposed overall architecture of this paper is as follows. This paper focuses on accelerating the position estimation of a sound source based on a PF using an acoustic signal sensor array in a GPU. The TDOA measurement is required for the sound source localization based on the proposed GPU, and the location of the current target can be determined using three TDOA measurements in the threedimensional space. In order to obtain three TDOA measurements, four acoustic sensors are required including one reference acoustic sensor. In this study, we used GCCPHAT to obtain TDOA measurements. To obtain TDOA measurements at high speed, the GCCPHAT process of the acoustic signal input by multiple sensors is performed in parallel using GPU.
The proposed method uses a PF to estimate the current state of a target using TDOA measurements, and we accelerate it. The PF process used in the study is as follows. Using the computed TDOA measurements and the state equations of the system, we predict the current state from the previous state of the system and obtain the observed value from the state of the updated system. Then, the weight of each particle is obtained by comparing with the actual measurement. In this process, we propose accelerating the state update and weight calculation process of the particle mentioned above by parallel operation of each particle in the kernel using GPU. Also, PF operation is customized using multiple system state equations based on GPU for tracking sudden movements of moving targets which also can be used for tracking multiple target. The proposed architecture is shown in
Figure 1. Moreover, this proposed architecture can be used in IoT applications as shown in
Figure 2, that is consisted with many sensors. Due to the amount of signal data from many sensors, it is reasonable using GPU in terms of high performance computing to accelerate signal processing for detecting target more accurately by increasing the number of sensors.
3.1. Particle Filter (PF) with Time Difference of Arrival (TDOA) Measurement
To estimate the movement of the acoustic signal source, we use the TDOA measurement obtained from the acoustic sensors. Four acoustic sensors including one reference sensor are used to estimate the position of the target in a threedimensional space. The proposed PF is based target position estimation using TDOA measurements obtained in this structure. The PF is a method of probabilistically estimating the state of a target using N particles, also called the Monte Carlo method. Unlike the extended KF and unscented KF, the PF has strength in nonGaussian and nonlinear systems. The proposed algorithm is scalable to the number of sensors, and it is applicable even if the number of sensors is increased as shown in
Figure 3. The proposed architecture uses four sensors which is minimum number of sensors required to detect a target in threedimensional space or can use more than minimum number of sensors. The PF processing can operated on the GPU using each sensor node set, and the proposed architecture can detect multiple targets in parallel using acoustic signal.
First, the state of the system used at PF processing is defined as follows. The state of the moving target is defined as a uniform speed motion. In this case, the system state equation for estimating the current state
${s}_{k}$ in a previous state
${s}_{k1}$ is expressed by Equation (
1), where
${s}_{k}$ denotes a position vector of the threedimensional space at time k. A denotes a state transition equation matrix, and
${p}_{k}$ denotes process noise. The observation model, which represents the relationship between the state of the system and the measured value, is defined by Equation (
2), where
${z}_{k}$ denotes measured value and
${v}_{k}$ denotes measurement noise.
In this study, we propose accelerating the position estimation of the target using TDOA measurements, and the measured value
${z}_{k}$ is a TDOA measurement obtained from the signal input to each acoustic sensor. In Equation (
2), the relation between the system state and the TDOA measurements is represented by
h, and it can be obtained from the following procedure. The distance between the target and the sensor is defined by
${r}_{a}\left({s}_{k}\right)$ in Equation (
3). Also the reference sensor for obtaining the TDOA measurement is defined as
${s}_{a}$, where
${u}_{k}$ is the position vector of the target and
${s}_{a}$ is the position vector of the
ath sensor. As shown in second line of Equation (
3) ,
${r}_{ab}$ is displacement value between distance to
${s}_{a}$ and
${s}_{b}$ from target, which can be represented with the velocity of source wave ‘c’ multiplied by the TDOA value between sensor
${s}_{a}$ and
${s}_{b}$.
Figure 3’s left one shows the reference sensor and the other sensor. TDOA measurements used for the PF are obtained between reference sensor
${s}_{a}$ and the other sensor. Equation (
4) is used to obtain the TDOA measurement value by using the distance from each reference sensor
${s}_{a}$ and sensor
${s}_{b}$ to the target. The velocity c is the speed of the sound wave in the water, which is 1500 m/s.
Therefore, the target location estimation by applying three TDOA measurement values is described with the following equation; Equation (
5) defines a relation
${h}_{ab}$ between the system state and the TDOA measurement. The state from three or more TDOA measurement values are represented in Equation (
6) by a matrix
H, where index ‘a’ represents reference sensor and index ‘X’ represents relative sensors.
The operation of the PF for estimating the position of the moving target using the relation between the system and the measurement is shown in Algorithm 1. The overall algorithm of the PF using the TDOA measurement is as follows. First, the state of the system for each particle is updated. Then, each particle predicts the observed value using the relational expression between the system state and the measurement values from each updated state. Additionally, the weight of each particle is calculated according to the probability distribution of the actual observed TDOA measurement value and the predicted measurement value. After that, the effective sample size (ESS) is calculated, then it is compared with threshold
${N}_{th}$ [
23,
24]. The
${N}_{th}$ is defined as a predefined threshold for resampling. In our implementation, the threshold
${N}_{th}$ is defined as 0.8 times of number of particles. Therefore, if the ESS is below a predetermined number of samples, which is denoted as threshold
${N}_{th}$, the weight of each particle is resampled. The resampling process removes low weighted particles and selects particles with high weight to reduce errors due to probability distributions. The PF estimates the position of the moving target more accurately using the resampling process.
Algorithm 1 PF 
 1:
$Set\phantom{\rule{4pt}{0ex}}initial\phantom{\rule{4pt}{0ex}}state\phantom{\rule{4pt}{0ex}}{s}_{0}\phantom{\rule{0.277778em}{0ex}}$  2:
$Generate\phantom{\rule{4pt}{0ex}}particles\phantom{\rule{4pt}{0ex}}{s}_{0\left(i\right)}\phantom{\rule{0.277778em}{0ex}}(i=1,2,\dots ,N)$  3:
$Set\phantom{\rule{4pt}{0ex}}initial\phantom{\rule{4pt}{0ex}}weights\phantom{\rule{4pt}{0ex}}{W}_{0\left(i\right)}=1/N\phantom{\rule{0.277778em}{0ex}}(i=1,2,\dots ,N)$  4:
for $k=1,2,3,...$  5:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{s}_{k\left(i\right)}=A{s}_{k1\left(i\right)}+{p}_{k}(i=1,2,\dots N)$  6:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}z=TDOA\phantom{\rule{4pt}{0ex}}measurement\phantom{\rule{4pt}{0ex}}values$  7:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}Reweight\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{4pt}{0ex}}{w}_{k\left(i\right)}={W}_{k1\left(i\right)}p(z\mid {s}_{k\left(i\right)})$  8:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}where\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{4pt}{0ex}}p(z\mid {s}_{k\left(i\right)})\sim \mathcal{N}\left(H\left({s}_{k\left(i\right)}\right),{Q}_{v}\right)$  9:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}Normalize\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{4pt}{0ex}}{W}_{k\left(i\right)}={w}_{k\left(i\right)}/{\sum}_{i=1}^{N}{w}_{k\left(i\right)}$  10:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}ESS=1/{\sum}_{i=1}^{N}{\left({w}_{k\left(i\right)}\right)}^{2}$  11:
if $ESS\le {N}_{th}$  12:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}Resampling$  13:
end if  14:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{s}_{k}={\sum}_{i=1}^{N}{W}_{k\left(i\right)}{s}_{k\left(i\right)}$  15:
end for

3.3. Accelerated Algorithm Based on GPU
Our approach accelerates the proposed position estimation of a moving target using GPU. The GPU is an architecture that has strength in parallel computing, and is used for highspeed computation of a large amount of data. The proposed GPUbased moving target position estimation can be divided into the operation of obtaining the TDOA measurement value from the sensor, the operation of PF processing, and the tracking of the moving target using multiple system state equations.
The calculation of the TDOA measurements in the frequency domain uses GCCPHAT. Therefore, we use the cuFFT library supported by CUDA for accelerating the computation of TDOA measurements using Fast fourier transform. To accelerate Fourier transform on the data using the same coefficients, the Fourier transform operations for multiple sensors are processed in parallel as shown in
Figure 4 by using the batch function of cuFFT. At this time, when the number of sensors is K and the amount of data obtained from each sensor is N, the amount of data copied to the GPU device is K*N. Because the position estimation of the moving target using TDOA measurement is more accurate with the use of many sensors, the proposed accelerated TDOA measurement calculation has an advantage as the data amount increases. As mentioned before, since the proposed algorithm is scalable for number of sensors, the proposed parallel Fast fourier transform is also scalable in the number of sensors.
Also, we propose the parallel computing of the PF using GPU as shown in
Figure 5. Because the PF processing depends on the number of particles, the entire process can be accelerated by paralleling and accelerating the operation of each particle in GPU. In particular, the process of estimating the position of moving targets required the acceleration in performing the weight calculation process of each particle in parallel through the process of updating the state of each particle and calculating the measured value from the state.
3.4. Target Tracking Using Multiple StateSpace Models
The KF and PF, which can be used for tracking the position of a target have system state equations based on Markov chains. In other words, if the target moves differently from the predefined system state equation, the estimation result in inaccurate. Therefore, in this paper, we define several other system state equations that are expected and go through them like the current system state equations. This proposed structure also can be used for multiple target tracking using multiple system models. From each multiple state update model, we obtain a measured value. Then the measured values are compared with actual measured values, and particles that have measured values with the smallest differences are selected for the remaining PF process. Using proposed method, tracking of sudden movements of a target can be easily enabled using multiple system state equations based on GPU.
The operation of the GPUbased PF using the proposed system state equations is shown in
Figure 6. The process of updating the state of each particle by multiple system state equations is performed in parallel using the kernel function in the GPU. For all particles obtained from the multiple statespace model, the state update process, weight calculation, and normalized process are performed in parallel. To obtain the particles of the statespace model suitable for the current system state, the predicted measured values are compared with actual measured values by finding the smallest difference. Then the particles with the smallest differences are selected and resampled. In our implementation, we use systematic resampling algorithm [
25,
26] in which the random number generation is required in resampling process. There is a study [
27] to accelerate resampling in GPUbased PF. In our study, for reducing the additional overhead running random number calculation during the resampling process, we utilized threadbased running method, which is provided by host CPU, in the process of copying data to the GPU.
Algorithm 3 describes the PF processing using multiple statespace models in GPU. The transition matrix ${A}_{m}$ is used to formulate multiple statespace models. The PF processing for the selected statespace model is performed in the individual block unit of GPU. This block runs the threads on which the corresponding particles are allocated. Through the synchronization between threads, the average measurement result can be estimated from the updated state, so that we could determine the appropriate statespace model, then continue to perform the remained PF processing incrementally.
Algorithm 3 PF using multiple statespace models in GPU 
 1:
$Set\phantom{\rule{4pt}{0ex}}initial\phantom{\rule{4pt}{0ex}}state\phantom{\rule{4pt}{0ex}}{s}_{0}^{m}\phantom{\rule{0.277778em}{0ex}}(m=1,2,\dots ,L)$  2:
$Generate\phantom{\rule{4pt}{0ex}}particles\phantom{\rule{4pt}{0ex}}{s}_{0\left(i\right)}^{m}\phantom{\rule{0.277778em}{0ex}}(i=1,2,\dots ,N),(m=1,2,\dots ,L)$  3:
$Set\phantom{\rule{4pt}{0ex}}initial\phantom{\rule{4pt}{0ex}}weights\phantom{\rule{4pt}{0ex}}{W}_{0\left(i\right)}^{m}=1/N\phantom{\rule{0.277778em}{0ex}}(i=1,2,\dots ,N),(m=1,2,\dots ,L)$  4:
for $k=1,2,3,...,\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}i\leftarrow \phantom{\rule{0.277778em}{0ex}}calculated\phantom{\rule{4pt}{0ex}}by\phantom{\rule{4pt}{0ex}}thread\phantom{\rule{4pt}{0ex}}index,\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}m\leftarrow \phantom{\rule{0.277778em}{0ex}}calculated\phantom{\rule{4pt}{0ex}}by\phantom{\rule{4pt}{0ex}}block\phantom{\rule{4pt}{0ex}}index$  5:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{s}_{k\left(i\right)}^{m}={A}_{m}{s}_{k1\left(i\right)}^{m}+{p}_{k}^{m}(i=1,2,\dots N)$  6:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}z=TDOA\phantom{\rule{4pt}{0ex}}measurement\phantom{\rule{4pt}{0ex}}values$  7:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\_\_syncthreads\left(\right);$  8:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}Calculate\phantom{\rule{4pt}{0ex}}difference\phantom{\rule{0.277778em}{0ex}}{e}^{m}=z({\sum}_{i=1}^{N}{H}_{k}({s}_{k\left(i\right)}^{m})\left\right)/N$  9:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\_\_syncthreads\left(\right);$  10:
if ${e}^{m}\phantom{\rule{4pt}{0ex}}is\phantom{\rule{4pt}{0ex}}minimum\phantom{\rule{4pt}{0ex}}value$  11:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}Reweight\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{4pt}{0ex}}{w}_{k\left(i\right)}^{m}={W}_{k1\left(i\right)}^{m}p(z\mid {s}_{k\left(i\right)}^{m})$  12:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}where\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{4pt}{0ex}}p(z\mid {s}_{k\left(i\right)}^{m})\sim \mathcal{N}(H({s}_{k\left(i\right)}^{m}),{Q}_{v})$  13:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}Normalize\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{4pt}{0ex}}{W}_{k\left(i\right)}^{m}={w}_{k\left(i\right)}^{m}/{\sum}_{i=1}^{N}{w}_{k\left(i\right)}^{m}$  14:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}ESS=1/{\sum}_{i=1}^{N}{({w}_{k\left(i\right)}^{m})}^{2}$  15:
if $ESS\le {N}_{th}$  16:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}Resampling$  17:
end if  18:
$\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{s}_{k}^{m}={\sum}_{i=1}^{N}{W}_{k\left(i\right)}^{m}{s}_{k\left(i\right)}^{m}$  19:
end if  20:
end for
