# Multi-Target Tracking Using Windowed Fourier Single-Pixel Imaging

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Principle and Method

#### 2.1. Single-Target Tracking Method Using FSI

#### 2.2. Multi-Target Tracking Method Using WFSI

#### 2.3. Independent Estimation Approach

#### 2.4. Joint Estimation Approach

## 3. Results

#### 3.1. Simulation

#### 3.1.1. Multi-Target Locating

#### 3.1.2. Multi-Target Tracking Using the Independent Estimation Approach

#### 3.1.3. Multi-Target Tracking Using the Joint Estimation Approach

#### 3.2. Experiment

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

SPI | single-pixel imaging |

FSI | Fourier single-pixel imaging |

WFSI | windowed Fourier single-pixel imaging |

STFT | short-time Fourier transform |

DMD | digital modulating devices |

## References

**Figure 1.**Sketch of multi-target tracking using WFSI. The process is as follows: Firstly, locate the initial multi-target positions. Secondly, design the window functions according to the positions. Thirdly, implement $6K$ measurements to calculate $2K$ Fourier coefficients. After that, using the multi-target tracking method with the independent approach and joint approach, the displacements can be estimated. Then, according to the estimated result, redesign the window functions to continue multi-target tracking.

**Figure 2.**Illustration of multi-target locating via the projected curves method. ${f}_{Y}\left(x\right)$ and ${f}_{X}\left(y\right)$ are the projection of the scene image on the X axis and the Y axis. $({x}_{1},{x}_{2})$ and $({y}_{1},{y}_{2})$ are the regions where targets exist on the X axis and the Y axis. $({L}_{x1},{L}_{x2},{L}_{y1},{L}_{y2})$ are the length of the regions. $({w}_{1},{w}_{2})$ represents the window functions designed for the two targets.

**Figure 3.**Illustrations of the measurement process. (

**a**) is the target image with $256\times 256$ pixels, (

**b**) is the designed window function with $256\times 256$ pixels. (

**c**–

**e**) are the binary Fourier patterns ${P}_{bin}(\frac{2}{N},0,\phi )$ with $512\times 512$ pixels used to calculate the Fourier coefficient ${F}_{I2}(\frac{2}{N},0)$.

**Figure 4.**Simulation scenes: (

**a**) a complex background, (

**b**) a scene image at moment ${t}_{1}$, and (

**c**) a scene image at moment ${t}_{2}$. There are four moving targets in the scene, denoted as ${T}_{1},{T}_{2},{T}_{3},{T}_{4}$.

**Figure 5.**Projected curves of the image with four targets. (

**a**) is the projected curve of image scene on X axis and (

**b**) is the projected curve on Y axis. $(\widehat{{x}_{1}},\widehat{{x}_{2}},\widehat{{x}_{3}})$ are the three regions on the X axis where the target may exist, and $(\widehat{{y}_{1}},\widehat{{y}_{2}},\widehat{{y}_{3}})$ are the three regions on the Y axis. The length ${L}_{x}$ and ${L}_{y}$ of the regions determine the size of window function.

**Figure 6.**The four window functions for four targets: (

**a**) is the four-window function, (

**b**) is the windowed image at moment ${t}_{1}$, and (

**c**) is the windowed image at moment ${t}_{2}$.

**Figure 7.**The four window functions: (

**a**–

**d**) are the images at the moment ${t}_{1}$, and (

**e**–

**h**) are the images at the moment ${t}_{2}$; (

**a**,

**e**) represent target ${T}_{1}$, (

**b**,

**f**) represent target ${T}_{2}$, (

**c**,

**g**) represent target ${T}_{3}$, and (

**d**,

**h**) represent target ${T}_{4}$.

**Figure 8.**Illustrations of the windowed image at moment ${t}_{2}$ and ${t}_{3}$: (

**a**–

**d**) represent the images at moment ${t}_{2}$, and (

**e**–

**h**) represent the images at moment ${t}_{3}$, (

**c**,

**g**) are the images of target ${T}_{2}$, and (

**d**,

**h**) are the images of target ${T}_{3}$. (

**h**) contains the interference caused by target ${T}_{2}$.

**Figure 9.**Experiment scene: (

**a**) is the 240th frame in the video, and (

**b**) is the image of 240th frame with background subtraction.

**Figure 10.**Estimated results in five different frames from the 260th frame to the 340th frame. The first row shows the scene images, and the second row shows the results of multi-target tracking. The real-time estimated position of Target 1 (representing the yellow ball on the left in the 240th frame) is marked with a red dot, and Target 2 (representing the red ball on the right in the 240th frame) is marked with a red triangle.

**Figure 12.**The estimated results of coordinates between the 240th frame and the 630th frame: (

**a**) is the yellow ball, and (

**b**) is the red ball.

Targets | ${\mathit{T}}_{1}$ | ${\mathit{T}}_{2}$ | ${\mathit{T}}_{3}$ | ${\mathit{T}}_{4}$ |
---|---|---|---|---|

Position coordinate | $(20,20)$ | $(20,200)$ | $(100,200)$ | $(200,300)$ |

Estimation result | $(19,20)$ | $(20,199)$ | $(98,199)$ | $(198,299)$ |

Targets | ${\mathit{T}}_{1}$ | ${\mathit{T}}_{2}$ | ${\mathit{T}}_{3}$ | ${\mathit{T}}_{4}$ |
---|---|---|---|---|

True value | $(5,5)$ | $(20,0)$ | $(-10,20)$ | $(20,-10)$ |

Estimation result | $(4.82,4.60)$ | $(19.67,0.08)$ | $(-8.89,19.65)$ | $(21.15,-8.55)$ |

Error | 0.43 | 0.34 | 1.16 | 1.85 |

Independent Estimation | Joint Estimation | True Value | |
---|---|---|---|

T2 | (19.66, −0.14) | (19.48, 0.15) | (20, 0) |

Error of T2 | 0.37 | 0.29 | |

T3 | (−23.34, −6.92) | (−11.29, 19.82) | (−10, 20) |

Error of T3 | 30.04 | 1.30 |

