Multi-Target Tracking with Collaborative Roadside Units Under Foggy Conditions

Shi, Tao; Wang, Xuan; Jiang, Wei; Huang, Xiansheng; Cen, Ming; Cao, Shuai; Zhou, Hao

doi:10.3390/s26030998

Open AccessArticle

Multi-Target Tracking with Collaborative Roadside Units Under Foggy Conditions

by

Tao Shi

^1,2

,

Xuan Wang

^2,3

,

Wei Jiang

^2,*,

Xiansheng Huang

²,

Ming Cen

⁴,

Shuai Cao

² and

Hao Zhou

²

¹

State Key Laboratory of Intelligent Vehicle Safety Technology, Chongqing 400023, China

²

Intelligent Connected Vehicle Inspection Center (Hunan) of CAERI Co., Ltd., Changsha 410205, China

³

Jiangsu CAERI Automotive Engineering Research Institute Co., Ltd., Suzhou 215151, China

⁴

School of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(3), 998; https://doi.org/10.3390/s26030998

Submission received: 1 December 2025 / Revised: 30 December 2025 / Accepted: 13 January 2026 / Published: 3 February 2026

(This article belongs to the Section Vehicular Sensing)

Download

Browse Figures

Versions Notes

Abstract

The Intelligent Road Side Unit (RSU) is a crucial component of Intelligent Transportation Systems (ITSs), where roadside LiDAR are widely utilized for their high precision and resolution. However, water droplets and atmospheric particles in fog significantly attenuate and scatter LiDAR beams, posing a challenge to multi-target tracking and ITS safety. To enhance the accuracy and reliability of RSU-based tracking, a collaborative RSU method that integrates denoising and tracking for multi-target tracking is proposed. The proposed approach first dynamically adjusts the filtering kernel scale based on local noise levels to effectively remove noisy point clouds using a modified bilateral filter. Subsequently, a multi-RSU cooperative tracking framework is designed, which employs a particle Probability Hypothesis Density (PHD) filter to estimate target states via measurement fusion. A multi-target tracking system for intelligent RSUs in Foggy scenarios was designed and implemented. Extensive experiments were conducted using an intelligent roadside platform in real-world fog-affected traffic environments to validate the accuracy and real-time performance of the proposed algorithm. Experimental results demonstrate that the proposed method improves the target detection accuracy by 8% and 29%, respectively, compared to statistical filtering methods after removing fog noise under thin and thick fog conditions. At the same time, this method performs well in tracking multi-class targets, surpassing existing state-of-the-art methods, especially in high-order evaluation indicators such as HOTA, MOTA, and IDs.

Keywords:

collaborative RSU; roadside LiDAR; foggy conditions; LiDAR denoising; multi-target tracking; particle PHD filter

1. Introduction

In recent years, breakthrough advances in artificial intelligence have accelerated the development of ITS [1], facilitating their transition from theoretical research to practical implementation. However, single-vehicle perception systems are often inadequate in complex traffic environments due to limitations in the field of view of onboard sensors and constrained computational resources. To address these challenges, Vehicle-Infrastructure Cooperative Systems (VICS) [2] have been introduced, which leverage roadside units (RSUs) [3] and vehicle-mounted terminals for collaborative environmental sensing. Such systems provide autonomous vehicles with beyond-line-of-sight environmental information, thereby substantially enhancing safety redundancy and scene adaptability.

Within VICS, RSUs employ a variety of sensors for environmental perception. Although visual sensors (e.g., cameras) offer advantages in terms of low cost and high resolution, their imaging quality is susceptible to abrupt illumination changes and adverse weather conditions. Although millimeter-wave radar is capable of all-weather operation and exhibits strong anti-interference characteristics, it is limited by its relatively low angular resolution, making it difficult to accurately discern the geometric features of targets. In contrast, LiDAR (Light Detection and Ranging), with its wide-area detection capability and centimeter-level ranging accuracy, enables the construction of high-resolution 3D environmental point clouds and has thus become a core sensor in RSU-based environmental perception.

Fog is a frequently encountered meteorological condition, particularly in mountainous regions. Taking ChengDu as an example, fog occurs on approximately 125 days annually, posing significant challenges to target detection and tracking by RSUs. For instance, cameras suffer from reduced image contrast and a marked increase in noise due to light scattering in fog [4]. Millimeter-wave radar is affected by water vapor absorption peaks near harmonics such as 60 GHz, leading to considerable attenuation in signal strength and angular resolution [5,6]. As for LiDAR, suspended particles in fog—such as water droplets and aerosols—induce Mie scattering of laser beams, resulting in path loss and directional deviation of the transmitted signals [7,8]. This physical phenomenon leads to spatially heterogeneous degradation of point cloud quality: at long ranges, scattering significantly reduces point density, causing target contours to become blurred or even undetectable; at close ranges, multiple scattering generates numerous “ghost” noise points. These spurious points interweave with legitimate returns, substantially complicating target detection and motion state estimation, thereby imposing more stringent requirements on the robustness of multi-object tracking algorithms.

Subsequently, we critically review classical LiDAR point cloud denoising methods (e.g., statistical outlier removal and radius-based filtering), highlighting their fundamental limitation in handling dynamic noise under high-clutter scenarios. Thereafter, we introduce Multi-Object Tracking (MOT) technology through two distinct paradigms—data association frameworks (e.g., Kalman filter-based tracking) and Random Finite Set (RFS) theory. This theoretical foundation directly enables our Particle PHD filter design in Chapter 3, which resolves trajectory fragmentation in real-time denoising.

1.1. Lidar Point Cloud Denoising Method in Foggy Conditions

The adverse effects of fog on LiDAR systems primarily manifest as signal attenuation and point cloud noise [9]. Dense fog particles cause significant laser signal attenuation due to Mie scattering, with severity increasing at higher fog densities and shorter wavelengths. Backscattering from fog particles introduces false returns that corrupt point cloud data, while multiple scattering extends optical paths, distorts waveforms, and increases ranging errors. These phenomena collectively degrade point cloud quality and measurement accuracy, posing fundamental challenges to reliable LiDAR perception in adverse weather conditions. For this reason, point cloud denoising methods must be used to dynamically suppress the fog noise point cloud and guarantee the robustness of the downstream sensing algorithm. Currently commonly used point cloud denoising methods are as follows.

Point cloud denoising in perception systems primarily relies on three categories of methods: statistical filtering, deep learning, and multi-sensor fusion. Statistical filtering techniques, such as Gaussian [10], mean [11], and median [12,13] filtering, smooth data by leveraging statistical properties within local neighborhoods. They are computationally simple and efficient, yet exhibit limited adaptability to complex noise patterns. Deep learning approaches, including CNN [14] and GNN [15]-based models like PointNet [16] and PointNet++ [17], automatically learn both local and global features from point clouds, effectively handling diverse noise types and demonstrating strong generalization capabilities. However, they often require large annotated datasets and entail higher computational costs. Multi-sensor fusion methods improve robustness and accuracy under low-visibility conditions by integrating complementary information from heterogeneous sensors such as LiDAR, cameras, and millimeter-wave radar, often supported by filtering techniques like Kalman [18] or particle [19] filtering. These strategies collectively highlight a trend toward leveraging complementary information to overcome the limitations of single-modality sensing.

1.2. Multi-Target Tracking Methods

MOT [20] serves as a critical technology in ITS, designed to achieve continuous localization, identity maintenance, and trajectory estimation of multiple targets in dynamic environments using sequential sensor data. Existing research methodologies can be broadly categorized into two frameworks: data association and Random Finite Set (RFS)-based approaches [21].

1.2.1. Data Association

MOT based on data association establishes correspondences between sensor measurements and target states, facing challenges such as occlusion, clutter, and dynamic scenarios. Classical approaches are divided into probabilistic and optimal assignment categories. Probabilistic methods include Nearest Neighbor (NN) [22], Probabilistic Data Association (PDA) [23], and JPDA [24], with the latter suffering from combinatorial complexity. Multi-Hypothesis Tracking (MHT) [25] maintains multiple trajectory hypotheses but demands substantial computation. Optimal assignment methods, primarily using the Hungarian algorithm, achieve global matching, with recent improvements addressing occlusion through Kalman prediction and scene partitioning. Deep learning has promoted end-to-end frameworks like FairMOT [20] and TransTrack [26,27], which integrate detection and Re-ID tasks, reducing identity switches. However, data association methods remain constrained by their dependency on detection accuracy, limited nonlinear motion handling, and scalability issues in multi-RSU edge deployments.

1.2.2. Random Finite Set

In contrast to traditional multi-target tracking methods that rely on data association and fixed target numbers—often leading to errors in roadside LiDAR monitoring due to occlusion, noise, and dynamic changes—Random Finite Set (RFS)-based approaches model all target states as a set, enabling joint estimation of target states and cardinality without pre-defined target numbers. This allows adaptive handling of target appearance, disappearance, and partial occlusion.

The RFS framework, pioneered by Mahler, recursively updates the multi-target state within a Bayesian formulation, treating target states as a whole rather than associating measurements individually. The (PHD) [28] filter propagates the intensity function of target states, avoiding explicit data association and offering robustness under occlusion and noisy LiDAR observations. Its extension, the Cardinalized PHD (CPHD) [29,30], jointly estimates target states and their number distribution, improving cardinality accuracy at increased computational cost. To address this, methods such as linear-complexity multi-sensor CPHD, Gaussian mixture [31,32] implementations, and gamma cardinality modeling have been proposed to enhance efficiency and adaptability.

For nonlinear and non-Gaussian scenarios, particle PHD filters approximate state distributions via weighted particles, while recent studies integrate deep reinforcement learning within a POMDP [33] framework, showing significant gains in tracking accuracy (23.6–41.8% OSPA improvement). These learning-augmented methods demonstrate potential in handling high-dimensional point cloud data and complex motion patterns.

Overall, existing point cloud denoising methods predominantly rely on the static scene assumption, which proves inadequate for modeling the spatiotemporal correlations of dynamic noise, such as dust raised by moving vehicles or the trajectories of raindrops. This shortcoming often leads to suboptimal filter threshold settings. Furthermore, current roadside unit (RSU)-based tracking solutions primarily depend on local sensor measurements. However, in Foggy conditions, the Mie scattering effect from airborne particulates introduces substantial noise into the point clouds. This phenomenon compromises the accuracy of local measurements and consequently inflates the observation error variance in tracking filters, thereby jeopardizing the safety and stability of the vehicle-infrastructure cooperative systems.

Therefore, in response to the challenge of multi-object tracking degradation in roadside LiDAR systems under Foggy conditions owing to fog-induced noise, this study proposes a collaborative multi-object tracking method for RSUs tailored to fog-affected environments. The proposed approach aims to enhance tracking accuracy and reliability under such adverse conditions.

2. Main Methods

To address the challenges of LiDAR point cloud noise and multi-target tracking instability in Foggy conditions, this paper proposes an integrated approach combining adaptive point cloud denoising with multi-RSU collaborative tracking.

2.1. System Framework: Multi-Target Tracking of Roadside Unit Coordination

The proposed system framework is illustrated in Figure 1.

The system comprises a target detection module, fusion tracking module, and communication module. The target detection module preprocesses raw point clouds through background filtering and ground segmentation to derive non-ground points with residual noise, applies spatiotemporal denoising to eliminate fog-induced artifacts, and extracts targets via clustering; the fusion tracking module performs spatiotemporal fusion of road target measurements with measurement sets received from neighboring RSUs via the communication module to produce an augmented measurement set, which is processed by PHD filter to update target trajectories; the communication module enables inter-RSU measurement set exchange and delivers road traffic target information to vehicles.

2.2. Adaptive Lidar Point Cloud Denoising Method

Raw point clouds acquired from roadside LiDAR sensors are typically dense, unstructured, and contain a significant number of outliers and irrelevant points due to atmospheric interference, sensor noise, and reflection artifacts—especially in Foggy conditions. In response to the influence of fog on point clouds mentioned above, this paper proposes an adaptive LIDAR point cloud denoising method. The framework of the method is shown in Figure 2.

2.2.1. Point Cloud Preprocessing

The point cloud data used in this study was scanned using RS RubyLite (v. 23071401), and each point cloud corresponds to the actual position of the physical laser reflection on the surface of the object. The RS-RubyLite LiDAR, is an 80-channel mechanical spinning LiDAR specifically engineered for medium-to-high-speed autonomous driving applications. This sensor achieved a vertical angular resolution of 0.1° and delivers a detection range of 160 m against targets with 10% reflectivity, thereby providing sufficient environmental perception capabilities for diverse operational scenarios including autonomous passenger vehicles, heavy-duty mining trucks, commercial haulage vehicles, and vehicle-infrastructure cooperative systems.

The detailed technical specifications are shown in Table 1:

Point cloud preprocessing includes region of interest segmentation, point cloud segmentation, and point cloud integration. First, interest segmentation is explained from a practical perspective as background filtering—the process of removing useless point cloud points belonging to background elements (such as road surfaces, buildings, and atmospheric particles) to isolate points from relevant targets. Pass-through filtering is applied to define a 3D region of interest, effectively narrowing the processing scope while preserving critical spatial data. Subsequently, radius filtering is implemented to eliminate isolated noise points, thereby improving point cloud reliability. Building on this, a ground segmentation method integrating grid-based height difference analysis and local plane fitting is proposed: by constructing a grid map to compute height characteristics within each unit and combining threshold-based determination, preliminary separation between ground and non-ground points is achieved. Low-height seed points are then selected from the retained areas, and iterative plane fitting is performed using Principal Component Analysis (PCA) [34] to accomplish accurate ground segmentation in complex terrain.

Point cloud density is crucial for object detection quality, as sparse point clouds will result in insufficient geometric detail capture and degraded detection performance for small or distant objects. Therefore, a method known as multi-frame point cloud fusion was further introduced, which enhanced the point density of real targets through the fusion of consecutive temporal scans while effectively suppressing random noise. Collectively, these preprocessing steps established a high-quality data foundation for subsequent target detection and tracking tasks. The result after preprocessing was shown in Figure 3.

2.2.2. Voxelisation and Local Noise Estimation

Point cloud data, comprising irregularly distributed 3D points with spatial coordinates and intensity attributes, presents challenges in computational complexity and storage. Voxelization addresses this by discretizing the 3D space into uniform volumetric grids (voxels) of size

d_{x} \times d_{y} \times d_{z}

, effectively converting irregular point clouds into a structured representation. The voxelization process is shown in Figure 4. Within the roadside LiDAR coordinate system, each point

p

is mapped to a specific voxel using index coordinates calculated as Equation (1).

i = \frac{x}{d_{x}}, j = \frac{y}{d_{y}}, k = \frac{z}{d_{z}}

(1)

Leveraging the voxel structure, local noise estimation is performed efficiently. For each point

p_{i}

within voxel

V

, its neighborhood

N (p_{i})

is defined as all points within a fixed radius

r

, accelerated by the voxel grid for rapid neighbor retrieval. The neighborhood point set

N (p_{i})

can be calculated by Equation (2):

N (p_{i}) = {p \in V | ‖ p - p_{i} ‖ \leq r}

(2)

The local noise level is quantitatively evaluated through the spatial distribution and intensity characteristics of the neighborhood. The centroid

μ_{v}

and spatial standard deviation

σ_{v}

are computed as

σ_{v} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {‖ p_{i} - μ_{v} ‖}^{2}}

(3)

μ_{v} = \frac{1}{N} \sum_{i = 1}^{N} p_{i}

(4)

respectively. Simultaneously, the intensity mean

μ_{I}

and standard deviation

σ_{I}

are derived as

σ_{I} = \sqrt{\frac{1}{| N_{r} (p_{i}) |} \sum_{k \in N (p_{i})} {(I_{k} - μ_{I})}^{2}}

(5)

μ_{I} = \frac{1}{| N_{r} (p_{i}) |} \sum_{k \in N (p_{i})} I_{k}

(6)

2.2.3. Parameter Adaptive Adjustment

In Foggy conditions, LiDAR point clouds exhibit a non-linear surge in noise density within close-range regions. Specifically, the heightened probability of laser scattering by suspended particles leads to an approximately exponential increase in noise points as distance decreases. Meanwhile, distant regions suffer from signal attenuation that causes noise to interlace with valid points. This section proposes a piecewise exponential mapping function defined as Equation (7):

σ_{f} = {\begin{matrix} α_{1} \cdot e x p (λ σ_{v}) + β_{1} σ_{v} < Υ \\ α_{2} \cdot σ_{v} + β_{2} σ_{v} \geq Υ \end{matrix}

(7)

Remark 1.

In contrast to a linear mapping function, the proposed piecewise function explicitly accounts for the distinct representational characteristics of point clouds at varying distances, thereby constructing a more accurate and adaptive filtering kernel.

The parameters α, β, γ, and τ in Equations (7) and (10) were determined through a two-step methodology: initial values were selected based on empirical experience from similar cooperative control applications reported in the literature, ensuring fundamental stability requirements were met. Subsequently, systematic fine-tuning was performed via experimental validation across multiple scenarios, where a grid search over physically meaningful ranges identified the final parameter set that optimally balanced tracking accuracy, string stability, and computational efficiency.

In low-noise regions

(σ_{v} < Υ)

, where noise primarily manifests as isolated outliers, the exponential term

e x p (λ σ_{v})

(with λ < 0) restrains excessive kernel expansion to preserve fine details. Parameters

α_{1}

and

β_{1}

are introduced to constrain the lower bound of kernel width, ensuring fundamental denoising capability. In high-noise regions

σ_{v} \geq Υ

, where noise becomes deeply coupled with valid points, the linear term

α_{2} \cdot σ_{v}

rapidly increases kernel width through slope

α_{2}

to enhance smoothing, while intercept

β_{2}

compensates for background noise interference.

2.2.4. Edge Updates

In 3D point cloud data structures, edge points typically exhibit significantly larger local gradient magnitudes. To preserve edge details during filtering and prevent excessive smoothing, an edge-aware updating scheme based on gradient computation is introduced. The local gradient

\nabla p_{i}

at point

p_{i}

is calculated as follows:

In 3D point cloud data structures, edge points are characterized by significantly larger local gradient magnitudes. To preserve these critical features during filtering and prevent undesired smoothing, an edge-aware updating scheme based on local gradient computation is introduced.

Robustness Enhancement via Pre-Smoothing: The gradient calculation in Equation (8), while computationally efficient, can be sensitive to high-frequency noise—common in foggy or low-visibility environments—as noisy points introduce random directional variations. To enhance robustness, a fast neighborhood averaging pre-processing step is applied prior to gradient computation. For each point

p_{i}

, its position is temporarily updated as the centroid of its local neighborhood

N (p_{i})

:

\tilde{p_{i}} = \frac{1}{N (p_{i})} \sum_{k \in N (p_{i})} p_{k}

(8)

The local gradient

\nabla p_{i}

is then computed using the smoothed point

\tilde{p_{i}}

and its smoothed neighborhood:

\nabla p_{i} = \sum_{k \in N (p)} \frac{p_{i} - p_{k}}{‖ p_{i} - p_{k} ‖}

(9)

The gradient magnitude

‖ \nabla p_{i} ‖

is then computed and compared with a predefined threshold τ to determine the edge weight:

ω_{i} = {\begin{matrix} 1 + γ \cdot (‖ \nabla p_{i} ‖ - τ), ‖ \nabla p_{i} ‖ > τ \\ 1, ‖ \nabla p_{i} ‖ < τ \end{matrix}

(10)

Remark 2.

This equation enables a soft, gradient-aware transition between smoothing and preservation, which directly enhances the denoising performance in two critical aspects: it significantly improves the retention of sharp geometric features and edges, while simultaneously preventing the over-smoothing.

2.2.5. Bilateral Filtering

In complex Foggy conditions, conventional convolution kernels struggle to simultaneously achieve noise suppression and detail preservation. To address this challenge, a bilateral filter [35] is employed for point cloud denoising, effectively balancing smoothing performance with edge protection during the filtering process. This approach enhances traditional weighted-average filtering by incorporating both spatial distance and intensity characteristics, thereby preventing the loss of edge details while removing noise.

For each target point

p_{i}

(with spatial coordinates

p_{i} = (x_{i}, y_{i}, z_{i})

and intensity

I_{i}

) and its neighborhood set

N_{r} (p_{i})

, the filter assigns a composite weight

ω_{i k}

to each neighboring point

p_{k}

. This weight integrates spatial domain, intensity domain, formulated as:

ω_{i k} = ω_{i} \cdot e x p (- \frac{{‖ p_{i} - p_{k} ‖}^{2}}{2 σ_{f}^{2}} - \frac{{(I_{i} - I_{k})}^{2}}{2 σ_{I}^{2}})

(11)

where

‖ p_{i} - p_{k} ‖

denotes the Euclidean distance between the target point

p_{i}

and its neighbor

p_{k}

.

After computing the bilateral filtering weights, the spatial coordinates of the target point

p_{i}

are updated through weighted averaging to obtain the denoised coordinates. The updating formula is given by:

p_{i}^{'} = \frac{\sum_{k \in N (p_{i})} ω_{i k} P_{k}}{\sum_{k \in N (p_{i})} ω_{i k}}

(12)

2.3. Multi-Object Tracking Method with Roadside Unit Collaboration

The challenges of cross-domain multi-target tracking in LiDAR systems are primarily attributed to adverse weather conditions and long-range sensing. While Section 3.1 mitigated the former via an adaptive point cloud denoising approach that effectively suppresses noise by fusing local noise statistics and intensity information, the latter persists as a critical limitation. Specifically, LiDAR’s angular resolution degrades geometrically with increasing range, yielding excessively sparse point clouds for distant targets that impede precise geometric reconstruction and stable motion characterization. Moreover, the signal-to-noise ratio deteriorates significantly at extended ranges. To overcome these limitations, we propose a cooperative multi-target tracking framework leveraging multiple RSUs.

Each RSU integrates three functional modules: perception, cooperative tracking, and communication. Within this cross-domain cooperative architecture, RSUs exchange local measurement data with adjacent units. The received measurements are subsequently fused with local observations to generate a unified observation set. A Particle PHD filter is then deployed locally at each RSU to execute multi-target tracking. Each particle is assigned a unique identifier (UID), enabling discrete target identity management. A predicted particle set is generated based on a predefined state transition model to estimate prospective target states. Particle weights are subsequently updated via the observation likelihood function, thereby ensuring accurate representation of the correspondence between particles and actual measurements.

2.3.1. Measurement Fusion

In the proposed tracking framework, target measurements are provided by roadside LiDAR units. Thus, the coordinate system of the LiDAR group is adopted as the reference, and conventional spatial synchronization methods are employed to perform spatial registration among multiple LiDAR devices, thereby ensuring the continuity of target measurement coordinates.

To facilitate subsequent particle filtering, the point cloud distribution is represented using a particle model during measurement fusion (i.e., particle measurement), thereby minimizing additional computational overhead in the tracking process.

P = {(x_{i}, ω_{i})}_{i = 1}^{N}, x_{i} \in R^{3}, ω_{i} \in [0, 1]

(13)

where

x_{i}

denotes the particle position and

ω_{i}

denotes the weight.

Let the measurement particle set from a neighboring RSU be denoted as

P_{A} = {(x_{A_{i}}, ω_{A_{i}})}

, and the local particle set as

P_{B} = {(x_{B_{j}}, ω_{B_{j}})}

. The fused particle set is then obtained as:

P^{'} = {(x_{k}, ω_{k})}, ω_{k} = \frac{ω_{A_{i}} \cdot ω_{B_{j}}}{\sum_{i, j} ω_{A_{i}} ω_{B_{j}}}

(14)

Remark 3.

This fusion mechanism overcomes the limitations of simple weighted averaging in conventional multi-sensor systems by formulating particle weights as a normalized product of probabilities from collaborating RSUs. This probabilistic integration enhances measurement consistency while maintaining computational efficiency in the particle filtering framework.

Given that the coordinate systems of the two RSUs have been aligned, the fused particle position

x_{k}

can be directly computed via a weighted average:

x_{k} = α_{A} \sum_{i = 1}^{N_{A}} x_{A_{i}} \cdot ω_{A_{i}} + α_{B} \sum_{j = 1}^{N_{B}} x_{B_{i}} \cdot ω_{B_{i}}

(15)

Here,

α_{A}

and

α_{B}

represent the confidence levels of the two RSUs, both set to 0.5 in this context.

Subsequently, the effective sample size

N_{e f f}

is calculated to determine the necessity of resampling:

N_{e f f} = \frac{1}{\sum_{i = 1}^{N} {(ω_{i})}^{2}}

(16)

If

N_{e f f} < 2 / N

, resampling is performed, generating a new particle set

P_{n} = {(x_{i}^{n e w}, 1 / N)}_{i = 1}^{N}

.

Remark 4.

This resampling criterion introduces an adaptive threshold based on effective sample size, overcoming the limitations of conventional fixed-interval resampling.

An experimental evaluation of the measurement fusion process was conducted in a road scenario equipped with two RSUs. The comparative results, illustrated in Figure 5, demonstrate that the fused point clouds exhibit sharper contours and significantly increased density for multiple vehicle targets compared to the unfused local measurements (Figure 5a). This improvement (Figure 5b) provides a richer point cloud for subsequent target state estimation.

2.3.2. State Estimation

To address the issue of discontinuity and lack of ordering in the multi-target state sets generated by Random Finite Set (RFS) filtering, which hinders the formation of continuous target trajectories, this paper introduces a particle labeling strategy applied to the particle set.

The core concept of the particle labeling method is as follows: during each iteration of the particle PHD filter, particles are categorized within the spatial domain, and particles belonging to the same category are assigned an identical label. During the resampling process, offspring particles inherit the label from their parent particles. Following resampling, particles are clustered again. Within each cluster, the predominant label—shared by the majority of particles—is used to associate the cluster with its corresponding cluster from the previous time step. Ultimately, by linking targets that share the same label across iterations, a complete trajectory for each target is constructed. The specific procedural steps of the particle labeling method are described as follows (for k ≥ 2):

(1): Prediction

Based on the target motion model, the current particle set is propagated to obtain the predicted particle set, incorporating process noise to enhance particle diversity. The stateparticle is expressed as:

x_{k}^{(i)} = f (x_{k - 1}^{(i)}) + v_{k - 1}^{(i)}

(17)

where

f (\cdot)

denotes the state transition function (e.g., Constant Velocity (CV) or Coordinated Turn (CT) models), and

v_{k - 1}^{(i)}

represents the process noise.

Given the updated intensity

v_{k - 1} (x)

at time

k - 1

, the predicted intensity

v_{k | k - 1} (x)

at time kk is formulated as:

v_{k | k - 1} (x) = \sum_{i = 1}^{L_{k - 1}} ω_{k - 1}^{(i)} δ (x - x_{k - 1}^{(i)})

(18)

Furthermore, considering both surviving and newborn targets, the overall predicted intensity becomes:

v_{k | k - 1} (x) = \sum_{i = 1}^{L_{k - 1}} ω_{P, k | k - 1}^{(i)} δ (x - x_{P, k | k - 1}^{(i)}) + \sum_{i = 1}^{L_{γ, k}} ω_{γ, k}^{(i)} δ (x - x_{γ, k}^{(i)})

(19)

with the corresponding components defined by:

x_{P, k | k - 1}^{(i)} ~ q_{k} (\cdot | x_{k - 1}^{(i)}, Z_{k}), i = 1, 2, \dots, L_{k - 1}

(20)

ω_{P, k | k - 1}^{(i)} = \frac{ω_{k - 1}^{(i)} P_{S, k} (x_{k - 1}^{(i)}) f_{k | k - 1} (x_{P, k | k - 1}^{(i)} | x_{k - 1}^{(i)})}{q_{k} (x_{P, k | k - 1}^{(i)} | x_{k - 1}^{(i)}, Z_{k})}

(21)

x_{γ, k}^{(i)} ~ b_{k} (\cdot | Z_{k}), i = 1, 2, \dots, L_{γ, k}

(22)

ω_{γ, k | k - 1}^{(i)} = \frac{1}{L_{γ, k}} \frac{γ_{k} (x_{γ, k}^{(i)})}{b_{k} (x_{γ, k}^{(i)} | Z_{k})}

(23)

Here,

q_{k} (\cdot | x_{k - 1}^{(i)}, Z_{k})

is the importance density for surviving targets, and

b_{k} (\cdot | Z_{k})

is the importance density function for newborn targets.

Concurrently, for

i = 1, 2, \dots, M

, each particle is assigned a label:

L_{k}^{P} ({\tilde{x}}_{k}^{(i)}) = L_{k}^{N E W}

(24)

(2): Update

Assuming the predicted intensity at time

k

is

v_{k | k - 1} (\cdot)

, expressed as:

v_{k | k - 1} (x) = \sum_{i = 1}^{L_{k | k - 1}} ω_{k | k - 1}^{(i)} δ (x - x_{k | k - 1}^{(i)})

(25)

then the updated intensity

v_{k} (x)

at time kk is given by:

v_{k} (x) = \sum_{i = 1}^{L_{k | k - 1}} ω_{k}^{(i)} δ (x - x_{k}^{(i)})

(26)

The weight update formula is:

ω_{k}^{(i)} = [1 - p_{D, k} (x_{k}^{(i)}) + \sum_{z \in Z_{k}} \frac{ψ_{k, z} (x_{k}^{(i)})}{κ_{k} (z) + \sum_{i = 1}^{L_{k | k - 1}} ψ_{k, z} (x_{k}^{(i)}) ω_{k | k - 1}^{(i)}}] ω_{k | k - 1}^{(i)}

(27)

ψ_{k, z} (x_{k}^{(i)}) = p_{D, k} (x_{k}^{(i)}) g_{k} (z | x_{k}^{(i)})

(28)

(3): Resample

Let

L_{k} = N \times {\hat{N}}_{k}

, where

N

is the number of particles allocated per target and

{\hat{N}}_{k}

is the estimated number of targets at the current time. The updated particle set

{x_{k}^{(i)}, ω_{k}^{(i)}}_{i = 1}^{L_{k | k - 1}}

is resampled. During resampling, particles are selected with probabilities proportional to their weights

ω_{k}^{(i)}

, yielding a new particle set

{x_{k}^{(i)}, {\hat{N}}_{k} ∕ L_{k}}_{i = 1}^{L_{k | k - 1}}

, where

{\hat{N}}_{k} ∕ L_{k}

is the new weight for each resampled particle.

For the new particles after resampling, labels are inherited from their parent particles. Specifically, if

x_{k}^{(j)} \in c h i l d ({\tilde{x}}_{k}^{(i)})

, then the label is assigned as

L_{k}^{R} (x_{k}^{(j)}) = L_{k}^{U} ({\tilde{x}}_{k - 1}^{(i)})

. This process ensures temporal continuity of labels, thereby maintaining target tracks. The classification after resampling is determined by

{P_{k}^{(R, 1)}, P_{k}^{(R, 2)}, \dots, P_{k}^{(R, {\hat{T}}_{k - 1} + 1)}}

.

(4): Target Number and State Estimation

The target number estimate in the SMC-PHD filter is given by:

{\hat{N}}_{k} = i n t (\sum_{i = 1}^{L_{k | k - 1}} ω_{k}^{(i)})

(29)

Target states and covariances are determined via k-means clustering applied to the set of weighted particle states and their associated covariances:

{({\bar{x}}_{k}^{(1)}, S_{k}^{(1)}), \dots, ({\bar{x}}_{k}^{({\hat{T}}_{k})}, S_{k}^{({\hat{T}}_{k})})}

(30)

During clustering, if two state estimates

{\bar{x}}_{k}^{(i)}

and

{\bar{x}}_{k}^{(j)}

satisfy the following condition, they are considered too close and potentially belong to the same target, prompting re-clustering based on velocity information:

e x p {- \frac{1}{2} {(H {\bar{x}}_{k}^{(i)} - H {\bar{x}}_{k}^{(j)})}^{T} (H^{T} S_{k}^{(i)} H) (H {\bar{x}}_{k}^{(i)} - H {\bar{x}}_{k}^{(j)})} < γ

(31)

where

H

is the coefficient matrix from the second-order Taylor expansion of the measurement function

h (\cdot)

, and

γ

is a predefined threshold.

Finally, labels

{L_{k}^{(1)}, L_{k}^{(2)}, \dots, L_{k}^{({\hat{T}}_{k})}}

are assigned to the clusters, resulting in the estimated classification

{P_{k}^{(1)}, P_{k}^{(2)}, \dots, P_{k}^{({\hat{T}}_{k})}}

. For each cluster, the weighted mean is computed as the target state estimate:

{\hat{x}}_{j} = \frac{\sum_{i \in C_{j}} ω_{k}^{(i)} x_{k}^{(i)}}{\sum_{i \in C_{j}} ω_{k}^{(i)}}

(32)

2.3.3. Trajectory Extraction

At this stage, two cluster sets are obtained:

The combined set from the previous time step’s clusters and the newborn target particle cluster:

{P_{k}^{(R, 1)}, P_{k}^{(R, 2)}, \dots, P_{k}^{(R, {\hat{T}}_{k - 1} + 1)}}

. The cluster set at the current time k:

{P_{k}^{(1)}, P_{k}^{(2)}, \dots, P_{k}^{({\hat{T}}_{k})}}

.

To characterize the particle associations between clusters, two key matrices A and B are defined:

Matrix

A_{g, h}

captures the particle overlap between current resampled clusters and previous time step clusters:

A_{g, h} = # {i : x_{k}^{(i)} \in p_{k}^{(R, g)} \cap p_{k}^{(h)}}

(33)

It counts the number of particles in the current resampled cluster

p_{k}^{(R, g)}

that also belonged to the previous cluster

p_{k}^{(h)}

.

Matrix

B_{g, h}

reflects the distribution of offspring particles after resampling:

B_{g, h} = # {i : c h i l d (x_{k}^{(i)}) \in p_{k}^{(R, g)} \cap p_{k}^{(h)}}

(34)

It counts the number of particles in

p_{k}^{(R, g)}

whose parent particles came from the previous cluster

p_{k}^{(h)}

.

Using these matrices, trajectory extraction is performed as follows:

(1): Surviving Target Identification

Ideal Threshold: Under accurate clustering, the number of particles corresponding to each target should satisfy:

\sum_{g = 1}^{{\hat{T}}_{k - 1}} A_{g, h} \approx N

(35)

If

A_{g, h} \approx N

, it indicates that target

g

from the previous time step likely remains alive.

Threshold-based Judgment: A threshold

ϵ_{1}

is set. If for a previous target

g

,

\sum_{h = 1}^{{\hat{T}}_{k}} A_{g, h} = ϵ_{1} N

(36)

then target

h

is considered to have disappeared.

(2): Newborn Target Identification

A threshold

ϵ_{2}

is defined. If the number of newborn target particles within any current cluster exceeds

ϵ_{2} N

, a new target is declared.

(3): Spawned Target Handling

If the particles from a previous target

g

split into multiple clusters after resampling (e.g., due to target spawning), the corresponding elements in matrix A might exhibit similar particle counts for these clusters. Matrix B is utilized for further discrimination: offspring particles of a surviving target should predominantly originate from itself (indicated by a larger

B_{g, h}

), whereas offspring particles associated with a spawned target are likely to be fewer.

In summary, by applying Equations (33), (35) and (36), the target state estimates and their associated trajectories are obtained.

Following the detailed exposition of the core principles and sequential phases—Prediction, Update, Resample, and Trajectory Extraction—the particle-labeled SMC-PHD tracking algorithm is concisely summarized in the Algorithm 1. This formulation encapsulates the key procedures and data flow, providing a clear blueprint for implementation.

Algorithm 1 Particle-Labeled PHD Filter

1: procedure MAIN (InitialParticleSet, MeasurementSequence)
2: for k = 2 to K do
3: PREDICTION(k)
4: UPDATE(k)
5: RESAMPLE(k)
6: ESTIMATE_TARGETS(k)
7: EXTRACT_TRAJECTORIES(k)
8: end for
9: end procedure
10: procedure PREDICTION(k)
11: for

i = 1 to L_{k - 1}

do
12:

x_{k}^{(i)} = f (x_{k - 1}^{(i)}) + v_{k - 1}^{(i)}

13:

ω_{P, k | k - 1}^{(i)}

= (ω_{k - 1}^{(i)}

\times

P_{S, k}

\times

f_{k | k - 1}

) / q_{k}

14:

x_{P, k | k - 1}^{(i)}

\sim q_{k} (\cdot | x_{k - 1}^{(i)}, Z_{k})

15: end for
16:

for i = 1 to L_{γ, k}

do
17:

x_{γ, k}^{(i)}

\sim b_{k} (\cdot | Z_{k})

18:

ω_{γ, k | k - 1}^{(i)}

= (1 / γ_{k}

) \times γ_{k} / b_{k}

19:

L_{k}^{P} ({\tilde{x}}_{k}^{(i)})

= L_{k}^{N E W}

20: end for
21: Combine predicted particle sets
22: end procedure
23: procedure UPDATE(k)
24:

for i = 1 to L_{k | k - 1}

do
25:

Calculate ψ_{k, z} (x_{k}^{(i)})

= p_{D, k}

\times

g_{k}

26:

Update weight ω_{k}^{(i)}

= [1 - p_{D, k}

+ \sum_{z \in Z_{k}} ψ_{k, z} / (κ_{k} + \sum_{i = 1}^{L_{k | k - 1}} ψ_{k, z} \times ω_{k | k - 1}^{(i)})]

27:

ψ_{k, z} (x_{k}^{(i)})

= p_{D, k} \times g_{k}

28: end for
29: end procedure
30: procedure RESAMPLE(k)
31:

L_{k}

= N

\times

round (\sum_{i = 1}^{L_{k | k - 1}} ω_{k}^{(i)})

32:

Resample L_{k}

particles with probability \propto ω_{k}^{(i)}

33:

Assign new weights ω_{k}^{(i)}

= {\hat{N}}_{k} / L_{k}

34:

for each resampled particle x_{k}^{(j)}

do
35:

Inherit label : L_{k}^{R} (x_{k}^{(j)})

= L_{k}^{U}

(parent particle)
36: end for
37:

Cluster resampled particles into P_{k}^{(R, g)}

38: end procedure
39: procedure ESTIMATE_TARGETS(k)
40: Perform k-means clustering on weighted particles
41:

for each cluster C_{j}

do
42: if cluster states too close then re-cluster by velocity
43:

Compute estimated state {\hat{x}}_{j}

= \sum_{i \in C_{j}} ω x / \sum_{i \in C_{j}} ω

44:

Assign label L_{k}^{(j)}

to cluster C_{j}

45: end for
46: Output clusters [10]
47: end procedure
48: procedure EXTRACT_TRAJECTORIES(k)
49: Compute matrices A and B using Equations (33) and (34)
50: for each previous cluster g do
51:

if \sum_{g = 1}^{{\hat{T}}_{k - 1}} A_{g, h} \approx N

then target survives
52:

if \sum_{h = 1}^{{\hat{T}}_{k}} A_{g, h} = ϵ_{1} N

then target disappears
53: end for
54: for each current cluster h do
55:

if newborn particles > ϵ_{2} N

then new target declared
56: end for
57: Use matrix B to handle target spawning cases
58: Link targets with same labels across time steps
59: end procedure

3. Results

This section further developed a roadside collaborative multi-target tracking system suitable for Foggy conditions, constructed an intelligent roadside hardware platform, and verified the effectiveness of the system in noise suppression and tracking continuity.

3.1. Experimental Platfrom

To validate the denoising efficacy and tracking performance of the proposed intelligent roadside multi-target tracking system under real-world Foggy conditions, field experiments were conducted in actual traffic scenarios using the Intelligent roadside platform (depicted in Figure 7). Each RSU integrated Lidar, Camera, computational module, GNSS/RTK and communication units. Each LiDAR timestamp served as the benchmark with a 10 Hz sampling rate. Based on the PTP method, the time information of the satellite atomic clock is received through GPS and transmitted to the time synchronization box to complete the timing work of the PTP master clock and achieve time synchronization of various sensors. The computational module leveraged the NVIDIA Jetson AGX Orin Developer Kit as the core processing unit, which incorporates a high-performance, power-efficient processor capable of real-time execution of computationally intensive algorithms. Regarding software configuration, the platform runs on Ubuntu 20.04 and utilizes the Robot Operating System (ROS1) as the software development framework for programming and system integration.

The experimental testbed is illustrated in Figure 6, comprising four RSUs deployed at the four corners of the road segment. The baseline separation between RSU1 and RSU2 is 90 m, while the distance between RSU1 and RSU4 is 150 m. The intelligent roadside perception platform is depicted in Figure 7, integrating an 80-channel LiDAR, a 32-channel blind-spot LiDAR, and IP cameras. All sensors are mounted at a uniform height of 6 m above the roadway centerline. The blind-spot and primary roadside LiDARs are vertically collocated to enable streamlined parameter calibration and point cloud fusion. Furthermore, the sensors are configured within a unified IP subnet and interfaced with edge computing units via Gigabit Ethernet for real-time data processing. Inter-RSU coordination is achieved through V2X communication. This experimental configuration was employed to validate the feasibility of the proposed tracking framework.

3.1.1. Point Cloud Denoising

To validate the effectiveness of the proposed noise reduction method, point cloud data collected under three distinct fog concentrations during roadside platform experiments were employed. Following meteorological standards [36], light fog (visibility: 1000–10,000 m), moderate fog (500–1000 m), and heavy fog (<500 m) were systematically evaluated. The meteorological parameters acquired on the day of the experiment are presented in Table 2. The experimental scene was shown in Figure 8.

A comparative evaluation was conducted between the proposed method and statistical filtering for point cloud denoising using 120 frames of data. As shown in Figure 9, the raw fog-affected point cloud (a) contains significant noise-induced artifacts. While basic preprocessing (b) fails to eliminate false targets (with only ID 1, 2, 5, 7, 9, 11 being valid objects), statistical filtering (c) shows limited improvement. The proposed method (d) demonstrates superior denoising performance, effectively suppressing false targets while preserving true detections.

It is observed from Figure 9b that IDs 1, 2, 5, 7, 9 and 11 are valid targets. The proposed denoising method successfully eliminates all false detections, whereas the statistical filtering approach, despite removing some false targets, fails to eliminate others (e.g., Targets 3, 8, and 10). This demonstrates a clear improvement in denoising effectiveness achieved by the proposed method.

For quantitative evaluation of the denoising performance, the total number of targets is defined as all objects (including both valid and false detections) present in the road area per frame. Let

N_{e}

represent the number of false detections and

N

the total number of targets. The target detection accuracy

n

is defined as:

n = \frac{N - N_{e}}{N} \times 100 %

(37)

We selected 120 frames of images, and the final statistical results are shown Table 3 and Table 4.

Compared with the statistical filtering method, the target detection accuracy after denoising by the method proposed in this paper has increased by 8%.

Subsequently, experiments were conducted using the same method under heavy fog conditions, with the corresponding experimental results presented as follows.

It is observed from the experimental results in Table 4 that the detection accuracy rate of the proposed method is 29% higher than that of statistical filtering under heavy fog conditions. This can be explained by the fact that the statistical filtering denoising method filters out outliers by analyzing the distance distribution characteristics in the neighborhood of point clouds and dynamically setting the standard deviation threshold. However, as the fog concentration increases, valid points in low-density point cloud areas tend to be over-filtered. In contrast, the proposed method benefits from the introduction of intensity features by the bilateral filter, enabling better preservation of details in low-density point cloud areas and distinguishing fog noise point clouds in high-noise areas.

3.1.2. Multi-Target Tracking Experiments and Analysis

To evaluate the performance of the proposed 3D multi-object tracking algorithm, we conduct systematic experimental validations on the dataset, which is collected by the above vehicle road collaboration platform. The dataset was decomposed into straight and curved sections, and experiments were conducted based on different road conditions, with experimental speeds ranging from 10 km/h to 80 km/h, to verify the effectiveness of the proposed tracking method.

A rigorously validated multi-dimensional evaluation framework is employed to holistically quantify the precision and robustness of tracking algorithms. Central to this methodology is the Higher-order Tracking Accuracy (HOTA) [37], which delivers a balanced performance characterization under complex operational conditions by jointly optimizing detection, association, and localization fidelity. As a supplement to the core indicators, auxiliary verification is carried out through several other key indicators: Multi target Tracking Accuracy (MOTA), Multi target Tracking Precision (MOTP), Identity Consistency (IDF1), Trajectory Integrity (MT), Trajectory Loss (ML), and Identity Switching Rate (IDs). This dual index method ensures comprehensive evaluation while reducing the scenario specific bias inherent in single index evaluation.

Given that existing research typically focuses on optimizing specific road scenarios, we conducted a detailed comparison between our method and the current state-of-the-art and most classic solutions for both straight and curved categories. The results indicate that this method outperforms existing technologies in multiple key indicators. The detailed evaluation results are shown in Table 5 and Table 6. The up arrow (↑) indicates that the higher the value, the better, and the down arrow (↓) indicates that the lower the value, the better.

Straight Scenario

We selected one of the experimental segments for display. The experimental results at different time instants of the JPDA-based method and the proposed method are illustrated in Figure 10 and Figure 11, respectively. Point clouds were fused from detections by RSU2 and RSU3, each with a LiDAR perception range of (0, 50 m). The vehicle traveled east to west at 10 km/h. The yellow line in Figures indicate the transition zone between RSU2 and RSU3, spanning approximately 20 m. The green box represents the target tracking box.

As shown in Figure 10a–d, the ID of the experimental vehicle changes or disappears intermittently, and the pedestrian target ID also fluctuates. This indicates that targets are not consistently tracked within the transition zone, demonstrating the poor stability of the JPDA-based tracking method during cross-domain tracking in straight-road scenarios.

In contrast, Figure 11a–d show that both vehicle and pedestrian targets maintain consistent IDs within the transition zone, confirming the proposed method’s capability for continuous cross-domain tracking. This demonstrates the superior performance of the proposed method over JPDA in straight-road scenarios.

As shown in the comprehensive comparison in Table 5, our method achieved highly competitive results on the test set. This method demonstrates significant advantages in multiple key indicators, particularly in HOTA, MOTA, IDF1, and IDs indicators. This outstanding performance is mainly attributed to the synergistic effect of multiple RSUs. Multi-object state estimation based on particle probability assumption density filtering framework, using particle identification to associate target states and accurately distinguish occluded objects from exiting objects. Ultimately, while reducing the IDSW index, the HOTA index was significantly improved.

Curved Scenario

Similarly, a representative sequence is selected to demonstrate the superior multi-object tracking performance of the proposed method, Figure 12 depicts the inter-RSU handoff tracking performance of the JPDA method. In this curved-road scenario, the test vehicle traversed the bend at a constant velocity of 20 km/h, while the pedestrian exited the curvature at a walking speed of 4 km/h. The solid yellow line denotes the curved segment, which corresponds to the transition zone.

Similarly, in Figure 12a–f, both the vehicle and pedestrian undergo ID switches within the transition zone, indicating the JPDA method’s failure to maintain consistent tracking through the curved section.

In Figure 13a, ID 3 corresponds to passenger vehicles, and ID 7 and ID 12 represent two pedestrians, respectively.

In contrast, Figure 12a–f demonstrate the operational mechanism of the proposed method: RSU4 continuously broadcasts the vehicle’s measurement data to neighboring units. When the target enters the detection range of RSU2, the received measurements are correlated with local observations. Upon successful association, the target state estimate is updated using the new measurements, achieving seamless cross-RSU cooperative tracking. The results confirm that the proposed tracking method maintains stable and continuous tracking throughout the entire curve negotiation process.

In Figure 13a–f, both vehicle and pedestrian targets maintain consistent IDs within the transition zone, demonstrating continuous tracking capability throughout the curved section. These results validate the superior performance of the proposed multi-target tracking method over JPDA in curved road scenarios.

The category of curves has characteristics such as complex motion models, frequent occlusions, and limited sensor perspectives, which make it a major challenge to distinguish occluded targets when exiting objects. As shown in Table 6, for the curve tracking tasks, the proposed method achieves first place in HOTA, MOTA, MOTP, MT, and IDs metrics, and secures second place in IDF1 and ML metrics. These results strongly demonstrate the effectiveness of this method in tracking nonlinear, small-scale, and easily occluded targets in curved scenes.

3.2. Computational Performance and Analysis

To verify the timeliness of the multi-target tracking method proposed in this article, we conducted experiments in actual scenarios. Specifically, we deployed the intelligent roadside experimental platform in a real traffic environment, ran the RSU collaborative multi-target tracking algorithm proposed in this paper, performed 500 cross domain multi-target tracking tasks, and accurately recorded the average processing time of each tracking task. The experimental results are shown in Figure 14. Considering that the operating frequency of the laser radar used is 10 Hz and its sampling period is 100 ms, the average operating time of the system proposed in this paper is only 57.33 ms, which is significantly shorter than the sampling period of the laser radar. This fully demonstrates that the method proposed in this article can meet the strict real-time requirements of intelligent roadside systems for multi-target tracking while ensuring processing accuracy, ensuring real-time perception of environmental changes by roadside units and providing a reliable basis for subsequent decision-making and control.

4. Discussion

While the proposed adaptive point cloud denoising and multi-RSU cooperative tracking framework has demonstrated substantial improvements in noise suppression, trajectory continuity, and inter-RSU label consistency under Foggy conditions, several limitations remain.

First, the dynamic parameter tuning within the multi-constraint filtering model relies on local noise statistics, which exhibits limited capability in capturing the inherent non-uniform noise characteristics of fog, particularly within transition zones where fog density varies. Consequently, the real-time efficacy and accuracy of Gaussian kernel scaling may be constrained by simplifying assumptions, potentially resulting in residual noise or over-smoothing of target details. Future work could investigate a self-supervised learning framework grounded in explicit fog noise modeling, leveraging deep neural networks to autonomously extract local topological features and implicit noise distribution patterns directly from fog-corrupted point clouds.

Second, although the identifier mechanism based on measurement fusion and particle filtering enhances the robustness of cross-RSU target association, challenges persist under extreme conditions, such as severely degraded visibility or extended occlusions, where observation gaps may still induce trajectory fragmentation. Furthermore, the computational overhead and latency of the current approach in dense-traffic scenarios require optimization for scalable deployment. A promising avenue involves modeling the spatio-temporal topology across RSUs using Graph Neural Networks (GNNs), thereby exploiting perceptual context to reinforce target identity reasoning within the sensor network. Additionally, future efforts should focus on developing lightweight particle filter variants or harnessing edge computing acceleration techniques to meet the stringent real-time constraints of large-scale intelligent transportation system deployments.

5. Conclusions

As a critical component of ITS, RSUs can provide continuous and wide-area environmental information for vehicles through multi-node collaborative perception. However, their perceptual performance in adverse weather conditions such as fog is often compromised by low visibility and sensor noise interference, leading to degraded tracking accuracy and ineffective cross-domain data fusion. To address challenges including limited sensing range, low tracking efficiency, and insufficient robustness in multi-target tracking under Foggy conditions, this paper proposes a method integrating adaptive point cloud denoising and multi-RSU collaboration. First, localized noise modeling is employed to dynamically adjust filtering parameters, and a multi-constraint filtering model is constructed by integrating point cloud spatial distribution, intensity gradient, and edge features. This enables rain-fog noise to be effectively suppressed while target details are preserved, thereby reducing the false detection rate. Second, to overcome limitations such as low tracking accuracy and a restricted perception range in roadside sensing systems under foggy conditions, cross-domain fusion of heterogeneous measurements is utilized for long-range perception. This is followed by the incorporation of an identifier inheritance mechanism from particle filtering and cross-unit state transfer, ensuring continuous target trajectory tracking and significantly enhancing cross-domain tracking robustness in complex scenarios.

A multi-target tracking system for intelligent RSUs in Foggy scenarios was designed and implemented. Extensive experiments were conducted using an intelligent roadside platform in real-world fog-affected traffic environments to validate the accuracy and real-time performance of the proposed algorithm. Experimental results demonstrate that the proposed method improves the target detection accuracy by 8% and 29%, respectively compared to statistical filtering methods after removing fog noise under thin and thick fog conditions. At the same time, this method performs well in tracking multi-class targets, surpassing existing state-of-the-art methods, especially in high-order evaluation indicators such as HOTA, MOTA, and IDs.

Nevertheless, the proposed adaptive point cloud denoising and multi-RSU cooperative tracking framework still exhibits certain limitations. For instance, dynamic parameter tuning based on local noise statistics may inadequately capture fog’s inherent non-uniformity—particularly in density transition zones—potentially constraining Gaussian kernel scaling accuracy and causing residual noise or target detail over-smoothing. Additionally, while the measurement fusion and particle filtering identifier enhances cross-RSU association robustness, trajectory fragmentation may persist under extreme visibility degradation or prolonged occlusion. To maintain lightweight design and align with practical deployment constraints, we do not currently implement advanced noise modeling or optimize computational overhead for dense traffic. Future work will focus on enhancing low-quality data processing and association resilience: specifically, developing self-supervised fog noise modeling to refine point cloud denoising, and leveraging graph neural networks to exploit spatio-temporal topology for robust identity reasoning under complex environmental conditions.

Author Contributions

Conceptualization, T.S. and M.C.; Methodology, T.S., X.W., W.J. and M.C.; Software, X.W., T.S., W.J., S.C. and H.Z.; Validation, T.S., X.W., W.J. and M.C.; Formal analysis, T.S., X.W., W.J. and M.C.; Investigation, X.W., W.J., X.H., S.C. and H.Z.; Resources, M.C.; Data curation, X.W. and X.H.; Writing—original draft preparation, T.S.; Writing—review and editing, T.S., X.W. and X.H.; Visualization, T.S., X.W., X.H., S.C. and H.Z.; Supervision, M.C.; Project administration, M.C. and W.J.; Funding acquisition, W.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Intelligent Connected Vehicle Inspection Center (Hunan) of CAERI Co., Ltd.: KY24001 and Intelligent Connected Vehicle Inspection Center (Hunan) of CAERI Co., Ltd.: KY25003.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to legal and privacy restrictions.

Conflicts of Interest

Author Tao Shi were employed by State Key Laboratory of Intelligent Vehicle Safety Technology, Chongqing 400023, China. Authors Tao Shi, Xuan Wang, Wei Jiang, Xiansheng Huang, Shuai Cao, Hao Zhou were employed by the company Intelligent Connected Vehicle Inspection Center (Hunan) of CAERI Co., Ltd. Author Xuan Wang was employed by the company Jiangsu CAERI Automotive Engineering Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. This research was funded by Intelligent Connected Vehicle Inspection Center (Hunan) of CAERI Co., Ltd grant number KY24001 and Intelligent Connected Vehicle Inspection Center (Hunan) of CAERI Co., Ltd grant number KY25003. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Song, Y. Connected Autonomous Vehicles in Cooperative Vehicle-Infrastructure System. Master’s Thesis, University of Windsor, Windsor, ON, Canada, 2024. [Google Scholar]
Yamada, S. The strategy and deployment plan for VICS. IEEE Commun. Mag. 1996, 34, 94–97. [Google Scholar] [CrossRef]
Ko, B.; Liu, K.; Son, S.H.; Park, K.-J. RSU-Assisted Adaptive Scheduling for Vehicle-to-Vehicle Data Sharing in Bidirectional Road Scenarios. IEEE Trans. Intell. Transp. Syst. 2021, 22, 977–989. [Google Scholar] [CrossRef]
Ogunrinde, I.; Bernadin, S. Deep Camera-Radar Fusion with an Attention Framework for Autonomous Vehicle Vision in Foggy Weather Conditions. Sensors 2023, 23, 6255. [Google Scholar] [CrossRef] [PubMed]
Qian, K.; Zhu, S.; Zhang, X.; Li, L.E. Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
Xiong, M.; Xu, X.; Yang, D.; Steinbach, E. Robust Depth Estimation in Foggy Environments Combining RGB Images and mmWave Radar. In Proceedings of the 24th IEEE International Symposium on Multimedia (IEEE ISM), Naples, Italy, 5–7 December 2022. [Google Scholar]
Li, Y.; Duthon, P.; Colomb, M.; Ibanez-Guzman, J. What happens for a ToF LiDAR in fog? IEEE Trans. Intell. Transp. Syst. 2020, 22, 6670–6681. [Google Scholar] [CrossRef]
Hevisov, D.; Liemert, A.; Reitzle, D.; Kienle, A. Impact of Multi-Scattered LiDAR Returns in Fog. Sensors 2024, 24, 5121. [Google Scholar] [CrossRef]
Usmani, K.; O’Connor, T.; Wani, P.; Javidi, B. Overview of 3D object detection through fog and occlusion: Passive integral imaging vs active LiDAR sensing. In Proceedings of the Conference on Three-Dimensional Imaging, Visualization, and Display, National Harbor, MD, USA, 22–24 April 2024. [Google Scholar]
Singh, A.K. Major Development Under Gaussian Filtering Since Unscented Kalman Filter. IEEE-Caa J. Autom. Sin. 2020, 7, 1308–1325. [Google Scholar] [CrossRef]
Hu, J.; Fu, Y.; Kang, J.; Zhong, Q.; Wang, X. An improved wavelet domain mean filtering algorithm. Sci. Surv. Mapp. 2021, 46, 55–60. [Google Scholar]
Tsurkan, O.; Kupchuk, L.; Polievoda, Y.; Wozniak, O.; Hontaruk, Y.; Prysiazhniuk, Y. Digital processing of one-dimensional signals based on the median filtering algorithm. Prz. Elektrotechniczny 2022, 98, 51–56. [Google Scholar] [CrossRef]
Shelke, S.K.; Sinha, S.K.; Patel, G.S. Study of Improved Median Filtering using adaptive window architecture. In Proceedings of the 11th International Conference of Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. IEEE PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet plus plus: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Guo, G.; Zhao, S. 3D Multi-Object Tracking with Adaptive Cubature Kalman Filter for Autonomous Driving. IEEE Trans. Intell. Veh. 2023, 8, 512–519. [Google Scholar] [CrossRef]
Arulampalam, M.; Maskell, S.; Gordon, N.; Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
Da, K.; Li, T.; Zhu, Y.; Fan, H.; Fu, Q. Recent advances in multisensor multitarget tracking using random finite set. Front. Inf. Technol. Electron. Eng. 2021, 22, 5–24. [Google Scholar] [CrossRef]
Yu, K.; Ji, L.; Zhang, X. Kernel nearest-neighbor algorithm. Neural Process. Lett. 2002, 15, 147–156. [Google Scholar] [CrossRef]
Bu, L.; Rao, B.; Song, D. A group target track-before-detect approach using two-stage strategy with maximum-likelihood probabilistic data association. IET Radar Sonar Navig. 2024, 18, 1351–1363. [Google Scholar] [CrossRef]
Svensson, L.; Svensson, D.; Guerriero, M.; Willett, P. Set JPDA Filter for Multitarget Tracking. IEEE Trans. Signal Process. 2011, 59, 4677–4691. [Google Scholar] [CrossRef]
Yang, H.; Sun, Z.; Qi, G.; Li, Y.; Sheng, A. A multi-group target tracking algorithm based on probabilistic multi-hypothesis tracking. In Proceedings of the 43rd Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024. [Google Scholar]
Zeng, F.; Dong, B.; Zhang, Y.; Wang, T.; Zhang, X.; Wei, Y. MOTR: End-to-End Multiple-Object Tracking with Transformer. In Proceedings of the 17th European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. Transtrack: Multiple object tracking with transformer. arXiv 2020, arXiv:2012.15460. [Google Scholar]
Cao, X.; Zhu, C.; Yi, W. PHD Filter Based Traffic Target Tracking Framework with FMCW Radar. In Proceedings of the 11th International Conference on Control, Automation and Information Sciences (ICCAIS), Hanoi, Vietnam, 21–24 November 2022. [Google Scholar]
Zhang, M.; Zhao, Y.; Niu, B. Trajectory PHD and CPHD Filters for the Pulse Doppler Radar. Remote Sens. 2024, 16, 4671. [Google Scholar] [CrossRef]
Wei, S.; Zhang, B.; Yi, W. Trajectory PHD and CPHD Filters with Unknown Detection Profile. IEEE Trans. Veh. Technol. 2022, 71, 8042–8058. [Google Scholar] [CrossRef]
Vo, B.-N.; Ma, W.-K. The Gaussian mixture probability hypothesis density filter. IEEE Trans. Signal Process. 2006, 54, 4091–4104. [Google Scholar] [CrossRef]
Yang, Z.; Li, X.; Yao, X.; Sun, J.; Shan, T. Gaussian Process Gaussian Mixture PHD Filter for 3D Multiple Extended Target Tracking. Remote Sens. 2023, 15, 3224. [Google Scholar] [CrossRef]
Zhang, X.; Xi, L.; Chen, H.; Zhang, W.; Fan, Y.; Li, T.; Liu, J. Multi-objective Sensor Management Method Based on Twin Delayed Deep Deterministic policy gradient algorithm. In Proceedings of the 43rd Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024. [Google Scholar]
Marukatat, S. Tutorial on PCA and approximate PCA and approximate kernel PCA. Artif. Intell. Rev. 2023, 56, 5445–5477. [Google Scholar] [CrossRef]
Lv, H.; Shan, P.; Shi, H.; Zhao, L. An adaptive bilateral filtering method based on improved convolution kernel used for infrared image enhancement. Signal Image Video Process. 2022, 16, 2231–2237. [Google Scholar] [CrossRef]
WMO. WMO Guide to Meteorological Instruments and Methods of Observation; WMO: Geneva, Switzerland, 2008. [Google Scholar]
Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. HOTA: A Higher Order Metric for Evaluating Multi-object Tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef]
Zhang, K.; Liu, Y.; Mei, F.; Jin, J.; Wang, Y. Boost Correlation Features with 3D-MiIoU-Based Camera-LiDAR Fusion for MODT in Autonomous Driving. Remote. Sens. 2023, 15, 874. [Google Scholar] [CrossRef]
Miah, M.; Bilodeau, G.A.; Saunier, N. Learning data association for multi-object tracking using only coordinates. Pattern Recognit. 2025, 160, 111169. [Google Scholar] [CrossRef]

Figure 1. Illustration of the proposed System framework.

Figure 2. Adaptive Lidar point cloud denoising method framework.

Figure 3. (a) Original point cloud. (b) Point cloud after preprocessing.

Figure 4. Point cloud voxelization.

Figure 5. (a) Original point cloud. (b) Measure the point cloud after fusion.

Figure 6. Experimental site BEV.

Figure 7. Intelligent Roadside Experimental Platform.

Figure 8. Experimental scenario.

Figure 9. Results of Experiment: (a) original point cloud; (b) unprocessed clustering results. (c) Statistical filtering results. (d) Proposed method for filtering results.

Figure 10. Tracking result of JPDA: (a) Target tracking status at 8:52:49. (b) Target tracking status at 8:52:50. (c) Target tracking status at 8:52:51. (d) Target tracking status at 8:52:52.

Figure 11. Tracking result of proposed method: (a) Target tracking status at 8:52:49. (b) Target tracking status at 8:52:50. (c) Target tracking status at 8:52:51. (d) Target tracking status at 8:52:52.

Figure 12. Tracking result of data association: (a) Target tracking status at 7:19:12. (b) Target tracking status at 7:19:13. (c) Target tracking status at 7:19:14. (d) Target tracking status at 7:19:15. (e) Target tracking status at 7:19:16. (f) Target tracking status at 7:19:17.

Figure 13. Tracking result of proposed method (a) Target tracking status at 7:19:12. (b) Target tracking status at 7:19:13. (c) Target tracking status at 7:19:14. (d) Target tracking status at 7:19:15. (e) Target tracking status at 7:19:16. (f) Target tracking status at 7:19:17.

Figure 14. System runtime.

Table 1. RS-RubyLite Performance Parameter.

Parameter	Specifications
Lines	80
Range	230 m (160 m@10% NIST)
Range Precision (Typical)	Up to ±3 cm
Frame Rate	10 Hz/20 Hz
Horizontal FOV	360°
Vertical FOV	40° (−25°~+15°)
Horizontal Resolution	[Balance] 0.2°/0.4° [High performance] 0.1°/0.2°
Vertical Resolution	Up to 0.1°

Table 2. Meteorological Parameters.

Parameter	Value
Horizontal Visibility/m	500~3000
Temperature Range/°C	6~13
Relative Humidity/%	65~76
Wind Speed/km/h	8–14

Table 3. Denoising results of actual scenes under mist.

Noise Reduction Methods	Total Target Number	Number of False Positives	Accuracy Rate (%)
Unprocessed	757	210	72%
Statistical filtering	671	124	82%
Proposed method	564	57	90%

Table 4. Denoising Results of Actual Scenes in Thick Fog.

Noise Reduction Methods	Total Target Number	Number of False Positives	Accuracy Rate (%)
Unprocessed	384	313	18%
Statistical filtering	415	249	40%
Proposed method	336	104	69%

Table 5. Evaluation result of Straight Scenario.

Methods	HOTA ↑	MOTA ↑	MOTP ↑	IDF1 ↑	MT ↑	ML ↓	IDs ↓
BcMODT [38]	72.64	83.18	85.82	62.96	44.69	12.58	89
C-TwiX [39]	78.91	88.56	85.46	56.82	49.32	14.92	241
JPDA	69.88	75.23	72.14	51.19	41.51	30.05	23
Ours	82.76	89.69	87.30	66.57	46.74	11.63	52

↑ indicate that the higher the numerical value, the better the performance. Conversely, ↓ indicate that the lower the numerical value, the better the performance.

Table 6. Evaluation result of Curved Scenario.

Methods	HOTA ↑	MOTA ↑	MOTP ↑	IDF1 ↑	MT ↑	ML ↓	IDs ↓
BcMODT [38]	54.07	55.13	75.87	65.46	35.98	22.65	167
C-TwiX [39]	58.94	57.95	73.61	72.63	33.74	21.07	239
JPDA	42.73	48.72	65.42	62.88	31.85	29.37	380
Ours	61.42	62.73	78.47	70.71	38.46	22.41	129

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, T.; Wang, X.; Jiang, W.; Huang, X.; Cen, M.; Cao, S.; Zhou, H. Multi-Target Tracking with Collaborative Roadside Units Under Foggy Conditions. Sensors 2026, 26, 998. https://doi.org/10.3390/s26030998

AMA Style

Shi T, Wang X, Jiang W, Huang X, Cen M, Cao S, Zhou H. Multi-Target Tracking with Collaborative Roadside Units Under Foggy Conditions. Sensors. 2026; 26(3):998. https://doi.org/10.3390/s26030998

Chicago/Turabian Style

Shi, Tao, Xuan Wang, Wei Jiang, Xiansheng Huang, Ming Cen, Shuai Cao, and Hao Zhou. 2026. "Multi-Target Tracking with Collaborative Roadside Units Under Foggy Conditions" Sensors 26, no. 3: 998. https://doi.org/10.3390/s26030998

APA Style

Shi, T., Wang, X., Jiang, W., Huang, X., Cen, M., Cao, S., & Zhou, H. (2026). Multi-Target Tracking with Collaborative Roadside Units Under Foggy Conditions. Sensors, 26(3), 998. https://doi.org/10.3390/s26030998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Target Tracking with Collaborative Roadside Units Under Foggy Conditions

Abstract

1. Introduction

1.1. Lidar Point Cloud Denoising Method in Foggy Conditions

1.2. Multi-Target Tracking Methods

1.2.1. Data Association

1.2.2. Random Finite Set

2. Main Methods

2.1. System Framework: Multi-Target Tracking of Roadside Unit Coordination

2.2. Adaptive Lidar Point Cloud Denoising Method

2.2.1. Point Cloud Preprocessing

2.2.2. Voxelisation and Local Noise Estimation

2.2.3. Parameter Adaptive Adjustment

2.2.4. Edge Updates

2.2.5. Bilateral Filtering

2.3. Multi-Object Tracking Method with Roadside Unit Collaboration

2.3.1. Measurement Fusion

2.3.2. State Estimation

2.3.3. Trajectory Extraction

3. Results

3.1. Experimental Platfrom

3.1.1. Point Cloud Denoising

3.1.2. Multi-Target Tracking Experiments and Analysis

3.2. Computational Performance and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI