1. Introduction
With the proliferation of smart transportation infrastructure, an increasing number of studies have investigated traffic perception using fixed sensors deployed at intersections and along roadsides. In roadside LiDAR-based perception, preprocessing techniques such as background removal and ground removal have been actively explored to suppress false detections caused by static structures (e.g., poles, curbs, and roadside facilities) [
1,
2,
3]. Building on these preprocessing steps, several methods have been proposed to extract object trajectories and estimate vehicle speed [
4,
5]. Recently, deep-learning-based approaches for object detection and tracking using roadside LiDAR have also been reported [
6].
In parallel, vehicle speed estimation and traffic surveillance/analysis based on infrastructure-mounted traffic cameras have been continuously studied [
7,
8,
9]. LiDAR-based infrastructure systems for object detection and tracking have also been introduced [
10]. Furthermore, comparative evaluations of camera and LiDAR approaches have been conducted [
3], and sensor-fusion-based perception and tracking—incorporating radar—have been actively investigated [
11].
Despite these advances, LiDAR measurements become increasingly sparse as the sensor-target distance grows, leading to reduced point-cloud density for each object. Under long-range observation conditions typical of intersection infrastructure, such sparsity can significantly degrade the reliability of object detection and geometric modeling. Studies that explicitly address sparse point clouds and partial observations remain relatively limited. In this context, separate efforts have focused on object detection and estimation using low-channel roadside LiDAR sensors [
12]. In addition, deep-learning-based classification and speed estimation using traffic camera imagery have also been reported [
13].
Cooperative perception, in which infrastructure sensors provide estimated object states (e.g., position and velocity) to connected vehicles or traffic management systems, is gaining importance because it can compensate for the limitations of vehicle-only sensing, including occlusions, limited field of view, and sparsification at long range [
14]. In such applications, vehicle speed is a key input to higher-level modules such as collision-risk assessment, trajectory prediction, and safe-distance decision-making. Therefore, robust real-time speed estimation under real-world infrastructure conditions is essential.
In practical roadside deployments, vehicle speed can also be measured or inferred using other sensing modalities, including inductive loops, microwave radar, video cameras, and RFID-based infrastructure. Video-based approaches are attractive due to low hardware cost and wide-area coverage; however, they are sensitive to illumination and weather, depend on calibration quality, and may raise privacy concerns [
9]. RFID-based systems can provide reliable event-driven identification or coarse motion cues in instrumented zones, but require tag/reader deployment and offer limited spatial coverage [
15,
16]. Radar and loop detectors are mature and widely adopted in practice; nevertheless, they are often lane-dependent and provide limited shape information, which can complicate multi-object separation in dense traffic. In contrast, infrastructure-mounted LiDAR provides explicit 3D geometry for robust object separation and trajectory extraction, yet in infrastructure scenarios, frequent occlusions among vehicles, changes in viewing geometry, and range-induced sparsification can make representative-point estimation unstable, which is a key challenge targeted in this work.
Therefore, we focus on stable geometric modeling and lightweight matching that remain reliable under sparse and partially observed LiDAR returns.
2. Related Works
With the growing body of research on vehicle detection and tracking using stationary LiDAR sensors, continuous efforts have been made to provide vehicle state information from infrastructure sensors and to derive vehicle speed from tracking results. In roadside LiDAR-based detection and tracking, speed is typically computed from the inter-frame displacement of an object centroid or a representative point, and this estimated speed is then used to update trajectories and to support inter-frame association [
2,
6,
10,
17]. However, in stationary LiDAR settings, vehicle-to-vehicle occlusion occurs frequently, and point clouds tend to become sparse as the sensor–vehicle distance increases. As a result, representative-point estimation can be unreliable; in particular, when the true inter-frame displacement is small, even minor localization noise may be amplified during speed computation, causing large fluctuations in the estimated speed.
To mitigate these limitations, registration-based speed estimation methods have been proposed to more accurately estimate inter-frame correspondences in point clouds. Among point-cloud registration techniques, ICP and normal distributions transform (NDT) are widely used [
18,
19]. Nevertheless, due to iterative optimization and the cost of correspondence search or distribution modeling, these methods are computationally expensive, and numerous variants and efficient implementations have been studied to accelerate them [
20,
21]. Zhang et al. proposed a method that tracks vehicle clusters, projects 3D point clouds onto a 2D plane, and performs inter-frame template matching to precisely estimate subtle motion, thereby improving speed accuracy [
5]. In addition, approaches that exploit multi-frame information within a time window and combine detection and tracking (e.g., joint detection and tracking) can be interpreted as strengthening temporal consistency across frames, ultimately improving the stability of displacement estimation required for speed computation [
22]. Since computational efficiency is also critical for real-time processing, especially when speed estimation is included in the pipeline, methods for low-channel roadside LiDAR have been reported that estimate motion using limited features such as corner points or sparse keypoint matching [
12].
Separately, approximating a vehicle’s boundary with a simple geometric model and estimating its heading and contour via rectangle (or L-shape) fitting has been a representative approach for LiDAR cluster-based modeling. Zhang et al. proposed an efficient L-shape fitting method for 2D laser scanner clusters based on angular search [
23], and Zhao et al. presented an L-shape-based pose estimation and tracking approach for 3D LiDAR environments to improve performance [
24]. However, in infrastructure scenarios, sparse observations and occlusions often prevent two orthogonal edges of a vehicle from being sufficiently observed at the same time; instead, only a single dominant line component may be captured. In such cases, if the fitting results fluctuate across frames, noise can be amplified in displacement and speed estimation. Therefore, there is a demand for a speed estimation method that remains stable under sparse and partial observations while computing inter-frame correspondences in a computationally efficient manner.
While L-shape/rectangle fitting and lightweight feature matching have been explored for LiDAR-based vehicle modeling and tracking, their performance can degrade when the observation degenerate observations (e.g., a near single-line contour) as often encountered in long-range infrastructure sensing [
23,
24]. Compared with prior approaches, this work introduces two key elements. First, we introduce Hessian-level stabilization for Gauss–Newton rectangle fitting, which directly regulates ill-conditioned updates near boundary states and suppresses abnormal width/height deformation without adding extra penalty terms to the objective. Second, instead of relying on dense point-cloud registration or multiple independent control points, we design a deterministic two-point correspondence (nearest corner plus an auxiliary point) and perform 2D SVD-based alignment on this minimal set, enabling stable inter-frame matching with low computational overhead under sparse and partially observed returns.
In our infrastructure LiDAR setting, observations are often sparse and partially occluded, and the visible boundary may be dominated by a single edge. To mitigate instability in representative-point displacement estimation under such conditions, we (i) robustly fit a rectangle from extracted boundary points with Hessian-level stabilization, and (ii) perform computationally efficient inter-frame alignment using a deterministic two-point correspondence derived from the fitted rectangle, avoiding full point-cloud registration.
3. Methods
Infrastructure-mounted LiDAR sensors cannot directly measure a vehicle’s state; therefore, it is essential to robustly estimate vehicle motion from sequentially acquired point clouds. Conventional tracking approaches often represent a vehicle’s position using a single point derived from a clustered point cloud, such as the centroid or a small set of boundary/feature points. However, due to LiDAR observation characteristics, point loss frequently occurs because of occlusions by surrounding vehicles and changes in viewing geometry, which introduces significant noise into the estimated feature-point locations. This issue is particularly critical for infrastructure LiDAR, where the scan interval is relatively short and the inter-frame displacement is consequently small; under such conditions, noise contained in the representative points can be greatly amplified during speed computation, leading to large errors.
As an alternative, one may consider estimating vehicle motion by accurately registering the entire point cloud (point-cloud matching). However, LiDAR typically produces thousands to tens of thousands of points per frame, and precise registration generally requires substantial computation time. This limits practical applicability in real-world road environments where real-time processing is required.
To address these challenges, this paper proposes a method that fits a rectangle to each vehicle and then performs inter-frame matching at the rectangle level. Compared with point-wise matching, the proposed approach significantly reduces computational cost, and it is designed to achieve matching accuracy comparable to feature-point-based estimation while enabling real-time operation.
In addition, LiDAR data are commonly provided to the proposed method after standard preprocessing steps for vehicle estimation. In the preprocessing stage, background removal (or ground removal and dynamic object separation), clustering, and inter-frame object association are performed, and the subsequent processing takes as input the point cloud of each object with an assigned ID.
3.1. Point-Cloud Filtering
Accurate displacement estimation generally requires point-cloud matching; however, performing matching over the entire object point cloud leads to excessive computational cost, making it impractical for real-world roadside deployment. To address this issue, we approximate each vehicle point cloud with a rectangle so that the key information required for registration can be represented in a compact geometric form.
To fit a rectangle to the input vehicle point cloud, we first apply filtering to extract only boundary points. Specifically, we compute the relative position and bearing of each point with respect to the centroid of the vehicle cluster, and partition the angular space into fixed-size sectors. For each sector, we select the point with the maximum radial distance from the centroid, and these selected points constitute the boundary point set.
The filtered boundary points may still include outliers, such as isolated points that are excessively distant from neighboring points, sensor noise, or points that do not match the true vehicle shape due to transient events (e.g., door opening). To remove such outliers, we first compute, for each point
, the distance to its neighboring points.
In Equation (1),
and
denote the Euclidean distances from the current point
to the previous point
and the next point
, respectively. If both distances simultaneously exceed the distance threshold
,
Points that satisfy the condition in (2) are designated as outlier candidates.
Then, for each candidate point, we define the angle formed with its two neighboring points
and
as follows.
In Equation (3), represents the local change in boundary curvature. Using the angle threshold , we keep the point when the contour is smoothly connected (), and remove it only when a sharp bend is observed ().
3.2. Rectangle Fitting
In this study, we approximate the vehicle’s 2D boundary as a rectangle, define the corresponding rectangle parameters, and estimate them by solving a nonlinear least-squares optimization problem.
Equation (4) represents the rectangle parameters, where
is the rectangle center,
is the rotation angle, and
and
denote the width and height, respectively. The filtered point
is transformed into the rectangle’s local coordinate frame as
, defined as follows.
Therefore, it can be expressed as
For each point, we select the closest one among the four rectangle edges (edge index
) and construct the corresponding residual. Here,
corresponds to the width-direction edges (left/right), and
corresponds to the height-direction edges (top/bottom). The residual is defined in a normalized form such that it becomes zero on the boundaries
or
, as follows.
Assuming a Gaussian error model with standard deviation
, the nonlinear least-squares problem is formulated as follows.
To minimize (8), we adopt the Gauss-Newton method. The gradient and Hessian of the objective function are approximated from the residual Jacobian as follows.
Here, is composed of the partial derivatives with respect to the local coordinates and the rectangle parameters . It is computed in a case-wise manner depending on whether each point is associated with a width-direction edge or a height-direction edge.
In addition, to stabilize the updates so that the points remain within the rectangle and to mitigate abnormal shrinkage or growth of the width and height parameters during iterations, we do not introduce any additional penalty term into the objective function. Instead, we apply a Hessian-level augmentation term constructed from the gradients of boundary-related functions. Specifically, for each point, we define the following boundary-related functions.
These functions are normalized such that they become zero on the rectangle boundaries or . They are designed so that when a point lies inside the rectangle and when it lies outside. Therefore, the sign of indicates the inside/outside status, and reflects the relative degree of deviation from the boundary.
In addition, to modulate the convergence behavior of the update depending on the inside/outside status and the degree of deviation, we define a sign-inverted weight
as follows and use it in the Hessian augmentation term.
Equations (11) and (12) limit the magnitude of the weight to the range to prevent numerical instability. By including , the Hessian augmentation term is designed to act differently depending on whether a point lies inside the boundary () or outside (). As a result, during iterative optimization, it suppresses updates that would excessively violate the boundary or drive abnormal deformation of the width and height parameters, thereby assisting more stable boundary alignment.
Accordingly, based on the above definitions, we add the following augmentation term to the Hessian.
In Equation (13), is an augmentation term that prevents the point cloud from excessively deviating beyond the rectangle boundary.
Next, to prevent excessive updates of
and
, we introduce an additional size-stabilization term. In the rectangle coordinate frame, let
and
denote the points attaining the maximum and minimum
, respectively, and let
and
denote the points attaining the maximum and minimum
, respectively. Accordingly, we define the point-cloud extents as
In Equation (14), and denote the maximum and minimum values of , respectively, and and denote the maximum and minimum values of , respectively.
To mitigate the tendency of the rectangle parameters to change excessively during the iterative process once the point cloud lies within the rectangle region, we construct the following vectors only when
or
.
Here,
and
are defined as the partial derivatives with respect to
, computed using the coordinate differences between the extremal points in the local frame:
Accordingly, the additional Hessian stabilization term is defined as follows.
Note that is activated only when , and is activated only when ; otherwise, the corresponding term is set to zero. These conditions indicate that the point cloud is sufficiently covered by the current and . In this case, limits excessive updates of and , thereby improving numerical stability.
Finally, the Hessian is given by the following.
The Gauss-Newton iterative update is computed as follows.
For the initial yaw, we consider two candidates, , and initialize the size parameters from the min–max extent of the point cloud as and . For each initialization, we run the Gauss–Newton optimization and select the solution with the smaller final objective value as the final estimate.
Overall, the proposed optimization is based on a residual-driven Gauss–Newton update, while incorporating a boundary-state-based Hessian augmentation term () and a size-stabilization term () to limit excessive updates near the boundary and in the components, thereby ensuring numerical stability and robust convergence behavior.
3.3. Rectangle Matching
Finally, we compute accurate displacement and speed via matching between rectangles. This stage performs rectangle-to-rectangle alignment using the rectangles already fitted in each frame by the optimization procedure in
Section 3.2, and does not rely on ICP-based registration of the full point cloud. In addition, the SVD-based alignment using two representative points can be interpreted as estimating the rotation
by aligning the direction vectors defined by the two point pairs, and then computing the translation
from the resulting rotation.
It is computed from the two point sets
and
as follows.
In Equation (21), denotes the cross-covariance matrix, and (22) computes the optimal rotation by performing SVD on .
For computational efficiency, we perform matching in 2D using only two points. The matching is carried out between two rectangles from consecutive frames, and .
Under normal vehicle trajectories, the nearest corner does not switch to the opposite corner between adjacent frames; therefore, we consider only transitions to adjacent corners. First, for each rectangle, we define the corner closest to the LiDAR origin as a feature point, denoted by and . Then, based on the relationship between the corner indices of the selected feature points in the two frames, we construct an auxiliary point . Finally, the two-point pair is used as the representative set for matching.
First, when the selected corners in the two frames correspond to the same index, we determine the corner index of the auxiliary point
from the corners adjacent to
according to a predefined rule (e.g., the next corner in the counterclockwise direction), as illustrated in
Figure 1. The same indexing rule is applied to both rectangles to form the correspondence. In addition, to prevent unstable correspondence of the auxiliary point caused by differences in side lengths between the two rectangles, the auxiliary points
and
are placed at the same relative position using the shorter of the two side lengths along the corresponding direction as a reference.
Second, when the selected corner indices differ between the two frames, we fix the feature point
selected in frame
as the reference point
, as shown in
Figure 2, and reassign the corresponding corner in frame
as the auxiliary point
to form the representative pair
. We then define the auxiliary point
in frame
using the same adjacent-corner rule so that a consistent pair of representative points is obtained for both rectangles. Again, to reduce correspondence errors caused by differences in side lengths, the offsets of
and
are normalized to the same relative position using the shorter of the two side lengths along the corresponding direction as the reference.
Using the two representative point sets constructed as above, we apply SVD-based alignment to estimate and , and then compute the 2D inter-frame displacement and speed from the resulting transformation.
4. Results
To validate the proposed algorithm, we conducted experiments at K-City, located in Hwaseong-si, Gyeonggi-do, Republic of Korea. The processing pipeline consisted of background subtraction via static point removal, clustering using density-based spatial clustering of applications with noise (DBSCAN) [
25], and inter-frame association using the Hungarian algorithm (linear assignment) [
26]; the proposed method was then applied as the final stage of the pipeline. The test site was a bidirectional five-lane intersection with a traffic island, and the infrastructure LiDAR was installed near roadside structures at approximately 6 m above the ground (
Figure 3). The experimental setup included a 40-channel 3D LiDAR (Pandar40P, Hesai Technology Co., Ltd., Shanghai, China), an industrial PC (sourced from SignTelecom Co., Ltd., Seoul, Republic of Korea), a test vehicle (Genesis G70, Hyundai Motor Company, Seoul, Republic of Korea), and an RTK system (MRP-2000, MBC Broadcasting Co., Ltd., Seoul, Republic of Korea). Detailed specifications of the sensors and equipment are summarized in
Table 1. The parameter settings used in the experiments are listed in
Table 2.
The industrial PC used in the experiments was equipped with a GPU for potential future deep-learning-based classification; however, the speed estimation algorithm proposed in this paper was executed solely on the CPU without GPU acceleration.
Because the infrastructure LiDAR observes objects from a long distance, the shape of the acquired point cloud can vary significantly depending on the object’s position and orientation. Typically, objects are observed in the representative forms shown in
Figure 4.
In
Figure 4a, the vehicle is observed under nominal conditions with sufficient returns.
Figure 4b illustrates a partial-observation case in which only the side surface is predominantly visible; nevertheless, the representative points are preserved and the rectangle remains stable.
Figure 4c demonstrates that the proposed method can still fit a plausible rectangle even when both the front (bonnet) and rear (trunk) portions are occluded by another vehicle or an obstacle, as long as enough side-boundary evidence is retained. In contrast,
Figure 4d shows a challenging case where the visible boundary becomes severely degraded (e.g., the side surface is largely missing), making the geometry underconstrained and resulting in an unstable or incorrect fit.
Following the qualitative fitting results, we conduct an ablation study to validate the numerical stabilization effect of the proposed Hessian-level augmentation. We compare the proposed method against a variant in which the Hessian stabilization terms are removed. A fitting failure is defined as the case where the Gauss–Newton optimization reaches the maximum number of iterations without convergence, or when the optimization encounters a numerical issue and the update cannot be computed. In addition, we report the average point-to-rectangle distance between the observed point cloud and the fitted rectangle to quantify fitting quality. As summarized in
Table 3, removing the Hessian stabilization leads to higher failure rates and larger mean point-to-rectangle distances.
The system clocks of all devices were synchronized to a common reference time using NTP (Network Time Protocol) with a Google NTP server, and the logs were aligned based on the synchronized timestamps. To evaluate performance under various speed conditions, data were collected while driving along a straight segment at 30, 50, 70, and 90 km/h. In addition, to examine whether speed estimation remains feasible under high curvature, we included an extra scenario in which the vehicle passed the traffic island and then executed a sharp right turn. As the reference speed, we used the vehicle speed obtained from the CAN bus, while the RTK-based speed—computed by differentiating RTK position over time—was used as an auxiliary metric.
Performance evaluation was conducted only within the region of interest (ROI) of the infrastructure LiDAR because clustering and tracking reliability can degrade outside the ROI due to reduced point density at longer ranges. During data collection, other vehicles were either absent or present up to a maximum of two, located in front of/behind the target vehicle or in non-adjacent lanes. The ROI was defined as the region where the vehicle is continuously intersected by at least two LiDAR rings.
Figure 5 illustrates the scenario trajectories and the LiDAR ROI.
In
Figure 5, the blue marker indicates the LiDAR installation location, the red solid line denotes the ROI, the red dashed line represents the trajectory for the straight-driving scenario, and the green solid line represents the trajectory for the turning scenario.
The straight-line segment available within the ROI was approximately 90 m. Due to site constraints, each speed setting was tested once. The nominal speed settings were 30, 50, and 70 kph, as well as high-speed (~90 kph); for the high-speed run, the driver was asked to drive as fast as feasible, resulting in a larger speed variation.
Table 3 reports the root mean square error (RMSE) and mean absolute error (MAE) of the vehicle speed estimated by the proposed method with respect to the CAN-based reference speed.
Table 4 summarizes the speed estimation performance on the straight segment under driving conditions of 30, 50, 70, and ~90 kph. All metrics were computed using only the samples within the ROI, and the reference speed was obtained from the vehicle CAN. Under the tested speed conditions, the proposed method achieved an MAE of 0.76–1.37 kph and an RMSE of 0.90–1.58 kph.
Table 5 further compares the proposed method with a centroid-based baseline, in which the inter-frame correspondence is computed from the cluster centroid displacement while the rest of the pipeline is kept unchanged. The centroid baseline shows larger errors than the proposed method under the same ROI and CAN-referenced evaluation. The centroid baseline also results in wider 95% confidence intervals, which suggests higher variability and sensitivity under sparse and partially occluded observations. Temporal smoothing could be applied to either method; in this study, we report the unfiltered estimates to focus on the stability of the underlying displacement measurements.
Figure 6a–e present the speed estimation results for straight driving at 30, 50, 70, and ~90 kph and for the turning scenario, respectively, showing that the estimated speed generally follows the CAN-based reference.
Figure 7 further summarizes the absolute error distributions for these scenarios using histograms. As speed increases,
Figure 7 indicates a more dispersed error distribution with larger outliers; the corresponding histograms appear flatter, which may also be influenced by the relatively smaller sample size in high-speed segments.
The RTK-based speed is used only as an auxiliary reference and is obtained by time-differentiating the RTK positions. Because the RTK update rate is relatively low and measurements are missing in some segments, the resulting speed signal can be delayed and the number of valid samples can be limited. In this study, the RTK-based speed is computed as the displacement between two consecutive RTK positions divided by the corresponding time interval, and the errors with respect to RTK are summarized for reference in
Table 6. Due to the limited number of valid RTK samples, we do not report confidence intervals for the RTK-referenced errors.
The errors with respect to RTK showed an MAE of 0.53–3.10 km/h and an RMSE of 0.62–3.75 km/h.
To assess cross-vehicle generalization, we conducted an additional experiment using the same cornering scenario with an SUV (Seltos, Kia Corporation, Seoul, Republic of Korea). Due to practical constraints, the additional evaluation was limited to a single scenario, and the vehicle was assessed using RTK only. The results are summarized in
Table 7. Although the scope is limited, the proposed method did not exhibit catastrophic degradation when applied to a different vehicle type, indicating the potential for applicability beyond a single vehicle model.
In addition, to assess the real-time feasibility of the proposed algorithm, we measured the processing time of the tracking stage (association and speed estimation) using infrastructure LiDAR data collected at the Hwaseong intersection shown in
Figure 8, excluding preprocessing steps such as background subtraction and clustering. All experiments were conducted on the same industrial PC.
The measurement was conducted during the evening rush hour when traffic volume was high, and the sample data, as shown in
Figure 9, depict more than ten vehicles traveling through the scene.
The processing-time distribution of the tracking stage is shown in
Figure 10. The maximum and average processing times were 34.736 ms and 18.458 ms, respectively, confirming real-time operation at the LiDAR rate of 10 Hz.
5. Discussion
Experimental results confirm that the proposed method consistently preserves representative points and robustly fits a rectangle even when the shape of point clusters varies due to changes in infrastructure LiDAR observation conditions. In particular, under geometrically challenging observation conditions—such as when the vehicle is observed from an elevated infrastructure viewpoint at longer ranges—the returned points often become sparse and biased toward a side view, producing point clouds that frequently resemble a single line segment. In addition, the method remains effective when the target vehicle is partially occluded by other objects or obstacles. Even in such cases, the proposed approach maintains the representative points and fits a rectangle, thereby improving the stability of speed estimation. Moreover, the proposed method reduces the computational cost of matching by leveraging rectangle-based representative points, while achieving high speed accuracy with a low error of approximately 1 kph in terms of MAE. These results demonstrate that the proposed method can stably estimate vehicle speed across the tested speed and trajectory scenarios, even when the object is observed in sparse regions where only about two LiDAR rings are returned, as in our experimental setup. In addition, a supplemental experiment using an SUV (Kia Seltos) was included, and the method did not show catastrophic degradation under a limited vehicle-type change. The tracking stage also achieved real-time performance during high-traffic commuting hours, with an average processing time of approximately 18 ms.
While the CAN-referenced errors tend to increase with speed, the RTK-referenced errors do not necessarily follow a monotonic trend. This difference is mainly because the RTK-based speed is obtained by time-differentiating discrete position measurements, which amplifies position noise and becomes sensitive to the RTK update rate and missing samples. In addition, when the vehicle speed is not perfectly constant within the ROI (e.g., slight acceleration/deceleration), the differencing-based RTK speed can reflect an interval-averaged value over the time step, whereas the CAN signal is closer to an instantaneous speed reference. Therefore, the RTK-referenced errors can appear to increase and then decrease across nominal speed settings, and we report RTK results only as an auxiliary reference.
In future work, we plan to extend the proposed algorithm to dynamic sensing environments, including not only fixed infrastructure LiDAR but also moving sensors such as vehicle-mounted LiDAR.