You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

12 September 2025

Robust Pose Estimation and Size Classification for Unknown Dump Truck Using Normal Distribution Transform

,
,
,
,
and
1
Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Tokyo 113-8656, Japan
2
Komatsu Ltd., Tsu 23 Futsu-machi, Komatsu-shi 923-0392, Ishikawa, Japan
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Intelligent Point Cloud Processing, Sensing and Understanding—Third Edition

Abstract

Labor shortage has been a severe problem in the Japanese construction industry, and the automation of construction work has been in high demand. One of the needs is the automation of soil loading onto dump trucks. This task requires pose estimation and size classification of the dump trucks to determine the appropriate loading position and volume. At actual construction sites, specifications of dump trucks are not always known in advance. However, most of the existing methods cannot robustly estimate the pose and the size of such unknown dump trucks. To address this issue, we propose a two-stage method that estimates the pose of dump trucks and then classifies their size categories. We use Normal Distribution Transform (NDT) for pose estimation of dump trucks. Specifically, we utilize NDT templates of dump trucks which distinguish global differences among size categories and simultaneously absorb local shape variations within each category. The proposed method is evaluated by data in a real-world environment. The proposed method appropriately estimates the pose of dump trucks under various settings of positions and orientations. In addition, the method correctly classifies the observed dump truck with all three predefined size categories. Furthermore, the computation time is approximately 0.13 s, which is sufficiently short for practical operation. These results indicate that the method will contribute to the automation of soil loading onto dump trucks with unknown specifications.

1. Introduction

In Japan, the construction industry has faced a labor shortage due to an aging population and declining number of skilled workers. Specifically, the number of construction technicians has been basically decreasing over the past ten years []. According to the statistical survey about the working population of the construction industry in 2024 [], those aged 60 and above account for over 25% of the total, while those under 30 account for about 12%. For this social background, the automation of construction tasks has become necessary. One of the major unsolved issues is to automate soil loading onto dump trucks with wheel loaders. Figure 1a illustrates the component technologies and the pipeline for soil loading. In this case, the automation consists of some processes such as bucket filling [,], autonomous locomotion [,], and soil loading onto dump trucks [,]. Among these processes, we focus on the recognition of dump trucks for soil loading based on point cloud data. To be more precise, the wheel loaders require pose estimation and size classification of the dump trucks to determine the appropriate loading position and capacity. Figure 1b illustrates the assumed layout of the wheel loader, dump truck, and soil. In this setup, the automatic loading system needs to estimate the pose and classify the size of the dump truck located in front of the wheel loader for automatic soil loading.
Figure 1. Assumed situation in automated soil loading. (a) Component technologies and the pipeline. (b) Layout of the wheel loader, dump truck, and soil.
One of the key challenges in actual construction sites is that the specifications of dump trucks are not always known in advance. Because dump trucks from various contractors enter and leave actual construction sites, it is difficult for the automatic loading system to fully manage their specifications. This situation raises two issues. First, classification of size categories is required to estimate the loading capacity of each truck. Second, pose estimation becomes more difficult due to the increased variability in truck shape within each size category. As for the second issue, Figure 2 illustrates the local shape variations in the unknown dump trucks due to the design of the vessel. Because various contractors retrofit base vehicles with specific vessels, local shape variations occur even within the same size category depending on the vessel design. Therefore, it is necessary to develop a method that performs both pose estimation and size classification for dump trucks with unknown specifications. To address this challenge, the method must distinguish global differences among size categories in size classification and also absorb local shape variations within each category in pose estimation. In addition, for practical implementation on construction machinery, the method should perform fast under limited computational resources.
Figure 2. Local shape variations in dump trucks due to vessel design.
Based on the above considerations, this study proposes a two-stage method based on Normal Distribution Transform (NDT) [], which is a method of point cloud registration. In existing research for recognition of dump trucks using point clouds, truck poses have often been estimated through point cloud registration between an observed point cloud and a reference point cloud. Among such approaches, Iterative Closest Point (ICP) [] has been widely used [,]. ICP matches a pair of point clouds by minimizing the distance between each point in the observed point cloud and its nearest neighbor in the reference point cloud. This approach works well when the specifications of the observed dump truck are known in advance and a corresponding reference point cloud can be prepared. However, as discussed above, such assumptions are not always met in the actual construction sites. In addition, although deep-learning-based methods have also been investigated for dump trucks [], they still have challenges in terms of the cost of collecting sufficient training data. On the other hand, NDT represents a reference point cloud by a set of normal distributions and minimizes the distance between the observed point cloud and these distributions. Because NDT approximates local shape variations with probability distributions, it is expected to be more robust against shape variations between the observed dump truck and the reference point cloud in the first step. For size classification, NDT alone does not provide a direct solution. Accordingly, we extend it to a parallel comparison based on the fact that NDT optimizes the matching score to determine the transformation. Specifically, we prepare multiple reference point clouds that represent different size categories and conduct NDT-based pose estimations in parallel. At least in Japan, dump truck sizes can be roughly categorized into a limited number of predefined categories, and it is practical to prepare the corresponding reference point clouds. Size classification is then performed by selecting the size category that provides the highest optimized score in the second step.
Figure 3 shows the conceptual diagram of the proposed method. First, an NDT template is constructed for each size category of dump trucks. These templates are constructed to distinguish global differences across size categories while absorbing local shape variations within each size category. Then, the pose estimation is performed by matching the input point cloud with the template of each size category by NDT. After that, the size classification is achieved by comparing the optimized scores across size categories and selecting the one with the highest score. In this way, the proposed method achieves robust pose estimation and size classification for unknown dump trucks.
Figure 3. Conceptual diagram of the proposed method.
The main contributions of this study are as follows:
  • Proposal of a two-stage method for pose estimation and size classification of dump trucks with unknown specifications using NDT templates;
  • Experimental validation of the proposed method using real-world data under various settings of different positions and size categories.
The remainder of this paper is organized as follows: Section 2 reviews the existing research on the recognition of dump trucks using point cloud data. Section 3 explains the details of the proposed method. In Section 4 and Section 5, the proposed method is evaluated by real-world data for pose estimation and size classification, respectively. Section 6 organizes discussion and limitations in this study. Finally, Section 7 presents the conclusions.

3. Method

3.1. Overview

Figure 4 shows the pipeline of the proposed method. In advance, a set of templates is constructed from reference point clouds for each size category (Section 3.2). During online operation, first, the point cloud of the dump truck is observed by the LiDAR sensors mounted on the wheel loader. The observed point cloud is then preprocessed with a rectangle fitting method to obtain an initial transformation for the following NDT process (Section 3.3). Next, NDT iteratively updates transformation parameters by matching the observed point cloud with pre-constructed templates (Section 3.4). These preprocess and NDT-based pose estimation are performed in parallel for multiple templates. After the pose estimation for each size category, the method compares the scores which are designed for size classification and then selects the one with the highest value (Section 3.5). To be more precise, we introduce negative point clouds into the conventional NDT score to correctly distinguish between different size categories.
Figure 4. Pipeline of proposed method. Before online operation, template construction is performed in advance (Section 3.2). Online operation consists of preprocess (Section 3.3), NDT-based pose estimation (Section 3.4), and score calculation (Section 3.5).
In this study, pose estimation is formulated as the problem of estimating a coordinate transformation that matches the observed point cloud with a template whose pose is already known. Because both the wheel loader and the dump truck are positioned horizontally on flat ground at the construction site, we consider the 2-D transformation parameter p defined as follows in this study:
p = ( t x ,   t y ,   ϕ z )
where t x and t y represent translational parameters along the x-axis and y-axis, respectively, and ϕ z denotes the yaw angle. When a point ( x j , y j , z j ) in the observed point cloud is transformed by this transformation parameter p , the position of the point after the transformation ( x j , y j , z j ) is expressed as follows:
x j y j   z j = c o s ϕ z s i n ϕ z 0 s i n ϕ z c o s ϕ z 0 0 0 1 x j x 0 y j y 0 z j z 0 + x 0 y 0 z 0 + t x t y 0
where ( x 0 , y 0 ,   z 0 ) indicates the center of the template. In other words, the transformation is defined as a translation in the xy-plane and a rotation around the z-axis that passes through the center of the template.

3.2. Construction of Normal Distribution Template

Before the online operation, normal distribution templates are constructed for different size categories using reference point clouds. These reference point clouds represent the typical shape of dump trucks in each category, and they are prepared in advance. Specifically, they are created by merging point clouds of a dump truck captured from all directions. We emphasize that the dump trucks which the reference point clouds represent are not always identical to those which will be observed during actual operation. Local shape variations may exist between the reference and observed dump trucks, even within the same size category.
The following part describes the process of constructing a single template from a reference point cloud. First, the reference point cloud is divided into a grid of voxels. Then, the mean vector μ k and covariance matrix Σ k are computed for a subset of points within the voxel k as follows:
μ k = 1 n k i = 1 n k x k , i
Σ k = 1 n k i = 1 n k x k , i μ k x k , i μ k T
where n k represents the number of points within voxel k , and x k , i ( i = 1 , , n k ) represents the i -th point within the voxel k .
Figure 5a illustrates the reference point cloud and constructed template. As shown in Figure 5a, an ellipsoid represents the 95% confidence region of the normal distribution in each voxel. In addition, Figure 5b illustrates how the template is divided into a voxel grid. The voxel grid is defined by 6 parameters: The voxel sizes ( l x , l y , l z ) and the offsets ( d l x , d l y , d l z ). As an example, the process of the voxel placement along the x-axis is described as follows:
  • The point with the smallest x-coordinate is extracted from the reference point cloud.
  • The starting line of the grid is then determined by subtracting the offset d l x from the x-coordinate of the above extracted point.
  • From this starting point, the space is divided along the x-axis at intervals lx
Figure 5. Visualization of Template. In this case, ( l x , l y , l z ) is (0.4, 0.8, 0.4) [m] and ( d l x , d l y , d l z ) is (0.2, 0.2, 0.0) [m]. (a) Reference point cloud and constructed template. (b) Top view and side view of the template with associated parameters.
The same process is applied to the y-axis and z-axis using voxel sizes l y and l z , and offsets d l y and d l z , respectively.

3.3. Preprocess

To avoid convergence to an unfavorable local optimum in the NDT-based pose estimation, a preprocess is conducted to roughly estimate the truck’s pose before the NDT. Although applications that deal with large-scale point clouds often require downsampling or compression [,] to improve data transfer speed or execution efficiency, this study uses only a single frame and does not include such processing in the preprocess. This preprocess consists of the following two steps:
  • Filtering of the observed point cloud based on the predefined parking area and height threshold.
  • Rectangle fitting to approximate the truck’s shape with a rectangle for estimation of an initial transformation.
The details of each step are explained in the following parts.

3.3.1. Filtering Based on Parking Area and Height

First, the observed point cloud is filtered based on the positions. Specifically, points outside the predetermined parking area are removed. In addition, to remove noise and ground points, only points above a certain height threshold are retained. This process ensures that the filtered point cloud primarily represents the dump truck.

3.3.2. Initial Transformation Using Rectangle Fitting

After the filtering, the rectangle fitting method [] is applied to the filtered point cloud to obtain a 2-D bounding rectangle. This rectangle approximates the horizontal projection of the filtered points. Figure 6 illustrates the process for deriving the initial transformation. Here, the coordinate system is defined such that the long side of the template is aligned with the x-axis. First, the center of the bounding rectangle is computed from its four vertices. A translational transformation is then applied to align this center with the center of the template, as shown in Figure 6a. Next, a rotational transformation is performed to align the orientation of the bounding rectangle with that of the template. However, because this initial transformation is based only on the geometric shape of the rectangle, it cannot distinguish between forward and backward orientations of the dump truck, as shown in Figure 6b. To address this ambiguity, two types of initial transformations are used in the following NDT-based pose estimation. One transformation is a 180-degree rotation of the other. In other words, one corresponds to the correct orientation and the other to the reversed orientation.
Figure 6. Process of pre-transformation using rectangle fitting. Green dots indicate filtered point cloud, and red dotted lines represent diagonals of fitted rectangle. (a) Translational transformation. (b) Ambiguity of forward-backward orientation estimation.

3.4. NDT-Based Pose Estimation

In this study, we assume that the yaw angle is estimated with sufficiently high accuracy in the preprocess, and only the translational transformation parameters are updated in the NDT-based pose estimation for computational cost. First, the observed point cloud is transformed by a parameter p according to Equation (2). The probability density function is evaluated for all transformed points x j   ( j = 1 , , N ) and all voxels in the template. Here, N denotes the total number of the points in the point cloud. For each point x j , let k j denote the nearest voxel in the template. Then, the likelihood of x j is expressed as follows:
x j = 1 2 π n Σ k j exp x j   μ k j T Σ k j 1 x j   μ k j 2
where μ k j and Σ k j denote the mean vector and covariance matrix of a voxel k j , and n denotes the dimension of x j , which is 3 in this study.
This study employs the NDT formulation proposed by Biber et al. [], who introduced a mixture model of a normal distribution and a uniform distribution. By incorporating this mixture model and applying further approximations, their method improves both robustness and computational efficiency. Following their formulation, the evaluation function for parameter p becomes computable as a sum of exponential functions [] as follows:
E p = 1 N j = 1 N d 1 exp d 2 x j μ k j   T Σ k j 1 x j μ k j   2
where d 1 and d 2 are constraints.
To optimize E p , the Newton method has been generally used in existing research. However, in this study, we employ the Euler method instead. This decision is based on two reasons. First, we found that the Newton method exhibits unstable behavior near inflection points, which can lead to divergence in the update of transformation parameters. Second, the Newton method requires the computation of the Hessian matrix, resulting in a high computational cost. Based on these considerations, we employed the Euler method to improve stability and efficiency in NDT-based pose estimation for dump trucks.
In the Euler method, the transformation parameter’s increment p is given as follows:
p = h · g
where h represents the step size and g is the gradient vector of E p . Because the preprocess provides a rough initial transformation and the NDT aims to refine this transformation, we limit the maximum value of p by setting h as follows:
h =       1         i f   g 0.01 0.01 g   i f   g > 0.01
This ensures a maximum movement of 0.01 m within one iteration.
As described in Section 3.1, the preprocess and the NDT-based pose estimation are performed with multiple templates corresponding to different size categories. In addition, as described in Section 3.3, to determine the correct forward-backward orientation, two different initial transformations are inputted to the NDT-based pose estimation. Therefore, for the observed point cloud, the system conducts NDT-based pose estimation in parallel for all combinations of size categories and forward-backward orientations. In other words, if S represents the number of size categories, the system performs 2 S NDT-based evaluations in parallel, considering both possible orientations for each category.

3.5. Size Classification with Negative Point Cloud

For the 2 S NDT-based pose estimations which are conducted in parallel, the system performs a two-step score selection. First, for each size category, it compares the two NDT scores with different initial orientations and selects the higher score. This first step determines the correct forward-backward orientation within each size category. Next, among the S NDT results that correspond to different size categories, the system selects the highest score. This second step determines the correct size category. Through this two-step process, the system determines both the forward-backward orientation and the size category of the observed dump truck.
However, in the second step of the score selection, the size classification cannot be reliably performed only by comparing the conventional NDT scores. Because NDT maximizes the overlap between the template and the transformed point cloud, it may result in an invalid higher score when the template corresponds to a larger dump truck than the observed point cloud. Figure 7a,b illustrate valid and invalid overlap between the point cloud and the template, respectively. In the figures, the green points represent the observed point cloud, and a set of red ellipsoids represents the template. To mitigate the issue of this invalid overlap, we propose a method to reduce the score when the template size is larger than the actual dump truck. Specifically, in the second step of the score selection, we introduce a virtual “negative” point cloud around the observed point cloud, which overlaps with the larger templates as shown in Figure 7c. In the figure, the blue points represent the negative point cloud. This negative point cloud contributes as a negative term when calculating the NDT score. The NDT score incorporating the negative point cloud is expressed as
s c o r e = 1 N j = 1 N + M ω j d 1 exp d 2 x j μ k j T Σ k j 1 x j μ k j 2
where N and M represent the number of points of the actual and negative point cloud, respectively. Here, the negative term is incorporated into the score by setting ω j as follows:
ω j = 1   i f   x j   b e l o n g s   t o   a c t u a l   p o i n t   c l o u d 1   i f   x j   b e l o n g s   t o   n e g a t i v e   p o i n t   c l o u d
Figure 7. Template and actual point cloud when an invalid higher score situation. (a) valid overlap between observed point cloud and template. (b) invalid overlap between observed point cloud and template. (c) Negative point cloud incorporated into (b).
Figure 8 illustrates the positioning of the negative point cloud. In the figure, the green points represent the observed point cloud, while the blue points represent the negative point cloud. The negative point cloud is assigned to the areas in front of and behind the observed point cloud, as well as above the vessel. The spatial range of the negative point cloud in front of and behind the dump truck is defined by the fitted rectangle obtained in the preprocess and parameters x g a p and x l e n , as shown in Figure 8a. Here, x g a p represents the gap distance between the negative point cloud and the fitted rectangle. Moreover, x l e n represents the length of the negative point cloud range. In addition, the range in the y-direction is defined by the minimum and maximum y-coordinates of the fitted rectangle. Similarly, the range of the negative point cloud above the vessel is defined by z g a p and z l e n , as shown in Figure 8b. In addition, the range in the x-direction is defined by the x-coordinate of the center of the fitted rectangle and the maximum x-coordinate. Furthermore, the interval between the negative points d n e g is also a parameter in the assignment process. These five parameters x g a p , x l e n , z g a p , z l e n , and d n e g are designed based on the requirements for size classification, and their numerical values are described in Section 5.1.
Figure 8. Visualization of transformed point cloud and negative point cloud. White lines around green points represent fitted rectangle, and white lines around blue points represent assignment area for negative points. (a) Top view. (b) Side view.

4. Evaluation of Pose Estimation

4.1. Experimental Setup

This section evaluates the performance of the NDT-based pose estimation. To isolate the evaluation of pose estimation from that of size classification, the experiments used a single template that belongs to the same size class as the observed dump truck. It should be noted that the template and the observed dump truck were not identical, and local shape variations existed between them. For this evaluation, point cloud data of a dump truck were collected under 12 different settings. Specifically, the dump truck was parked at four different positions for each of the three orientations. These settings are labeled 1-A to 3-D, as shown in Figure 9. The purpose of this experiment is to evaluate the method’s ability to estimate poses under two types of variation: (1) local shape differences between the template and the observed dump truck within the same size category, and (2) variations in the truck’s position and orientation within the parking area.
Figure 9. Position and angle conditions of observed data. A–D represent four different dump truck positions.
The whole process was conducted on a laptop computer with an AMD (Santa Clara, CA, USA)® Ryzen 7 7730U [] processor on C++ nodes implemented in ROS Noetic. In addition, the data were acquired using two Livox HAP LiDARs mounted on the front of the wheel loader. The specifications of the LiDAR sensor are summarized in Table 1 []. In addition, the maximum number of NDT update iterations was set to 20.
Table 1. Specifications of the Livox HAP LiDAR used in this evaluation [].

4.2. Evaluation Method

The evaluation is conducted from three aspects: (1) correctness of forward-backward orientation, (2) error in the estimated yaw angle ϕ z , and (3) errors in the estimated translational parameters t x and t y . Each aspect is described below.

4.2.1. Forward-Backward Orientation

As described in Section 3.3, the proposed method performs pose estimation for both forward and backward orientations of the dump truck and selects the one with the higher NDT score. The orientation is considered correct if the selected result matches the actual orientation of the observed dump truck.

4.2.2. Error in Yaw Angle ϕ z

To evaluate the rotational accuracy, two points are manually selected in advance, and the line connecting these two points is used as a landmark line. Specifically, the landmark lines are selected on the side surface of the vessel for both the reference and observed point clouds. After applying the estimated transformation, the yaw angle error is calculated as the difference between the slopes of the reference line and the transformed line.

4.2.3. Error in Translational Parameters t x and t y

To evaluate translational accuracy, a feature point is manually selected as a landmark in advance. In this study, the left mirror of the dump truck is selected as the landmark point. To cancel out the rotational error, the observed point cloud is first rotated by the error in ϕ z after pose estimation. Then, the corresponding point in the observed point cloud after transformation is compared with the position in the reference point cloud. The errors in t x and t y are computed as the differences between these two positions.

4.3. Results of Single Setting

First, we summarize the results of case 1-A, which is shown in Figure 9, for detailed analysis of one setting. Figure 10 shows the observed point cloud after transformation by the NDT-based pose estimation. Specifically, Figure 10a,b show the template and the observed point cloud after transformation in the top and side views, respectively. In the figures, a set of red ellipsoids represents the template, and green points represent the observed point cloud. In this case, the forward-backward orientation is correctly estimated, and the errors in t x , t y , and ϕ z are 0.01 m, 0.10 m, and −0.012 rad, respectively.
Figure 10. Template and observed point cloud after transformation in case 1-A. (a) Top view. (b) Side view.

4.4. Results of All Settings

Next, we analyze the results of all settings for comprehensive performance. Table 2 summarizes the pose estimation results for the 12 settings. In the table, the “Orientation” column indicates whether the forward-backward orientation was correctly estimated. Here, “✓” denotes a correct estimation, while “×” denotes an incorrect estimation. The “Score” column indicates the maximum NDT score. In addition, “Error t x ”, “Error t y ”, and “Error ϕ z ” columns indicate the errors in t x , t y , and ϕ z , respectively. These estimation errors are evaluated for all rows except for the rows of cases 2-D and 3-C, where the orientation was incorrectly estimated.
Table 2. Evaluation score and error results for 12 settings.
Figure 11 shows box plots of absolute errors for the ten cases where the orientation was correctly estimated. In these cases, the maximum absolute errors were 0.10 m in t x , 0.16 m in t y , and 0.019 rad (approximately 1.089°) in ϕ z . These results indicate that the proposed method can estimate the pose with reasonable accuracy when the correct orientation is estimated.
Figure 11. Box plot of absolute errors in (a) t x , (b) t y , and (c) ϕ z . Blue area indicates interquartile range.
As an example case of incorrect orientation estimation, Figure 12 shows the result for case 3-C. In this case, the method failed to estimate the correct forward-backward orientation of the dump truck. Figure 13 illustrates a side view of the transformed point cloud overlaid with the reference point cloud. It shows that the top part of the dump truck was outside the LiDAR’s observable area due to the positional relationship between the sensor and the truck. This limitation in observation might contribute to the incorrect pose estimation. Such an incorrect case is primarily due to limitations in sensor coverage rather than the method.
Figure 12. Template and transformed point cloud in case 3-C. (a) Top view. (b) Side view.
Figure 13. Observed point cloud and reference point cloud in case 3-C. Green points represent observed point cloud, and red points represent reference point cloud.
In summary, although the results show certain limitations in cases where the LiDAR observation is insufficient, the proposed method can estimate the pose with reasonable accuracy even under variations in the truck’s position and orientation within the parking area.

5. Evaluation of Size Classification

5.1. Size Categories of Dump Trucks

In this section, we evaluate the size classification performance using multiple templates representing different dump truck size categories. Table 3 summarizes the four dump trucks used as observed point clouds, which are categorized into small, medium, and large. For each size category, one template was constructed using a representative truck; Small A truck was used for small template, Medium for medium, and Large for large. Figure 14 shows the reference point clouds used to construct these templates. To evaluate classification performance, we examined all combinations of the four observed point clouds and the three templates. This setting allows us to evaluate the method’s ability to distinguish between size categories, as well as its robustness to local shape variations within the same category.
Table 3. Size categories of dump truck and their length and vessel height.
Figure 14. The reference point clouds of small, medium, and large categories. Red points represent reference point clouds.
As described in Section 3.5, the parameters for placing negative point clouds are determined according to the size specifications of dump trucks. In this evaluation, the parameters were set for classification based on both truck length and vessel height. Specifically, x g a p and x l e n were both set to 0.3 m to classify by dump truck length. Also, z g a p and z l e n were set to 0.4 m and 0.5 m to classify by vessel height, respectively. In addition, the interval between points d n e g was set to 0.1 m.

5.2. Results

Table 4 shows the results of the size classification before incorporating negative point clouds. The table displays the score computed for each combination of template and observed point clouds. The highest score for each observed point cloud is highlighted in bold and marked with an asterisk. The rightmost column indicates whether the classification was correct. These results show that when a small dump truck was observed, it was frequently misclassified as a larger size. This misclassification occurred because the computed score was incorrectly high when the size category of the template was larger than the observed point cloud. Without negative point clouds, the computed score lacked sufficient penalization for incorrectly overlapped regions between the observed point cloud and the template.
Table 4. Confusion matrix of evaluation score before incorporating negative point cloud.
In contrast, Table 5 shows the results after incorporating negative point clouds. The results demonstrate that appropriate size classification was achieved, even for observed dump trucks belonging to the small category. Figure 15 illustrates the spatial relationship between the template and the observed point cloud after incorporating a negative point cloud. Specifically, the figure shows the combination of the large-size template and the point cloud of Small A truck. In the figure, a set of red ellipsoids indicates the template, the green points indicate the observed point cloud after transformation, and the blue points indicate the negative point cloud. In this case, the score decreases where the template overlaps with the negative point cloud. As a result, the score was reduced from 1.02 to 0.52 in this case. On the other hand, Figure 16 shows the combination of the medium-size template and the observed point cloud of Medium truck. In this case, the score decreased slightly from 1.34 to 1.33 because the template and negative point cloud are far away, as shown in the figure.
Table 5. Confusion matrix of evaluation score after incorporating negative point cloud.
Figure 15. Template, actual point cloud, and negative point cloud in case where dump truck is Small A while the template is Large. (a) Top view. (b) Side view.
Figure 16. Template, actual point cloud, and negative point cloud in case where both the dump truck and template are Medium. (a) Top view. (b) Side view.
These results demonstrate that incorporating negative point clouds into the conventional NDT score enables the system to distinguish global differences between size categories. Consequently, the proposed method can achieve size classification even with the presence of local shape variations within the same category.
Finally, we discuss the computational time of the proposed method. As described in Section 3, when an observed point cloud of a dump truck is input, a total of S pipelines are performed in parallel. In this study, the number of size categories S is 3. Table 6 shows the computation time of each process when the observed point cloud corresponds to the Small A dump truck. The table shows the computation time for each process: Preprocess (Section 3.3), NDT-based pose estimation (Section 3.4), Size classification (Section 3.5), and Total. In this case, even the slowest pipeline, which is the bottleneck of the whole method, required 0.132 s in total. This result demonstrates that the proposed method is suitable for practical operation.
Table 6. Computation time of proposed method when Small A dump truck is observed.

6. Discussion and Limitations

This study proposed an NDT-based two-stage framework that addresses pose estimation and size classification of dump trucks with unknown specifications. The method utilizes the probabilistic representation of point clouds to handle local shape variations and extends NDT to parallel comparisons across multiple templates for size classification. The experimental results demonstrated that the method could estimate truck poses with sufficient accuracy while correctly classifying trucks into three predefined size categories. The computational time is approximately 0.13 s per estimation, which is suitable for practical operation under limited computational resources. These results support the feasibility of applying the proposed framework to construction machinery in practice.
Compared with the design approaches in existing studies, two main approaches have been explored for pose estimation of dump truck using point cloud registration: ICP-based methods and deep-learning-based methods. The former generally assumes that the observed dump truck and the reference point cloud correspond to the same vehicle. In contrast, this study verified that pose estimation can be achieved even when the observed dump truck and the template are from different vehicles. On the other hand, although deep-learning-based approaches have the potential to achieve high accuracy and robustness, they require large training data, which have a considerable cost in terms of data collection, and they also have challenges for practical operation on construction machinery with limited computational resources. In contrast, the proposed method only requires representative reference point clouds as templates, which is more practical for construction machinery. Furthermore, by extending NDT to parallel comparisons, the proposed method also achieves size classification of dump trucks. To the best of our knowledge, this function has not been addressed in existing studies.
Despite these contributions, several limitations remain. First, the experiments were limited to three size categories, which does not fully reflect the diversity of actual size categories of dump truck. Although the number of size categories for dump trucks is finite and few, the three categories considered in the experiments are somewhat fewer than those in practice. Therefore, further validation with additional size categories is necessary. Second, the robustness of the method under adverse environmental conditions, such as rain, fog, or dust, was not evaluated. Accordingly, further experiments under real or simulated adverse environments are needed. Third, the proposed method depends on the coverage of LiDAR sensors, and pose errors may occur if parts of the truck lie outside the coverage. Although this issue could be mitigated by carefully selecting the mounting position of LiDAR sensors, in actual application, the sensor placement is determined not only for dump truck recognition but also in consideration of other functions, such as recognition of piled soil or autonomous locomotion. Specifically, one possible way to reduce the problem in this study is to mount the LiDAR upward so that the top part of the dump truck can be observed. However, this also creates a trade-off because the blind area on the ground becomes larger and makes it harder to detect surrounding objects. Therefore, a comprehensive design discussion is required regarding sensor placement. Finally, parameters related to negative point clouds were tuned for specific scenarios and their adaptability to other cases was not validated. For future work, we intend to develop an automatic parameter searching approach to improve adaptability across datasets.

7. Conclusions

In this study, we proposed a two-stage method for pose estimation and size classification of dump trucks with unknown specifications. The proposed method performs pose estimation and size classification by NDT in parallel with multiple templates that represent different size categories. For appropriate size classification, we incorporate negative point clouds into the conventional NDT score. The performance of the proposed method was evaluated using data acquired in a real-world environment. The results demonstrated that the proposed method could estimate the pose of dump trucks robustly and classify their size by incorporating negative point clouds. Based on these results, the proposed method will contribute to not only wheel loader operation but also the overall automation of soil loading onto dump trucks at construction sites.
However, as described in Section 6, several limitations, such as the limited number of size categories, the evaluation under adverse environments, the investigation on LiDAR placement, and the requirement for more adaptable parameter settings, remain to be addressed. It is necessary for future work to address these issues to enhance the robustness and applicability of the proposed method in real construction environments.

Author Contributions

Conceptualization, K.I., K.W., A.S. and T.I.; Methodology, K.I., K.W. and T.I.; Software, K.I. and K.W.; Validation, K.I. and K.W.; Formal analysis, K.I., K.W. and T.I.; Investigation, K.I., K.W., H.O. and T.M.; Writing—original draft, K.I. and K.W.; Writing—review & editing, H.O., T.M., A.S. and T.I.; Visualization, K.I.; Supervision, A.S. and T.I.; Project administration, A.S. and T.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article.

Acknowledgments

This work was supported by KOMATSU Ltd.

Conflicts of Interest

K.I., K.W., H.O., T.M. and T.I. are inventors of pending patent on this work. H.O., T.M. and A.S. are employees of Komatsu Ltd. T.I. is an employee of the University of Tokyo. The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ICPIterative Closest Point
NDTNormal Distribution Transform

References

  1. Labour Force Survey; e-Stat Portal Site of Official Statics of Japan. Available online: https://www.e-stat.go.jp/en/stat-search/files?page=1&layout=datalist&toukei=00200531&tstat=000001226583&cycle=7&tclass1=000001226584&tclass2=000001226585&tclass3val=0 (accessed on 20 March 2025).
  2. Dadhich, S.; Sandin, F.; Bodin, U.; Andersson, U.; Martinsson, T. Field test of neural-network based automatic bucket-filling algorithm for wheel-loaders. Autom. Constr. 2019, 97, 1–12. [Google Scholar] [CrossRef]
  3. Eriksson, D.; Ghabcheloo, R.; Geimer, M. Optimizing bucket-filling strategies for wheel loaders inside a dream environment. Autom. Constr. 2024, 168, 105804. [Google Scholar] [CrossRef]
  4. Wang, Y.; Liu, X.; Ren, Z.; Yao, Z.; Tan, X. Synchronized path planning and tracking for front and rear axles in articulated wheel loaders. Autom. Constr. 2024, 165, 105538. [Google Scholar] [CrossRef]
  5. Li, Y.; Dong, W.; Zheng, T.; Wang, Y.; Li, X. Scene-Adaptive Loader Trajectory Planning and Tracking Control. Sensors 2025, 25, 1135. [Google Scholar] [CrossRef] [PubMed]
  6. Cao, B.; Liu, X.; Chen, W.; Li, H.; Wang, X. Intelligentization of wheel loader shoveling system based on multi-source data acquisition. Autom. Constr. 2023, 147, 104733. [Google Scholar] [CrossRef]
  7. Kumar, M.; Ekevid, T.; Löwe, W. Operator model for wheel loader short-cycle loading handling. Autom. Constr. 2024, 167, 105691. [Google Scholar] [CrossRef]
  8. Biber, P.; Strasser, W. The normal distributions transform: A new approach to laser scan matching. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), Las Vegas, NV, USA, 27–31 October 2003; pp. 2743–2748. [Google Scholar]
  9. Besl, P.J.; McKay, N.D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
  10. Lee, J.-H.; Lee, J.; Park, S.-Y. 3D pose recognition system of dump truck for autonomous excavator. Appl. Sci. 2022, 12, 3471. [Google Scholar] [CrossRef]
  11. Sugasawa, Y.; Chikushi, S.; Komatsu, R.; Louhi Kasahara, J.Y.; Pathak, S.; Yajima, R.; Hamasaki, S.; Nagatani, K.; Chiba, T.; Chayama, K.; et al. Visualization of Dump Truck and Excavator in Bird’s-eye View by Fisheye Cameras and 3D Range Sensor. In Intelligent Autonomous Systems 16; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  12. An, Y.; Xu, H.; Guo, Y.; Qian, J.; Sun, Z.; Xie, L. Point cloud registration network based on dual-attention mechanism for truck pose estimation. In Proceedings of the 2024 39th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Dalian, China, 7–9 June 2024; pp. 1364–1370. [Google Scholar]
  13. Zhang, X.; Xu, W.; Dong, C.; Dolan, J.M. Efficient L-shape fitting for vehicle detection using laser scanners. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 54–59. [Google Scholar]
  14. Liu, Y.; Liu, B.; Zhang, H. Estimation of 2D bounding box orientation with convex-hull points—A quantitative evaluation on accuracy and efficiency. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 945–950. [Google Scholar]
  15. Baeg, J.; Park, J. Oriented bounding box detection robust to vehicle shape on road under real-time constraints. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; pp. 3383–3389. [Google Scholar]
  16. Yan, Y.; Mao, Y.; Li, B. SECOND: Sparsely Embedded Convolutional Detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
  17. Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. PointPillars: Fast Encoders for Object Detection From Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12689–12697. [Google Scholar] [CrossRef]
  18. Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10526–10535. [Google Scholar] [CrossRef]
  19. Zhang, T.; Vosselman, G.; Elberink, S.J.O. Vehicle recognition in aerial lidar point cloud based on dynamic time warping. IISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, IV-2/W4, 193–198. [Google Scholar] [CrossRef]
  20. Kraemer, S.; Stiller, C.; Bouzouraa, M.E. Lidar-based object tracking and shape estimation using polylines and free-space information. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4515–4520. [Google Scholar]
  21. Kraemer, S.; Bouzouraa, M.E.; Stiller, C. Simultaneous tracking and shape estimation using a multi-layer laserscanner. In Proceedings of the 2017 IEEE 20th Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–7. [Google Scholar]
  22. Monica, J.; Chao, W.-L.; Campbell, M. Sequential joint shape and pose estimation of vehicles with application to automatic amodal segmentation labeling. In Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 2678–2685. [Google Scholar]
  23. Ding, W.; Li, S.; Zhang, G.; Lei, X.; Qian, H. Vehicle pose and shape estimation through multiple monocular vision. In Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 12–15 December 2018; pp. 709–715. [Google Scholar]
  24. Monica, J.; Campbell, M. Vision only 3-D shape estimation for autonomous driving. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 1676–1683. [Google Scholar]
  25. Magnusson, M.; Nuchter, A.; Lorken, C.; Lilienthal, A.J.; Hertzberg, J. Evaluation of 3D registration reliability and speed—A comparison of ICP and NDT. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, 12–17 May 2009; pp. 3907–3912. [Google Scholar]
  26. Segal, A.V.; Haehnel, D.; Thrun, S. Generalized-ICP. In Proceedings of the Robotics: Science and Systems (RSS), Seattle, WA, USA, 28 June–1 July 2009. [Google Scholar]
  27. Aoki, Y.; Goforth, H.; Srivatsan, R.A.; Lucey, S. PointNetLK: Robust & Efficient Point Cloud Registration Using PointNet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7156–7165. [Google Scholar] [CrossRef]
  28. Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef]
  29. Wang, Y.; Solomon, J. Deep Closest Point: Learning Representations for Point Cloud Registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3522–3531. [Google Scholar] [CrossRef]
  30. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
  31. Li, Y.; Niu, T.; Qin, T.; Yang, L. Machine vision based autonomous loading perception for super-huge mining excavator. In Proceedings of the 2021 IEEE 16th Conference on Industrial Electronics and Applications (ICIEA), Chengdu, China, 1–4 August 2021; pp. 1250–1255. [Google Scholar]
  32. Suzuki, T.; Ohno, K.; Kojima, S.; Miyamoto, N.; Suzuki, T.; Komatsu, T.; Nagatani, K. Estimation of articulated angle in six-wheeled dump trucks using multiple GNSS receivers for autonomous driving. Adv. Robot. 2021, 35, 1376–1387. [Google Scholar] [CrossRef]
  33. Stentz, A.; Bares, J.; Singh, S.; Rowe, P. A Robotic Excavator for Autonomous Truck Loading. Auton. Robot. 1999, 7, 175–186. [Google Scholar] [CrossRef]
  34. Phillips, T.G.; McAree, P.R. An evidence-based approach to object pose estimation from LiDAR measurements in challenging environments. J. Field Robot. 2018, 35, 921–936. [Google Scholar] [CrossRef]
  35. Wang, M.; Huang, R.; Xie, W.; Ma, Z.; Ma, S. Compression Approaches for LiDAR Point Clouds and Beyond: A Survey. ACM Trans. Multimed. Comput. Commun. 2025, 188, 31. [Google Scholar] [CrossRef]
  36. Wang, M.; Huang, R.; Liu, Y.; Li, Y.; Xie, W. suLPCC: A Novel LiDAR Point Cloud Compression Framework for Scene Understanding Tasks. IEEE Trans. Ind. Inform. 2025, 21, 3816–3827. [Google Scholar] [CrossRef]
  37. Biber, P.; Fleck, S.; Strasser, W. A probabilistic framework for robust and accurate matching of point clouds. Lect. Notes Comput. Sci. 2004, 3175, 480–487. [Google Scholar]
  38. Magnusson, M. The three-dimensional normal-distributions transform: An efficient representation for registration, surface analysis, and loop detection. Örebro Stud. Technol. 2009, 36, 201. [Google Scholar]
  39. AMD Ryzen™ 7 7730U, AMD. Available online: https://www.amd.com/en/products/processors/laptop/ryzen/7000-series/amd-ryzen-7-7730u.html (accessed on 8 September 2025).
  40. Livox HAP (T1) User Manual, Livox. Available online: https://dl.djicdn.com/downloads/Livox/HAP/HAP(T1)_User_Manual_V1.2_EN.pdf (accessed on 28 August 2025).
  41. IEC 60825-1:2014; Safety of Laser Products-Part 1: Equipment Classification and Requirements. International Electrotechnical Commission: Geneva, Switzerland, 2014.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.