3D Pose Recognition System of Dump Truck for Autonomous Excavator

: The purpose of an excavator is to dig up materials and load them onto heavy-duty dump trucks. Typically, an excavator is positioned at the rear of the dump truck when loading. In order to automate this process, this paper proposes a system that employs a combined stereo camera and two LiDAR sensors to determine the three-dimensional (3D) position of the truck’s cargo box and to analyze its loading space. Sparse depth information acquired from the two LiDAR sensors is used to detect points on the door of the cargo box and establish the plane on its rear side. Dense depth information of the cargo box acquired from the stereo camera sensor is projected onto the plane of the box’s rear to estimate its initial 3D position. In the next step, the point cloud sampled along the long shaft of the edge of the cargo box is used as the input of the Iterative Closest Point algorithm to calculate a more accurate cargo box position. The data collected from the stereo camera are then used to determine the 3D position of the cargo box and provide an estimate of the volume of the load along with the 3D position of the loading space to the excavator. In order to demonstrate the efﬁciency of the proposed method, a mock-up of a heavy-duty truck cargo box was created, and the volume of the load in the cargo box was analyzed.


Introduction
There is a shortage of new construction workers in the industry due to the trend of laborers wanting to avoid extreme environments [1]. Particularly, due to the retirement of older workers, there is a shortage of skilled workers. Therefore, research is being conducted on autonomous trucks and construction robots that can replace humans at construction sites. Excavators are considered low risk because they do not travel at high speeds, and they perform repetitive tasks. Hence, their automation is highly feasible. For this reason, major construction equipment manufacturers are incorporating various types of automation [2,3]. Diverse research was carried out in line with this trend. For example, several publications examined the automated determination of excavation spaces and related control techniques [4][5][6][7][8][9][10][11]. An excavator digs up and transports loads into the cargo box of heavy-duty dump trucks. Excavators feature long joints and are often positioned at the rear of a truck to load materials. After the cargo box has been loaded sufficiently, the process is repeated for additional trucks.
To automate this process, it is vital that the excavator acquire accurate information about the cargo box's three-dimensional (3D) position for analysis. Stentz introduced a method for recognizing a truck's cargo box after segmenting a truck from dense Lidar data [12]. The truck's cargo box is scanned using a dense LIDAR sensor. After segmenting the cargo box from dense rider data, the 3D position of the dump truck is determined by the upper plane of the cargo box. However, suppose the excavator is positioned higher than the 1.
A novel dump truck cargo box sensing device; 2.
A method of automatically estimating the 3D position of the loading space; 3.
A method of automatically estimating the volume of the load.
The sensing device combines two two-dimensional (2D) LiDAR sensors and a stereo camera. The scenario includes an excavator operating in an outdoor environment. Note that using only a stereo camera would be hazardous. Hence, two LiDAR sensors complement it (Section 2). The distance values acquired by each sensor are projected onto one coordinate system using a calibration board and a line laser for accuracy (Section 3). As the cargo box door resembles a plane shape, the 3D estimation of the position of the cargo box is achieved by determining the plane of the rear of the cargo box from the data received from the two LiDAR sensors installed vertically on the sensing device (Section 4.1). Furthermore, the 3D position of the box is determined by projecting the dense 3D location of the door acquired from the camera onto a plane and matching it with the actual 3D model of the cargo box. The 3D position of the box can be obtained by utilizing one virtual matching point and four matching points of the rear plane (Section 4.2).
If the 3D position of the cargo box is estimated solely using the plane, errors may occur. Therefore, points along the long shaft of the box are used for more accurate determination. By matching the sampling of the points of the box's long shaft to the actual 3D model, the distance data of the shaft obtained from the stereo camera are fed to an iterative closest point (ICP) algorithm (Section 4.3). Lastly, after the 3D position of the cargo box is estimated, the volume of the load is calculated. Moreover, the 3D position information for the loading space is transmitted to the excavator.
The general outline of this paper is as follows. First, Section 2 describes the sensing device developed for this study. In Section 3, the method of calibrating the sensors of the sensing device is discussed. Section 4 outlines the method for estimating the truck cargo box using information acquired from the sensing device. Section 5 examines the accuracy of the proposed method for estimating load volume and determining the loading space. The study concludes with Section 6. Figure 1 illustrates the sensing device used to detect the dump truck cargo box. Its dimensions were 20 × 10 × 30 cm in terms of width, length, and height, respectively. The device consisted of a stereo camera (ZED-1) and two LiDAR (Hokuyo UST-10LX1-CH) sensors and was placed on top of the excavator's driver seat to estimate the plane of the cargo box's rear. The stereo camera provides one color image and one 3D depth map. The data acquired by the stereo camera were used to recognize the 3D position of the dump truck's cargo box, determine the loading space available, and calculate the volume of the load. Because most excavators operate outdoors, a commercial outdoor camera was selected. It had a resolution of 1920 × 1080 pixels and provided red green blue video information as a depth map at a rate of 30 fps. The depth map had sufficient sensitivity between 0.5 and 20 m. of the proposed method for estimating load volume and determining the The study concludes with Section 6. Figure 1 illustrates the sensing device used to detect the dump truck dimensions were 20 × 10 × 30 cm in terms of width, length, and height, res device consisted of a stereo camera (ZED-1) and two LiDAR (Hokuyo U sensors and was placed on top of the excavator's driver seat to estimate th cargo box's rear. The stereo camera provides one color image and one 3D d data acquired by the stereo camera were used to recognize the 3D positio truck's cargo box, determine the loading space available, and calculate the load. Because most excavators operate outdoors, a commercial outdoor c lected. It had a resolution of 1920 × 1080 pixels and provided red green blu mation as a depth map at a rate of 30 fps. The depth map had sufficient sens 0.5 and 20 m. An excavator's working environment comprises dust and solar refle quently, the information acquired from a camera in such an environment hence, errors occur in the distance estimates of the stereo algorithm. The purposes of this research, it was not possible to rely solely on the informatio the camera. The two LiDAR sensors were attached perpendicular to the de widely used with automated driving technologies, drones, and robots. One of LiDAR is its price, which increases with resolution. Although LiDAR a depth information, it has the advantage of consistent accuracy. When comb era data, superior dense depth information is acquired. The LiDAR sensor h and can see distances between 0.02 and 10 m. It provides users with a relati of depth accuracy at ±40 mm (Table 1).  An excavator's working environment comprises dust and solar reflections. Consequently, the information acquired from a camera in such an environment contains noise; hence, errors occur in the distance estimates of the stereo algorithm. Therefore, for the purposes of this research, it was not possible to rely solely on the information captured by the camera. The two LiDAR sensors were attached perpendicular to the device. They are widely used with automated driving technologies, drones, and robots. One disadvantage of LiDAR is its price, which increases with resolution. Although LiDAR acquires sparse depth information, it has the advantage of consistent accuracy. When combined with camera data, superior dense depth information is acquired. The LiDAR sensor has a 270 • angle and can see distances between 0.02 and 10 m. It provides users with a relatively high level of depth accuracy at ±40 mm (Table 1).

Sensor Calibration
During the scanning process, the depth information from each sensor can be retrieved in real-time. However, the 3D depth information depends on the coordinate system of the sensor. Therefore, it is essential to define a method for representing depth information in a single coordinate system via sensor calibration. For ease of calibration, the line lasers and a flat calibration object were used, as shown in Figure 2.

Sensor Calibration
During the scanning process, the depth information from each sensor can trieved in real-time. However, the 3D depth information depends on the coordina tem of the sensor. Therefore, it is essential to define a method for representing de formation in a single coordinate system via sensor calibration. For ease of calibrati line lasers and a flat calibration object were used, as shown in Figure 2. Generally, low-cost, mass-produced LiDAR sensors have a bandwidth of 9 The scanning area with the infrared (IR) filter removed captures the laser lines [16]. when the line laser and LiDAR scan areas are matched, the area to be scanned by can also be determined from the stereo camera image, as shown in Figure 3.  With T defined as the Euclidean transformation relation between two coordin tems, the transformation matrix, T, is represented by a single (3 × 3) rotation tr mation matrix, R, and a (3 × 1) translation transformation vector, t. When the 3D ma points obtained from the stereo camera and the LiDAR sensor are expressed as Qi respectively, and three matching points theoretically exist, the transformation T(L → C), between the two coordinate systems can be obtained as follows:

= →
If there are more than three matching points, the transformation matrix betw two points in time can be obtained using the singular value decomposition (SVD Levenberg-Marquardt (LM) algorithm to minimize the error [17], as shown in Eq (2): Generally, low-cost, mass-produced LiDAR sensors have a bandwidth of 905 nm. The scanning area with the infrared (IR) filter removed captures the laser lines [16]. Hence, when the line laser and LiDAR scan areas are matched, the area to be scanned by LiDAR can also be determined from the stereo camera image, as shown in Figure 3.

Sensor Calibration
During the scanning process, the depth information from each sensor can be retrieved in real-time. However, the 3D depth information depends on the coordinate system of the sensor. Therefore, it is essential to define a method for representing depth information in a single coordinate system via sensor calibration. For ease of calibration, the line lasers and a flat calibration object were used, as shown in Figure 2. Generally, low-cost, mass-produced LiDAR sensors have a bandwidth of 905 nm. The scanning area with the infrared (IR) filter removed captures the laser lines [16]. Hence, when the line laser and LiDAR scan areas are matched, the area to be scanned by LiDAR can also be determined from the stereo camera image, as shown in Figure 3.  With T defined as the Euclidean transformation relation between two coordinate systems, the transformation matrix, T, is represented by a single (3 × 3) rotation transformation matrix, R, and a (3 × 1) translation transformation vector, t. When the 3D matching points obtained from the stereo camera and the LiDAR sensor are expressed as Qi and Pi, respectively, and three matching points theoretically exist, the transformation matrix, T(L → C), between the two coordinate systems can be obtained as follows: If there are more than three matching points, the transformation matrix between the two points in time can be obtained using the singular value decomposition (SVD) or the Levenberg-Marquardt (LM) algorithm to minimize the error [17], as shown in Equation (2): In order to facilitate calibration, the LiDAR sensors were positioned so that they sensed the calibration board, and the center point of the line was detected by each LiDAR With T defined as the Euclidean transformation relation between two coordinate systems, the transformation matrix, T, is represented by a single (3 × 3) rotation transformation matrix, R, and a (3 × 1) translation transformation vector, t. When the 3D matching points obtained from the stereo camera and the LiDAR sensor are expressed as Q i and P i , respectively, and three matching points theoretically exist, the transformation matrix, T (L→C) , between the two coordinate systems can be obtained as follows: If there are more than three matching points, the transformation matrix between the two points in time can be obtained using the singular value decomposition (SVD) or the Levenberg-Marquardt (LM) algorithm to minimize the error [17], as shown in Equation (2): In order to facilitate calibration, the LiDAR sensors were positioned so that they sensed the calibration board, and the center point of the line was detected by each LiDAR sensor and camera as a match point, as shown in Figure 4. The distance information of the calibration board scanned by the LiDAR sensor was segmented as a distance value for detection, and the stereo camera used the images to detect the calibration board. Based on the random sample consensus algorithm (RANSAC), the detected lines were fitted more precisely, and the center point was determined [18].
ppl. Sci. 2022, 12, x FOR PEER REVIEW sensor and camera as a match point, as shown in Figure 4. The distance informa calibration board scanned by the LiDAR sensor was segmented as a distance detection, and the stereo camera used the images to detect the calibration board the random sample consensus algorithm (RANSAC), the detected lines were f precisely, and the center point was determined [18].   Figure 6 shows the results when in calibration was performed and when it was not.  After placing the calibration board at various positions and obtaining the transformation matrices using Equation (2), an accurate projection of the data obtained from the LiDAR sensor onto the position of the line laser was available, as shown in Figure 5. The accuracy of calibration becomes greater with the increasing number of match points, but this study confirmed that six were sufficient. Figure 6 shows the results when inter-device calibration was performed and when it was not.
sensor and camera as a match point, as shown in Figure 4. The distance in calibration board scanned by the LiDAR sensor was segmented as a di detection, and the stereo camera used the images to detect the calibration the random sample consensus algorithm (RANSAC), the detected lines w precisely, and the center point was determined [18]. After placing the calibration board at various positions and obtain mation matrices using Equation (2), an accurate projection of the data ob LiDAR sensor onto the position of the line laser was available, as shown accuracy of calibration becomes greater with the increasing number of m this study confirmed that six were sufficient. Figure 6 shows the results w calibration was performed and when it was not.  With the calibrated device in the outdoors scan experiment, it is possible to obtain three-dimensional depth information expressed in one coordinate system. In this study, the reference coordinate system of the device was defined as the stereo camera's coordinate system. With the calibrated device in the outdoors scan experiment, it is possible to obtain three-dimensional depth information expressed in one coordinate system. In this study, the reference coordinate system of the device was defined as the stereo camera's coordinate system.

Location Estimation of Truck Cargo Box
In order to load the excavator's content into the truck's cargo box, it is first necessary to determine its 3D position. Heavy-duty dump truck cargo box doors are flat and generally mounted on the rear of the box. It is characterized by a long shaft emerging from its edge. In this research, the accuracy of the location of the plane was prioritized, assuming the device senses the cargo box door of the truck and that the rear of the cargo box is flat.
In Figure 7, a flowchart of the method of estimating the 3D position of the truck cargo box is presented. After the rear of the cargo box is scanned by the device, a dense and sparse 3D point cloud is generated by the stereo camera and the two vertically mounted LIDARs, respectively. Based on the feature points associated with the images on the left and right, the stereo algorithm generates the 3D depth value. Thus, the more distinct feature points detected by the left and right cameras, the more accurate the 3D depth [19]. Meanwhile, the depth value of a single-color object with few features will have a larger error because of inaccurate matching. It is possible to scatter random patterns to overcome this problem, but they are ineffective in the presence of strong wavelengths of light, such as sunlight [20,21].

Location Estimation of Truck Cargo Box
In order to load the excavator's content into the truck's cargo box, it is first necessary to determine its 3D position. Heavy-duty dump truck cargo box doors are flat and generally mounted on the rear of the box. It is characterized by a long shaft emerging from its edge. In this research, the accuracy of the location of the plane was prioritized, assuming the device senses the cargo box door of the truck and that the rear of the cargo box is flat.
In Figure 7, a flowchart of the method of estimating the 3D position of the truck cargo box is presented. After the rear of the cargo box is scanned by the device, a dense and sparse 3D point cloud is generated by the stereo camera and the two vertically mounted LIDARs, respectively. Based on the feature points associated with the images on the left and right, the stereo algorithm generates the 3D depth value. Thus, the more distinct feature points detected by the left and right cameras, the more accurate the 3D depth [19]. Meanwhile, the depth value of a single-color object with few features will have a larger error because of inaccurate matching. It is possible to scatter random patterns to overcome this problem, but they are ineffective in the presence of strong wavelengths of light, such as sunlight [20,21]. Commercial dump trucks usually have a single-color cargo box, making it difficult for left and right cameras to identify similar features. Consequently, the distance information on the rear side of the dump truck cargo box will be inaccurate. To overcome this issue, two LiDAR sensors that provide constant depth information were used to identify points on the cargo box's door. The normal vector of the 2D point first obtained from the Commercial dump trucks usually have a single-color cargo box, making it difficult for left and right cameras to identify similar features. Consequently, the distance information on the rear side of the dump truck cargo box will be inaccurate. To overcome this issue, two LiDAR sensors that provide constant depth information were used to identify points on the cargo box's door. The normal vector of the 2D point first obtained from the lidar sensor is calculated using the surrounding points. The points on the door are determined by the distance information between the normal vector and the two LiDAR sensors, and the noise points are removed using a line fitting algorithm. They are then formulated into a plane corresponding to the rear of the cargo box.
A stereo camera provides a dense 3D point cloud. Points close to the plane are points on the door. These points are projected onto a plane and represented as a 2D ROI. Conversely, points that do not exist on the plane are considered noise. These 3D points exist on the ground and are removed using ground plane fitting. As a result, a 3D point cloud of the truck cargo box is obtained.
In the next step, we create one 3D hexahedron model with the same dimensions as the cargo box. The initial 3D position of the cargo box is determined by matching the planes of the two models. After estimating the initial 3D position of the box, the estimation error is corrected by using the long axis of the cargo box. A detailed description of each step can be found in the detail section.

Cargo Box Rear Detection Using a LiDAR Sensor
Due to the vertical placement of the two LiDAR sensors, data from the cargo box door and the ground are merged, and the point cloud for the cargo door is separated. In theory, a 3D point cloud on a plane can be described by an identical normal vector. Furthermore, when a sensing device is constructed, the index, i, of the point cloud obtained from the two LiDAR sensors may be considered identical if both identical LiDAR sensors are located on the same plane. Therefore, a simple method was used to detect the point cloud on the cargo box door using two LiDAR sensors. Figure 8 illustrates the process of obtaining the point cloud on the cargo box door. Three-dimensional point clouds acquired from the left and right LiDAR sensors are denoted as P L i and P R i And their normal vectors areP L i andP R i , respectively. The normal vector, P L i is determined by the cross product of the two vectors after obtaining the direction vector, B P R i of P L i and P L i+k (k is 3-5), after the direction vector, A of P R i , is ascertained from, P L i . The normal vector,P R i , of the right-side LiDAR data, P R i , is obtained using the same method. When the normal vector of all point clouds is determined, the dot product and the z-axis (0, 0, 1) of the stereo camera coordinate system are obtained.
Appl. Sci. 2022, 12, x FOR PEER REVIEW Figure 8. A method to detect the door of a dump truck cargo box using two LIDAR senso points on the plane have the same normal vector, and the dot product with the camera z-axis to −1. The normal vector is computed with two LIDAR point clouds.
As the sensor mounted on the excavator is placed on the rear side of the carg an inner product value closer to −1 is considered to be the point cloud on the car door. The point cloud is filtered using the magnitude of the direction vector, A. As the sensor mounted on the excavator is placed on the rear side of the cargo box, an inner product value closer to −1 is considered to be the point cloud on the cargo box door. The point cloud is filtered using the magnitude of the direction vector, A. Points with significant differences in the distance cannot be considered part of the point cloud and are therefore removed, as they are considered noise. Because the points are discriminated using only the normal vector and the distance between the point clouds, there may be outliers that are not removed from the filtering.
Lastly, outliers are eliminated by line modeling the 3D point cloud sorted using the random sample consensus algorithm, and the final 3D point cloud existing on the cargo box door is sorted. Figure 9 shows a LiDAR point cloud with outliers removed from a three-dimensional point cloud of a stereo camera. Point clouds that are not line-fitted are considered noise. Figure 8. A method to detect the door of a dump truck cargo box using two LIDAR sen points on the plane have the same normal vector, and the dot product with the camera z-ax to −1. The normal vector is computed with two LIDAR point clouds.
As the sensor mounted on the excavator is placed on the rear side of the ca an inner product value closer to −1 is considered to be the point cloud on the ca door. The point cloud is filtered using the magnitude of the direction vector, A with significant differences in the distance cannot be considered part of the poi and are therefore removed, as they are considered noise. Because the points are d nated using only the normal vector and the distance between the point clouds, th be outliers that are not removed from the filtering.
Lastly, outliers are eliminated by line modeling the 3D point cloud sorted u random sample consensus algorithm, and the final 3D point cloud existing on th box door is sorted. Figure 9 shows a LiDAR point cloud with outliers removed three-dimensional point cloud of a stereo camera. Point clouds that are not line-f considered noise. This point cloud is used to define the plane of the rear of the cargo box, Equation (3), and the plane equation is obtained using the least-square method s Equation (4): This point cloud is used to define the plane of the rear of the cargo box, π L , as in Equation (3), and the plane equation is obtained using the least-square method shown in Equation (4): The matrix form is a linear equation, AX = 0. Therefore, the general solution, X, is computed using SVD, as in Equation (5). Finally, the back plate plane is computed with Equation (6):

Recognition of Initial 3D Position Using a Stereo Camera
Although the box is plane-shaped, the 3D distance provided by the stereo camera does not provide an accurate representation of the shape if the box is single-colored. Consequently, the plane data obtained by the LiDAR sensor on the rear of the cargo box are used to correct the acquired depth value. In the dense depth map obtained from the stereo camera, the 3D point cloud is expressed as Q i (x i , y i , z i ). Hence, the points closer to the cargo box plane must be sorted in advance as they are likely to be situated on the cargo box door. It is possible to calculate the distance between point Q i and plane π L using Equation (7): Generally, only points with a difference in distance within 30 cm are considered cargo box door points; all other points are regarded as noise and are eliminated. The sorted points can also be projected onto plane π L using Equation (8): (8) Figure 10 shows a projection of the dense 3D point cloud acquired by the stereo camera on the cargo box plane obtained by LiDAR sensors. The distance errors are calibrated as the LiDAR sensor measurement values are projected onto the plane. The projected 3D points can be used to estimate the initial 3D position of the excavator cargo box. The 3D hexahedron model based on the actual dimensions of the box is generated, as shown in Figure 11. Next, four points near the vertex in the projection points cloud are detected to match the hexahedron model to the actual cargo box plane, as shown in Figure 12.
The points corresponding to the hexahedron model are determined in the same order. Figure 13 illustrates the method of matching the 3D hexahedron model and the truck cargo box.

Recognition of Initial 3D Position Using a Stereo Camera
Although the box is plane-shaped, the 3D distance provided by the stereo camera does not provide an accurate representation of the shape if the box is single-colored. Consequently, the plane data obtained by the LiDAR sensor on the rear of the cargo box are used to correct the acquired depth value. In the dense depth map obtained from the stereo camera, the 3D point cloud is expressed as ( , , ). Hence, the points closer to the cargo box plane must be sorted in advance as they are likely to be situated on the cargo box door. It is possible to calculate the distance between point and plane using Equation (7): Generally, only points with a difference in distance within 30 cm are considered cargo box door points; all other points are regarded as noise and are eliminated. The sorted points can also be projected onto plane using Equation (8): Figure 10 shows a projection of the dense 3D point cloud acquired by the stereo camera on the cargo box plane obtained by LiDAR sensors. The distance errors are calibrated as the LiDAR sensor measurement values are projected onto the plane. The projected 3D points can be used to estimate the initial 3D position of the excavator cargo box. The 3D hexahedron model based on the actual dimensions of the box is generated, as shown in Figure 11. Next, four points near the vertex in the projection points cloud are detected to match the hexahedron model to the actual cargo box plane, as shown in Figure 12. The points corresponding to the hexahedron model are determined in the same order. Figure  13 illustrates the method of matching the 3D hexahedron model and the truck cargo box.      First, the four vertices ( , , , ) on the 3D hexahedron model an vertices ( ′ , ′ , ′ , ′ ) on the cargo box plane are set as points of con ( ′ , ) is a virtual point of convergence used to eliminate the symmetry ambi cargo box has a rectangular parallelepiped form, and its calculation is subjec The virtual point of convergence is determined from the cross-product of each p unit vector, | | and | | , which become direction vectors of identical magnitu the five points of convergence are defined, the 3D transformation matrix, tained with its error minimized, as in the following equation. Figure 14 shows of matching the cargo box to the hexahedron model using the transformation m figure shows errors, but a match was made based on the cargo box plane.    First, the four vertices ( , , , ) on the 3D hexahedron model a vertices ( ′ , ′ , ′ , ′ ) on the cargo box plane are set as points of co ( ′ , ) is a virtual point of convergence used to eliminate the symmetry amb cargo box has a rectangular parallelepiped form, and its calculation is subje The virtual point of convergence is determined from the cross-product of each unit vector, | | and | | , which become direction vectors of identical magni the five points of convergence are defined, the 3D transformation matrix, tained with its error minimized, as in the following equation. Figure 14 show of matching the cargo box to the hexahedron model using the transformation figure shows errors, but a match was made based on the cargo box plane. is a virtual point of convergence used to eliminate the symmetry ambiguity. The cargo box has a rectangular parallelepiped form, and its calculation is subject to errors. The virtual point of convergence is determined from the cross-product of each point cloud unit vector, |A| and|B|, which become direction vectors of identical magnitude. When the five points of convergence are defined, the 3D transformation matrix, T M→C , is obtained with its error minimized, as in the following equation. Figure 14 shows the result of matching the cargo box to the hexahedron model using the transformation matrix. The figure shows errors, but a match was made based on the cargo box plane.
Appl. Sci. 2022, 12, x FOR PEER REVIEW Figure 14. Example of initial 3D position and error of the truck cargo box.

Pose Estimation Refinement
The truck cargo box position was estimated with only five points of converg sults in the errors presented in Figure 14. This is caused by differences in the

Pose Estimation Refinement
The truck cargo box position was estimated with only five points of convergence results in the errors presented in Figure 14. This is caused by differences in the distance information estimated by the stereo camera and the generated 3D hexahedron model. The initial matching results show that more errors occurred at the longitudinal corner than at the rear of the cargo bed. Therefore, to minimize error, the points of convergence are sampled at regular intervals from the two corners along the longitudinal axis of the 3D hexahedron model, as shown in Figure 15. An iterative algorithm is used so that these points of convergence and the longitudinal axis of the cargo box obtained by the camera will match.
Appl. Sci. 2022, 12, x FOR PEER REVIEW Figure 15. Example of 5 3D points sampled for each corner along the long axis of a car

Results and Analysis
In order to validate the performance of the proposed system, a model o A 3D transformation matrix, T M→C , is used to project the sampled 3D point, M i , of the corner onto the camera's 2D image using Equation (9): Here, k is the internal parameter of the stereo camera, and m i is the point projected onto the 2D camera image, as shown in Figure 16. If the initial position transformation matrix, T M→C , is accurate, the projected points are identical to the box corner. Otherwise, they do not match, as shown in Figure 16. Thus, the longitudinal axis of the cargo box is used to solve this issue. Figure 17 presents the method for detecting points of convergence. When they are projected onto the 2D image, the left corner of the cargo box is searched from the right side of the x-axis, and the right corner is searched from the left side. When a 3D value exists on the searched point, it is designated as the point of convergence. The initial transformation matrix, T M→C , is refined after the ICP algorithm iteratively minimizes the errors of the points added to the corner and the plane [22,23]. The pseudocode of the iterative refinement is shown below:

Results and Analysis
In order to validate the performance of the proposed system, a mod box of a common heavy-duty dump truck was developed, followed by o ments. The excavator sensor was installed on the upper section of the driv excavator as shown in Figure 18, and the loading procedure was repeated sensor sensed the cargo box while the excavator was moving. The data, as sh 19, were obtained each time, and the distance values were derived from the

Results and Analysis
In order to validate the performance of the proposed system, a model of the cargo box of a common heavy-duty dump truck was developed, followed by outdoor experiments. The excavator sensor was installed on the upper section of the driver's seat of the excavator as shown in Figure 18, and the loading procedure was repeated 20 times as the sensor sensed the cargo box while the excavator was moving. The data, as shown in Figure  19, were obtained each time, and the distance values were derived from the stereo sensor.

Results and Analysis
In order to validate the performance of the proposed system, a model of the cargo box of a common heavy-duty dump truck was developed, followed by outdoor experiments. The excavator sensor was installed on the upper section of the driver's seat of the excavator as shown in Figure 18, and the loading procedure was repeated 20 times as the sensor sensed the cargo box while the excavator was moving. The data, as shown in Figure 19, were obtained each time, and the distance values were derived from the stereo sensor.

Results and Analysis
In order to validate the performance of the proposed system, a model of the cargo box of a common heavy-duty dump truck was developed, followed by outdoor experiments. The excavator sensor was installed on the upper section of the driver's seat of the excavator as shown in Figure 18, and the loading procedure was repeated 20 times as the sensor sensed the cargo box while the excavator was moving. The data, as shown in Figure  19, were obtained each time, and the distance values were derived from the stereo sensor.  Figure 20 illustrates the detectability of the truck cargo box door based on distance measurements from the LiDAR sensor. When the distance data obtained by the LiDAR sensor and the stereo camera were visualized, only those points that were estimated to represent the cargo box door were visible. Notably, the widely used dump truck cargo boxes of Volvo and Scania were sensed, excluding the experiment model, and an experiment for detecting the cargo box door was conducted [24,25]. Using the plane data obtained from the LiDAR sensor, the 3D point cloud obtained from the camera was projected. The cargo box doors of two regular trucks are also detected  Figure 21 shows the experimental results of 3D location estimation using the experimental model. The initial results are indicated in blue, and the more precise results using a refined transformation matrix are expressed in yellow. When the initial position was estimated only using plane data, numerous differences were found in the longitudina axis of the cargo box. The 3D position estimation using the values obtained by the refined transformation matrix showed that the longitudinal axes of the cargo box and the hexahedral model were similar, indicating that the 3D location estimation was successful. Estimation of cargo space volume can increase work efficiency [26,27]. The 3D location of the estimated cargo box enabled the provision of data of the loading space of the cargo box. Figure 22 illustrates the visualized results of normalized data based on box height The load can be placed in the cargo box, excluding the part in red.  Figure 20 illustrates the detectability of the truck cargo box door based on distance measurements from the LiDAR sensor. When the distance data obtained by the LiDAR sensor and the stereo camera were visualized, only those points that were estimated to represent the cargo box door were visible. Notably, the widely used dump truck cargo boxes of Volvo and Scania were sensed, excluding the experiment model, and an experiment for detecting the cargo box door was conducted [24,25]. Using the plane data obtained from the LiDAR sensor, the 3D point cloud obtained from the camera was projected. The cargo box doors of two regular trucks are also detected  Figure 20 illustrates the detectability of the truck cargo box door based on distance measurements from the LiDAR sensor. When the distance data obtained by the LiDAR sensor and the stereo camera were visualized, only those points that were estimated to represent the cargo box door were visible. Notably, the widely used dump truck cargo boxes of Volvo and Scania were sensed, excluding the experiment model, and an experiment for detecting the cargo box door was conducted [24,25]. Using the plane data obtained from the LiDAR sensor, the 3D point cloud obtained from the camera was projected. The cargo box doors of two regular trucks are also detected  Figure 21 shows the experimental results of 3D location estimation using the experimental model. The initial results are indicated in blue, and the more precise results using a refined transformation matrix are expressed in yellow. When the initial position was estimated only using plane data, numerous differences were found in the longitudinal axis of the cargo box. The 3D position estimation using the values obtained by the refined transformation matrix showed that the longitudinal axes of the cargo box and the hexahedral model were similar, indicating that the 3D location estimation was successful. Estimation of cargo space volume can increase work efficiency [26,27]. The 3D location of the estimated cargo box enabled the provision of data of the loading space of the cargo box. Figure 22 illustrates the visualized results of normalized data based on box height. The load can be placed in the cargo box, excluding the part in red.  Figure 21 shows the experimental results of 3D location estimation using the experimental model. The initial results are indicated in blue, and the more precise results using a refined transformation matrix are expressed in yellow. When the initial position was estimated only using plane data, numerous differences were found in the longitudinal axis of the cargo box. The 3D position estimation using the values obtained by the refined transformation matrix showed that the longitudinal axes of the cargo box and the hexahedral model were similar, indicating that the 3D location estimation was successful. Estimation of cargo space volume can increase work efficiency [26,27]. The 3D location of the estimated cargo box enabled the provision of data of the loading space of the cargo box. Figure 22 illustrates the visualized results of normalized data based on box height. The load can be placed in the cargo box, excluding the part in red.  In this study, the load was loaded five times to the cargo box interior using the excavator, as shown in Figure 23, and volume estimations were made using the image data obtained from the camera. The cargo box interior was detected using the image data when the 3D hexahedral model was projected onto the image, as shown in Figure 24. As such, the outcome of the projected 3D points was expressed in the shape of a trapezoid (nearfar effect), for which the distance per pixel was considered to estimate an accurate value. The distance per pixel, , is calculated using Equation (10): The internal volume of cargo box A obtained from the 2D image is determined using Equations (11)-(13): In this study, the load was loaded five times to the cargo box interior vator, as shown in Figure 23, and volume estimations were made using t obtained from the camera. The cargo box interior was detected using the im the 3D hexahedral model was projected onto the image, as shown in Figu the outcome of the projected 3D points was expressed in the shape of a tr far effect), for which the distance per pixel was considered to estimate an a The distance per pixel, , is calculated using Equation (10): The internal volume of cargo box A obtained from the 2D image is det Equations (11)-(13): In this study, the load was loaded five times to the cargo box interior using the excavator, as shown in Figure 23, and volume estimations were made using the image data obtained from the camera. The cargo box interior was detected using the image data when the 3D hexahedral model was projected onto the image, as shown in Figure 24. As such, the outcome of the projected 3D points was expressed in the shape of a trapezoid (near-far effect), for which the distance per pixel was considered to estimate an accurate value. The distance per pixel, p d , is calculated using Equation (10): Here, ( ) is the height value from the model's floor. The volume of the load was measured using the data from five sets of experiments. The quantitative values were estimated for each loading operation using the difference in volume between frames. The volume of the excavator's bucket was 0.8 m 3 ; hence, load volume was estimated with the difference in volume estimated from one experiment among the accumulated volume of the load over five sets of experiments.   Table 2 summarizes the measurement results of load volume. Accuracy was determined by the proximity of the value to the ground truth, and the experiment results showed large errors. The sensing device mounted on the excavator was positioned diagonally rather than accommodating a top view. However, as the cargo box is filled with load, more accurate results are possible if the loads are closer to the sensing device and a more precise camera is used. mated for each loading operation using the difference in volume between frames. The volume of the excavator's bucket was 0.8 m 3 ; hence, load volume was estimated with the difference in volume estimated from one experiment among the accumulated volume of the load over five sets of experiments.   Table 2 summarizes the measurement results of load volume. Accuracy was determined by the proximity of the value to the ground truth, and the experiment results showed large errors. The sensing device mounted on the excavator was positioned diagonally rather than accommodating a top view. However, as the cargo box is filled with load, more accurate results are possible if the loads are closer to the sensing device and a more precise camera is used. The internal volume of cargo box A obtained from the 2D image is determined using Equations (11)-(13): P i(y) = P i(y) .
Here, P i(x) is the height value from the model's floor. The volume of the load was measured using the data from five sets of experiments. The quantitative values were estimated for each loading operation using the difference in volume between frames. The volume of the excavator's bucket was 0.8 m 3 ; hence, load volume was estimated with the difference in volume estimated from one experiment among the accumulated volume of the load over five sets of experiments. Table 2 summarizes the measurement results of load volume. Accuracy was determined by the proximity of the value to the ground truth, and the experiment results showed large errors. The sensing device mounted on the excavator was positioned diagonally rather than accommodating a top view. However, as the cargo box is filled with load, more accurate results are possible if the loads are closer to the sensing device and a more precise camera is used.

Conclusions
This study applied vision technology that combines 3D LiDAR and stereo distance sensors to estimate a heavy-duty dump truck's cargo box location and volume. The sensing device was installed on the upper section of the excavator's driver seat. First, the plane information for the door of the dump truck's cargo box was extracted, and stereo distance data were combined to obtain the 3D plane data of the rear of the cargo box. Next, a 3D model of the cargo box was generated identically to actual measurements, followed by the determination of the initial 3D position of the box so that the two planes would match. The 3D model was initially adjusted, after which the 3D coordinate value of the longitudinal axis corner was detected iteratively for more precise position estimation. Lastly, the estimated location data were used to convert the stereo distance values into a 3D cargo box model so that the height of the 3D model from the floor surface could be measured. The cargo box interior was then visualized so that loading space data could be delivered to the excavator. Significant errors were detected during the load analysis owing to the distance estimation errors of the stereo camera. Thus, a camera with higher precision is necessary to improve the outcome.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.