Dai et al. [
4] proposed a method for detecting over-height cargo vehicles using computer vision. They collected video streams through roadside cameras and processed them to measure the height of the vehicles. Although this study presents experiments on local roads and expressways, it can only be applied under limited occlusion and lighting conditions. Iqbal et al. [
5] and others proposed a computer vision method for vehicle height detection using the Gaussian mixture model and blob detection, and demonstrated the accuracy of their measurement method. However, it does not take into account the problem of vehicle occlusion. If occlusion occurs, blob detection may not be able to extract the vehicle contour coordinates, resulting in insufficient algorithm adaptability and robustness. Yu et al. [
6] and others proposed a method for high-speed three-dimensional shape and deformation measurement using digital image correlation technology and a single high-speed color camera, and verified the effectiveness and accuracy of this method. The measurement process of this system involves multiple manual operations, which not only increases the risk of human error and reduces measurement efficiency, but also has a high operational threshold, making it unfavorable for non-professionals to use. Zhang et al. [
7] reviewed high-speed and high-precision 3D shape measurement technology based on structured light, covering mainstream methods such as statistical patterns, binary coding, sinusoidal phase coding, and binary defocus, and analyzed their performance. From the perspective of system practicality and scalability, the miniaturization progress of high-precision 3D sensing technology is limited. Commercial devices (such as iPhone X and Intel RealSense) cannot match advanced structured light methods in terms of resolution and accuracy. At the same time, they face the problem of efficient storage of massive 3D data, and the existing compression methods have not been popularized. Moreover, the 3D shape measurement technology has a low degree of automation and is far from the ease of use of 2D imaging. It lacks rapid optimization tools for non-professional users, and it is difficult to meet the personalized, low-cost, and high-efficiency demands of different field applications. Nguyen et al. [
8] compared the accuracy of two measurement methods based on fringe projection profilometry and 3D digital image correlation technology, both of which can achieve sub-micron accuracy. This experiment was only carried out under controlled conditions in the laboratory and did not involve the influence of dust, shading, and light in the actual environment, making it difficult to verify its stability in extreme conditions. Ngo et al. [
9] proposed a three-dimensional measurement system based on a measurement algorithm and perspective transformation. However, the experimental scenarios of this research are limited, only focusing on specific hole structures and truck compartments, and do not involve broader industrial measurement scenarios such as complex curved surfaces, highly reflective/transparent surfaces, or tiny parts, thus limiting its application scope. The influence of environmental factors, such as changes in light intensity, vibration, dust, and other actual industrial environment interferences, was not taken into account. Lu et al. [
10] proposed a vehicle height measurement method based on Mask R-CNN. A three-dimensional bounding box is established for the measured vehicle to achieve the measurement of the vehicle height. This scheme has a single experimental scenario, based only on one surveillance video, and has not been fully tested in an actual engineering environment. The height estimation relies on the known length of reference objects in the scenario, which limits the scenario adaptability of the method. Pu et al. [
11] and others proposed a method for measuring the size of objects using a conventional digital camera. If there are no suitable reference objects in the scene or the displacement is difficult to measure precisely, calibration and size estimation cannot be completed, and the scene adaptability will be limited. Without considering the influence of environmental factors, complex lighting conditions such as low light, backlight, and haze can interfere with image quality and thereby affect the boundary extraction accuracy of the active contour model. Wang et al. [
12] developed a portable automatic measurement system for pig body dimensions based on Xtion depth cameras to address the problems of low efficiency in traditional manual measurement, significant stress on pigs, and susceptibility to interference from light and other factors in existing digital imaging methods. Itoh et al. [
13] aimed to develop a real-time measurement algorithm for aggregate size based on image processing to solve the problem that traditional screening methods cannot measure in real-time online. A multiple regression equation was established based on texture features to estimate the aggregate size, which proved that this non-contact real-time measurement method could still maintain high accuracy under different lighting conditions. Zhai et al. [
14] focused on the real-time detection of the size of moving objects, taking image processing as the core, and proposed a processing flow including median filtering, gain correction, image segmentation and binarization, corner detection, and edge fitting. The human–computer interaction interface is designed based on VC++ to achieve real-time image acquisition, processing, display, and calculation of object dimensions (area, length, and width). Experiments show that the measurement error is less than 1%, meeting industrial requirements. Khasnobish et al. [
15] aimed to achieve the recognition of object shape and size by artificial tactile sensing systems to meet the application requirements of human–computer interaction (HCI). The research collected tactile images of different objects and extracted statistical features from them. The results showed that the average accuracy rate of shape recognition among the subjects was 93%, and that of size recognition was 87%. The accuracy rate of shape recognition within the subjects was 94%, and that of size recognition was 88%. Moreover, the classification accuracy was less affected by the type of classifier, verifying the effectiveness of identifying the shape and size of objects through tactile image analysis. It makes up for the deficiencies of visual recognition in scenarios such as occlusion and low light, providing a new technical path for object recognition. Liu et al. [
16] proposed a measurement method for large aviation components using a global data registration method based on dynamic coding points. Laboratory experiments were verified with standard scales, achieving an accuracy of 0.0150%. Field experiments also proved that this method meets the measurement requirements of large aviation components, and the dynamic coding point matching is accurate, with high robustness and the ability to eliminate cumulative errors. Zhao et al. [
17] proposed a 3D object surface boundary perimeter measurement scheme based on binocular stereo vision systems, innovatively combining B-spline active contours with binocular stereo vision to reduce computational complexity and simultaneously enhance the accuracy of contour edge extraction. Moreover, the system has low construction costs and strong portability, making it suitable for industrial online measurement scenarios. Pu et al. [
18] proposed a structure recognition method based on mobile laser scanning point clouds, achieving efficient reconstruction by fusing the information of ground laser point clouds and close-range images. This research fully exploits the complementarity between laser data and optical images, enhancing the reliability and automation level of the reconstruction results. Jia et al. [
19] proposed a method for on-site measurement of large objects based on a multi-view stereo vision system to address the issue of monocular or binocular vision having difficulty measuring large objects with high precision. This research strikes a balance between accuracy and practicality. Its effectiveness has been verified through both laboratory and industrial field experiments. The measurement accuracy meets the requirements of large objects and can handle extreme environments such as forging workshops, with a wide range of applications. This method is low-cost and highly practical, with high measurement efficiency and high edge extraction accuracy. By using the linear and polynomial approximation of the least square method to filter out edge noise and combining sub-pixel-level calculations, it ensures that the dimensional measurement accuracy reaches ±0.02 mm, meeting the conventional tolerance requirements of the protector. Xiang et al. [
20] proposed a high-precision measurement method for the dimensions of large automotive brake pads using binocular machine vision technology. This method takes into account both large-scale and high-precision measurements and has a high degree of automation. However, the mechanical installation accuracy requirements are strict, and the measurement range is limited. It is only designed for brake pad mounts of a specific specification (130.9 mm), and the compatibility with mounts of other sizes or irregular shapes has not been verified. The experiment was carried out in a controlled laboratory environment. The interference of common environments in brake pad production sites, such as high temperature and dust, on image quality and measurement accuracy was not tested. The stability in actual industrial environments needs further verification. Barnea et al. [
21] used RGB and range data to analyze the shape of objects in the image plane and 3D space. By detecting highlights and 3D shape features, the problem of detection failure caused by color confusion and unstable lighting is solved. However, it has strong hardware dependence and insufficient real-time performance. The optimal solution takes an average of 197 s for a single image, and no code optimization has been carried out, which cannot meet the real-time picking requirements of the harvesting robot. Sun et al. [
22] proposed a non-contact volume measurement method for irregular objects based on 3D reconstruction technology using a linear laser and a camera. Experimental verification shows that at a distance of 2 m from the measuring equipment, the measurement error of this method is less than 4.5%, and it can achieve precise 3D reconstruction and volume measurement of irregular objects. However, it is sensitive to environmental interference. Line lasers are easily affected by environmental factors such as strong light, dust, and reflections from object surfaces, which may lead to the failure or deviation of stripe extraction. The research did not mention anti-interference measures for complex environments.
Guo Yi et al. [
23] proposed a real-time three-dimensional tracking technology based on non-imaging single-pixel LiDAR in response to the problems of large data volume, high computational load, and limited detection range existing in traditional LiDAR in the tracking of long-distance moving targets. This technology breaks through the traditional imaging reliance, adopts non-imaging single-pixel detection, and does not require the reconstruction of a complete three-dimensional image. It only extracts the key features of the target position, significantly reducing the amount of data and computing costs. However, the system is highly sensitive to environmental interference. It relies on the detection of the laser echo signal intensity and peak value. Strong light and atmospheric scattering (such as haze and dust) can cause a decrease in the signal-to-noise ratio of the echo signal, which may lead to peak positioning deviation. Massoud et al. [
24] proposed a real-time SLZ recognition method based on airborne Lidar point cloud data to address the issue that aircraft have difficulty accurately identifying a safe landing zone (SLZ) in low-visibility environments such as sandstorms, fog, and darkness. This method can accurately identify SLZ in stadiums, roads, rooftops, etc., and is compatible with multiple scenarios. It has been integrated into helicopter simulators by industrial partners, and the output SLZ is visually displayed in color coding (dark green for high certainty, light green for low certainty, and red for uncertainty) to assist pilots in making decisions. Zhang et al. [
25] proposed a real-time vehicle-tracking method based on the ideas of anchor-free and “track-by-point” for the problems of large computational load, dependence on anchor box parameters, and easy ID switching in the traditional LiDAR vehicle-tracking method based on bounding boxes. The vehicle motion state is predicted by combining Kalman filtering (linear constant-speed model), and the inter-frame target matching is achieved through the Hungarian algorithm. The experiment was verified on 32-line LiDAR data (10 Hz) at three signal intersections (Reno and Rabke, NV, USA). The results showed that the MOTA (Multi-target Tracking Accuracy) of this method reached 0.9253 (in the low-traffic scene of Reno), the number of ID switches was reduced by 40%, and the tracking speed reached 2566 FPS. It is 23% higher than the traditional bounding box method and meets the requirements of real-time applications. Shi et al. [
26] proposed an improved multimodal decision-level fusion 3D vehicle-detection algorithm to address the issues of insufficient feature utilization and limited accuracy in traditional CLOCs (Camera LiDAR Object Candidates Fusion) algorithms in multimodal fusion. This algorithm has sufficient feature fusion, and the improved algorithm balances real-time performance and accuracy. However, this experiment was not conducted under adverse weather conditions, and cannot verify the robustness of its algorithm.
Although existing research has made progress in specific scenarios, there are still the following limitations in the automated detection of truck carriage dimensions:
Based on the deficiencies in the above-mentioned literature, this study proposes a multi-parameter hierarchical collaborative optimization framework: designing differentiated algorithm logics (ICP dynamic weights, MLS+ plane constraints, and GMM-EM secondary screening) for different detection parameters to solve the parameter coupling problem of traditional methods. Develop a dust-adaptive point cloud preprocessing module: Dynamically adjust the Statistical Outlier Removal (SOR) filter threshold based on PM2.5 concentration, maintaining an effective point cloud rate of 95% even when PM2.5 = 500 μg/m3. Build a low-cost LiDAR detection system: Dual Livox HAP deployment reduces hardware cost by 60%, and single-frame processing time ≤ 2.1 s, meeting the real-time industrial requirements.