Long-Tail Learning for Three-Dimensional Pavement Distress Segmentation Using Point Clouds Reconstructed from a Consumer Camera

Cheng, Pengjian; Yi, Junyan; Pei, Zhongshi; Liu, Zengxin; Jiang, Dayong; Abdukadir, Abduhaibir

doi:10.3390/rs18071008

Open AccessArticle

Long-Tail Learning for Three-Dimensional Pavement Distress Segmentation Using Point Clouds Reconstructed from a Consumer Camera

by

Pengjian Cheng

,

Junyan Yi

^*

,

Zhongshi Pei

,

Zengxin Liu

,

Dayong Jiang

and

Abduhaibir Abdukadir

School of Transportation Science and Engineering, Harbin Institute of Technology, Harbin 150090, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(7), 1008; https://doi.org/10.3390/rs18071008

Submission received: 11 February 2026 / Revised: 19 March 2026 / Accepted: 24 March 2026 / Published: 27 March 2026

(This article belongs to the Special Issue Point Cloud Data Analysis and Applications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A workflow encompassing low-cost and readily deployable acquisition, multi-class pavement distress automated segmentation, and geometric information extraction, based on point clouds.
An end-to-end network for segmenting distress from pavement point clouds, incorporating a long-tail class imbalance mitigation strategy and a dual-stream feature fusion module.

What are the implications of the main findings?

Enables scalable, low-cost 3D pavement inspection and monitoring using consumer-grade imaging, reducing reliance on expensive scanning systems and improving deploy ability for routine inspections.
Provides engineering-ready 3D distress outputs to support condition assessment, maintenance prioritization, and integration into intelligent pavement management workflows.

Abstract

The application of 3D data in pavement inspection represents an emerging trend. Acquiring and measuring the 3D information of pavement distress enables a more comprehensive assessment of severity, thereby allowing for accurate monitoring and evaluation of the pavement’s technical condition. Existing methods face challenges in high-cost pavement scanning and insufficient research on automated 3D distress segmentation. This study employed a consumer-grade action camera for data acquisition and constructed an engineering-aligned 3D point cloud dataset of pavements. Then a long-tail class imbalance mitigation strategy was introduced, integrating adaptive re-sampling with a weighted fusion loss function, effectively balancing minority class representation. The proposed network, named PointPaveSeg, was a dedicated point cloud processing architecture. A dual-stream feature fusion module was designed for the encoder layer, which decoupled geometric and semantic features to improve distress extraction capability. The network incorporated a hierarchical feature propagation structure enhanced by edge reinforcement, global interaction, and residual connections. Experimental results demonstrated that PointPaveSeg achieved an mIoU of 78.45% and an accuracy of 95.43%. In the field evaluation, post-processing and geometric information extraction were performed on the segmented point clouds. The results showed high consistency with manual measurements. Testing confirmed the method’s practical applicability in real-world projects, offering a new lightweight alternative for intelligent pavement monitoring and maintenance systems.

Keywords:

point cloud; pavement; distress segmentation; deep learning; remote sensing

1. Introduction

The acquisition of 3D pavement data through advanced technological means enables more comprehensive evaluation of pavement conditions, representing a critical direction in intelligent inspection development [1,2]. In current practice, pavement distress detection primarily relies on 2D images. While this approach ensures high accuracy in detection and localization, it inherently fails to capture the 3D characteristics of distress. Existing technical standards in China assess the severity of potholes and other types of distress only based on projected area. However, potholes with different depths have significantly different effects on driving safety, ride comfort, and structural integrity. This limitation has attracted increasing attention in recent years, leading to more studies using 3D data to evaluate pavement conditions.

Current methods for acquiring 3D pavement data mainly fall into two categories: laser-based systems and stereo vision techniques. Among laser-based systems, LiDAR (Light Detection and Ranging) is the predominant method, offering high accuracy, rapid data acquisition rates, and robust performance under varying lighting conditions. However, its widespread adoption is constrained by prohibitive costs, with initial investments often exceeding $1 million for high-precision mobile mapping systems [3]. Stereo vision techniques reconstruct 3D pavement by processing overlapping 2D images to generate point clouds. Although these methods are less accurate than laser-based systems, they reduce costs by an order of magnitude. This cost-effectiveness holds particular significance for China’s 4-million-kilometer highways below Class II, which show poor technical conditions but lack funding for high-end inspections. Lightweight systems enable large-scale assessment at lower equipment cost and can complement high-accuracy laser-based systems.

Recent research has demonstrated significant progress in pavement distress analysis using 3D data. By leveraging 3D spatial features, various distress types, including potholes and cracks, can be effectively segmented and characterized. This approach also enables comprehensive geometric quantification, including length, area, and volume. Zhang et al. used 3D laser profiling to acquire pavement cross-sections and applied PCA to separate deformation from abrupt residual variations for crack detection [4]. Chen et al. achieved pixel-level fusion between point clouds and images, employing a modified Otsu’s algorithm for crack detection, with results demonstrating that the fused data yielded higher accuracy than using either point clouds or images alone [5]. Liu et al. extracted pavement cross-sections and longitudinal profiles from laser point clouds, used the Douglas–Peucker algorithm and integral invariant operators to identify pothole features, and then delineated pothole boundaries from local depth to measure depth and area [6]. Song et al. detected sub-centimeter-level urban pavement settlement by comparing multi-temporal point clouds, with the workflow involving point cloud preprocessing, mesh construction, Gaussian kernel smoothing, and ultimately determining settlement areas through height difference calculations [7]. Fan et al. proposed a pothole detection algorithm based on road disparity map estimation and segmentation. The method employed semi-global matching (SGM) to estimate road disparities, utilized a disparity transformation algorithm to distinguish damaged road areas, and then applied linear iterative clustering (SLIC) to convert the transformed disparities into superpixels for final pothole detection [8]. Khan et al. compared terrestrial LiDAR-derived raw point clouds, DEMs, and hillshade maps for pavement distress assessment and found that hillshade maps were the most suitable representation [9]. Sun et al. used an RGB-D camera to acquire pothole point clouds, applied voxel-based preprocessing, segmented potholes with RANSAC and Euclidean clustering, and measured pothole volume using Alpha Shapes [10]. Faisal et al. proposed a downsampled point cloud-based method for pothole detection and assessment. Their approach leveraged curvature as a feature descriptor to achieve pothole segmentation in low-density point clouds [3]. Li et al. used binocular stereo vision and point cloud processing to extract morphology features of potholes and reached a high depth accuracy [11].

For complex tasks in which latent features are difficult to describe explicitly, deep learning has been widely used for point cloud classification and segmentation [12,13]. Several scholars have employed multi-view methods [14,15] or voxelized representations [16,17] to address the irregularity of point clouds, thereby enabling convolutional operations. PointNet, as a pioneering deep learning architecture that directly processes raw point clouds, employs shared MLPs to extract per-point features and utilizes a global max-pooling layer to aggregate global features. This design effectively circumvents the irregularity challenge of point clouds while achieving efficient and concise per-point segmentation [18]. PointNet++ extended PointNet with hierarchical sampling and multi-scale neighborhood aggregation, improving learning for varying point densities and fine-grained patterns [19]. Graph neural networks were later introduced into point cloud processing. Wang et al. proposed EdgeConv, which captures local neighborhood information and improves global feature representation, achieving strong performance in classification and segmentation [20]. Inspired by natural language processing, self-attention mechanisms have also been introduced into deep learning for point clouds. Under the Transformer framework, the approach was inherently permutation-invariant to point clouds. By combining FPS and k-NN search, it enhanced local context extraction and demonstrated state-of-the-art performance [21]. Qian et al. improved PointNet by optimizing training and sampling strategies without changing its main structure, and further proposed PointNeXt by introducing an inverted residual bottleneck block [22].

Deep learning methods have also been explored in point cloud-based pavement distress detection. Shim et al. proposed a crack segmentation network to enhance the recognition performance of crack images captured under diverse environmental conditions [23]. Additionally, they achieved 3D reconstruction of crack regions through stereo vision techniques, enabling comprehensive spatial analysis of distress [23]. Ma et al. developed a saliency-based dilated graph convolution network, named SD-GVN, to extract cracks from MLS point clouds [24]. The network included four modules: non-ground point removal, height-intensity saliency map generation, cylinder-based dilated convolution for multi-scale feature extraction, and MLP-based feature refinement [24]. In subsequent research, they further optimized GNN architecture, significantly reducing computational costs for point cloud-based crack recognition [25]. Faris et al. employed a point cloud-oriented deep learning network for crack segmentation in unstructured environments, utilizing LiDAR-scanned colored point clouds as data source [26]. Additionally, they proposed a performance evaluation metric based on the recognition rate of crack entities to assess model effectiveness. Fan et al. enhanced PointNet++ by incorporating an attention module to improve crack segmentation and adopted Poly Loss to address class imbalance [27]. Pascucci et al. conducted a comparative analysis of DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and traditional classification methods (e.g., random forests) for crack detection in point clouds [28]. By integrating open-source WebGIS and pavement condition index (PCI) data, they improved the feasibility and efficiency of pavement crack detection using point cloud. Feng et al. proposed a semi-supervised GNN framework for pavement crack detection using MLS point clouds [29]. Compared with conventional deep learning methods, it achieved higher accuracy while reducing annotation effort and training cost. In subsequent research, they introduced a two-level nested U-Net architecture to enhance multi-scale feature perception in GNNs, while simultaneously reducing memory and computational resource consumption [30]. Wang et al. captured multi-view images of pavement potholes and reconstructed point clouds [31]. They further developed a Transformer-based deep learning network to achieve automatic segmentation and extraction of potholes from the point clouds. In subsequent research, they employed depth cameras to capture 3D information of potholes and proposed a GAN for dataset augmentation. Furthermore, they developed a deep learning architecture combining 3D fused convolutions with Transformer modules to achieve pothole segmentation in point clouds [32].

In automated pavement inspection, distressed areas typically account for a significantly smaller proportion compared to intact pavement, meaning that minority class samples are far outnumbered by majority class samples. This phenomenon, known as the long-tail problem in deep learning, is commonly encountered in object detection and segmentation tasks. The long-tail problem causes models to favor majority classes and neglect minority classes [33,34]. Minority class distress is undoubtedly of greater concern. In addressing the long-tail problem, prevailing solutions span data-level, loss function-level, and model architecture-level strategies. Data-level solutions fundamentally aim to address class imbalance by ensuring the model encounters more minority class instances during training. Key implementations include re-sampling [35], data augmentation [32,36,37], and transfer learning [38]. In long-tailed learning, loss function modification is regarded as a low-cost and efficient solution. By assigning differential weights to training losses across classes, this approach directs greater model attention toward minority categories. Panella et al. showed in crack segmentation experiments that effective loss functions can address long-tail problems better than simply deepening or widening the network [39] Fan et al. proposed a dynamic loss function that enhanced the performance of a hybrid CNN–Transformer network model in crack segmentation tasks [40]. From a model architecture perspective, a common strategy involved integrating global features with local fine-grained features to enhance the model’s learning capability for minority class characteristics. Other tuning strategies have also been explored to enable faster prediction and more compact models [41,42]. In the context of pavement inspection tasks, designing network structures based on the salient features of defects has proven to be an effective solution [43].

Automated pavement distress detection using point cloud demonstrates considerable application prospects, yet several technical limitations persist. Although vehicle-mounted LiDAR systems enable efficient point cloud acquisition, their high cost limits widespread adoption. In contrast, lightweight and cost-effective computer vision solutions are more suitable for inspections of low-grade highways and routine maintenance. Current point cloud-based automated detection methods predominantly focus on single distress types or localized analysis of small pavement sections, significantly limiting their practical utility in real-world inspection scenarios. This study proposes a point cloud segmentation method for the two most common types of distress: cracks and potholes. The proposed network includes a long-tail class imbalance mitigation strategy, a dual-stream fusion encoder that separates geometric and semantic cues using seven-channel inputs, and a hierarchical global–local–edge feature propagation framework. This method promotes the development of cost-effective automated distress detection and enables more refined pavement condition assessment.

2. Data Acquisition and Processing

2.1. Point Cloud Generation

The equipment for data acquisition was a consumer-grade DJI Osmo Action4 action camera (Dajiang Innovation Co., Ltd., Shenzhen, China), equipped with a 1/1.3-inch image sensor, 155° FOV, and maximum 3648 × 2736 resolution, for low-cost pavement inspection. The built-in HorizonSteady stabilization technology effectively reduced vibration during vehicle motion and ensured stable image capture. The camera was mounted at the rear end of a vehicle, 1.5 m above the pavement, with a backward viewing angle of 30°, shown in Figure 1. The mounting device consisted of a main bracket and an auxiliary stabilizing bracket. Both brackets had multiple adjustable joints and ball heads for angle adjustment, allowing installation on different vehicles. Video recordings were captured at a resolution of 2688 × 1512 pixels, corresponding to an area of 11.25 m² per frame. The pavement area represented by each pixel ranged from 0.019 to 0.036 cm², depending on its position in the image. To increase dataset diversity and improve generalization in practical applications, images were collected from multiple road sections, including Grade I to Grade III highways and municipal roads.

During frame extraction, one out of every three frames was retained from video recorded at 48 fps, resulting in an overlap rate of over 70% between consecutive images. Typical distressed sections were selected for 3D reconstruction using SfM. The workflow consisted of three key stages: (1) feature extraction and matching, (2) camera pose and trajectory estimation (i.e., estimating the rotation and translation of each image in a common reference frame), and (3) depth estimation with 3D reconstruction. The pipeline was implemented through open-source tools including OpenCV (v4.6.0), Open3D (v0.19.0), and OpenSfM (v0.5.2). Camera poses were initialized from two-view geometry and then refined by bundle adjustment using Ceres Solver to minimize reprojection error. Figure 2 illustrates representative pavement images and reconstructed point clouds.

2.2. Point Cloud Processing

To ensure geometric consistency between the reconstructed point cloud and the pavement coordinate system, point cloud correction was performed using PCA-based plane fitting. In pavement scenes, height variation is typically limited, so most points can be approximated as lying on a dominant plane. This property facilitates coordinate axis alignment through plane fitting.

During image acquisition, the camera was intentionally tilted relative to the pavement to preserve vertical height information. This configuration inevitably introduced perspective distortion. When SfM was used for 3D reconstruction, these perspective errors tended to accumulate. This paper proposed an adaptive linear correction method for point clouds. During 3D reconstruction, because all images were captured at the same height and angle and covered the same pavement area, the perspective distortion in the point cloud was assumed to follow a linear transformation. Firstly, a reference pavement width W_ref was extracted from convex hull of point cloud. Next, the point cloud was divided along Y axis into M segments. For each segment j, the local width w_j was defined as the x-range of points near the segment boundary. To avoid a single fixed correction, s(y) was computed by piecewise linear interpolation between adjacent node coefficients, as shown in Formula (1). The point clouds before and after correction are shown in Figure 3.

s (y_{i}) = s_{j} + \frac{y_{i} - b_{j}}{b_{j + 1} - b_{j}} (s_{j + 1} - s_{j}), j = 0, 1, \dots, M - 1, y_{i} \in [b_{j}, b_{j + 1})

(1)

2.3. Dataset Construction

Existing publicly available point cloud datasets typically consist of indoor or outdoor scene scans, which do not align with the requirements for multitype pavement distress detection. To address this gap, this study employed the point cloud acquisition and processing methods and constructed a dedicated pavement point cloud dataset.

The annotation was performed using the open-source software CloudCompare (v2.13). The annotation process involved classifying each point in the point cloud into one of three categories, namely, intact pavement, crack, or pothole, while assigning corresponding labels. Before annotation, obvious isolated outliers were removed. Each point in the resulting file contained XYZ coordinates, normalized RGB color values, and the category label. The dataset comprised 330 annotated point cloud files, including 284 potholes and 765 cracks. Because the reconstructed scene sizes varied, each file contained 100 k to 1000 k points. Through data augmentation, including random rotation, translation, and scaling of point clouds as well as global offset and scaling of RGB values, the dataset size was tripled. The final dataset was partitioned into training and test sets at an 8.5:1.5 ratio.

3. Methodology

3.1. Network Structure

A new network structure named PointPaveSeg was proposed. The network consisted of four encoder layers and four decoder layers. For feature extraction, the first encoder layer incorporated a dual-stream feature fusion module (Geo-Sem DSFF). The second encoder layer was a Set Abstraction layer with a channel attention mechanism to adaptively recalibrate feature importance and suppress irrelevant channel responses. The third encoder layer was a Set Abstraction layer, and the fourth performed global feature aggregation. The decoder integrated four feature propagation layers that combined EdgeConv, global interaction, edge-aware gradient boost, and residual connection. This multi-scale fusion framework enhanced point cloud features at the global, local, and edge levels. The network structure of PointPaveSeg is shown in Figure 4.

The network accepted 7-channel point cloud data (XYZ coordinates + RGB colors + color gradient) as input, which underwent progressive feature abstraction through four encoder layers with expanding receptive fields. This hierarchical processing transformed raw point clouds into high-dimensional tensors with reduced point counts while preserving critical geometric and semantic information. The decoder section employed a progressive interpolation strategy to restore the spatial resolution of point clouds, while incorporating skip connections to preserve low-level features and leveraging classification labels to guide feature reconstruction. The network ended with a fully connected classification head for per-point classification and pavement distress segmentation.

(1) Geometry and Semantic Dual-Stream Feature Fusion (Geo-Sem DSFF)

Geo-Sem DSFF is an attention module designed for RGB point clouds and pavement distress segmentation. It uses dual-stream processing to separate geometric and semantic features, followed by global context aggregation and spatial guidance for feature weighting. This study focuses on cracks and potholes, which show distinct features in point clouds. Potholes exhibit stronger geometric saliency due to measurable depth variations, and the spatial features dominate their detection. In contrast, cracks usually lack clear 3D structures in large-scale pavement point clouds, making semantic features the main discriminative cues. By decoupling and independently processing geometric and semantic features, the module improves segmentation accuracy and robustness while reducing confusion between cracks and potholes.

The structure of Geo-Sem DSFF is shown in Figure 5. The input features were first decoupled into geometric coordinates and supplementary attributes, which in this study correspond to RGB and color gradient. Two separate MLPs were employed to process geometric and semantic features independently and extract multimodal representations. Subsequently, a channel mechanism was employed to process both features. Global average pooling was applied to compress each feature type, followed by independent computation of channel-wise attention weights. The features were then adaptively fused through a weighted combination and nonlinear activation. In the spatial attention module, geometric structure was extracted from the covariance matrix of the input point cloud, and Singular Value Decomposition (SVD) was used to derive surface normals. These normals were combined with decentralized spatial coordinates to construct comprehensive geometric features, which were further encoded to generate spatial attention weights. For semantic feature processing, information from max-pooled and average-pooled features was first fused, and the resulting representation was then combined with explicitly encoded geometric weights to generate geometry-sensitive spatial attention. The final attention weights were obtained by adaptively fusing channel and spatial attention.

(2) Edge Feature Propagation

To enhance performance in detecting the edges of cracks and potholes, a framework that integrates EdgeConv, global interaction, and edge gradient enhancement, building upon the inverse interpolation-based upsampling method in PointNet++ was proposed. The feature propagation layers took the coordinates and corresponding features of point sets as input before and after downsampling, denoted as X_i, X_i+1, and F_i, F_i+1, respectively. Through inverse interpolation, it estimated the features

F_{i + 1}^{'}

for the denser point set. Then

F_{i + 1}^{'}

was concatenated with F_i, and the combined features are processed by an MLP to obtain the fused output feature. Subsequently, EdgeConv was employed to dynamically capture local geometric structures. Unlike traditional methods that process point features independently, EdgeConv constructs edge-like features by computing the feature differences between each point and its neighbors, thereby simulating a graph convolution mechanism. This approach significantly enhanced the model’s sensitivity to edges and shape variations in point clouds. In implementation, the KNN algorithm was first applied to identify the neighboring points for each query point. Subsequently, edge features were constructed for each point by combining its own features with the feature differences relative to its neighbors. This process is formally expressed as Formula (2):

f_{i}^{'} = MaxPool \{j \in N (i)\} (h_{θ} (f_{i}, f_{j} - f_{i}))

(2)

where

f_{i}^{'}

is the updated feature of query point i, N_(i) represents the set of nearest neighbors of i, and h_θ is a nonlinear mapping function, implemented as an MLP composed of Conv2D, Batch Normalization (BN), and ReLU activation.

Point clouds have an unordered structure, and relying only on local operations may lead to insufficient global semantic understanding. To address this issue, a global interaction module was introduced to use global features to enhance the semantic understanding of local points. The core of this module was a residual block with 1D convolutions, as expressed in Formulas (3) and (4). This design shared conceptual similarities with the self-attention mechanism in Transformers but employed lightweight convolutions for improved efficiency.

F_{i - g l o b a l} = F_{i} + h_{φ} (F_{i})

(3)

h_{φ} (•) = Conv 1 d (δ (BN (Conv 1 d (•))))

(4)

To capture finer edge details and improve segmentation accuracy, an edge-aware gradient enhancement module named GradientConv was introduced. This module enhanced boundary segmentation by explicitly modeling edge information. Specifically, first-order difference was used to approximate gradients and strengthen edge-aware feature representation through lightweight convolutions, as expressed in Formulas (5) and (6). To preserve low-level features and stabilize gradient flow, a residual connection was incorporated by adding the original input features to the enhanced output.

F_{i}^{'} = F_{i} + h_{γ} (f_{j} - f_{j - 1})

(5)

h_{γ} (•) = Conv 1 d (δ (BN (•)))

(6)

To achieve comprehensive feature propagation, a hierarchical global–local–edge framework was constructed by integrating EdgeConv and global interaction modules at higher-level layers, while employing GradientConv at lower-level layers. This architecture enhanced segmentation accuracy for pavement distress, particularly for crack-like defects with subtle spatial structures, by jointly modeling global context, local geometry, and fine-grained edge cues.

3.2. Re-Sampling

In the constructed dataset, intact pavement points account for the vast majority, whereas cracks and potholes account for only about 3.4% and less than 1% of the data, respectively. Under real-world inspection scenarios, the proportion of defective areas would be even lower. This extreme class imbalance hinders the model’s ability to effectively learn discriminative features for accurate segmentation. To mitigate the adverse effects of the long-tail problem, a strategy incorporating color gradient (G_C) augmentation and targeted re-sampling techniques was proposed.

Defective areas typically demonstrate noticeable color differences from surrounding intact pavement. It is not feasible to identify pavement distress based solely on color differences due to the presence of markings, stains, and other disturbances. Nevertheless, such color information can still serve as valuable prior knowledge to inform deep learning models. The color dissimilarity was quantified by computing the average Euclidean distance in color space, calculated as Formula (7).

G_{C i} = \frac{1}{k} {\sum_{\{j \in N (i)\}} ‖R G B_{j} - R G B_{i}‖}_{2}

(7)

where G_Ci is the color gradient of point i, N_(i) denotes the neighborhood of point i, and RGB_i ∈ ℝ³ represents its color vector; all three channels are processed with equal weights.

As illustrated in Figure 6, the color gradient transformation effectively amplifies the contrast between intact pavement and distress. When incorporated as an additional input channel during training, it can provide richer learnable information for the deep neural network. However, it should be noted that high gradient magnitudes may also originate from non-damage artifacts such as pavement markings or illumination variations during data acquisition. Consequently, while gradient features offer improved damage localization cues, they cannot solely be relied upon for precise segmentation, necessitating integration with more sophisticated multimodal analysis.

Conventional re-sampling methods assign sampling weights by semantic labels, artificially increasing the proportion of minority classes. However, for point cloud-based pavement distress detection, such methods have a key limitation: point density strongly affects feature representation. Although label-based re-sampling can improve segmentation in densely sampled distress regions, it introduces an unrealistic density bias and weakens generalization to real-world unlabeled data. As a result, the model may perform worse on unseen pavement point clouds. To address this issue, a color gradient-guided re-sampling strategy was proposed, in which sampling weights are assigned according to gradient magnitude. Without relying on labels, the method exploits intrinsic distress cues and alleviates the long-tail problem by increasing the sampling probability of gradient-salient regions.

The re-sampling strategy assigned higher weights to color gradient intervals with higher damage concentrations. These intervals were identified by analyzing the gradient distribution of the entire dataset. To reduce the effect of illumination variation, color gradient magnitudes were normalized and divided by quantiles. As illustrated in Figure 7, the number of damage-associated points increased progressively with higher gradient quantiles. Over 50% of damage-associated points resided within the top 75% gradient quantile.

Leveraging color gradients as prior knowledge, an adaptive interval-based re-sampling strategy was implemented. This strategy dynamically adjusted sampling probabilities to emphasize gradient ranges that were more relevant to distress and practical in engineering applications. First, the enhancement interval (0.75, 0.99) was identified through empirical analysis of gradient distribution. The top 1% was excluded to mitigate sensor noise. For each point cloud, the gradient value of the target interval was calculated. Then the enhance factor h was calculated as follows and the sigmoid smoothing was used, as expressed in Formulas (8)–(11).

s i g_{l} = 1 / (1 + e^{- k (G_{C i} - G_{C - l o w})})

(8)

s i g_{l} = 1 / (1 + e^{- k (G_{C - u p} - G_{C i})})

(9)

h_{i} = β \times s i g_{l} \times s i g_{r} \times (1 + G_{C i})

(10)

w_{i} = G_{C i} \times (1 + h_{i})

(11)

where G_Ci is the normalized color gradient of point i, sig_l and sig_r are sigmoid factors, k represents the steepness factor, used to control the steepness of the curve, β is the base enhancement factor, and w is the enhanced sampling weight, which is normalized for sampling.

3.3. Re-Weighting

Adjusting class weights through the loss function during training is one of the effective approaches to addressing the long-tail problem in deep learning. The standard cross-entropy loss function (CE Loss) inherently ignores differences in class frequencies, tending to produce imbalanced gradients. To enhance the model’s sensitivity to minority classes during training, the loss computation was modified to guide gradient updates in a direction that favors minority classes, thereby improving performance on long-tail distributions.

In this paper, four loss functions were tested including Focal Loss [44], LDAM Loss [45], weighted CE Loss, and weighted IoU fusion Loss designed for the segmentation of pavement distress. Focal Loss is a widely used re-weighting loss function that builds upon the balanced cross-entropy loss by introducing a modulating factor. This factor dynamically adjusts the weights based on the classification difficulty of each sample, directing more attention to hard-to-classify examples. The mathematical formulation is as Formula (12):

L_{F L} (p_{t}) = - α_{t} {(1 - p_{t})}^{γ} \log (p_{t})

(12)

where p_t represents the model’s predicted probability for the true class, α_t is the class-balancing weight, and γ is the focusing parameter.

LDAM Loss addresses class imbalance by explicitly enforcing larger decision margins for minority classes during training, as shown in Formula (13). Unlike Focal Loss, which focuses on suppressing easy samples, LDAM prioritizes inter-class discriminability, reducing the tendency of the model to misclassify minority samples as majority classes.

L_{L D A M} (x_{i}, y_{i}) = - \log \frac{\exp (z_{y i} - Δ_{y i})}{\exp (z_{y i} - Δ_{y i}) + \sum_{j \neq y_{i}} \exp (z_{j})}

(13)

where x_i and y_i are the input sample and label, z represents the predicted probability, Δ represents the class-dependent margin,

Δ_{j} = \frac{C}{n_{j}^{1 / 4}}

, n is the number of samples in a class, and C is a hyperparameter.

Besides the above two loss functions, Weighted Cross-Entropy Loss was also used, as expressed in the Formula (14). It is a straightforward yet effective method for handling class imbalance. It assigns a fixed weight to each class in the standard cross-entropy loss. This forces the model to pay more attention to minority classes during training.

L_{W C E} = - \frac{1}{N} \sum_{i = 1}^{N} w_{y_{i}} y_{i} \log (p_{i})

(14)

In point cloud segmentation tasks, conventional accuracy metrics are easily dominated by majority classes, whereas the IoU provides a more comprehensive evaluation by simultaneously assessing recognition accuracy and detecting overfitting. To address this, a weighted IoU fusion Loss was designed, which augments WCE with class-wise IoU-based weighting to directly optimize segmentation boundary quality. The mathematical formulation is as Formula (15).

L_{W - I o U} = (1 - λ {) L}_{W C E} + λ \frac{1}{K} \sum_{k = 1}^{K} w_{k} (1 - I o U_{k})

(15)

where K is the number of classes and w_k is the class weight when computing IoU loss.

3.4. Post-Processing and Geometric Measurement

After segmentation by the deep learning model, each point in the pavement point cloud was assigned a classification label. However, performing instance extraction directly on the point cloud led to fragmented instances and unstable geometric estimation. To improve the robustness of geometric measurement, the segmented distress point cloud was projected onto a 2D plane in the pavement coordinate system, forming a 2.5D measurement framework. The initial crack mask was dilated on the grid to reduce point cloud voids. A closing operation was then applied to connect short gaps and broken segments, prioritizing oriented line closing for elongated structures, followed by disk closing with a smaller radius. Small holes were subsequently removed to eliminate discrete noise. After these steps, endpoint bridging was performed to reconnect skeleton endpoints within a prioritized distance, repairing discontinuities, as shown in Figure 8. This post-processing primarily aimed to restore topological connectivity for stable length estimation, while width estimation remained based on the original segmentation evidence to avoid systematic widening introduced by post-processing. Potholes are areal distresses, and their post-processing focuses primarily on boundary completion and hole filling. After performing closing operations and hole filling, noisy connected components were removed to obtain the final pothole instances. For each distress instance mask, depth statistics were computed on the 2.5D depth field to derive the maximum depth and other relevant statistical metrics.

4. Experimental Setup

The experiments were conducted using Python 3.9 and PyTorch 1.10 on an Intel^® Xeon^® Gold 5218 CPU and NVIDIA GeForce RTX 2080 Ti GPU. During training, the initial learning rate was set to 0.001, with a lower bound of 1 × 10⁻⁵ to ensure non-zero updates. The initial momentum was configured at 0.1, with a decay factor of 0.5 to regulate gradient update inertia and mitigate training oscillations. The momentum decayed periodically during training, gradually reducing its influence. The evaluation metrics included precision, recall, accuracy, F1-score, and mean intersection over union (mIoU). The formulas for the metrics are shown as Formulas (16)–(20).

P r e c i s i o n = \frac{T P}{T P + F P}

(16)

R e c a l l = \frac{T P}{T P + F N}

(17)

A ccuracy = \frac{T P + T N}{T P + T N + F P + F N}

(18)

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(19)

mIoU = \frac{1}{n} \sum_{i = 1}^{n} \frac{T P_{i}}{T P_{i} + F P_{i} + F N_{i}}

(20)

5. Results

5.1. Re-Sampling and Re-Weighting

Beyond network architecture optimization, a dual strategy composed of re-sampling and re-weighting was adopted. Two hyperparameters, k and β, were determined through tests. The model used was the baseline network without Geo-Sem DSFF and improvements in feature propagation layers. The results are shown in Table 1; the combination of k = 15, β = 5 was chosen because of the most sampled points and mIoU. The number of sampled points of potholes and cracks improved by 89.0% and 61.1%, respectively. It should be mentioned that k and β are not intended to be universal constants. They should be adapted to the target dataset because gradient distributions and imbalance ratios differ.

A comprehensive evaluation of five loss functions using the baseline network and adaptive interval sampling strategy was conducted, as shown in Figure 9. The results reveal slight differences in accuracy. WIoU Loss achieved the highest mIoU, followed by Focal Loss. Compared with CE Loss, all four alternative methods improved pothole segmentation and increased the F1-score. For crack segmentation, only WIoU Loss outperformed CE Loss, while other loss functions failed to enhance the recognition of this minority class. Although cracks were more numerous than potholes, their thin and discontinuous structures made them harder to segment. The superior performance of WIoU Loss suggests that IoU-based optimization is more effective than pixel-wise losses for this task, and that its class-weighting mechanism better handles class imbalance.

5.2. Model Testing

The PointPaveSeg was trained integrating re-sampling and re-weighting strategies, and the performance was evaluated on the test set. It achieved 78.45% mIoU for pavement distress segmentation, with an 88.93% F1-score for potholes and 74.20% for cracks, while maintaining 95.43% overall accuracy. As shown in Figure 10a, the model successfully segmented potholes, accurately identifying their depressed interior regions while maintaining clear and intact edge delineation. For the more challenging crack segmentation task, PointPaveSeg segmented the majority of crack point clouds (Figure 10b). Despite the inherent difficulty caused by lacking significant spatial coordinate variations, the model captures sufficient fracture morphology and dimensions, though minor discontinuities persist in some segmented cracks.

As illustrated in Figure 10c, the model maintained robust performance under complex conditions involving interconnected cracks and potholes with large scale variations. Notably, it achieved differentiation between adjacent crack–pothole interfaces without misclassification, maintaining superior segmentation accuracy for potholes over cracks. These results validate the model’s capability in large-scale pavement scenarios, effectively delineating damaged areas while precisely distinguishing neighboring distress, confirming its readiness for practical inspection engineering applications.

5.3. Ablation Test

The hierarchical design including Geo-Sem DSFF and improved feature propagation aimed to improve the feature extraction and expression ability of network. The ablation test was conducted to validate the efficacy of each component, shown in Table 2, the same number is used in Figure 11 and Figure 12 to represent different networks. While all models achieved high accuracy due to the dataset’s long-tail distribution, this metric failed to reflect true segmentation performance. It can be seen from the results that crack segmentation presented significantly greater challenges than pothole. The proposed network demonstrated significant performance gains across both defect categories, with particularly notable 7.88% improvements in crack F1-score. The framework successfully segmented the complete crack morphology in 3D point clouds.

The proposed Geo-Sem DSFF module yielded significant improvements across multiple metrics, improving mIoU by nearly 4%. By extracting and explicitly encoding geometric features, the model effectively captured subtle spatial variations in crack point clouds, thereby enhancing crack perception. EdgeConv operated at higher feature levels, reinforcing global structural representation and compensating for missing details. GradientConv strengthened low-level features using local geometric properties. Both modules contributed to improved mIoU and F1-scores for potholes and cracks, though to varying degrees.

Figure 11a shows the loss values during training. The baseline network exhibited rapid initial loss reduction but ultimately converged to a higher loss value compared to the optimized versions, accompanied by significant oscillations—indicating inherent instability and insufficient model capacity for the target task. Through systematic improvements, the training loss was significantly reduced. With the incorporation of IoU loss, the final loss stabilized around 0.1. The integration of the global interaction module within the feature propagation layer effectively mitigated loss oscillations. This empirically validated that global feature fusion promoted model stability during training. As shown in Figure 11b, the mIoU served as the primary evaluation metric in this study, effectively demonstrating the validity of model optimizations. Compared to the baseline network, feature extraction improvement yielded a 4% mIoU improvement. With the additional edge-aware feature propagation layers, mIoU further increased to 78.45%.

As illustrated in Figure 12, the baseline network exhibited two critical limitations: (1) false-positive identification of intact pavement adjacent to potholes as damaged areas and (2) scale-sensitive misclassification where smaller potholes were erroneously segmented as cracks. Additionally, it only captured macroscopic cracks while failing to characterize fine crack morphology. The task-specific improvement proposed for pavement distress segmentation significantly enhanced the model’s capability to distinguish pavement distress. The method partially mitigated adverse effects caused by severe class imbalance in the dataset and successfully achieved automated segmentation.

5.4. Comparative Test

For 3D pavement data, the task of multi-class distress segmentation aligns more closely with point cloud semantic segmentation in deep learning. To validate the superiority of the proposed model, extensive experiments were conducted on pavement point cloud datasets, benchmarking against multiple classical and state-of-the-art models including PointNet [18], PointNet++ [19], PointNeXt [22], PointTransformer [21], and PointMamba [46]. PointNet is the pioneering work in point cloud deep learning, establishing the first framework for direct point-wise feature convolution. PointNet++ enhanced local feature aggregation through hierarchical spatial partitioning. PointNeXt revisited PointNet and improved architectural robustness via InvResMLP. PointTransformer incorporates long-range self-attention mechanisms from natural language processing. PointMamba implements state space modeling for sequential point cloud analysis.

As shown in Table 3, the proposed PointPaveSeg network showed competitive performance compared with both classical and recent point cloud deep learning models under both evaluation settings. Without the long-tail class imbalance mitigation strategy, PointPaveSeg achieved the best mIoU (73.64%), the highest crack F1-score (71.51%), and the highest overall accuracy (94.64%), indicating that the proposed architecture itself is well suited to pavement distress segmentation. After introducing the long-tail class imbalance mitigation strategy, all models obtained performance gains to different extents, which confirms the importance of addressing the severe class imbalance in pavement scenes. Under this setting, PointPaveSeg achieved the best mIoU (78.45%) and the highest crack F1-score (74.20%), while PointMamba obtained slightly higher pothole F1-score and overall accuracy. These results suggest that PointPaveSeg is particularly effective for the more challenging crack segmentation task. Especially in crack segmentation, as shown in Figure 13, PointPaveSeg achieved an F1-score improvement of more than 20% over PointNet. It can be observed that PointPaveSeg achieves a smaller parameter size compared to the similarly performing PointMamba, while also demonstrating shorter inference time under the current task conditions (batch size = 4; number of points = 102,400). Overall, the comparative results support the effectiveness of the proposed network for pavement distress segmentation and demonstrate its practical potential for pavement inspection applications.

5.5. Field Evaluation

The automated segmentation of pavement distress ultimately aims to serve pavement inspection. A field experiment was conducted to evaluate the proposed method. A 600 m section of Huanansi Road in Harbin with significant pavement damage was selected. An on-site survey and manual measurements of pavement distress were carried out, recording their locations and geometric characteristics, as shown in Figure 14. The test section contained 17 potholes and 40 crack instances at the centimeter level. Data collection followed the aforementioned method, capturing continuous pavement images during vehicle movement and reconstructing the pavement point cloud. PointPaveSeg was then applied for distress recognition and segmentation. Subsequently, the labeled point cloud was post-processed to extract geometric features of the distresses. All cracks and potholes were detected without missing any. In our measurement protocol, the surrounding intact pavement at the pothole rim was used as the local reference surface for each pothole. The pothole depth was determined as the maximum of the five deepest sampled points in the field manual measurement, while in the point clouds, after calculating the depth for each point within a pothole instance, they were rasterized onto a grid. The maximum depth was then extracted from the resulting depth grid. The results obtained by comparing the manually measured data with the automatically extracted data from the proposed method are shown in Figure 15.

The largest absolute error was observed for cracks, with a maximum exceeding 0.2 m. This is attributed to the variability in crack morphology and the errors introduced during the connection steps in post-processing. In terms of relative error, crack width and pothole depth exhibited the highest values, with interquartile ranges of 6.11–15.19% and 9.62–16%, respectively. The larger relative errors for these two metrics are due to their inherently small scales of measurement. Bland–Altman analysis was employed to assess agreement, as shown in Figure 16, examining systematic bias and the limits of agreement (LoA). For crack length, the 95% LoA was (−10.80%, 7.03%), indicating a slight underestimation, with the majority of differences falling within approximately ±10%. In contrast, crack width exhibited higher uncertainty, with considerably wider LoA than crack length, attributable to its small scale and sensitivity to segmentation boundary noise. In the current framework, crack width is more suitable as an auxiliary indicator for severity assessment rather than a standalone criterion. For potholes, most measurements lay within the 95% LoA, with length and width limits around (−10%, 15%). Pothole depth showed a tendency toward slight overestimation, influenced by the small magnitude of depth values and sparsity of points at the bottom. Overall, the results show good agreement with manual measurements.

6. Discussion

This work shows that point cloud-based analysis can provide 3D geometry such as pothole depth volume. These metrics are useful for severity grading and maintenance planning, because they describe the true surface deformation rather than only appearance. The field evaluation indicates that crack measurements have larger errors than pothole measurements. This is mainly because cracks have diverse shapes and thin boundaries, and the post-processing “connection” step can introduce extra uncertainty. Crack width is also sensitive to segmentation boundary noise, and its small magnitude makes the relative error more visible. From an engineering perspective, crack width is currently more reliable as an auxiliary severity indicator used together with crack length and morphology, rather than as a standalone severity criterion. In contrast, potholes usually have clearer boundaries, so length and width are more stable. Depth errors mainly come from sparse points near the bottom and the small absolute depth values. From an engineering view, our current setting targets centimeter-level cracks and potholes, because very small cracks are close to the effective resolution of low-cost acquisition and are more affected by noise. This is a practical trade-off between deploy ability and measurement precision.

Limitations remain. A direct point cloud-to-point cloud comparison with a LiDAR-based system reference can further quantify geometric accuracy under different conditions. This study focused on engineering usability and validated the extracted distress geometry by manual field measurements. This provides a practical reference for maintenance-related metrics. LiDAR-based benchmarking and hybrid evaluation will be included in future work. Segmented boundary quality and crack width estimation still need improvement, and more pavement and distress types should be included. Further improving the segmentation performance of the method is one of the key directions for future research. This includes exploring strategies such as expanding the dataset and incorporating more powerful feature extractors and hybrid architectures. On the acquisition side, future work will focus on integrating a compact vehicle-mounted LiDAR sensor with a camera to form a lightweight pavement data acquisition system. This setup aims to achieve a better balance between acquisition accuracy and cost, offering a cost-effective alternative to expensive mobile mapping platforms.

7. Conclusions

This paper presents a lightweight workflow for 3D pavement distress inspection using consumer-grade imaging and point-cloud learning. An engineering-oriented pavement point cloud dataset was constructed. Proposed PointPaveSeg for multi-class distress segmentation uses a dual-stream encoder to separate geometric and semantic features and a hierarchical decoder for feature propagation. A long-tail class imbalance mitigation strategy was introduced to get more points of distress. Experiments show that PointPaveSeg achieves 78.45% mIoU and 95.43% overall accuracy, with F1-scores of 88.93% for potholes and 74.20% for cracks. Field evaluation demonstrates good agreement with manual measurements, most pothole measurements lay within the 95% LoA, with length and width limits around (−10%, 15%). This work advances 3D pavement inspection and distress measurement by providing a lightweight, intelligent solution with significantly reduced hardware costs. Future research will expand dataset diversity and distress categories, and optimize network architecture for better segmentation performance.

Author Contributions

P.C.: Software, Data curation, Formal Analysis, and Writing—Original Draft. J.Y.: Conceptualization, Methodology, Funding Acquisition, and Supervision. Z.P.: Methodology and Validation. Z.L.: Investigation and Data Curation. D.J.: Data Curation and Visualization. A.A.: Data Curation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Jilin Provincial Transport Scientific Research Institute. Program name: Long-Life Maintenance Technologies for Pavement of Ordinary Trunk Highways in Jilin Province (No. GNCWSSJH20240036).

Data Availability Statement

The datasets generated in this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors express their sincere appreciation for the funding support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kothai, R.; Prabakaran, N.; Srinivasa Murthy, Y.V.; Reddy Cenkeramaddi, L.; Kakani, V. Pavement Distress Detection, Classification, and Analysis Using Machine Learning Algorithms: A Survey. IEEE Access 2024, 12, 126943–126960. [Google Scholar] [CrossRef]
Mathavan, S.; Kamal, K.; Rahman, M. A Review of Three-Dimensional Imaging Technologies for Pavement Distress Detection and Measurements. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2353–2362. [Google Scholar] [CrossRef]
Faisal, A.; Gargoum, S. Cost-effective LiDAR for pothole detection and quantification using a low-point-density approach. Autom. Constr. 2025, 172, 106006. [Google Scholar] [CrossRef]
Zhang, D.; Zou, Q.; Lin, H.; Xu, X.; He, L.; Gui, R.; Li, Q. Automatic pavement defect detection using 3D laser profiling technology. Autom. Constr. 2018, 96, 350–365. [Google Scholar] [CrossRef]
Chen, X.; Li, J.; Huang, S.; Cui, H.; Liu, P.; Sun, Q. An Automatic Concrete Crack-Detection Method Fusing Point Clouds and Images Based on Improved Otsu’s Algorithm. Sensors 2021, 21, 1581. [Google Scholar] [CrossRef]
Liu, R.; Yang, J.; Ren, H.; Cong, B.; Chang, C. Research on a pavement pothole extraction method based on vehicle-borne continuous laser scanning point cloud. Meas. Sci. Technol. 2022, 33, 115204. [Google Scholar] [CrossRef]
Song, H.; Zhang, J.; Zuo, J.; Liang, X.; Han, W.; Ge, J. Subsidence Detection for Urban Roads Using Mobile Laser Scanner Data. Remote Sens. 2022, 14, 2240. [Google Scholar] [CrossRef]
Fan, R.; Ozgunalp, U.; Wang, Y.; Liu, M.; Pitas, I. Rethinking Road Surface 3-D Reconstruction and Pothole Detection: From Perspective Transformation to Disparity Map Segmentation. IEEE Trans. Cybern. 2022, 52, 5799–5808. [Google Scholar] [CrossRef] [PubMed]
Khan, N.H.R.; Kumar, S.V. Terrestrial LiDAR derived 3D point cloud model, digital elevation model (DEM) and hillshade map for identification and evaluation of pavement distresses. Results Eng. 2024, 23, 102680. [Google Scholar] [CrossRef]
Sun, Q.; Qiao, L.; Shen, Y. Pavement Potholes Quantification: A Study Based on 3D Point Cloud Analysis. IEEE Access 2025, 13, 12945–12955. [Google Scholar] [CrossRef]
Li, J.; Liu, T.; Wang, X. Advanced pavement distress recognition and 3D reconstruction by using GA-DenseNet and binocular stereo vision. Measurement 2022, 201, 111760. [Google Scholar] [CrossRef]
Zhang, H.; Wang, C.; Tian, S.; Lu, B.; Zhang, L.; Ning, X.; Bai, X. Deep learning-based 3D point cloud classification: A systematic survey and outlook. Displays 2023, 79, 102456. [Google Scholar] [CrossRef]
Tang, X.; Huang, F.; Li, C.; Ban, D. A survey on end-to-end point cloud learning. IET Image Process. 2023, 17, 1307–1321. [Google Scholar] [CrossRef]
Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-View Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 945–953. [Google Scholar]
Yu, T.; Meng, J.; Yuan, J. Multi-View Harmonized Bilinear Network for 3D Object Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 186–194. [Google Scholar]
Li, Y.; Pirk, S.; Su, H.; Qi, C.R.; Guibas, L.J. FPNN: Field Probing Neural Networks for 3D Data. Adv. Neural Inf. Process. Syst. 2016, 29, 9. [Google Scholar]
Wang, L.; Huang, Y.; Shan, J.; He, L. MSNet: Multi-Scale Convolutional Network for Point Cloud Classification. Remote Sens. 2018, 10, 612. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation; Cornell University Library: Ithaca, Greece, 2017. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2018, 38, 1–12. [Google Scholar] [CrossRef]
Guo, M.; Cai, J.; Liu, Z.; Mu, T.; Martin, R.R.; Hu, S. PCT: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.A.A.K.; Elhoseiny, M.; Ghanem, B. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. Adv. Neural Inf. Process. Syst. 2022, 35, 23192–23204. [Google Scholar]
Shim, S.; Kim, J.; Cho, G.; Lee, S. Stereo-vision-based 3D concrete crack detection using adversarial learning with balanced ensemble discriminator networks. Struct. Health Monit. 2023, 22, 1353–1375. [Google Scholar] [CrossRef]
Ma, L.; Li, J. SD-GCN: Saliency-based dilated graph convolution network for pavement crack extraction from 3D point clouds. Int. J. Appl. Earth Obs. 2022, 111, 102836. [Google Scholar] [CrossRef]
Feng, H.; Ma, L.; Yu, Y.; Chen, Y.; Li, J. SCL-GCN: Stratified Contrastive Learning Graph Convolution Network for pavement crack detection from mobile LiDAR point clouds. Int. J. Appl. Earth Obs. 2023, 118, 103248. [Google Scholar] [CrossRef]
Faris, A.; Charlotte, S.; Michael, M.; Thierry, P. PointCrack3D: Crack Detection in Unstructured Environments using a 3D-Point-Cloud-Based Deep Neural Network. arXiv 2021, arXiv:2111.11615. [Google Scholar] [CrossRef]
Fan, J.; Song, W.; Zhang, J.; Sun, S.; Jia, G.; Jin, G. PAN: Improved PointNet++ for Pavement Crack Information Extraction. Electronics 2024, 13, 3340. [Google Scholar] [CrossRef]
Pascucci, N.; Dominici, D.; Habib, A. LiDAR-Based Road Cracking Detection: Machine Learning Comparison, Intensity Normalization, and Open-Source WebGIS for Infrastructure Maintenance. Remote Sens. 2025, 17, 1543. [Google Scholar] [CrossRef]
Feng, H.; Li, W.; Luo, Z.; Chen, Y.; Fatholahi, S.N.; Cheng, M.; Wang, C.; Junior, J.M.; Li, J. GCN-Based Pavement Crack Detection Using Mobile LiDAR Point Clouds. IEEE Trans. Intell. Transp. Syst. 2022, 23, 11052–11061. [Google Scholar] [CrossRef]
Feng, H.; Li, W.; Ma, L.; Chen, Y.; Guan, H.; Yu, Y.; Junior, J.M.; Li, J. Crack-U2 Net: Multiscale Feature Learning Network for Pavement Crack Detection From Large-Scale MLS Point Clouds. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17952–17964. [Google Scholar] [CrossRef]
Wang, N.; Dong, J.; Fang, H.; Li, B.; Zhai, K.; Ma, D.; Shen, Y.; Hu, H. 3D reconstruction and segmentation system for pavement potholes based on improved structure-from-motion (SFM) and deep learning. Constr. Build. Mater. 2023, 398, 132499. [Google Scholar] [CrossRef]
Dong, J.; Wang, N.; Fang, H.; Lu, H.; Ma, D.; Hu, H. Automatic augmentation and segmentation system for three-dimensional point cloud of pavement potholes by fusion convolution and transformer. Adv. Eng. Inform. 2024, 60, 17. [Google Scholar] [CrossRef]
Zhang, Y.; Kang, B.; Hooi, B.; Yan, S.; Feng, J. Deep Long-Tailed Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10795–10816. [Google Scholar] [CrossRef]
Chen, W.; Yang, K.; Yu, Z.; Shi, Y.; Chen, C.L.P. A survey on imbalanced learning: Latest research, applications and future directions. Artif. Intell. Rev. 2024, 57, 51. [Google Scholar] [CrossRef]
Zizhao, Z.; Tomas, P. Learning Fast Sample Re-weighting Without Reward Data. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 705–714. [Google Scholar]
Mazzini, D.; Napoletano, P.; Piccoli, F.; Schettini, R. A Novel Approach to Data Augmentation for Pavement Distress Segmentation. Comput. Ind. 2020, 121, 103225. [Google Scholar] [CrossRef]
Xu, Z.; Dai, Z.; Sun, Z.; Zuo, C.; Song, H.; Yuan, C. Enhancing Pavement Distress Detection Using a Morphological Constraints-Based Data Augmentation Method. Coatings 2023, 13, 764. [Google Scholar] [CrossRef]
Wei, T.; Shi, J.; Tu, W.; Li, Y. Robust Long-Tailed Learning under Label Noise. Front. Comput. Sci. 2026, 20, 2001321. [Google Scholar] [CrossRef]
Panella, F.; Lipani, A.; Boehm, J. Semantic segmentation of cracks: Data challenges and architecture. Autom. Constr. 2022, 135, 104110. [Google Scholar] [CrossRef]
Fan, Y.; Hu, Z.; Li, Q.; Sun, Y.; Chen, J.; Zhou, Q. CrackNet: A Hybrid Model for Crack Segmentation with Dynamic Loss Function. Sensors 2024, 24, 7134. [Google Scholar] [CrossRef]
Shi, J.; We, T.; Zhou, Z.; Shao, J.; Han, X.; Li, Y. Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts. arXiv 2024, arXiv:2309.10019. [Google Scholar]
Li, M.; Liu, Y.; Lu, Y.; Zhang, Y.; Cheung, Y.; Huang, H. Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition. Adv. Neural Inf. Process. Syst. 2024, 37, 103985–104009. [Google Scholar]
Zhang, J.; Sun, S.; Song, W.; Li, Y.; Teng, Q. A novel convolutional neural network for enhancing the continuity of pavement crack detection. Sci. Rep. 2024, 14, 20. [Google Scholar] [CrossRef]
Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEETrans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
Haeyong, K.; Thang, V.; Yoo, C.D. Learning Imbalanced Datasets With Maximum Margin Loss. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1269–1273. [Google Scholar]
Liang, D.; Zhou, X.; Xu, W.; Zhu, X.; Zou, Z.; Ye, X.; Tan, X.; Bai, X. PointMamba: A Simple State Space Model for Point Cloud Analysis. Adv. Neural Inf. Process. Syst. 2024, 37, 32653–32677. [Google Scholar]

Figure 1. Equipped DJI Osmo Action 4.

Figure 2. Point cloud generation using SfM.

Figure 3. Point cloud plane fitting and perspective correction.

Figure 4. Network structure of PointPaveSeg.

Figure 5. Structure of Geo-Sem DSFF.

Figure 6. Color gradient distributions of typical point clouds.

Figure 7. Class proportion of damage-associated points.

Figure 8. Post-processing of crack instances.

Figure 9. Comparative test results of loss functions.

Figure 10. Segmentation results of PointPaveSeg.

Figure 11. Metric curves in ablation test.

Figure 12. Segmentation results in the ablation test.

Figure 13. Crack segmentation results: comparison of different networks.

Figure 14. Test road and distress measuring.

Figure 15. Measurement error distribution.

Figure 16. Bland–Altman plot.

Table 1. Determination of hyperparameters for re-sampling strategy.

Sampling Strategy		Points Sampled in Testing Set		Accuracy (%)	mIoU (%)
Sampling Strategy		Potholes	Cracks	Accuracy (%)	mIoU (%)
Random sampling		41,962	135,020	96.70	67.84
Adaptive interval sampling	k = 10, β = 3	74,436	194,452	96.21	67.55
	k = 15, β = 3	75,720	203,616	95.88	68.26
	k = 20, β = 3	76,778	202,289	95.32	68.03
	k = 10, β = 5	77,445	213,447	96.92	67.96
	k = 15, β = 5	79,746	217,162	94.56	68.81
	k = 20, β = 5	78,844	209,452	95.48	67.82

Table 2. Results of the ablation test.

Number	Network	mIoU (%)	F1-Pot (%)	F1-Crack (%)	Accuracy (%)
①	Baseline	71.30	82.75	66.32	94.47
②	+Geo-Sem DSFF	75.03	85.72	71.11	94.54
③	+Geo-Sem DSFF + GradientConv	76.89	87.19	73.79	95.27
④	+Geo-Sem DSFF + GradientConv + Edgeconv	77.62	88.51	74.08	95.06
⑤	+Geo-Sem DSFF + GradientConv + Edgeconv + Global interaction	78.45	88.93	74.20	95.43

Table 3. Results of comparative tests.

(a) Comparative tests without long-tail class imbalance mitigation strategy
Network			mIoU (%)	F1-Pot (%)	F1-Crack (%)	Accuracy (%)
PointNet			64.25	74.15	52.01	93.52
PointNet++			66.89	76.62	62.15	93.99
PointNeXt			69.60	78.98	66.06	94.49
PointTransformer			70.49	80.39	66.65	94.61
PointMamba			72.27	84.76	68.10	94.51
PointPaveSeg			73.64	83.03	71.51	94.64
(b) Comparative tests with long-tail class imbalance mitigation strategy
Network	mIoU (%)	F1-Pot (%)	F1-Crack (%)	Accuracy (%)	Parameters (M)	Inference Time (s/item)
PointNet	66.41	80.49	54.17	94.95	3.625	1.04
PointNet++	71.47	82.12	67.26	94.70	1.744	0.87
PointNeXt	74.65	85.13	69.38	94.66	1.807	0.94
PointTransformer	73.91	85.72	68.43	95.09	3.094	3.26
PointMamba	77.46	89.74	70.22	95.65	12.558	2.06
PointPaveSeg	78.45	88.93	74.20	95.43	3.880	1.11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheng, P.; Yi, J.; Pei, Z.; Liu, Z.; Jiang, D.; Abdukadir, A. Long-Tail Learning for Three-Dimensional Pavement Distress Segmentation Using Point Clouds Reconstructed from a Consumer Camera. Remote Sens. 2026, 18, 1008. https://doi.org/10.3390/rs18071008

AMA Style

Cheng P, Yi J, Pei Z, Liu Z, Jiang D, Abdukadir A. Long-Tail Learning for Three-Dimensional Pavement Distress Segmentation Using Point Clouds Reconstructed from a Consumer Camera. Remote Sensing. 2026; 18(7):1008. https://doi.org/10.3390/rs18071008

Chicago/Turabian Style

Cheng, Pengjian, Junyan Yi, Zhongshi Pei, Zengxin Liu, Dayong Jiang, and Abduhaibir Abdukadir. 2026. "Long-Tail Learning for Three-Dimensional Pavement Distress Segmentation Using Point Clouds Reconstructed from a Consumer Camera" Remote Sensing 18, no. 7: 1008. https://doi.org/10.3390/rs18071008

APA Style

Cheng, P., Yi, J., Pei, Z., Liu, Z., Jiang, D., & Abdukadir, A. (2026). Long-Tail Learning for Three-Dimensional Pavement Distress Segmentation Using Point Clouds Reconstructed from a Consumer Camera. Remote Sensing, 18(7), 1008. https://doi.org/10.3390/rs18071008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Long-Tail Learning for Three-Dimensional Pavement Distress Segmentation Using Point Clouds Reconstructed from a Consumer Camera

Highlights

Abstract

1. Introduction

2. Data Acquisition and Processing

2.1. Point Cloud Generation

2.2. Point Cloud Processing

2.3. Dataset Construction

3. Methodology

3.1. Network Structure

3.2. Re-Sampling

3.3. Re-Weighting

3.4. Post-Processing and Geometric Measurement

4. Experimental Setup

5. Results

5.1. Re-Sampling and Re-Weighting

5.2. Model Testing

5.3. Ablation Test

5.4. Comparative Test

5.5. Field Evaluation

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI