Incremental Pavement Distress Classification in UAV-Based Remote Sensing via Analytic Geometric Alignment

Wang, Quanziang; Li, Xin; Peng, Jiangjun; Jia, Xixi; Wang, Renzhen

doi:10.3390/rs18081141

Open AccessArticle

Incremental Pavement Distress Classification in UAV-Based Remote Sensing via Analytic Geometric Alignment

by

Quanziang Wang

¹

,

Xin Li

²,

Jiangjun Peng

³

,

Xixi Jia

⁴ and

Renzhen Wang

^1,*

¹

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China

²

School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China

³

School of Mathematics and Statistics, Northwestern Polytechnical University, Xi’an 710072, China

⁴

School of Mathematics and Statistics, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(8), 1141; https://doi.org/10.3390/rs18081141

Submission received: 25 February 2026 / Revised: 7 April 2026 / Accepted: 9 April 2026 / Published: 12 April 2026

(This article belongs to the Special Issue Advances in Artificial Intelligence (AI) and Deep Learning (DL) in UAV-Based Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We propose a novel Analytic Geometric Alignment (AGA) framework for class-incremental pavement distress classification in UAV-based remote sensing, which innovatively integrates three key components: Subspace-Aware Analytic Initialization (SAI) to mathematically bridge the optimization gap for novel classes, a Decoupled Geometric Adapter (DGA) to decouple the global geometric aligment and local feature adaptation, and the Memory-Prioritized Regression (MPR) loss to enhance inter-class feature separability against complex UAV remote sensing backgrounds.
On the UAV-PDD2023 dataset, AGA achieves state-of-the-art accuracy and stability in fine-grained pavement distress classification, which is also demonstrated on the auxiliary RDD2022 dataset. Notably, the model maintains robust performance even under extreme low-memory conditions (e.g., retaining only 100–200 exemplar samples), significantly alleviating catastrophic forgetting without incurring massive computational overhead.

What are the implications of the main findings?

The exceptional resource efficiency and anti-forgetting capability of AGA provide a highly deployable technical solution for continuous, long-term infrastructure monitoring on edge devices within UAV air–ground collaborative systems.
This framework establishes a novel, data-efficient paradigm for processing streaming UAV imagery, offering robust support for dynamic remote sensing applications addressing critical challenges such as memory constraints, evolving target categories, and background interference.

Abstract

Automated pavement distress classification using high-resolution Unmanned Aerial Vehicle (UAV) imagery is pivotal for intelligent transportation systems. However, long-term UAV monitoring faces a continuous stream of evolving distress types and changing remote sensing background textures, necessitating Class-Incremental Learning (CIL) capabilities. Existing methods struggle to balance stability and plasticity, especially under the severe storage limitations typical of local edge stations in air–ground collaborative systems. This data scarcity leads to catastrophic forgetting and confusion among fine-grained distress categories. To address these challenges, we propose a data-efficient approach named Analytic Geometric Alignment (AGA). Our framework mainly consists of three key components. First, to overcome the optimization gap between the feature extractor and the fixed geometric target, we introduce a Subspace-Aware Analytic Initialization (SAI) that computes a closed-form projection to instantly align the feature subspace with the ETF manifold before each task training. Second, on this aligned basis, a Decoupled Geometric Adapter (DGA) is incorporated to facilitate continuous non-linear adaptation to complex aerial textures. Finally, for stable incremental training, we design a Memory-Prioritized Regression (MPR) loss to enforce tighter geometric constraints on replay samples, significantly enhancing model stability. Extensive experiments on the UAV-PDD2023 dataset demonstrate that AGA significantly outperforms state-of-the-art methods, showcasing excellent robustness and data efficiency.

Keywords:

pavement distress classification; UAV; class incremental learning; simplex equiangular tight frame; neural collapse

1. Introduction

Regular inspection and maintenance of road infrastructure are critical for ensuring traffic safety and transport efficiency. With the rapid advancement of computer vision and intelligent sensing technologies, automated pavement distress classification using high-resolution Unmanned Aerial Vehicle (UAV) imagery has progressively replaced traditional manual visual inspections. These UAV-based remote sensing technologies have become an integral component of Intelligent Transportation Systems (ITSs) [1,2,3,4]. As a novel and highly mobile remote sensing platform, UAVs offer unprecedented spatial resolution and operational flexibility, allowing for rapid, large-scale capture of fine-grained structural anomalies [5,6,7,8]. To process such imagery, early methods relied on hand-crafted features [9,10], which have now been largely superseded by deep Convolutional Neural Networks (CNNs) [5,11,12,13] and Vision Transformers (ViTs) [14,15,16]. These architectures have been successfully adapted for various remote sensing tasks, including road network extraction [17], multi-type distress segmentation [18], and surface inspection [19].

However, in practical long-term monitoring applications, detection and classification systems face a highly dynamic environment. UAVs continuously capture novel distress types, such as different shapes of cracks, potholes, and repair patches, as they fly over various regions. Additionally, road surface textures vary significantly due to regional differences [5,20]. This dynamic nature requires detection models to possess class-incremental learning (CIL) capabilities. Specifically, the models must continuously learn new distress categories from streaming remote sensing data while effectively retaining knowledge of previously learned features to avoid “catastrophic forgetting” [21,22,23].

Despite significant progress in CIL for general image classification, UAV-based pavement distress classification presents a unique set of challenges that intensifies the stability-plasticity dilemma. Various existing CIL strategies have shown promise, including regularization [24,25,26], prompt learning [27,28,29,30], and replay-based methods [31,32,33,34]. However, they still exhibit limitations in this specific domain. First, this is inherently a fine-grained incremental classification problem. Different distress classes in remote sensing imagery often share extremely similar textures, such as transverse versus longitudinal cracks. During incremental learning, the decision boundaries for these similar classes tend to drift, leading to confusion between old and new classes. Second, to process the massive and continuously growing high-resolution UAV data, an air–ground collaboration architecture is typically employed. While local maintenance stations or edge servers possess adequate computational power to fine-tune deep models, they face severe storage limitations and bandwidth bottlenecks. Transmitting or storing all historical UAV imagery for offline retraining is highly inefficient and often impractical. Consequently, systems must operate under low-buffer conditions where they cannot retain large amounts of historical data. This scarcity of data exacerbates feature distribution shifts and makes it exceptionally difficult to maintain model stability.

Existing solutions exhibit certain limitations when addressing these specific issues. Specifically, under strict storage limitations, replay-based methods suffer from severe class imbalance, which inevitably leads to biased classifiers and task-recency bias. To effectively address this classifier bias, recent geometric methods [35,36,37,38,39] attempt to fix the classifier’s structure based on the Neural Collapse (NC) phenomenon [40,41,42,43] to ensure stability. However, they often overlook the substantial optimization gap between the pre-trained feature extractor and the fixed geometric targets. As illustrated in Figure 1a, conventional random initialization results in severe spatial misalignment, leading to convergence difficulties and suboptimal overall performance.

To achieve both high stability and plasticity under these resource-constrained conditions, we propose a novel framework named Analytic Geometric Alignment (AGA). First, to eliminate the drift of decision boundaries caused by similar textures, we replace the learnable classifier with a fixed Simplex Equiangular Tight Frame (Simplex ETF). Second, to address the optimization gap, we introduce a Subspace-Aware Analytic Initialization (SAI) to mathematically calculate the optimal projection matrix before training starts. This provides a “warm start” that instantly aligns features with the ETF vertices, thereby maintaining excellent geometric stability. Furthermore, to maintain plasticity, the backbone needs to be fine-tuned to fit new knowledge. Recently, Parameter-Efficient Fine-Tuning (PEFT) strategies based on the pretrained backbone, such as prompt-based methods [27,28,29], have shown great success in CIL for natural images. However, we observe that in the context of pavement distress classification, PEFT methods often lack sufficient capacity to capture the fine-grained, complex texture variations typical of remote sensing imagery. As a result, we maintain full fine-tuning to maximize plasticity. To effectively prevent this backbone fine-tuning from disrupting the geometric stability established by SAI, we propose a Decoupled Geometric Adapter (DGA). By decoupling the global geometric alignment and the local feature adaptation, this paradigm provides the maximal plasticity required for fine-grained remote sensing imagery without compromising stability. Finally, to protect the old classes stored in the small buffer, we design a Memory-Prioritized Regression (MPR) loss. This loss applies stricter geometric constraints to replay memory buffer samples, ensuring that these samples remain tightly clustered around their prototypes despite the data imbalance.

The main contributions of this paper are summarized as follows:

To address the classifier drift caused by fine-grained similarity in pavement distresses, we propose the Analytic Geometric Alignment (AGA) framework. By replacing learnable classifiers with a fixed Simplex ETF, we provide a stable geometric anchor that maximizes inter-class separability under strict memory constraints.
To bridge the optimization gap between pre-trained features and the fixed geometric target, we propose the Subspace-Aware Analytic Initialization (SAI) to analytically align feature subspaces before training. On this aligned basis, we introduce a Decoupled Geometric Adapter (DGA) to maintain plasticity for complex aerial textures.
To mitigate catastrophic forgetting under extreme data imbalance, we propose a Memory-Prioritized Regression (MPR) loss. It imposes asymmetric geometric constraints on replay samples, ensuring that historical knowledge remains robust despite the scarcity of memory data.
Extensive experiments on the UAV-PDD2023 dataset demonstrate that AGA achieves state-of-the-art accuracy and stability. The results confirm its superior data efficiency, making it highly suitable for resource-constrained UAV edge deployment.

2. Materials and Methods

2.1. Dataset and Materials

To rigorously evaluate our continual learning framework in aerial scenarios, we utilize the UAV-PDD2023 dataset as our primary benchmark. The dataset comprises 2440 UAV-captured images containing over 11,158 annotated instances of pavement distresses. To ensure real-world practicality, the data encompasses diverse environmental scenarios, including clear and post-rainfall weather conditions, across various road types with different construction qualities (e.g., highways, provincial, and county roads). Specifically, the dataset includes six distinct categories: Longitudinal Crack, Transverse Crack, Oblique Crack, Alligator Crack, Pothole, and Repair. Although covering a wide range of realistic scenarios, the instances exhibit a distinctly long-tailed and imbalanced distribution across these categories, forming a challenging foundation for our incremental task sequence. As illustrated in Figure 2, UAV-PDD2023 presents two inherent challenges for classification. First, it exhibits strong fine-grained characteristics with high inter-class visual similarity; for instance, longitudinal, transverse, and oblique cracks share highly confusable local textures, while the irregular and diverse appearances of repaired areas make them exceptionally difficult to identify. Second, the images capture severe scale variations of the distress targets. These specific properties make it an ideal testbed for evaluating a model’s geometric stability and anti-forgetting capabilities under confusable conditions.

2.2. Motivation and Framework Overview

Real-world UAV pavement inspection is inherently dynamic, facing a continuous data stream with novel distress categories and varying backgrounds. This renders static models obsolete and necessitates the Class-Incremental Learning (CIL) paradigm. Formally, a model learns from a sequential task stream

{T_{1}, T_{2}, \dots, T_{T}}

. At step t, it must incrementally assimilate new classes using only the current dataset

D_{t}

and a strictly size-constrained memory buffer

M

available at local edge stations. The objective is to minimize classification error across the cumulative label space

Y_{1 : t}

, continuously integrating new concepts while retaining previously acquired knowledge.

During the CIL process, standard models often suffer from catastrophic forgetting and classification bias [21,32,34]. To mitigate these issues and enhance fine-grained distress recognition under data-scarce edge conditions, we propose the Analytic Geometric Alignment (AGA) framework. As illustrated in Figure 3, AGA adopts a fixed Simplex ETF as a geometric anchor to strictly constrain the feature distribution from drastic drifts. To better adapt this rigid structure to specific UAV data distributions, we introduce the Subspace-Aware Analytic Initialization (SAI), which analytically aligns the pre-trained feature subspace with ETF targets prior to training to eliminate the initial optimization gap. During the subsequent gradient training phase, a Decoupled Geometric Adapter (DGA) is locally fine-tuned to capture complex aerial textures, ensuring sufficient plasticity. Simultaneously, a Memory-Prioritized Regression (MPR) loss imposes strict geometric constraints on replay samples to robustly anchor historical knowledge. This cohesive pipeline allows the model to seamlessly capture novel remote sensing patterns while guaranteeing absolute classification stability.

2.3. Geometric Stability: The Simplex ETF Framework

To clearly explain our proposed method, we first introduce the application of the Simplex ETF framework in continual learning. Inspired by the phenomenon of Neural Collapse (NC) [40,41,42], this fixed geometric structure effectively mitigates the severe drift of feature distributions during the incremental learning process. Specifically, theoretical studies reveal that at the terminal phase of training, classifier weights naturally collapse into a rigid, symmetric geometric structure that maximizes inter-class separability, known as a Simplex Equiangular Tight Frame (Simplex ETF). We naturally hypothesize that pre-defining and anchoring the classifier to this terminal geometric structure provides the ideal stability for CIL scenarios. Formally, assuming the maximum number of classes to be learned across all incremental tasks is K, and the dimension of the feature extractor is d (where

d \geq K - 1

), a Simplex ETF classifier can be defined as a matrix

W = [w_{1}, \dots, w_{K}] \in R^{d \times K}

. These prototype vectors satisfy two critical properties: (1) Equal Norm: Eliminating magnitude-based bias. (2) Maximal Equiangular Separation: Maintaining a constant, maximum angular margin (

- 1 / (K - 1)

) between any pair of distinct classes.

To implement this theoretical concept prior to any training phase, we analytically construct the fixed ETF matrix

W^{*}

step by step. First, we generate a semi-orthogonal matrix

U \in R^{d \times K}

(typically via QR decomposition of a random Gaussian matrix), which satisfies

U^{T} U = I_{K}

. Next, we define a centering matrix

C = I_{K} - \frac{1}{K} 1_{K} 1_{K}^{T} \in R^{K \times K}

, where

1_{K}

is a vector of all ones. By projecting the centered structure into the feature space and scaling it to achieve unit norm, the final ETF classifier is explicitly formulated as

W^{*} = \sqrt{\frac{K}{K - 1}} U C .

(1)

It is mathematically trivial to verify that this constructed matrix

W^{*}

strictly satisfies the optimal Simplex ETF geometry:

{W^{*}}^{T} W^{*} = \frac{K}{K - 1} I_{K} - \frac{1}{K - 1} 1_{K} 1_{K}^{T} .

(2)

In our continual learning paradigm, this generated

W^{*}

is assigned as the weights of the final classification layer and remains strictly frozen (i.e., no gradient updates) throughout the entire sequential learning stream. The model optimizes the feature extractor

f_{θ} (\cdot)

using the standard Cross-Entropy loss:

L_{C E} = - \log \frac{\exp ({w_{y}^{*}}^{T} f_{θ} (x))}{\sum_{i} \exp ({w_{i}^{*}}^{T} f_{θ} (x))}

. By eliminating the optimization freedom of the classifier, the network is compelled to align incoming UAV remote sensing features onto these pre-set, maximally separated geometric anchors. This fundamentally prevents the classifier drift that typically overwrites old knowledge. Nevertheless, a critical issue arises: aligning representations from a highly structured pre-trained Vision Transformer to a randomly initialized geometric frame

W^{*}

introduces a severe optimization gap, significantly impeding the model’s plasticity for new tasks. To bridge this gap without sacrificing geometric stability, we introduce the Semantic Analytic Initialization (SAI) in the following section.

2.4. Bridging the Optimization Gap in Simplex ETF

While the Simplex ETF provides a theoretically stable target, applying it directly to CIL introduces a severe optimization gap. Specifically, the pre-trained backbone is typically agnostic to the randomly generated, rigid ETF vertices. This misalignment is particularly pronounced in UAV-based pavement inspection, where a significant domain gap exists between the pre-training data (typically natural images) and the specific remote sensing distress patterns. As illustrated in Figure 1a, forcing the backbone to adapt to these misaligned targets via standard SGD leads to gradient instability and the distortion of pre-trained features. To resolve this, we introduce two main components. First, we employ a closed-form Subspace-Aware Analytic Initialization (SAI) to analytically align the feature subspace with the ETF targets. Subsequently, a Decoupled Geometric Adapter (DGA) is incorporated to enable continuous plasticity for fine-grained adjustments.

2.4.1. Subspace-Aware Analytic Initialization (SAI): Global Realignment

At the onset of each new task t, rather than starting optimization from a random state, we seek to mathematically “reset” the system to a globally aligned geometric state. We introduce a linear transformation from the feature space to the ETF classifier, serving as a Geometric Alignment Matrix

P \in R^{d \times d}

. Theoretically, achieving global optimal alignment would require access to the features of all seen classes

D_{0 : t}

. However, under the strict CIL protocol dictated by the storage limits of local edge stations, the entire historical UAV data is unavailable, with the exception of the stored memory buffer. To approximate this global distribution, we construct a joint feature matrix by aggregating features from the current task

D_{t}

and the exemplar memory

M

. Although

M

is a sparse approximation of history data distribution, it provides critical anchors for the old feature subspace.

To formulate the alignment rigorously, let

N = | D_{t} | + | M |

denote the total number of available samples. We construct the joint feature matrix

H_{j o i n t} \in R^{d \times N}

, where each column represents a feature vector

h_{i} \in R^{d \times 1}

. Let

W_{j o i n t} \in R^{d \times N}

denote the corresponding ETF prototype matrix, where the i-th column is the assigned ETF vertex

w_{y_{i}} \in R^{d \times 1}

for the sample’s ground-truth label

y_{i}

.

To minimize the difference between the transformed subspace and the Simplex ETF classifier, we naturally formulate the alignment as a Ridge Regression problem. We solve for the optimal matrix

P^{*}

that minimizes the projection error:

P^{*} = \arg \min_{P} | | P H_{j o i n t} - W_{j o i n t} {| |}_{F}^{2} + {λ | | P | |}_{F}^{2},

(3)

where

λ > 0

is a regularization coefficient. By setting the derivative with respect to

P

to zero, this yields the closed-form solution:

P^{*} = W_{j o i n t} H_{j o i n t}^{T} {(H_{j o i n t} H_{j o i n t}^{T} + λ I)}^{- 1} .

(4)

We use this analytically derived matrix

P^{*}

to align the feature space with the ETF classifier. To prevent misalignment drift during the subsequent gradient-based training, we fix

P^{*}

immediately after calculation. Note that this operation serves as an instant rectification for the initialization of Simplex ETF prototypes: it rotates the feature subspace to optimally match the rigid ETF structure before the training of the current task begins. By eliminating the initial misalignment, SAI significantly reduces the optimization difficulty of the backbone under the CIL setting and further prevents the task-recency bias typically caused by the struggle of SGD to find this equilibrium from scratch.

2.4.2. Decoupled Geometric Adapter (DGA): Continuous Plasticity

While SAI provides a linear global alignment, pavement distress patterns in high-resolution UAV imagery often exhibit complex, non-linear variations (e.g., irregular crack branching) that a simple linear projection cannot fully capture. The feature extractor requires sufficient plasticity to map these fine-grained manifolds onto the ETF vertices.

To this end, we introduce the Decoupled Geometric Adapter (DGA) appended to the backbone, which is designed to absorb the non-linear residual geometric distortions that remain after the linear alignment. Let

h = f_{θ} (x) \in R^{d \times 1}

denote the feature column vector extracted by the backbone. The DGA, parameterized by

ϕ

, applies a residual transformation:

h_{d g a} = h + {MLP}_{ϕ} (h),

(5)

where

{MLP}_{ϕ} (\cdot)

denotes a Multi-Layer Perceptron. This architecture incorporates a residual connection to facilitate optimization and gradient flow [44]. Subsequently, the final embedding

z \in R^{d \times 1}

is obtained by projecting the adapted feature through the optimal fixed SAI alignment matrix

P^{*} \in R^{d \times d}

:

z = P^{*} h_{d g a} .

(6)

In our synergy framework, the training process is decoupled: the global geometric orientation is locked by the fixed

P^{*}

, while the backbone parameters

θ

and DGA parameters

ϕ

are fine-tuned via gradients. This design allows the model to focus on learning discriminative local semantics without disrupting the global geometric stability established by SAI.

2.5. Optimization with Memory-Prioritized Constraints

To fully fine-tune the backbone and the proposed DGA module, we can adopt the standard Cross-Entropy (CE) loss, which is widely used in classification problems. However, CE loss struggles with the task-recency bias under the CIL protocol [32,34]. Alternatively, we choose the Dot-Product Regression (DR) loss in [35] to enforce the Neural Collapse geometry. Specifically, DR loss directly maximizes the cosine similarity between the normalized feature

\hat{z} = z / {| | z | |}_{2}

and the assigned class prototype

w_{y}

.

However, in the replay-based CIL setting deployed at local edge stations, the model encounters a severe data imbalance between the massive new UAV data streams in current task

D_{t}

and the highly constrained memory buffer

M

. Since the size of the new task dataset far exceeds that of the buffer (

| D_{t} | ≫ | M |

), the optimization landscape is dominated by gradients derived from new classes. Consequently, the feature extractor tends to prioritize learning new patterns while neglecting the alignment of buffered exemplars, causing the features of old classes to drift away from their assigned ETF vertices.

To counteract this gradient dominance, we propose the Memory-Prioritized Regression (MPR) loss. The key idea is to impose an asymmetric geometric constraint that places a higher priority on the structural stability of memory samples. Specifically, for samples from the new task

(x, y) \in D_{t}

, we employ the standard regression target of 1, encouraging the feature to align with the prototype direction:

L_{n e w} = \frac{1}{2} {({\hat{z}}^{T} w_{y} - 1)}^{2} .

(7)

This standard constraint allows the model sufficient flexibility to learn new semantic representations. Conversely, for replay samples

(x, y) \in M

, we introduce a stricter constraint with a margin-enhanced target

1 + δ

(where

δ > 0

):

L_{m e m} = \frac{1}{2} {({\hat{z}}^{T} w_{y} - (1 + δ))}^{2} .

(8)

Since the cosine similarity

{\hat{z}}^{T} w_{y}

is mathematically upper-bounded by 1, the term

({\hat{z}}^{T} w_{y} - (1 + δ))

remains non-zero. This formulation ensures that the gradient for memory samples never saturates (i.e., vanishes). It generates a persistent force that continuously pulls old class features towards their ETF vertices, effectively enforcing a tighter clustering for historical knowledge to compensate for the scarcity of memory data.

The final objective function balances these two terms as follows:

L_{t o t a l} = \frac{γ}{| B_{m e m} |} \sum_{j \in B_{m e m}} L_{m e m}^{(j)} + \frac{1 - γ}{| B_{n e w} |} \sum_{i \in B_{n e w}} L_{n e w}^{(i)},

(9)

where

B_{n e w}

and

B_{m e m}

denote the mini-batches sampled from the current task and memory buffer, respectively, and

γ

is a hyperparameter that controls the strength of the memory constraint.

In summary, our proposed Analytic Geometric Alignment (AGA) framework fundamentally relies on three core designs: SAI, DGA, and the MPR loss. The entire training pipeline is detailed in Algorithm 1.

Algorithm 1 Analytic Geometric Alignment (AGA) Training Process

Require:: Sequence of tasks ${T_{1}, \dots, T_{T}}$ , Pre-trained backbone $f_{θ}$ , DGA parameters $ϕ$ , Geometric Alignment Matrix $P$ , Memory buffer $M = \emptyset$ .

1:: Initialize fixed Simplex ETF classifier $W$ by Equation (2).
2:: for task $t = 1$ to T do
3:: // Subspace-Aware Analytic Initialization (SAI)
4:: Extract joint features $H_{j o i n t}$ from $T_{t} \cup M$ using the current backbone $f_{θ}$ .
5:: Analytically compute optimal alignment matrix $P^{*}$ via Equation (4).
6:: // Gradient Fine-Tuning via DGA and MPR Loss
7:: for epoch $= 1$ to E do
8:: for minibatch $(x, y)$ in $T_{t} \cup M$ do
9:: Calculate adapted features z via Equation (6).
10:: Calculate total loss $L_{t o t a l}$ via Equation (9).
11:: Update backbone parameters $θ$ and DGA parameters $ϕ$ via SGD: $\nabla_{ϕ} L_{t o t a l}$ .
12:: end for
13:: end for
14:: Update $M$ by randomly selecting exemplars from $T_{t}$ .
15:: end for

3. Experiments

In this section, we conduct comprehensive experiments to evaluate the effectiveness of our proposed Analytic Geometric Alignment (AGA) framework. We perform a comparative analysis against state-of-the-art methods, followed by detailed ablation studies and visualization results to validate the specific contributions of our framework.

3.1. Experimental Setup

To rigorously verify our method, we adopt a primary-and-auxiliary evaluation strategy. As detailed in Section 2, we select UAV-PDD2023 as our primary validation benchmark, which perfectly aligns with our target scenario of UAV-based remote sensing. Furthermore, to comprehensively demonstrate the universality and cross-platform robustness of our framework, we incorporate the RDD2022 dataset as an auxiliary benchmark.

3.1.1. Implementation Details

For all experiments, we adopt the standard Class-Incremental Learning (CIL) protocol, where the model is evaluated on a unified test set containing all classes seen so far after each training session. We employ the Vision Transformer (ViT-B/16) [45] pre-trained on ImageNet-21K [46] as the backbone feature extractor for both our AGA and all comparison methods. The input images for both datasets are resized to

224 \times 224

. All models are trained using the SGD optimizer with a batch size of 64. The learning rate is set to 0.01 for UAV-PDD2023.

To simulate the severe storage constraints of local edge stations in air–ground collaborative inspection systems, we strictly limit the memory buffer size (e.g.,

| M | \in {100, 200, 500}

images in total). For the proposed AGA, the specific hyperparameters are configured as follows: the regularization parameter

λ

in the SAI is set to 0.001; for the MPR loss, the margin is set to

δ = 0.1

and the loss weight is set to

γ = 0.5

. To demonstrate the robustness of our method, these AGA-specific hyperparameters

{λ, δ, γ}

are kept fixed across all experimental settings and datasets without task-specific tuning.

3.1.2. Evaluation Metrics

We employ five quantitative metrics to comprehensively assess model performance:

Average Accuracy (ACC ↑): This metric reports the mean classification accuracy on all encountered classes after the final task. Let $a_{i, j}$ denote the accuracy on task j after training on task i. The Average Accuracy is defined as $A C C = \frac{1}{T} \sum_{j = 1}^{T} a_{T, j}$ . The symbol ↑ indicates that higher scores denote better performance.
Forgetting Measure (FM ↓): This metric quantifies the degradation of performance on previous tasks. It is defined as $F M = \frac{1}{T - 1} \sum_{j = 1}^{T - 1} \max_{l \in {j, \dots, T - 1}} (a_{l, j} - a_{T, j})$ . Conversely, ↓ signifies that lower values indicate better stability.
Macro Metrics: To account for potential class imbalances in dynamic inspection data, we also report the Macro-Precision (↑), Macro-Recall (↑), and Macro-F1 Score (↑), which are calculated by averaging the respective metric across all classes independently. High values in these metrics indicate that the model achieves a reliable balance between minimizing false positives (Precision) and avoiding missed detections (Recall), ensuring comprehensive classification capability across all distress types.

3.2. Comparative Results

3.2.1. Baselines and Comparison Methods

To rigorously evaluate the proposed AGA in the context of remote sensing applications, we compare it against a diverse set of representative CIL methods categorized into four main streams. For a fair comparison, all methods utilize the same ViT-B/16 backbone.

Replay-based Methods: We include classic methods like ER (Experience Replay) [31] and iCaRL [32], as well as the recent state-of-the-art DGR [47]. ER maintains a memory buffer to rehearse old samples via cross-entropy loss, while iCaRL combines representation learning with a Nearest Class Mean (NCM) classifier to mitigate catastrophic forgetting. DGR specifically tackles the severe class imbalance issue in incremental tasks by employing gradient reweighting and distribution-aware knowledge distillation. All these methods utilize a memory buffer to rehearse old knowledge.
Geometric Methods: We compare with NC-FSCIL [35], which leverages the geometry of Neural Collapse. Unlike replay-based baselines, NC-FSCIL adopts a prototype-based strategy, storing only the mean feature vector for each class rather than raw images. This offers extreme memory efficiency but lacks the capability to rehearse fine-grained data distribution details.
Parameter-Efficient Fine-Tuning (PEFT) Methods: This stream includes prompt-based methods (L2P [27] and DualPrompt [28]) and adapter-based methods (TUNA [48]). Prompt learning methods were originally designed for exemplar-free settings and achieved sound performance on incremental natural image classification. To ensure a fair comparison under our replay protocol ( $| M | = 200$ ), we integrate them with the same rehearsal mechanism used in our method, denoted as L2P* and DualPrompt*. Conversely, TUNA integrates task-specific and universal adapters to capture both specialized and shared knowledge. We retain TUNA’s original rehearsal-free design to evaluate its innate anti-forgetting capacity, as it operates entirely without storing historical exemplars.
Incremental Pavement Distress Methods: We also include DML-PDI [5], a state-of-the-art method specifically designed for incremental UAV pavement anomaly detection. DML-PDI adopts a training-free, metric-learning paradigm. However, it requires storing features of all encountered samples to compute precise prototypes for inference, resulting in high storage costs that scale linearly with the dataset size.

3.2.2. Performance on UAV-PDD2023

The quantitative results of our method and all comparison methods on the primary UAV-PDD2023 dataset are summarized in Table 1.

Consistent Superiority across Buffer Sizes. As shown in Table 1, our proposed AGA demonstrates consistent superiority over all competitors across different memory buffer sizes. Notably, under the most constrained setting where the buffer size is extremely small (simulating the strict storage limits of local edge stations), our method achieves a remarkable improvement compared to the classic strong baseline, iCaRL. Specifically, AGA boosts the Average Accuracy (ACC) by 3.69% while simultaneously reducing the Forgetting Measure (FM) by 5.72% when the buffer size is only 100. This validates the high data efficiency of our analytic geometric alignment strategy in handling high-resolution UAV imagery.

Analysis of Comparison Methods. The classic replay method ER [31], which relies on a standard linear classifier with Softmax, suffers significantly from task-recency bias [32,34,49], resulting in mediocre performance. Although iCaRL [32] demonstrates strong competitiveness, it relies on a computationally expensive herding strategy for exemplar selection. The recent gradient-reweighting method DGR [47] delivers commendable performance, particularly at the extreme buffer size of 100, where it exhibits strong anti-forgetting capabilities (achieving an FM of 31.65%). However, its overall classification accuracy across all memory settings still falls short of our proposed method. In contrast, our AGA achieves superior performance using simple random sampling, effectively reducing the computational burden on edge devices. Regarding parameter-efficient methods, L2P* [27] and DualPrompt* [28] yield unsatisfactory results even when augmented with replay. Similarly, the latest adapter-based method TUNA [48] fine-tunes only a minimal number of parameters. While this strategy demands low computational resources, its overall performance is noticeably restricted. This demonstrates that freezing the backbone and relying on lightweight modules makes it difficult to adequately handle such complex, fine-grained continual classification tasks. Their limited learnable parameters restrict the model’s plasticity, which is proven to be critical for capturing the fine-grained, complex textures inherent in UAV remote sensing imagery. The geometric baseline NC-FSCIL [35], despite its extreme memory efficiency, exhibits poor performance. This indicates that a simple linear mapping is insufficient to bridge the optimization gap between the pre-trained feature extractor and the fixed, randomly initialized ETF classifier. Consequently, it fails to handle the significant domain shift present in downstream UAV pavement data. Finally, the SOTA incremental pavement distress method DML-PDI [5] achieves decent performance by storing extensive feature data for inference. However, its training-free design inherently prevents it from learning task-specific discriminative features. This results in a severe lack of plasticity, leading to suboptimal performance compared to our full fine-tuning approach.

More Metrics for Fair Evaluation. Beyond accuracy, AGA significantly lowers the FM, indicating a superior trade-off between stability and plasticity in the continual learning setting. Furthermore, our method shows substantial improvements in Macro-F1, Precision, and Recall. These comprehensive gains suggest that AGA effectively minimizes both false positives and missed detections, ensuring reliable classification across all distress categories in dynamic UAV inspections.

3.2.3. Performance on RDD2022

To further verify the cross-platform generalization of our method across different data distributions, we conduct supplementary experiments on the terrestrial RDD2022 dataset [2]. Unlike the aerial imagery of our primary UAV-PDD2023 benchmark, RDD2022 consists of close-range images captured by vehicular cameras and smartphones. We select 5 common damage categories from this dataset and partition them into 3 incremental tasks (a base task of 3 classes, followed by 2 incremental tasks introducing 1 new class each). The evaluation results under memory buffer sizes of 100, 200, and 500 are summarized in Table 2. Overall, our proposed AGA consistently outperforms the classic baseline ER [31] and the strong competitor iCaRL [32] across all buffer settings. Specifically, under the strict 100-buffer regime, AGA improves the average accuracy (ACC) by 4.45% and reduces the forgetting measure (FM) by 7.03% compared to iCaRL. While ER suffers from severe task-recency bias during incremental updates, AGA effectively maintains historical knowledge and achieves a superior balance of stability and plasticity. These supplementary results confirm that our geometric alignment framework is not over-fitted to specific aerial imagery, but possesses robust generalizability for various visual sensing platforms and environments.

3.3. Detailed Analysis and Additional Experiments

In this section, we conduct a detailed analysis to verify the effectiveness of individual components within our AGA framework. We further provide visual evidence of the learned geometric structures and the temporal dynamics of the learning process in the context of UAV remote sensing.

3.3.1. Ablation Study

To investigate the contribution of each module, we perform a step-by-step ablation study on the primary UAV-PDD2023 dataset with a restricted buffer size of 200 (simulating edge station limits), with detailed results reported in Table 3.

The baseline model, employing a standard ViT-B/16 backbone with Experience Replay and a learnable linear classifier, achieves an ACC of 52.05% and an FM of 63.18%. These results reveal that the standard linear classifier with Softmax struggles severely with catastrophic forgetting when handling fine-grained remote sensing data under the CIL setting. Replacing the learnable head with a Fixed Simplex ETF classifier improves the accuracy to 77.95% and reduces FM to 33.06%, indicating that a predefined, maximally separated geometric structure provides a more stable optimization target against the extreme data imbalance inherent in local edge buffers.

Building upon this, the introduction of SAI yields the most substantial performance boost, increasing accuracy to 83.49% and significantly dropping FM to 25.85%. This confirms our core hypothesis that analytically aligning features with the fixed classifier at the onset of each task eliminates the initial optimization gap, providing a “warm start” that reduces the disruption to existing knowledge derived from previous UAV flights. To further balance stability and plasticity, we incorporate the DGA, helping the model gain better plasticity for new tasks with complex aerial textures while preserving general representations, leading to further accuracy improvements (84.67%). Finally, incorporating the MPR loss serves as the ultimate safeguard for stability. It constrains the replay samples to stay close to their prototypes, minimizing semantic drift and achieving the best overall performance across all evaluation metrics.

3.3.2. Geometric Visualization

To visualize the impact of AGA on feature learning and discriminability for complex remote sensing imagery, we present the t-SNE embeddings of feature distributions and classifier prototypes on the UAV-PDD2023 dataset in Figure 4.

As illustrated in Figure 4a, ER reveals a distinct misalignment between the empirical feature centers (▴) and the classifier weights (★). Although the learnable classifier attempts to delineate boundaries, the standard Softmax loss lacks strong intra-class compactness constraints. This results in a significant gap between features and prototypes, causing severe overlap among visually similar categories. For instance, Class 2 (Transverse Crack) and Class 3 (Oblique Crack) are heavily entangled; since these categories differ primarily in orientation within complex road backgrounds, the weak constraints of ER fail to yield sufficiently discriminative, rotation-invariant features required for aerial imagery.

As observed in Figure 4b, iCaRL exhibits severe feature entanglement, not only among various linear cracks but also specifically between Class 1 (Longitudinal Crack) and Class 5 (Repair). This primarily occurs because iCaRL employs a Nearest Class Mean (NCM) classifier, where class prototypes are passively updated as the averages of drifting features. Lacking fixed geometric anchors, this classifier provides no corrective constraint against semantic drift. Consequently, since repair patches often follow the original crack paths and share highly similar road background textures typical of remote sensing imagery, the passive NCM classifier struggles to distinguish the subtle visual differences between a sealed crack and an active one.

In contrast, AGA (Figure 4c) exhibits a high degree of Neural Collapse. Driven by SAI, the feature centers (▴) align perfectly with the fixed Simplex ETF vertices (★). Furthermore, the MPR loss imposes strict geometric constraints, actively pulling features towards their assigned vertices to minimize the optimization gap. This results in highly compact and well-separated clusters. Notably, previously confused pairs (i.e., Transverse vs. Oblique Cracks and Longitudinal Cracks vs. Repair) are now clearly distinguishable with wide decision margins. This confirms that AGA’s fixed geometric anchors effectively mitigate interference in dynamic UAV data streams, ensuring robust classification for fine-grained distress types in air–ground collaborative systems.

3.3.3. Learning Dynamics and Stability

Figure 5 illustrates the task-wise test accuracy dynamics throughout the entire incremental training process. The results clearly indicate that AGA (solid line with circles) achieves superior stability and plasticity compared to iCaRL (dashed line with squares) and ER (dotted line with triangles) when updating models with new data batches. Specifically, regarding the initial Task 0 (blue lines), baselines suffer a sharp performance drop immediately after the introduction of Task 1 (Epoch 10), indicating severe catastrophic forgetting caused by the interference of new gradients introduced by newly captured UAV images, which is also known as the stability gap in continual learning [34,50]. In contrast, AGA exhibits a significantly milder decay during this transition phase and maintains the highest final accuracy. This confirms that the Fixed ETF initialized by SAI constructs a robust geometric prior that effectively withstands the shock of distribution shifts. Moreover, for the new Task 1, the effective analytic initialization enables AGA to achieve not only a higher starting accuracy but also superior final performance compared to competitors. This proves that our method does not compromise plasticity for stability. Specifically, through the synergy of SAI, DGA, and MPR loss, it enables effective full fine-tuning, allowing the model to capture novel remote sensing distress patterns accurately while rigorously preserving historical knowledge stored in the limited local buffer.

To provide a deeper insight into the task-recency bias, Figure 6 visualizes the confusion matrices after the final incremental stage. In these matrices, the horizontal and vertical axes represent the predicted and true categories, respectively, with the red bounding boxes specifically highlighting all test samples that are predicted as newly introduced classes. For the baseline ER, a severe task-recency bias is evident, as a massive portion of historical samples are misclassified into the new classes. In contrast, both iCaRL and our proposed AGA significantly alleviate this issue, drastically reducing the false-positive predictions within the red boxes and yielding overall better classification outcomes. Furthermore, a closer comparison between iCaRL and AGA reveals different learning tendencies: iCaRL leans towards stronger plasticity but weaker stability, effectively recognizing new classes but suffering from noticeable performance degradation on the old ones. Conversely, our method robustly maintains the diagnostic accuracy on historical categories while accurately adapting to novel ones, demonstrating a superior and optimal balance between stability and plasticity.

3.3.4. Hyperparameter Sensitivity Analysis

To evaluate the robustness of our framework, we conduct a parameter sensitivity analysis on three core hyperparameters: the SAI regularization coefficient (

λ

), the MPR margin (

δ

), and the MPR loss weight (

γ

), as detailed in Table 4. Specifically,

λ

acts as the penalty term in the ridge regression (Equation (3)). While a value of 0.0015 also shows competitive results, we adopt the standard value of 0.001 to avoid complex hyperparameter tuning. The parameter

δ

in Equation (8) defines the margin size, where a larger value puts a stronger constraint on old task exemplars. Our experiments indicate that setting

δ

to either 0.10 or 0.15 yields highly competitive results. Consequently, we empirically adopt 0.10 as the default configuration, which provides a sufficient margin to preserve historical knowledge without imposing excessive rigidity during the optimization of new tasks. Furthermore,

γ

(Equation (9)) controls the overall regularization strength on historical data. A larger

γ

reduces forgetting but limits plasticity (yielding lower F1 and ACC), whereas a smaller

γ

worsens catastrophic forgetting. An intermediate value of 0.5 achieves the optimal balance. Overall, these results validate that our analytical geometric alignment mechanism maintains highly stable performance over a broad range of parameter values, effectively balancing plasticity and stability.

3.3.5. Computational Overhead Analysis

To evaluate the practical feasibility of deploying our model on resource-constrained edge devices (e.g., UAVs), we compare the computational overhead of AGA against ER and iCaRL (evaluated on a single NVIDIA RTX 4090 with a buffer size of 200). As detailed in Table 5, AGA introduces virtually no additional computational burden compared to the baseline ER. During the base task (Task 1), all methods exhibit similar memory usage and training times (∼333–345 s). However, in the incremental phase (Task 2), iCaRL’s memory footprint surges to 17,754 MB due to its knowledge distillation mechanism requiring old-model maintenance. In contrast, AGA consumes only 11,479 MB which is comparative with the baseline ER, remaining highly memory-efficient. Furthermore, calculating the unique analytic projection matrix (

P^{*}

) in AGA requires merely ∼16–25 ms once per incremental phase. This millisecond-level overhead is entirely negligible compared to the overall network fine-tuning, confirming that AGA achieves significant anti-forgetting improvements while maintaining exceptional efficiency for real-world edge deployment.

3.3.6. Attention Visualization

To further investigate the models’ ability to localize fine-grained pavement distresses under the class-incremental setting, we extract the class activation mapping (Grad-CAM) visualizations for various distress categories after the incremental training. As shown in Figure 7, the baseline replay method ER frequently misdirects its attention towards image edges or meaningless background padding, which can easily miss the actual structural damages. This severe distraction indicates that ER suffers from catastrophic forgetting, losing its plasticity to capture fine-grained textures in complex UAV imagery. While iCaRL exhibits relatively better localization than ER, it still occasionally focuses on erroneous or irrelevant regions, which consequently degrades its classification performance on visually similar anomalies. In contrast, our proposed AGA framework consistently and accurately anchors its attention onto the precise distress locations across both the initial task categories (Figure 7a–c) and the newly learned ones (Figure 7d–f). By maintaining robust geometric alignment and effectively mitigating semantic drift, AGA successfully distinguishes confusable damage types and provides highly reliable fine-grained classification results in dynamic remote sensing scenarios.

4. Discussion

Although our proposed continual learning framework demonstrates superior performance across incremental tasks on datasets like UAV-PDD2023, the robustness of vision-only models remains inherently constrained by physical limitations under extreme lighting (e.g., nocturnal environments) or adverse weather (e.g., dense fog), which exacerbate visual feature shifts in dynamic real-world deployments. Integrating multimodal sensor fusion technologies, such as infrared thermal imaging to capture heat radiation from UAV motors at night and radar to penetrate rain and fog while detecting moving targets, represents a pivotal strategy to overcome these perceptual bottlenecks. In this context, Synthetic Aperture Radar (SAR) can serve as a representative complementary modality for UAV perception, where advanced deep learning techniques have demonstrated significant potential [51,52], including paradigms such as triple-level sparsity awareness [53] and glance–focus–gaze mechanisms [54], which provide representations that are inherently robust to illumination and weather variations. Future research will aim to extend our method into a multimodal incremental perception framework, fusing thermal and spatiotemporal features with efficient knowledge update mechanisms to construct a robust, all-weather detection system capable of continually learning new tasks.

5. Conclusions

In this paper, we have addressed the critical challenges of catastrophic forgetting and feature–classifier misalignment inherent in class-incremental learning for UAV-based remote sensing inspection. To resolve the stability–plasticity dilemma under the severe storage constraints of local edge stations, we proposed the Analytic Geometric Alignment (AGA) framework, which fundamentally rethinks the optimization paradigm from a geometric perspective. By introducing Subspace-Aware Analytic Initialization (SAI), AGA analytically aligns the feature subspace with a fixed Simplex Equiangular Tight Frame (ETF) classifier, effectively bridging the initial optimization gap that plagues traditional gradient-based methods. This global geometric stability is further complemented by the Decoupled Geometric Adapter (DGA) and Memory-Prioritized Regression (MPR) loss, which synergistically enable the model to capture the fine-grained, complex textures of novel remote sensing distress patterns while rigorously preserving historical knowledge. Extensive experiments on the primary UAV-PDD2023 dataset and the auxiliary terrestrial RDD2022 dataset demonstrate that AGA significantly outperforms state-of-the-art competitors in both accuracy and stability, particularly in low-buffer regimes. Visualization results further confirm that our method successfully induces Neural Collapse, achieving optimal class separability for dynamic aerial imagery. We believe that this geometric-analytic perspective offers a highly promising direction for building robust, data-efficient, and cross-platform incremental learning frameworks in air–ground collaborative infrastructure monitoring. Furthermore, considering the limitations of vision-only models in extreme environments, our future work will focus on multi-modal incremental learning by incorporating more robust sensing modalities such as SAR and thermal imaging, so as to improve stability under diverse real-world conditions.

Author Contributions

Conceptualization, Q.W. and R.W.; data curation, Q.W. and X.L.; formal analysis, Q.W. and X.L.; investigation, Q.W. and X.L.; methodology, Q.W., X.L., J.P., X.J. and R.W.; project administration, Q.W. and X.L.; resources, Q.W. and X.L.; software, Q.W. and X.L.; supervision, Q.W., X.L., J.P., X.J. and R.W.; validation, Q.W., X.L., J.P., X.J. and R.W.; visualization, Q.W., X.L. and R.W.; writing—original draft, Q.W., X.L., J.P., X.J. and R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants 62306233, 12401674, and 62372359, the China Postdoctoral Science Foundation under Grant 2024M752550, and the Guangdong Basic and Applied Basic Research Foundation under Grant 2025A1515011453.

Data Availability Statement

The datasets used in the experiments are publicly available and can be downloaded from the following links: UAV-PDD2023 dataset: https://zenodo.org/records/8429208 (accessed on 10 February 2026). RDD2022 dataset: https://datasetninja.com/road-damage-detector (accessed on 10 February 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Mraz, A.; Kashiyama, T.; Sekimoto, Y. Deep Learning-based Road Damage Detection and Classification for Multiple Countries. Autom. Constr. 2021, 132, 103935. [Google Scholar] [CrossRef]
Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Sekimoto, Y. RDD2022: A Multi-National Image Dataset for Automatic Road Damage Detection. Geosci. Data J. 2024, 11, 846–862. [Google Scholar] [CrossRef]
Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Omata, H.; Kashiyama, T.; Sekimoto, Y. Crowdsensing-based Road Damage Detection Challenge (CRDDC’2022). In Proceedings of the IEEE International Conference on Big Data; IEEE: Piscataway, NJ, USA, 2022; pp. 6378–6386. [Google Scholar]
Yan, H.; Zhang, J. UAV-PDD2023: A Benchmark Dataset for Pavement Distress Detection based on UAV Images. Data Brief 2023, 51, 109692. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Wang, J.; Lü, B.; Yang, H.; Wu, X. Deep Metric Learning-Based Classification for Pavement Distress Images. Sensors 2025, 25, 4087. [Google Scholar] [CrossRef]
Guo, H.; Wu, Q.; Wang, Y. Auhf-detr: A Lightweight Transformer with Spatial Attention and Wavelet Convolution for Embedded UAV Small Object Detection. Remote Sens. 2025, 17, 1920. [Google Scholar] [CrossRef]
Li, H.; Ma, J.; Zhang, J. ELNet: An Efficient and Lightweight Network for Small Object Detection in UAV Imagery. Remote Sens. 2025, 17, 2096. [Google Scholar] [CrossRef]
Huang, J.; Jin, W.; Tao, H.; Feng, Y.; Shang, Y.; Wang, S.; Liu, A. EFPNet: An Efficient Feature Perception Network for Real-Time Detection of Small UAV Targets. Remote Sens. 2026, 18, 340. [Google Scholar] [CrossRef]
Barburiceanu, S.; Terebes, R.; Meza, S. 3D Texture Feature Extraction and Classification Using GLCM and LBP-based Descriptors. Appl. Sci. 2021, 11, 2332. [Google Scholar] [CrossRef]
Fujita, Y.; Shimada, K.; Ichihara, M.; Hamamoto, Y. A Method based on Machine Learning Using Hand-crafted Features for Crack Detection from Asphalt Pavement Surface Iages. In Proceedings of the Thirteenth International Conference on Quality Control by Artificial Vision 2017; SPIE: Bellingham, WA, USA, 2017; Volume 10338, pp. 117–124. [Google Scholar]
Guan, S.; Liu, H.; Pourreza, H.R.; Mahyar, H. Deep Learning Approaches in Pavement Distress Identification: A Review. arXiv 2023, arXiv:2308.00828. [Google Scholar] [CrossRef]
Li, D.; Duan, Z.; Hu, X.; Zhang, D.; Zhang, Y. Automated Classification and Detection of Multiple Pavement Distress Images based on Deep Learning. J. Traffic Transp. Eng. (Engl. Ed.) 2023, 10, 276–290. [Google Scholar] [CrossRef]
Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1127–1141. [Google Scholar] [CrossRef]
Zhang, S.; Wang, K.; Liu, Z.; Huang, M.; Huang, S. The Fine Feature Extraction and Attention Re-Embedding Model Based on the Swin Transformer for Pavement Damage Classification. Algorithms 2025, 18, 369. [Google Scholar] [CrossRef]
Chen, Y.; Gu, X.; Liu, Z.; Liang, J. A Fast Inference Vision Transformer for Automatic Pavement Image Classification and Its Visual Interpretation Method. Remote Sens. 2022, 14, 1877. [Google Scholar] [CrossRef]
Ali, L.; Jassmi, H.A.; Khan, W.; Alnajjar, F. Crack45K: Integration of Vision Transformer with Tubularity Flow Field (TuFF) and Sliding-window Approach for Crack-segmentation in Pavement Structures. Buildings 2023, 13, 55. [Google Scholar] [CrossRef]
Liu, B.; Ding, J.; Zou, J.; Wang, J.; Huang, S. LDANet: A Lightweight Dynamic Addition Network for Rural Road Extraction from Remote Sensing Images. Remote Sens. 2023, 15, 1829. [Google Scholar] [CrossRef]
Song, W.; Zhang, Z.; Zhang, B.; Jia, G.; Zhu, H.; Zhang, J. ISTD-PDS7: A Benchmark Dataset for Multi-type Pavement Distress Segmentation from CCD Images in Complex Scenarios. Remote Sens. 2023, 15, 1750. [Google Scholar] [CrossRef]
Han, F.; Gu, C. Surface Damage Detection in Hydraulic Structures from UAV Images Using Lightweight Neural Networks. Remote Sens. 2025, 17, 2668. [Google Scholar] [CrossRef]
Peng, R.; Zhao, W.; Li, K.; Ji, F.; Rong, C. Continual Contrastive Learning for Cross-dataset Scene Classification. Remote Sens. 2022, 14, 5105. [Google Scholar] [CrossRef]
French, R.M. Catastrophic Forgetting in Connectionist Networks. Trends Cogn. Sci. 1999, 3, 128–135. [Google Scholar] [CrossRef]
Zhou, D.W.; Sun, H.L.; Ning, J.; Ye, H.J.; Zhan, D.C. Continual Learning with Pre-trained Models: A Survey. arXiv 2024, arXiv:2401.16386. [Google Scholar] [CrossRef]
Wang, L.; Zhang, X.; Su, H.; Zhu, J. A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5362–5383. [Google Scholar] [CrossRef] [PubMed]
Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming Catastrophic Forgetting in Neural Networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Hoiem, D. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed]
Zenke, F.; Poole, B.; Ganguli, S. Continual Learning through Synaptic Intelligence. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017; pp. 3987–3995. [Google Scholar]
Wang, Z.; Zhang, Z.; Lee, C.Y.; Zhang, H.; Sun, R.; Ren, X.; Su, G.; Perot, V.; Dy, J.; Pfister, T. Learning to Prompt for Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 139–149. [Google Scholar]
Wang, Z.; Zhang, Z.; Ebrahimi, S.; Sun, R.; Zhang, H.; Lee, C.Y.; Ren, X.; Su, G.; Perot, V.; Dy, J.; et al. DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 631–648. [Google Scholar]
Wang, Y.; Huang, Z.; Hong, X. S-prompts Learning with Pre-trained Transformers: An Occam’s Razor for Domain Incremental Learning. Adv. Neural Inf. Process. Syst. 2022, 35, 5682–5695. [Google Scholar]
Liang, Y.S.; Li, W.J. InfLoRA: Interference-free Low-Rank Adaptation for Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 23638–23647. [Google Scholar]
Rolnick, D.; Ahuja, A.; Schwarz, J.; Lillicrap, T.; Wayne, G. Experience Replay for Continual Learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2019; pp. 350–360. [Google Scholar]
Rebuffi, S.A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. iCaRL: Incremental Classifier and Representation Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2017; pp. 2001–2010. [Google Scholar]
Buzzega, P.; Boschini, M.; Porrello, A.; Abati, D.; Calderara, S. Dark Experience for General Continual Learning: A Strong, Simple Baseline. Adv. Neural Inf. Process. Syst. 2020, 33, 15920–15930. [Google Scholar]
Wang, Q.; Wang, R.; Wu, Y.; Jia, X.; Meng, D. CBA: Improving Online Continual Learning via Continual Bias Adaptor. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2023; pp. 19082–19092. [Google Scholar]
Yang, Y.; Yuan, H.; Li, X.; Lin, Z.; Torr, P.; Tao, D. Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Zhang, E.; Li, C.; Geng, C.; Chen, S. All-around Neural Collapse for Imbalanced Classification. IEEE Trans. Knowl. Data Eng. 2025, 37, 4460–4470. [Google Scholar] [CrossRef]
Seo, M.; Koh, H.; Jeung, W.; Lee, M.; Kim, S.; Lee, H.; Cho, S.; Choi, S.; Kim, H.; Choi, J. Learning Equi-angular Representations for Online Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 23933–23942. [Google Scholar]
Dang, T.A.; Nguyen, V.; Vu, N.S.; Vrain, C. Memory-efficient Continual Learning with Neural Collapse Contrastive. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); IEEE: Piscataway, NJ, USA, 2025; pp. 7950–7959. [Google Scholar]
Zhou, Z.; Peng, Y.; Yi, P.; Zhu, M.; Shen, C. Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
Papyan, V.; Han, X.; Donoho, D.L. Prevalence of Neural Collapse During the Terminal Phase of Deep Learning Training. Proc. Natl. Acad. Sci. USA 2020, 117, 24652–24663. [Google Scholar] [CrossRef] [PubMed]
Fang, C.; He, H.; Long, Q.; Su, W.J. Exploring Deep Neural Networks via Layer-peeled Model: Minority Collapse in Imbalanced Training. Proc. Natl. Acad. Sci. USA 2021, 118, e2103091118. [Google Scholar] [CrossRef]
Kothapalli, V. Neural Collapse: A Review on Modelling Principles and Generalization. arXiv 2022, arXiv:2206.04041. [Google Scholar]
Tirer, T.; Huang, H.; Niles-Weed, J. Perturbation Analysis of Neural Collapse. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2023; pp. 34301–34329. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
Dosovitskiy, A. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
He, J. Gradient reweighting: Towards imbalanced class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 16668–16677. [Google Scholar]
Wang, Y.; Zhou, D.W.; Ye, H.J. Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2025; pp. 806–816. [Google Scholar]
Hou, S.; Pan, X.; Loy, C.C.; Wang, Z.; Lin, D. Learning a Unified Classifier Incrementally via Rebalancing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 831–839. [Google Scholar]
De Lange, M.; van de Ven, G.M.; Tuytelaars, T. Continual Evaluation for Lifelong Learning: Identifying the Stability Gap. arXiv 2023, arXiv:2205.13452. [Google Scholar] [CrossRef]
Zhang, T.; Gao, G.; Ke, X.; Zhang, X. Swarm Learning: Perception-Retrieval-Localization for Ship Detection from Synthetic Aperture Radar Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2026, 1–11. [Google Scholar]
Zhang, T.; Zhang, X.; Liu, C.; Shi, J.; Wei, S.; Ahmad, I.; Zhan, X.; Zhou, Y.; Pan, D.; Li, J.; et al. Balance Learning for Ship Detection from Synthetic Aperture Radar Remote Sensing Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 182, 190–208. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X. Triple-Level Sparsity Awareness for Marine Ship Surveillance Using Satellite Synthetic Aperture Radar. IEEE Trans. Autom. Sci. Eng. 2026, 23, 5155–5166. [Google Scholar] [CrossRef]
Zhang, T.; Gao, G.; Zhang, X. Glance-Focus-Gaze: A Novel Eagle-Eye Vision-Inspired Panorama-Population-Individual Progressive Screening Paradigm to Capture Ships in SAR Images. ISPRS J. Photogramm. Remote Sens. 2026, 235, 241–260. [Google Scholar] [CrossRef]

Figure 1. Illustration of the motivation. On the normalized hypersphere (black ring), fixed Simplex ETF prototypes (★) act as rigid targets for the normalized features (•). (a) Conventional Random Initialization results in severe spatial misalignment, leading to a large optimization gap that hinders efficient learning. (b) Our Subspace-Aware Analytic Initialization (SAI) computes a closed-form projection to instantly align feature subspaces with ETF targets before training, effectively bridging the optimization gap and ensuring a stable starting point.

Figure 2. Representative samples of each category from the UAV-PDD2023 dataset. The images highlight the complex backgrounds, scale variations, and diverse environmental conditions that the proposed continual learning model must overcome.

Figure 3. Overview of the Analytic Geometric Alignment (AGA) framework. Before training, the simplex ETF targets are analytically initialized by SAI to bridge the optimization gap between features and targets. During training, the Decoupled Geometric Adapter (DGA) module enhance the plasticity of the model, and the Memory-Prioritized Regression (MPR) loss imposes tighter geometric constraints on replay samples to mitigate forgetting.

Figure 4. t-SNE visualization of the learned feature spaces and classifier prototypes after the final task. Solid triangles (▴) represent the empirical centers of feature distributions, while stars (★) denote the classifier weights (prototypes). (a) ER uses a linear classifier with Softmax activation, leading to loose clusters and noticeable misalignment between the classifier weights and actual feature centers. (b) iCaRL provides a more discriminative feature distribution, while some fine-grained categories are still prone to confusion. (c) AGA (ours): Features are tightly aligned with the fixed, maximally separated ETF prototypes, mitigating the overlap between each class and achieving optimal geometric alignment.

Figure 5. Trace of test accuracy for each individual task throughout the incremental learning process on UAV-PDD2023. The curves illustrate the performance of specific tasks as new tasks are introduced (marked by ▴). Compared to ER and iCaRL, our AGA (solid lines) demonstrates superior stability, maintaining consistently high accuracy on earlier tasks while effectively learning new concepts.

Figure 6. Confusion matrix of each method.

Figure 7. Attention visualizations of different methods on the UAV-PDD2023 dataset after the incremental training. Subfigures (a–c) display the pavement distress categories encountered in the initial base task (Task 0), while (d–f) represent the novel categories in the subsequent Task 1. Compared to the baselines, our AGA accurately localizes fine-grained damage textures without being distracted by complex remote sensing backgrounds.

Table 1. Comparison results on the UAV-PDD2023 dataset. ↑ and ↓ indicate that higher and lower values are better, respectively. Bold values represent the best results under each comparison condition. The superscript * denotes methods equipped with a replay mechanism.

Memory Cost	Method	Precision ↑	Recall ↑	F1-Score ↑	Acc ↑	FM ↓
Features	NC-FSCIL	0.4770	0.4605	0.4314	56.58	20.27
	DML-PDI	0.6022	0.7547	0.6185	74.27	16.99
	TUNA	0.5907	0.7407	0.6218	74.48	12.99
100 samples	ER	0.3861	0.4263	0.1609	43.79	82.99
	iCaRL	0.5794	0.7519	0.5042	74.81	44.73
	L2P*	0.5319	0.7265	0.4775	72.76	44.45
	DualPrompt*	0.5824	0.6892	0.4143	68.32	50.36
	DGR	0.5903	0.7923	0.5667	78.37	31.65
	AGA (ours)	0.5938	0.7856	0.5567	78.50	39.01
200 samples	ER	0.4144	0.4904	0.2868	52.05	63.18
	iCaRL	0.6283	0.8389	0.6314	84.30	26.01
	L2P*	0.5805	0.7982	0.5802	80.02	30.39
	DualPrompt*	0.5777	0.7860	0.5837	78.58	24.10
	DGR	0.6224	0.8286	0.6573	82.56	20.19
	AGA (ours)	0.6586	0.8578	0.6909	86.55	20.77
500 samples	ER	0.4973	0.5851	0.4511	62.96	31.94
	iCaRL	0.7288	0.8983	0.7732	90.72	10.67
	L2P*	0.6455	0.8519	0.6844	84.64	18.28
	DualPrompt*	0.6207	0.8350	0.6619	82.03	16.93
	DGR	0.6802	0.8484	0.7270	84.37	12.22
	AGA (ours)	0.7713	0.9134	0.8193	91.52	8.85

Table 2. Comparison results on the RDD2022 dataset.

Memory Cost	Method	Precision ↑	Recall ↑	F1-Score ↑	Acc ↑	FM ↓
100 samples	ER	0.8771	0.6666	0.7337	69.45	43.26
	iCaRL	0.8749	0.6968	0.7578	73.03	38.01
	AGA (ours)	0.8855	0.7325	0.7772	77.48	30.98
200 samples	ER	0.8939	0.7242	0.7794	75.59	33.97
	iCaRL	0.8919	0.7619	0.8121	80.09	27.36
	AGA (ours)	0.8962	0.8012	0.8300	83.28	21.98
500 samples	ER	0.9066	0.8299	0.8613	83.84	21.43
	iCaRL	0.9105	0.8471	0.8636	87.06	16.74
	AGA (ours)	0.9159	0.8529	0.8653	87.70	15.09

Table 3. Ablation study on UAV-PDD2023 with the buffer only storing 200 samples.

Method	Precision ↑	Recall ↑	F1-Score ↑	Acc ↑	FM ↓
Baseline (ER)	0.4144	0.4904	0.2868	52.05	63.18
+Fixed ETF	0.5878	0.7743	0.5831	77.95	33.06
+SAI	0.6494	0.8312	0.6472	83.49	25.85
+DGA	0.6287	0.8417	0.6470	84.67	25.06
+MPR Loss	0.6586	0.8578	0.6909	86.55	20.77

Table 4. Hyperparameter sensitivity analysis on

λ

,

δ

, and

γ

. (a) Effect of SAI coefficient

λ

. (b) Effect of MPR margin

δ

. (c) Effect of MPR weight

γ

.

Table 4. Hyperparameter sensitivity analysis on

λ

,

δ

, and

γ

. (a) Effect of SAI coefficient

λ

. (b) Effect of MPR margin

δ

. (c) Effect of MPR weight

γ

.

(a)
$λ$	F1-Score↑	ACC↑	FM ↓
0.0000	0.4361	53.91	28.64
0.0005	0.6640	83.82	23.09
0.0010	0.6681	84.62	22.97
0.0015	0.6698	84.87	23.07
0.0020	0.6664	84.69	23.03
(b)
$δ$	F1-Score↑	ACC↑	FM ↓
0.00	0.6441	83.68	26.55
0.05	0.6498	84.55	24.80
0.10	0.6681	84.62	22.97
0.15	0.6675	84.87	23.05
0.20	0.6662	84.61	23.18
(c)
$γ$	F1-Score↑	ACC↑	FM ↓
0.1	0.5891	81.07	31.51
0.3	0.6521	85.01	24.24
0.5	0.6681	84.62	22.97
0.7	0.6677	84.39	21.30
0.9	0.6320	68.54	14.86

Table 5. Computational overhead comparison on an NVIDIA RTX 4090 GPU (Memory buffer = 200).

Method	Task 1				Task 2
Method	Training Time (s)	Testing Time (s)	Calculation of $P^{*}$ (ms)	Memory Usage (MB)	Training Time (s)	Testing Time (s)	Calculation of $P^{*}$ (ms)	Memory Usage (MB)
ER	345.15	4.02	-	10,328	130.71	8.64	-	10,610
iCaRL	333.01	3.71	-	16,438	133.71	8.09	-	17,754
AGA (ours)	338.68	3.96	25.12	10,326	141.67	9.01	16.82	11,479

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Q.; Li, X.; Peng, J.; Jia, X.; Wang, R. Incremental Pavement Distress Classification in UAV-Based Remote Sensing via Analytic Geometric Alignment. Remote Sens. 2026, 18, 1141. https://doi.org/10.3390/rs18081141

AMA Style

Wang Q, Li X, Peng J, Jia X, Wang R. Incremental Pavement Distress Classification in UAV-Based Remote Sensing via Analytic Geometric Alignment. Remote Sensing. 2026; 18(8):1141. https://doi.org/10.3390/rs18081141

Chicago/Turabian Style

Wang, Quanziang, Xin Li, Jiangjun Peng, Xixi Jia, and Renzhen Wang. 2026. "Incremental Pavement Distress Classification in UAV-Based Remote Sensing via Analytic Geometric Alignment" Remote Sensing 18, no. 8: 1141. https://doi.org/10.3390/rs18081141

APA Style

Wang, Q., Li, X., Peng, J., Jia, X., & Wang, R. (2026). Incremental Pavement Distress Classification in UAV-Based Remote Sensing via Analytic Geometric Alignment. Remote Sensing, 18(8), 1141. https://doi.org/10.3390/rs18081141

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Incremental Pavement Distress Classification in UAV-Based Remote Sensing via Analytic Geometric Alignment

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Materials

2.2. Motivation and Framework Overview

2.3. Geometric Stability: The Simplex ETF Framework

2.4. Bridging the Optimization Gap in Simplex ETF

2.4.1. Subspace-Aware Analytic Initialization (SAI): Global Realignment

2.4.2. Decoupled Geometric Adapter (DGA): Continuous Plasticity

2.5. Optimization with Memory-Prioritized Constraints

3. Experiments

3.1. Experimental Setup

3.1.1. Implementation Details

3.1.2. Evaluation Metrics

3.2. Comparative Results

3.2.1. Baselines and Comparison Methods

3.2.2. Performance on UAV-PDD2023

3.2.3. Performance on RDD2022

3.3. Detailed Analysis and Additional Experiments

3.3.1. Ablation Study

3.3.2. Geometric Visualization

3.3.3. Learning Dynamics and Stability

3.3.4. Hyperparameter Sensitivity Analysis

3.3.5. Computational Overhead Analysis

3.3.6. Attention Visualization

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI