ER-PASS: Experience Replay with Performance-Aware Submodular Sampling for Domain-Incremental Learning in Remote Sensing

Lee, Yeseok; Lee, Donghyeon; Kwak, Taehong; Kim, Yongil

doi:10.3390/rs17183233

Open AccessArticle

ER-PASS: Experience Replay with Performance-Aware Submodular Sampling for Domain-Incremental Learning in Remote Sensing

by

Yeseok Lee

,

Donghyeon Lee

,

Taehong Kwak

and

Yongil Kim

^*

Department of Civil and Environmental Engineering, Seoul National University, Seoul 08826, Republic of Korea

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(18), 3233; https://doi.org/10.3390/rs17183233

Submission received: 23 July 2025 / Revised: 4 September 2025 / Accepted: 15 September 2025 / Published: 18 September 2025

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

ER-PASS, a model-agnostic experience replay algorithm with performance-aware submodular sampling, effectively mitigates catastrophic forgetting with low resource demands in domain-incremental learning.
Experimental validation on building segmentation and land use/land cover (LULC) classification shows that ER-PASS outperforms existing continual learning methods in average incremental accuracy (AIA) and backward transfer (BWT).

What is the implication of the main finding?

Provides a practical and efficient continual learning framework for remote sensing, maintaining generalization while optimizing training time and memory usage.
Offers a general-purpose solution applicable to diverse remote sensing applications, highlighting potential for real-world deployment.

Abstract

In recent years, deep learning has become a dominant research trend in the field of remote sensing. However, due to significant domain discrepancies among datasets collected from various platforms, models trained on a single domain often struggle to generalize to other domains. In domain-incremental learning scenarios, such discrepancies often lead to catastrophic forgetting, hindering the practical deployment of deep learning models. To address this, we propose ER-PASS, an experience replay-based continual learning algorithm that incorporates a performance-aware submodular sampling strategy. ER-PASS balances adaptability across domains and retention of knowledge by combining the strengths of joint learning and experience replay, while maintaining practical efficiency in terms of training time and memory usage. We validated our method on two remote sensing applications—building segmentation and land use/land cover (LULC) classification—using UNet and DeepLabV3+. Experimental results show that ER-PASS consistently outperforms existing continual learning methods in average incremental accuracy (AIA) and backward transfer (BWT), ensuring generalization across domains and mitigating catastrophic forgetting. While these results were obtained under restricted conditions, limited to a sequence of domains from high to low resolution and two applications, they underscore the potential of ER-PASS as a practical and general-purpose solution for continual learning in remote sensing.

Keywords:

remote sensing; deep learning; continual learning; domain-incremental learning; experience replay; catastrophic forgetting

1. Introduction

Recent advances in remote sensing technologies have enabled the acquisition of large volumes of imagery from diverse platforms, including satellites, drones, and airborne systems. This has led to the extensive application of remote sensing data in various fields such as urban planning [1,2,3,4], environmental monitoring [5,6,7], and disaster management [8,9,10]. Furthermore, the introduction of deep learning techniques has significantly enhanced the efficiency and accuracy of remote sensing image analysis, achieving state-of-the-art performance across a wide range of applications [11,12,13,14,15]. For instance, tasks such as building segmentation and land use/land cover (LULC) classification, which previously relied on manual interpretation or basic analytical methods, have seen remarkable improvements in both accuracy and computational efficiency through the application of deep learning [16,17,18]. Consequently, the research paradigm in remote sensing has shifted from traditional rule-based approaches to data-driven, deep learning-based methods.

Most deep learning studies in remote sensing have focused on improving task performance using fixed datasets under static experimental settings, with the development of state-of-the-art models becoming a dominant research trend. However, in real-world scenarios, imagery is continuously collected from various platforms, leading to significant variations in visual characteristics—such as object size, shape, and texture—as illustrated in Figure 1. Due to these domain-specific discrepancies, models trained on a single domain (i.e., a set of images from a specific platform) often suffer from severe performance degradation when applied to new domains [19,20]. This necessitates either retraining the model for each new domain, allowing adaptation only to the new domain, or performing joint learning across all domains, which is computationally expensive.

To alleviate these challenges, domain adaptation has been explored, aiming to transfer knowledge from a source domain to a target domain and thereby improve performance on the target domain [21,22]. While this approach can indeed enhance target-domain performance, it primarily focuses on forward knowledge transfer and does not explicitly preserve knowledge from previous domains, leading to catastrophic forgetting [23,24], where previously learned knowledge is lost during adaptation. Therefore, in real-world scenarios where new domains appear sequentially, more flexible and adaptive learning strategies are required to facilitate the practical deployment of deep learning-based remote sensing methods. In this context, continual learning has emerged as a promising approach capable of mitigating both domain shift and catastrophic forgetting.

Continual learning is a learning paradigm in which a model incrementally learns from a sequence of tasks, aiming to integrate new knowledge without forgetting what has already been learned [25,26]. In this context, a “task” refers to a specific learning objective that arises sequentially over time. Based on learning scenarios, continual learning is typically categorized into Task-Incremental Learning (TIL), Class-Incremental Learning (CIL), and domain-incremental learning (DIL) [27,28]. In TIL, each task has a distinct label space, and a task identifier is available during both the training and inference to specify which task the model should address. In contrast, CIL does not provide a task identifier during inference, requiring the model to classify all previously encountered classes jointly. DIL differs from the previous two scenarios in that the learning objective (i.e., the set of classes or the application) remains the same across tasks, while the input data distribution or domain characteristics vary. Among the three, DIL is particularly relevant to real-world remote sensing applications, where domain shifts frequently occur. Nevertheless, research on DIL remains relatively underexplored compared to the TIL and CIL. Therefore, this study focuses on addressing challenges within the DIL scenario.

In addition to scenario-based categorization, continual learning can also be classified based on implementation strategies: regularization-based, architecture-based, and replay-based methods [29,30]. Regularization-based methods preserve prior knowledge by incorporating additional terms into the loss function that penalize changes to parameters important for previous tasks. Architecture-based methods maintain prior knowledge by isolating parameters for each task, either by fixing and masking them during the training of new tasks or by using dynamic architectures that freeze existing parameters while expanding the model for task-specific learning. Replay-based methods, on the other hand, preserve prior knowledge more intuitively by explicitly storing a subset of data from previous tasks in a memory buffer and reusing it when training on new tasks.

Most existing studies on DIL have primarily employed regularization-based or architecture-based approaches, often citing concerns regarding data storage costs and privacy issues associated with replay-based methods [31,32,33]. However, in the field of remote sensing, long-term data collection is standard practice, making data storage costs a less critical concern. Furthermore, remote sensing imagery typically does not contain personally identifiable information, reducing privacy risks compared to other fields—except in sensitive areas such as national defense. Instead, the limitations of regularization-based and architecture-based methods pose significant challenges in practical remote sensing applications. Regularization-based methods are inherently vulnerable to domain shifts, which can lead to catastrophic forgetting. Meanwhile, architecture-based methods tend to increase model complexity as tasks accumulate, requiring additional memory resources. Given these factors, replay-based methods present a more practical solution for DIL within remote sensing.

Therefore, this study focuses on developing a replay-based learning algorithm for DIL in remote sensing, where a model learns the same application from sequentially emerging domains collected from different platforms, while retaining knowledge from previously learned domains. To this end, we propose an Experience Replay with Performance-Aware Submodular Sampling (ER-PASS), designed to improve adaptability across domains, mitigate catastrophic forgetting, and ensure computational efficiency. ER-PASS integrates the robustness of joint learning—by processing samples from multiple domains as a unified input—with the efficiency of replay-based learning in reducing training time. Furthermore, we introduce a performance-aware submodular sample selection strategy to enhance the stability of the learning process. To evaluate the effectiveness of our approach, we conduct experiments on two representative remote sensing applications: building segmentation and LULC classification. The main contributions of this study are as follows:

(1): We propose a replay-based learning algorithm that incorporates a performance-aware submodular sample selection strategy, namely ER-PASS, which is model-agnostic and can be applied across various deep learning models.
(2): We demonstrate that ER-PASS effectively mitigates catastrophic forgetting compared to existing methods, while requiring relatively low resource demands.
(3): Experimental results on building segmentation and LULC classification demonstrate that ER-PASS exhibits generalizability across diverse remote sensing applications.

2. Related Work

2.1. Domain-Incremental Learning in Remote Sensing

Recently, continual learning has received increasing attention in the field of remote sensing. However, most existing studies have focused on TIL or CIL [34,35,36,37,38], while research on DIL remains limited. To address this gap, several recent works have begun investigating DIL in remote sensing contexts. Marsocci and Scardapane [39] were the first to explore semantic segmentation of remote sensing imagery in the context of DIL, proposing a loss function that combines Elastic Weight Consolidation (EWC) [40] and Barlow Twins [41] to simultaneously address label scarcity and catastrophic forgetting. Rui et al. [31] introduced the DILRS architecture, which employs a shared encoder and domain-specific decoders to capture both domain-invariant and domain-specific features. They addressed label space shifts across domains by applying a multi-level class-specific knowledge distillation loss. Wang et al. [32] developed a self-knowledge distillation model that incorporates dynamic equilibrium constraints and learnable loss weights, effectively closing the performance gap with joint learning in cross-domain fire detection. Additionally, Huang et al. [33] proposed a DIL framework leveraging frozen feature layers and a multi-feature joint loss to alleviate catastrophic forgetting. Although these studies offer valuable contributions, most rely predominantly on regularization-based or architecture-based approaches. Given the characteristics of remote sensing imagery, the use of replay-based methods, though less explored, holds significant promise, as discussed in the following section.

2.2. Replay-Based Continual Learning Algorithms

The core challenges of continual learning stem from the need to balance knowledge transfer and interference between tasks. This balance is closely related to the angles between the gradients of the loss functions for different tasks during stochastic gradient descent [42]. Specifically, learning interference occurs when the angle between these gradients exceeds 90 degrees, hindering knowledge transfer. Motivated by this insight, replay-based continual learning methods seek to reduce interference by controlling or projecting gradients to keep their angles below this threshold. For instance, Lopez-Paz and Ranzato [43] proposed GEM, which compares the current task’s gradient with those of previous tasks stored in a memory buffer and projects the current gradient to ensure the angle with previous task gradients does not exceed 90 degrees, thereby preventing interference. To improve computational efficiency, Chaudhry et al. [44] introduced A-GEM, which replaces task-specific gradient comparisons with a single comparison against the average gradient computed from the memory buffer, avoiding per-task gradient computations. Alternatively, Experience Replay (ER) [45] adopts a different approach by directly reusing stored samples instead of manipulating gradients. ER jointly samples mini-batches from the current task and the memory buffer for optimization, avoiding explicit gradient projection while still effectively mitigating catastrophic forgetting. Building on this approach, several replay-based continual learning methods have been developed in the computer vision community, including MER, DER, and iCaRL [46,47,48].

In addition to experience replay methods, generative replay has recently attracted considerable attention. Shin et al. [49] introduced a scholar architecture comprising a generator and a solver, where synthetic samples from previous tasks are replayed alongside new task data to preserve past knowledge. Maracani et al. [50] extended this concept by leveraging both generative adversarial network-based generation and web-crawled images, effectively reducing forgetting while addressing privacy concerns. Ye et al. [51] combined a variational autoencoder with a generative adversarial network, enabling not only sample generation but also structured latent representations that cluster domain-specific information, thereby facilitating feature-level knowledge transfer across domains. Liu et al. [52] proposed directly synthesizing feature representations to simplify generative replay and alleviate the class imbalance between old and new tasks. Furthermore, Petit et al. [53] proposed generating pseudo-features by computing centroid differences between new and old task features and translating new task features accordingly to approximate representations of previous tasks, thereby demonstrating its effectiveness in CIL.

Despite these advances, replay-based methods have seen limited application in remote sensing. Among the few studies, Bhat et al. [54] proposed a curriculum-based memory replay strategy to improve scene classification in remote sensing imagery. Sun et al. [55] evaluated various continual learning methods and highlighted that combining replay of representative samples with knowledge distillation enhances synthetic aperture radar-based automatic target recognition accuracy. These early efforts demonstrate the potential of replay-based methods in remote sensing. However, they mainly focus on CIL rather than DIL, which is common in real-world scenarios. To fill this gap, we propose an effective replay-based learning algorithm tailored for DIL.

2.3. Sample Selection Strategies for Replay

Sample selection, also called coreset selection, plays a crucial role in replay-based continual learning by efficiently managing the memory buffer, directly impacting both model performance and memory utilization. Early studies often employed simple strategies such as reservoir sampling [56] and ring buffers [43], which generally preserve data diversity but do not explicitly consider the importance of individual samples. Subsequent methods, including k-Center Greedy [57] and Mean of Features [48], aim to select representative samples in the feature space, thereby improving representativeness. However, these methods may neglect rare or boundary samples that are crucial for robust generalization.

Recent submodular-based sample selection methods have gained attention. This approach is based on the diminishing returns property of submodular functions, enabling efficient selection of a subset of samples that maximizes representativeness and diversity [58]. For instance, Yoon et al. [59] proposed a method that considers intra-batch similarity, sample diversity, and affinity to past tasks, thereby enhancing model adaptation while reducing catastrophic forgetting. Das et al. [60] employed a submodular function combining facility location and concave over modular terms, which reflects pairwise similarity between samples while accounting for model uncertainty, resulting in substantially accelerated training. Moreover, Lee et al. [61] addressed object detection by generating representative feature vectors for multiple objects of the same class within each image and applying submodular-based greedy selection, showing better performance than random sampling. While these approaches provide a solid framework for capturing sample diversity and coverage, they do not explicitly account for each sample’s influence on model optimization.

On the other hand, gradient-based methods emphasize the impact of individual samples on model optimization. Notably, Aljundi et al. [62] formulated sample selection as a constraint reduction problem, selecting coreset examples whose aggregated gradients closely approximate the average gradient of the full dataset. This approach helps maintain stable optimization during continual learning. However, by focusing on average gradient preservation, it may dilute the influence of critical samples near decision boundaries that are crucial for model performance. To overcome this limitation, we propose a performance-aware submodular sample selection strategy that incorporates model predictions into the selection criteria. This enables the memory buffer to retain samples that are not only representative and diverse, but also particularly informative for refining decision boundaries, thereby enhancing the stability of the continual learning process.

3. Methodology

3.1. Overview

This study is conducted under the DIL scenario, where data from multiple platforms are incrementally collected, and the model is continuously updated to accommodate newly acquired data. In this setting, learning for each domain is treated as an individual task, and the number of tasks increases as new domain data becomes available. A strict DIL setup assumes that class labels remain consistent across tasks. To align with this assumption, we adopt building segmentation as the primary downstream application. However, real-world scenarios often involve the emergence of new classes over time. To reflect such practical considerations, we additionally evaluate our approach under a more relaxed setting using LULC classification, where new classes can emerge in later tasks.

The overall learning process of ER-PASS proposed in this study is illustrated in Figure 2. In the training phase, model learning and sample selection are performed sequentially for each task. The segmentation model is trained on the current domain’s data in conjunction with the samples stored in the memory buffer from previous domains. After training, the sample selection algorithm identifies core samples based on scores derived from the trained model’s predictions. These samples are then stored in the memory buffer and carried over to the next task, enabling the model to retain knowledge from earlier domains while facilitating learning on subsequent ones. For evaluation, the model is assessed not only on the current domain but also on all previously encountered domains. We adopt UNet [63] and DeepLabV3+ [64] as baseline models for both the building segmentation and LULC classification. Notably, ER-PASS is model-agnostic and can be readily integrated into other segmentation networks without requiring any architectural modifications.

3.2. Proposed Algorithm

ER-PASS is a replay-based continual learning algorithm tailored for DIL in remote sensing applications. It is motivated by the observation that joint learning has been shown to improve generalization across heterogeneous domains and that dataset-level memory integration—rather than batch-level—offers greater computational efficiency.

Let the continual learning process consist of a sequence of tasks

{T_{1}, T_{2}, \dots, T_{k}}

, each corresponding to a distinct remote sensing domain. For the k-th task

T_{k}

, a labeled dataset

D_{k} = {(x_{i}^{k}, y_{i}^{k})}_{i = 1}^{N_{k}}

is provided, where

x_{i}^{k}

and

y_{i}^{k}

denote the input image and its corresponding label, and

N_{k}

is the number of training samples in task

T_{k}

.

Unlike conventional experience replay methods [45,46,47], which merge current task data and memory buffer samples within each mini-batch, ER-PASS performs dataset-level integration by combining the two into a unified training set. Specifically, for task

T_{k}

, the new training set

{\tilde{D}}_{k}

is defined as:

{\tilde{D}}_{k} = D_{k} \cup M

(1)

where

M

denotes the memory buffer containing representative samples from previous tasks. The model parameters

θ_{k}

for task

T_{k}

are then optimized by minimizing the expected loss over the combined dataset

{\tilde{D}}_{k}

:

θ_{k} = arg min_{θ} E_{(x, y) \sim {\tilde{D}}_{k}} [L (f_{θ} (x), y)]

(2)

where

L (\cdot)

denotes the downstream segmentation loss (e.g., binary cross-entropy or cross-entropy), and

f_{θ}

represents the segmentation model.

Through this simple yet effective approach, ER-PASS computes a single gradient per iteration without requiring gradient aggregation, while maintaining a consistent batch size. This design reduces memory overhead compared to conventional replay-based methods and shortens training time compared to joint learning by relying only on a compact memory buffer rather than the full dataset of previous tasks.

After training on task

T_{k}

, a sample selection algorithm is employed to extract a representative subset of samples from

{\tilde{D}}_{k}

for updating the memory buffer. The selection is based on feature similarity and task-specific performance, as described in Section 3.3:

M \leftarrow S AMPLE S ELECTION ({\tilde{D}}_{k}, f_{θ_{k}})

(3)

The updated memory

M

is then used for training on the next task

T_{k + 1}

. The overall learning process of ER-PASS is summarized in Algorithm 1.

Algorithm 1 Learning process of ER-PASS
Input: ${D_{t}}_{t = 1}^{k}$	▹Dataset corresponding to each task
Require: $f_{θ}$	▹ Neural network
Initialize: $M \leftarrow {}$	▹ Memory buffer
1: Define ${\tilde{D}}_{k}$ as the training set used for task k 2: for task $T_{t} = T_{1}, \dots, T_{k}$ do 3: if t = 1 then 4: ${\tilde{D}}_{t} \leftarrow D_{1}$
5: $θ_{t} \leftarrow R AND I NIT (θ_{t})$	▹ Initialize model parameters
6: else 7: ${\tilde{D}}_{t} \leftarrow D_{t} \cup M$ 8: $θ_{t} \leftarrow θ_{t - 1}$ 9: end if 10: Define $N_{B, t}$ as the total number of mini-batches in ${\tilde{D}}_{t}$ 11: for $i = 1, \dots, N_{B, t}$ do
12: $θ_{t} \leftarrow θ_{t} - \nabla f_{θ_{t}} (B_{t, i})$	▹ Update model
13: end for
14: $M \leftarrow S AMPLE S ELECTION ({\tilde{D}}_{t}, f_{θ_{t}})$	▹ Update memory buffer
15: end for

3.3. Performance-Aware Submodular Sample Selection

In ER-PASS, the samples stored in the memory buffer from previous tasks are essential to preserving knowledge. To ensure effective knowledge retention, we propose a performance-aware submodular sample selection strategy that jointly considers both feature-level diversity and task-specific prediction performance. This strategy promotes the retention of diverse, representative, and informative samples, thereby improving stability and mitigating catastrophic forgetting during the learning process.

For each candidate sample

x_{i} \in {\tilde{D}}_{k}

, we extract the feature representation

z_{i} \in R^{d}

by applying global average pooling to the encoder output of the trained model

f_{θ_{k}}

. Here, d denotes the dimensionality of the feature space, which corresponds to 512 for UNet and 2048 for DeepLabV3+. Specifically, we use the output of the final downsampling block in the UNet encoder and the output of layer 4 in the ResNet50 backbone of DeepLabV3+ as the feature map. Each feature vector is then

ℓ_{2}

-normalized to ensure consistent scaling, using the following equation:

{\tilde{z}}_{i} = \frac{z_{i}}{∥ z_{i} ∥_{2}}

(4)

We also compute a task-specific evaluation score

S (x_{i}) \in [0, 1]

, such as the Intersection-over-Union (IoU) or mean Intersection-over-Union (mIoU) between the model’s prediction and the ground truth. This score serves as a performance-based weight during sample selection.

Let

P_{c} = {{\tilde{z}}_{1}, {\tilde{z}}_{2}, \dots, {\tilde{z}}_{n}} \subset R^{d}

denote the set of normalized feature vectors extracted from all candidate samples. The intra-similarity of a candidate sample

x_{i}

is defined as the total cosine similarity between its feature vector and those of all other vectors in

P_{c}

:

intra (x_{i}) = \sum_{p \in P_{c}} {\tilde{z}}_{i}^{⊤} p

(5)

This term reflects the alignment of

x_{i}

with the candidate sample distribution in the feature space, promoting the selection of representative and diverse samples.

Let

Q_{c} \subset P_{c}

be the set of feature vectors corresponding to the samples already selected into the memory buffer. The inter-similarity of

x_{i}

with respect to

Q_{c}

is then defined as:

inter (x_{i}) = \sum_{q \in Q_{c}} {\tilde{z}}_{i}^{⊤} q

(6)

This measures the redundancy of

x_{i}

relative to the selected set. A lower inter-similarity indicates that the sample introduces novel information, contributing to buffer diversity.

We define the selection scores for each candidate

x_{i}

as the submodular gain—the difference between intra- and inter-similarity—weighted by task-specific evaluation score:

Score (x_{i}) = (intra (x_{i}) - inter (x_{i})) \cdot S (x_{i})

(7)

At each iteration, the candidate

x_{i^{★}}

with the highest score is greedily selected and added to the memory buffer:

i^{★} = arg max_{i \in P_{c} ∖ Q_{c}} Score (x_{i})

(8)

This process continues until a predefined budget—such as a fixed number of samples or percentage of the dataset—is reached. By jointly optimizing representativeness (via intra-similarity), redundancy reduction (via inter-similarity), and task relevance (via evaluation score), the proposed strategy constructs a memory buffer that is both submodular-optimal and performance-sensitive. Notably, since samples are dynamically selected based on their scores after the completion of each task, there is no need to pre-allocate memory per task. This design ensures that important samples are naturally retained in the buffer and are not mechanically displaced due to memory constraints, thereby effectively supporting continual learning. The detailed procedure is described in Algorithm 2.

Algorithm 2 Performance-aware submodular sample selection
Input: ${\tilde{D}}_{k}$ , $f_{θ_{k}}$	▹ Dataset and trained model corresponding to $T_{k}$
Require: N	▹ Memory budget (number of samples to select)
Output: $M$	▹ Updated memory buffer
1: Initialize $M \leftarrow {}$ , $I \leftarrow {}$ , $Q_{c} \leftarrow {}$ 2: Extract features $z_{i}$ from $f_{θ_{k}}$ for each sample $x_{i} \in {\tilde{D}}_{k}$ 3: Compute normalized features: ${\tilde{z}}_{i} \leftarrow z_{i} / {∥ z_{i} ∥}_{2}$ 4: Compute evaluation score $S_{i}$ between $f_{θ_{k}} (x_{i})$ and ground truth $y_{i}$ for each sample 5: Let $P_{c} \leftarrow {{\tilde{z}}_{1}, \dots, {\tilde{z}}_{n}}$ , where n is the total number of samples in ${\tilde{D}}_{k}$ 6: Compute intra-similarity: $intra (x_{i}) \leftarrow \sum_{p \in P_{c}} {\tilde{z}}_{i}^{⊤} p$ 7: for $t = 1$ to N do 8: for $i = 1$ to n do 9: if $Q_{c} = \emptyset$ then
10: ${Score}_{i} \leftarrow intra (x_{i})$	▹ Only intra-similarity
11: else 12: Compute inter-similarity: $inter (x_{i}) \leftarrow \sum_{q \in Q_{c}} {\tilde{z}}_{i}^{⊤} q$ 13: ${Score}_{i} \leftarrow (intra (x_{i}) - inter (x_{i})) \cdot S_{i}$ 14: end if
15: ${Score}_{i} \leftarrow - \infty$ if $i \in I$	▹ Exclude already selected samples
16: end for 17: $i^{★} \leftarrow arg {max}_{i} {Score}_{i}$ 18: $M \leftarrow M \cup {x_{i^{★}}}$ , $I \leftarrow I \cup {i^{★}}$ , $Q_{c} \leftarrow Q_{c} \cup {{\tilde{z}}_{i^{★}}}$ 19: end for

4. Experimental Settings

4.1. Datasets

In this study, we designed a domain sequence to reflect practical scenarios that may arise when utilizing remote sensing imagery. The construction was guided by the following three criteria: (1) each dataset should be acquired from a distinct sensing platform; (2) data within each dataset should originate from a single, consistent platform; and (3) the datasets should vary in spatial resolution. Based on these criteria, we selected four representative LULC datasets—Potsdam [65], LoveDA [66], DeepGlobe [67], and GID [68]—each representing a distinct domain (

D_{1}

to

D_{4}

).

Details of the constructed domain sequence are summarized in Table 1. Depending on the dataset, data were either provided as training-only or with a predefined training/test split. For training-only datasets, we divided the data into training, validation, and test sets using a 4:1:1 ratio. For datasets with a predefined split, the training portion was further divided into training and validation subsets in an 8:2 ratio, while the provided test set was retained for evaluation. To ensure a comparable number of training samples across domains, all images were divided into

512 \times 512

patches using dataset-specific stride values. Additionally, since the class definitions and label structures differ across datasets, we grouped semantically similar classes into unified categories. We first conducted binary segmentation experiments focusing exclusively on the building class, where building pixels were labeled as 1 and all other pixels as 0. Subsequently, we extended the experiments to multi-class LULC classification using the redefined label categories for each dataset. During this process, no data augmentation was applied in order to preserve the original distribution of the data. Figure 3 presents the class distribution of each domain used for training, highlighting both the differences in class categories and proportions across domains and the imbalance among classes within each domain.

4.2. Implementation Details

We selected building segmentation and LULC classification as the downstream applications and conducted incremental learning experiments for each. To demonstrate the model-agnostic nature of ER-PASS, we utilized UNet [63] and DeepLabV3+ [64] as base models for both applications. As the primary focus of this study is on learning strategy rather than architectural design, no modifications were made to the base network architectures. For building segmentation, we employed binary cross-entropy loss, while multi-class cross-entropy loss, based on the redefined class labels for each dataset, was used for LULC classification.

For benchmarking, ER-PASS was compared against representative continual learning methods, including EWC [40], Learning without Forgetting (LwF) [69], and ER [45]. The EWC and LwF methods incorporated regularization terms, with

λ

set to 10 and 2, respectively. For ER and the proposed ER-PASS, a memory sampling ratio of 0.5 was applied.

In the incremental learning setup, the model for the first task was randomly initialized without pretrained weights, while for subsequent tasks, it was initialized using the weights learned from the preceding tasks to facilitate knowledge transfer. All experiments were implemented in PyTorch 1.12.0 and performed on a Linux workstation equipped with 64 GB system RAM and a single NVIDIA RTX A6000 GPU with 48 GB of memory. Training was conducted using the NAdam optimizer with a learning rate of 0.0002 and a batch size of 8, for a maximum of 100 epochs with early stopping applied if validation performance did not improve for 10 consecutive epochs. The model achieving the best validation performance was selected for final evaluation.

4.3. Evaluation Metrics

Continual learning performance should be evaluated in terms of two key aspects: the effectiveness in learning new tasks and the capacity to mitigate forgetting of prior knowledge. To quantify these aspects, we adopt Average Incremental Accuracy (AIA) [48] and Backward Transfer (BWT) [43], defined as follows:

{AIA}_{k} = \frac{1}{k} \sum_{i = 1}^{k} (\frac{1}{i} \sum_{j = 1}^{i} a_{i j})

(9)

{BWT}_{k} = \frac{1}{k} \sum_{j = 1}^{k - 1} (a_{k j} - a_{j j})

(10)

Here,

a_{k j}

denotes the performance on the test set of the j-th task after training incrementally up to the k-th task, where

j \leq k

.

AIA measures the overall performance of a model throughout the entire learning process. By averaging performance on each previously encountered task at every step, it captures the model’s ability to learn new tasks while retaining knowledge from earlier ones. Meanwhile, BWT quantifies the degree of knowledge retention by comparing the performance on earlier tasks before and after subsequent learning, thus serving as a direct indicator of catastrophic forgetting. As continual learning aims to balance learning new tasks and retaining prior knowledge, it is essential to consider both metrics for a comprehensive evaluation.

5. Results

5.1. Building Segmentation (Strict DIL Setting)

As previously described, our incremental learning setup employs a step-wise training procedure across four domains in the following order: Potsdam, LoveDA, DeepGlobe, and GID. The model is initially trained on Potsdam in Step 1 with randomly initialized weights. In Steps 2 to 4, the model is incrementally updated by leveraging the weights learned from the previous step, thereby facilitating knowledge transfer across domains. Under this setup, we conducted a downstream evaluation on building segmentation. Experiments were primarily conducted using UNet, with additional comparisons performed using DeepLabV3+.

Table 2 presents the IoU performance at each incremental step using UNet for building segmentation, along with evaluation metrics at Step 4. At each step, the reported IoU values reflect the model’s performance on all previously encountered domains, evaluated using the model trained on the current task. Models trained via single-task learning lack adaptability to unseen domains and demonstrate satisfactory performance only within the domain on which they were trained. This limitation becomes particularly evident at Step 4, where performance on other domains significantly degrades—primarily caused by substantial domain shifts. While fine-tuning facilitates some knowledge transfer, its performance gains on previous tasks over single-task learning remain marginal, suggesting the occurrence of catastrophic forgetting. Notably, even continual learning methods such as EWC and LwF fail to provide meaningful improvements, yielding results comparable to naive fine-tuning and demonstrating limited ability to mitigate forgetting. In contrast, ER demonstrates better performance in terms of BWT, suggesting a stronger capability to retain prior knowledge. It also achieves higher AIA values compared to other baseline methods. However, comparison with single-task learning reveals that ER exhibits lower performance on the current task at each step, indicating that the high AIA is primarily due to the preservation of previous knowledge rather than effective learning of new tasks.

In theory, joint learning serves as the performance upper bound for incremental learning approaches. Our experimental results demonstrate that the proposed method achieves performance comparable to joint learning and, in some cases, even surpasses it. Furthermore, our method consistently outperforms baseline methods. In particular, compared to ER, which achieved the best performance among the benchmarks, our method achieves improvements of 0.1613 in AIA and 0.332 in BWT at Step 4. These results indicate that our method not only mitigates catastrophic forgetting effectively but also maintains reasonable adaptability to new tasks. A similar trend is observed in Table 3, where DeepLabV3+ is used as the segmentation model instead of UNet. This consistent improvement across different architectures highlights that the effectiveness of the proposed algorithm is not limited to a specific model, demonstrating its model-agnostic nature.

To complement the quantitative results, Figure 4 and Figure 5 provide qualitative visualizations that further demonstrate the effectiveness of the proposed method. Figure 4 shows segmentation results from each domain using the UNet model trained up to Step 4, illustrating representative samples across domains. As expected, the segmentation performance on GID is relatively accurate since the model was directly trained on that domain. However, all benchmark methods exhibit noticeable degradation on previously learned domains. In particular, EWC and LwF fail to produce meaningful predictions, indicating that severe catastrophic forgetting has occurred. In contrast, ER performs better than these methods and partially mitigates forgetting. The proposed method maintains segmentation performance on previous domains at a level comparable to joint learning, clearly demonstrating its effectiveness in mitigating catastrophic forgetting.

Beyond the cross-domain comparison at Step 4, Figure 5 presents step-by-step segmentation results for the first domain, analyzing how each method retains prior knowledge throughout the incremental learning process. In the case of EWC and LwF, prior knowledge is relatively well preserved up to Step 2. However, noticeable forgetting begins at Step 3, and severe catastrophic forgetting becomes evident by Step 4. ER maintains acceptable performance up to Step 3, with slight degradation observed at Step 4. Consistent with earlier findings, the proposed method maintains stable performance across all steps, further demonstrating its robustness against catastrophic forgetting. For completeness, the qualitative visualization results for the DeepLabV3+ model are provided in Appendix A.

5.2. LULC Classification (Relaxed DIL Setting)

To further evaluate the generalizability of the proposed method in practical applications, we conduct experiments on LULC classification. Unlike building segmentation, which follows a strict DIL setting with consistent class labels, LULC classification represents a more relaxed and realistic scenario where new classes may emerge over time.

Table 4 and Table 5 present LULC classification results for UNet and DeepLabV3+, respectively. Similar to the building segmentation experiments, these tables provide step-wise mIoU scores and final-step metrics, with overall trends following a comparable pattern. Nonetheless, several noteworthy differences can be observed. First, the single-task performance on LoveDA is relatively low. As observed in Figure 3, this can be attributed to the highly imbalanced class distribution in LoveDA, which causes certain classes, such as agriculture and barren, to be frequently misclassified as background. In addition, EWC and LwF, which failed to retain prior knowledge in earlier building segmentation experiments (yielding near-zero scores at Step 4), show relatively better performance in LULC classification. This may be attributed to greater semantic consistency or overlap among LULC classes across domains, which facilitates better classification performance.

Another interesting observation concerns the behavior of ER. While ER showed substantial forgetting at Step 4 in building segmentation, its performance in LULC begins to degrade earlier, at Step 2, but then remains relatively stable in the subsequent steps. Building segmentation involves binary and consistent labels, allowing previous knowledge to be effectively reinforced through the memory buffer. In contrast, LULC classification includes many classes with new classes introduced at each step. Due to the batch-level replay mechanism in ER, which integrates previous and current samples within every batch, learning new classes persistently interferes with previously learned knowledge, and severe forgetting is, therefore, considered to have occurred in Step 2. In comparison, the proposed method integrates previous and current samples at the dataset level before batching, thereby reducing such interference and enabling more stable optimization even when new classes are introduced. This is supported by the experimental results, which show that the proposed method consistently mitigates forgetting, as also observed in the building segmentation experiments.

Figure 6 illustrates the LULC classification results obtained with the UNet model trained up to Step 4. The results for EWC and LwF on previous domains show that predictions are primarily focused on classes such as forest, grassland, water, and agriculture, which correspond to the classes present in the GID. However, both methods largely fail to predict classes absent from GID, such as cars, roads, and barren areas, suggesting substantial forgetting of these classes. Notably, despite the building class being commonly present across all domains, both EWC and LwF still perform poorly on this class. This is likely due to substantial domain discrepancies, which hinder consistent recognition. In contrast, ER seems to retain partial retention of classes absent from GID, demonstrating better knowledge preservation across diverse classes and domains.

Figure 7 provides a detailed visualization of the forgetting dynamics across incremental steps. As confirmed by the quantitative results, substantial forgetting occurs at Step 4 in the building segmentation, whereas in the LULC classification, it emerges earlier, starting from Step 2. Specifically, in the case of ER, prediction performance on the grassland class noticeably degrades at Step 2, which corresponds to the second domain, LoveDA, where this class is absent. This implies that classes not present in the current task are more susceptible to forgetting. Similarly, at Step 3, performance on the road class deteriorates, consistent with its absence in the DeepGlobe, and this trend continues through Step 4. Notably, at Step 4, a performance degradation is also observed for the building class, which is present in the GID, aligning with earlier observations in the building segmentation experiments. In contrast, the proposed method effectively preserves a broader range of class representations across incremental steps, mitigating forgetting not only for classes currently being learned but also for those learned in previous steps. These findings suggest that the proposed method generalizes well beyond binary segmentation to more general semantic segmentation problems such as LULC classification, underscoring its robustness and scalability under both strict and relaxed DIL settings. The corresponding DeepLabV3+ results are provided in Appendix A.

6. Discussion

6.1. Ablation Study on Proposed Sample Selection Strategy

As described earlier, the proposed sample selection strategy prioritizes samples based on a score function. This score is defined as the product of two components: the submodular gain, which promotes sample diversity and representativeness, and a task-specific evaluation score, which reflects the importance of each sample for the current task. To assess the individual contribution of each component, we conduct an ablation study using our primary experimental setup of incremental building segmentation with UNet.

Table 6 presents the results of the ablation study. When using only the submodular gain, performance in the early incremental steps was comparable to or slightly higher than the random baseline, whereas in Step 4, performance decreased. This can be attributed to the fact that the submodular gain prioritizes selecting samples that increase the diversity of the memory buffer rather than those critical for preserving previously acquired knowledge, which likely led to the omission of key samples in the later steps. Meanwhile, when using only the evaluation score, the opposite trend was observed. Even though the evaluation score reflects the task-specific importance of samples in the early incremental steps, it may not adequately account for sample diversity, potentially causing selected samples to be redundant or concentrated in certain regions. However, as learning progresses, samples that are essential for maintaining performance on previously learned tasks are gradually incorporated, which appears to improve performance in the later steps. When both components are combined, the most stable and superior performance is achieved across all incremental steps, indicating that they complement each other. This demonstrates that by jointly considering both sample diversity and informativeness, the proposed sample selection method enhances both stability and overall performance throughout the learning process.

6.2. Effect of Sampling Ratio

In replay-based continual learning methods, the size of the memory buffer—i.e., the number of stored samples—is a key factor in balancing performance and resource efficiency. Too few samples can lead to severe forgetting, while too many increase resource consumption and reduce practicality. Therefore, selecting an appropriate sampling ratio is essential for scalability in real-world applications. To better understand the impact of the sampling ratio on knowledge retention, we conducted experiments with varying ratios (0.1 to 0.9) using UNet for incremental building segmentation. For each sampling ratio, the BWT values were measured across three incremental steps (BWT₂, BWT₃, and BWT₄), and the results are visualized in Figure 8.

As can be seen in Figure 8, increasing the sampling ratio generally led to BWT values approaching zero across all steps. In the low sampling range (0.1 to 0.4), BWT₂ maintained relatively higher values, suggesting better knowledge retention in earlier stages. In contrast, BWT₃ and BWT₄ exhibited significantly lower values in the same range, reflecting more severe forgetting as the learning progressed. Interestingly, both BWT₃ and BWT₄ showed a sharp recovery at a sampling ratio of 0.5, with BWT values rapidly approaching zero. Beyond this point, the BWT values remained relatively stable, with only marginal improvements, suggesting that the benefit of additional sample storage diminishes after a certain threshold. These findings indicate that, while increasing the sampling ratio is effective for reducing forgetting, storing excessive samples may not yield further performance gains and could instead introduce inefficiencies in resource usage. Therefore, we recommend using at least a 0.5 sampling ratio to ensure a desirable trade-off between performance and computational efficiency. Further experiments, including variations of the sampling ratio in other settings, are reported in the Appendix B.

6.3. Computational Efficiency Analysis

In practical scenarios, training time and memory usage are also critical considerations for the deployment of continual learning systems. To assess these aspects, training time and memory consumption were monitored throughout our primary experimental setting. Training time was measured as the average duration per epoch during Step 4, and memory usage was recorded as the peak memory consumption observed throughout the training up to Step 4.

As shown in Table 7, joint learning took 902.55 s per epoch, while the proposed method only took 441.28 s, yielding a total reduction of about 13 h over 100 epochs. As the number of tasks increases in future scenarios, this time gap is expected to grow even larger. This demonstrates that the proposed method offers a time-efficient solution for DIL, which is particularly beneficial in real-world applications with limited computational resources. Furthermore, the proposed algorithm demonstrated the lowest memory consumption among the continual learning benchmarks, reducing memory usage by almost half compared to ER. Although its training time is slightly longer than that of other continual learning methods, it provides a well-balanced trade-off between performance, training time, and memory efficiency. Therefore, the proposed method can be regarded as a practical and scalable solution for deploying deep learning models in remote sensing applications within DIL scenarios. Further results from other experimental settings are provided in Appendix C.

7. Conclusions

In this paper, we propose ER-PASS, an experience replay-based continual learning algorithm that leverages the strengths of joint learning and experience replay for DIL in remote sensing. ER-PASS integrates a performance-aware submodular sample selection strategy to enhance the stability of the learning process across evolving domains. We conducted experiments on two distinct remote sensing applications: building segmentation and LULC classification. In both applications, ER-PASS outperforms the benchmarks in terms of AIA and BWT, demonstrating its effectiveness in mitigating catastrophic forgetting and its generalizability across diverse applications. Furthermore, experiments with multiple model architectures, including UNet and DeepLabV3+, show that ER-PASS is model-agnostic and can be flexibly integrated into various network structures. Additional analyses confirm that the proposed sample selection strategy positively impacts adaptability and forgetting mitigation, with stable performance maintained when the sampling ratio is 0.5 or higher. Finally, ER-PASS exhibits reasonable efficiency in terms of training time and memory consumption. These results suggest that ER-PASS can be practically applied in real-world remote sensing scenarios.

Nevertheless, ER-PASS has several limitations. First, ER-PASS has only been evaluated on two representative applications—building segmentation and LULC classification—using solely optical imagery. Its applicability to a wider range of remote sensing tasks, such as change detection or super-resolution, as well as its generalizability to more heterogeneous remote sensing domains, such as hyperspectral or synthetic aperture radar imagery, remains to be validated. Second, the experiments were limited to domain sequences from high resolution to low resolution, and the impact of domain order was not fully analyzed. Therefore, performance across various domain orders should be evaluated in future work. Third, the submodular gain calculation involved in the sample selection process of ER-PASS includes matrix operations, which leads to increased computational complexity over successive incremental learning steps. Hence, efficient computation strategies for large-scale matrix operations need to be explored. Lastly, while ER-PASS was primarily compared with relatively classical methods, additional evaluations against recent state-of-the-art benchmarks are required for a more comprehensive assessment. Furthermore, as a model-agnostic learning algorithm, ER-PASS could be further explored for potential integration with architecture-based approaches proposed in recent studies.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L. and D.L.; software, Y.L.; validation, Y.L.; formal analysis, Y.L., D.L. and T.K.; investigation, Y.L. and T.K.; resources, Y.K.; data curation, Y.L.; writing—original draft, Y.L.; writing—review and editing, D.L., T.K. and Y.K.; visualization, Y.L.; supervision, Y.K.; project administration, Y.K.; funding acquisition, Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2023R1A2C2005548). This work was also supported by the Korea Agency for Infrastructure Technology Advancement(KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant RS-2022-00155763).

Data Availability Statement

Publicly available datasets used in this paper are open-sourced and can be accessed through the following URLs: Potsdam: https://www.kaggle.com/datasets/bkfateam/potsdamvaihingen/ (accessed on 22 July 2025); LoveDA: https://zenodo.org/records/5706578 (accessed on 22 July 2025); DeepGlobe: https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset/ (accessed on 22 July 2025); GID: https://x-ytong.github.io/project/GID.html (accessed on 22 July 2025).

Acknowledgments

The Institute of Engineering Research at Seoul National University provided research facilities for this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIA	Average Incremental Accuracy
BWT	Backward Transfer
CIL	Class Incremental Learning
DIL	Domain Incremental Learning
ER	Experience Replay
EWC	Elastic Weight Consolidation
IoU	Intersection-over-Union
LULC	LandUse/LandCover
LwF	Learning without Forgetting
mIoU	mean Intersection-over-Union
TIL	Task Incremental Learning

Appendix A

Appendix A provides supplementary visualizations obtained using DeepLabV3+. Figure A1 and Figure A2 present the results for building segmentation, while Figure A3 and Figure A4 show the results for LULC classification. Across all figures, the observed trends are consistent with those reported in Section 5.1 and Section 5.2, confirming that the proposed method effectively mitigates forgetting throughout incremental learning steps. These results further highlight the model-agnostic nature of the proposed method, demonstrating its applicability across different network architectures.

Figure A1. Qualitative building segmentation results on each domain using the DeepLabV3+ model trained up to Step 4. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Figure A2. Qualitative building segmentation results on the first domain using the DeepLabV3+ model trained at each incremental step. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Figure A3. Qualitative LULC classification results on each domain using the DeepLabV3+ model trained up to Step 4. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Figure A4. Qualitative LULC classification results on the first domain using the DeepLabV3+ model trained at each incremental step. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Appendix B

In addition to the results reported in Section 6.2, we further investigated the impact of sampling ratios in other settings. Figure A5 presents the building segmentation results obtained with the DeepLabV3+, which exhibit a trend consistent with the findings in the main text: increasing the sampling ratio reduces forgetting, while improvements of BWT tend to saturate beyond a ratio of 0.5. Figure A6 and Figure A7 present the LULC classification results obtained with UNet and DeepLabV3+, respectively. Although a noticeable increase in BWT values was observed around a sampling ratio of 0.3, the BWT values consistently remained above -0.05 once the sampling ratio reached 0.5. These findings further support our recommendation to use a sampling ratio of at least 0.5 to achieve an efficient trade-off between performance and resource usage.

Figure A5. Effect of sampling ratio on Backward Transfer (BWT) in incremental building segmentation with DeepLabV3+.

Figure A6. Effect of sampling ratio on Backward Transfer (BWT) in incremental LULC classification with UNet.

Figure A7. Effect of sampling ratio on Backward Transfer (BWT) in incremental LULC classification with DeepLabV3+.

Appendix C

To further validate the efficiency of the proposed method, we evaluated training time and memory consumption under additional experimental settings, with detailed results summarized in Table A1, Table A2 and Table A3. Across all settings, the proposed method consistently achieved a favorable trade-off among performance, training time, and memory usage. In particular, it consumed significantly less memory compared to ER while maintaining competitive training times relative to other continual learning baselines. These results further support that the proposed method is a practical and scalable solution for continual learning in remote sensing applications.

Table A1. Comparison of training time and memory usage at Step 4 of incremental building segmentation with DeepLabV3+. Bold indicates our method.

Method	Epoch Time (s)	Peak Memory Usage (MB)
Joint learning	1625.15	9840.22
EWC [40]	594.19	10,474.58
LwF [69]	653.16	10,399.52
ER [45]	914.49	18,413.60
Ours	809.86	10,068.95

Table A2. Comparison of training time and memory usage at Step 4 of incremental LULC classification with UNet. Bold indicates our method.

Method	Epoch Time (s)	Peak Memory Usage (MB)
Joint learning	933.90	6165.76
EWC [40]	358.37	6376.93
LwF [69]	372.17	7537.90
ER [45]	457.87	11,668.95
Ours	393.82	6281.15

Table A3. Comparison of training time and memory usage at Step 4 of incremental LULC classification with DeepLabV3+. Bold indicates our method.

Method	Epoch Time (s)	Peak Memory Usage (MB)
Joint learning	1490.93	9949.83
EWC [40]	537.73	10,614.43
LwF [69]	618.66	10,734.65
ER [45]	823.51	18,580.31
Ours	730.07	10,227.50

References

Wellmann, T.; Lausch, A.; Andersson, E.; Knapp, S.; Cortinovis, C.; Jache, J.; Scheuer, S.; Kremer, P.; Mascarenhas, A.; Kraemer, R.; et al. Remote sensing in urban planning: Contributions towards ecologically sound policies? Landsc. Urban Plan. 2020, 204, 103921. [Google Scholar] [CrossRef]
Pham, H.M.; Yamaguchi, Y.; Bui, T.Q. A case study on the relation between city planning and urban growth using remote sensing and spatial metrics. Landsc. Urban Plan. 2011, 100, 223–230. [Google Scholar] [CrossRef]
Kuffer, M.; Pfeffer, K.; Persello, C. Special issue “remote-sensing-based urban planning indicators”. Remote Sens. 2021, 13, 1264. [Google Scholar] [CrossRef]
Hoalst-Pullen, N.; Patterson, M.W. Applications and trends of remote sensing in professional urban planning. Geogr. Compass 2011, 5, 249–261. [Google Scholar] [CrossRef]
Song, W.; Song, W.; Gu, H.; Li, F. Progress in the remote sensing monitoring of the ecological environment in mining areas. Int. J. Environ. Res. Public Health 2020, 17, 1846. [Google Scholar] [CrossRef]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
Khan, S.M.; Shafi, I.; Butt, W.H.; Diez, I.d.l.T.; Flores, M.A.L.; Galán, J.C.; Ashraf, I. A systematic review of disaster management systems: Approaches, challenges, and future directions. Land 2023, 12, 1514. [Google Scholar] [CrossRef]
Ye, P. Remote sensing approaches for meteorological disaster monitoring: Recent achievements and new challenges. Int. J. Environ. Res. Public Health 2022, 19, 3701. [Google Scholar] [CrossRef] [PubMed]
Lei, T.; Wang, J.; Li, X.; Wang, W.; Shao, C.; Liu, B. Flood disaster monitoring and emergency assessment based on multi-source remote sensing observations. Water 2022, 14, 2207. [Google Scholar] [CrossRef]
Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
Munawar, H.S.; Hammad, A.W.; Waller, S.T. Remote sensing methods for flood prediction: A review. Sensors 2022, 22, 960. [Google Scholar] [CrossRef]
White, J.C.; Coops, N.C.; Wulder, M.A.; Vastaranta, M.; Hilker, T.; Tompalski, P. Remote sensing technologies for enhancing forest inventories: A review. Can. J. Remote Sens. 2016, 42, 619–641. [Google Scholar] [CrossRef]
Khanal, S.; Kc, K.; Fulton, J.P.; Shearer, S.; Ozkan, E. Remote sensing in agriculture—Accomplishments, limitations, and opportunities. Remote Sens. 2020, 12, 3783. [Google Scholar] [CrossRef]
Luo, L.; Li, P.; Yan, X. Deep learning-based building extraction from remote sensing images: A comprehensive review. Energies 2021, 14, 7982. [Google Scholar] [CrossRef]
Digra, M.; Dhir, R.; Sharma, N. Land use land cover classification of remote sensing images based on the deep learning approaches: A statistical analysis and review. Arab. J. Geosci. 2022, 15, 1003. [Google Scholar] [CrossRef]
Zhao, S.; Tu, K.; Ye, S.; Tang, H.; Hu, Y.; Xie, C. Land use and land cover classification meets deep learning: A review. Sensors 2023, 23, 8966. [Google Scholar] [CrossRef]
Michau, G.; Fink, O. Unsupervised transfer learning for anomaly detection: Application to complementary operating condition transfer. Knowl.-Based Syst. 2021, 216, 106816. [Google Scholar] [CrossRef]
Xu, M.; Wu, M.; Chen, K.; Zhang, C.; Guo, J. The eyes of the gods: A survey of unsupervised domain adaptation methods based on remote sensing data. Remote Sens. 2022, 14, 4380. [Google Scholar] [CrossRef]
Patel, V.M.; Gopalan, R.; Li, R.; Chellappa, R. Visual domain adaptation: A survey of recent advances. IEEE Signal Process. Mag. 2015, 32, 53–69. [Google Scholar] [CrossRef]
Jeeveswaran, K.; Arani, E.; Zonooz, B. Gradual divergence for seamless adaptation: A novel domain incremental learning method. arXiv 2024, arXiv:2406.16231. [Google Scholar] [CrossRef]
French, R.M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 1999, 3, 128–135. [Google Scholar] [CrossRef] [PubMed]
McClelland, J.L.; McNaughton, B.L.; O’Reilly, R.C. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 1995, 102, 419. [Google Scholar] [CrossRef]
Parisi, G.I.; Kemker, R.; Part, J.L.; Kanan, C.; Wermter, S. Continual lifelong learning with neural networks: A review. Neural Netw. 2019, 113, 54–71. [Google Scholar] [CrossRef]
Hadsell, R.; Rao, D.; Rusu, A.A.; Pascanu, R. Embracing change: Continual learning in deep neural networks. Trends Cogn. Sci. 2020, 24, 1028–1040. [Google Scholar] [CrossRef]
Van de Ven, G.M.; Tolias, A.S. Three scenarios for continual learning. arXiv 2019, arXiv:1904.07734. [Google Scholar] [CrossRef]
Zhou, D.W.; Wang, Q.W.; Qi, Z.H.; Ye, H.J.; Zhan, D.C.; Liu, Z. Class-incremental learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9851–9873. [Google Scholar] [CrossRef] [PubMed]
De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; Tuytelaars, T. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3366–3385. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Zhang, X.; Su, H.; Zhu, J. A comprehensive survey of continual learning: Theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5362–5383. [Google Scholar] [CrossRef]
Rui, X.; Li, Z.; Cao, Y.; Li, Z.; Song, W. DILRS: Domain-incremental learning for semantic segmentation in multi-source remote sensing data. Remote Sens. 2023, 15, 2541. [Google Scholar] [CrossRef]
Wang, M.; Yu, D.; He, W.; Yue, P.; Liang, Z. Domain-incremental learning for fire detection in space-air-ground integrated observation network. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103279. [Google Scholar] [CrossRef]
Huang, W.; Ding, M.; Deng, F. Domain Incremental Learning for Remote Sensing Semantic Segmentation with Multi-Feature Constraints in Graph Space. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5645215. [Google Scholar] [CrossRef]
Tasar, O.; Tarabalka, Y.; Alliez, P. Incremental learning for semantic segmentation of large-scale remote sensing data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3524–3537. [Google Scholar] [CrossRef]
Li, H.; Jiang, H.; Gu, X.; Peng, J.; Li, W.; Hong, L.; Tao, C. CLRS: Continual learning benchmark for remote sensing image scene classification. Sensors 2020, 20, 1226. [Google Scholar] [CrossRef]
Bhat, S.D.; Banerjee, B.; Chaudhuri, S.; Bhattacharya, A. CILEA-NET: Curriculum-based incremental learning framework for remote sensing image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5879–5890. [Google Scholar] [CrossRef]
Ammour, N. Continual learning using data regeneration for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Feng, Y.; Sun, X.; Diao, W.; Li, J.; Gao, X.; Fu, K. Continual learning with structured inheritance for semantic segmentation in aerial imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
Marsocci, V.; Scardapane, S. Continual barlow twins: Continual self-supervised learning for remote sensing semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 5049–5060. [Google Scholar] [CrossRef]
Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef] [PubMed]
Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 12310–12320. [Google Scholar]
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the COMPSTAT’2010: 19th International Conference on Computational Statistics, Paris, France, 22–27 August 2010; Keynote, Invited and Contributed Papers. Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
Lopez-Paz, D.; Ranzato, M. Gradient episodic memory for continual learning. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Chaudhry, A.; Ranzato, M.; Rohrbach, M.; Elhoseiny, M. Efficient lifelong learning with a-gem. arXiv 2018, arXiv:1812.00420. [Google Scholar]
Chaudhry, A.; Rohrbach, M.; Elhoseiny, M.; Ajanthan, T.; Dokania, P.K.; Torr, P.H.; Ranzato, M. On tiny episodic memories in continual learning. arXiv 2019, arXiv:1902.10486. [Google Scholar] [CrossRef]
Riemer, M.; Cases, I.; Ajemian, R.; Liu, M.; Rish, I.; Tu, Y.; Tesauro, G. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv 2018, arXiv:1810.11910. [Google Scholar]
Buzzega, P.; Boschini, M.; Porrello, A.; Abati, D.; Calderara, S. Dark experience for general continual learning: A strong, simple baseline. Adv. Neural Inf. Process. Syst. 2020, 33, 15920–15930. [Google Scholar]
Rebuffi, S.A.; Kolesnikov, A.; Sperl, G.; Lampert, C.H. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2001–2010. [Google Scholar]
Shin, H.; Lee, J.K.; Kim, J.; Kim, J. Continual learning with deep generative replay. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December2017. [Google Scholar]
Maracani, A.; Michieli, U.; Toldo, M.; Zanuttigh, P. Recall: Replay-based continual learning in semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7026–7035. [Google Scholar]
Ye, F.; Bors, A.G. Learning latent representations across multiple data domains using lifelong VAEGAN. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 777–795. [Google Scholar]
Liu, X.; Wu, C.; Menta, M.; Herranz, L.; Raducanu, B.; Bagdanov, A.D.; Jui, S.; de Weijer, J.V. Generative feature replay for class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 226–227. [Google Scholar]
Petit, G.; Popescu, A.; Schindler, H.; Picard, D.; Delezoide, B. Fetril: Feature translation for exemplar-free class-incremental learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 3911–3920. [Google Scholar]
Bhat, S.D.; Banerjee, B.; Chaudhuri, S.; Bhattacharya, A. Efficient curriculum based continual learning with informative subset selection for remote sensing scene classification. arXiv 2023, arXiv:2309.01050. [Google Scholar] [CrossRef]
Sun, H.; Xu, Y.; Fu, K.; Lei, L.; Ji, K.; Kuang, G. An Evaluation of Representative Samples Replay and Knowledge Distillation Regularization for SAR ATR Continual Learning. In Proceedings of the IEEE 2024 Photonics & Electromagnetics Research Symposium (PIERS), Chengdu, China, 21–25 April 2024; pp. 1–6. [Google Scholar]
Vitter, J.S. Random sampling with a reservoir. ACM Trans. Math. Softw. (TOMS) 1985, 11, 37–57. [Google Scholar] [CrossRef]
Sener, O.; Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv 2017, arXiv:1708.00489. [Google Scholar]
Krause, A.; Golovin, D. Submodular function maximization. Tractability 2014, 3, 3. [Google Scholar]
Yoon, J.; Madaan, D.; Yang, E.; Hwang, S.J. Online coreset selection for rehearsal-based continual learning. arXiv 2021, arXiv:2106.01085. [Google Scholar]
Das, A.; Bhatt, G.; Bhalerao, M.; Gao, V.; Yang, R.; Bilmes, J. Accelerating batch active learning using continual learning techniques. arXiv 2023, arXiv:2305.06408. [Google Scholar] [CrossRef]
Lee, H.; Kim, S.; Lee, J.; Yoo, J.; Kwak, N. Coreset selection for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 7682–7691. [Google Scholar]
Aljundi, R.; Lin, M.; Goujaud, B.; Bengio, Y. Gradient based sample selection for online continual learning. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Rottensteiner, F.; Sohn, G.; Gerke, M.; Wegner, J.D. ISPRS Semantic Labeling Contest; ISPRS: Leopoldshöhe, Germany, 2014; Volume 1, p. 4. [Google Scholar]
Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv 2021, arXiv:2110.08733. [Google Scholar]
Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. Deepglobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 172–181. [Google Scholar]
Tong, X.Y.; Xia, G.S.; Zhu, X.X. Enabling country-scale land cover mapping with meter-resolution satellite imagery. ISPRS J. Photogramm. Remote Sens. 2023, 196, 178–196. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of remote sensing images from multiple platforms. In the figure, the name of each platform is shown below, with its spatial resolution indicated in parentheses. The first row shows urban, and the second row shows rural. Differences in spatial resolution affect the size and detail of visible objects, with higher resolution enabling finer feature analysis and lower resolution covering broader areas.

Figure 2. Overall framework of ER-PASS for domain-incremental learning in remote sensing.

Figure 3. Class distribution of each domain used in the training.

Figure 4. Qualitative building segmentation results on each domain using the UNet model trained up to Step 4. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Figure 5. Qualitative building segmentation results on the first domain using the UNet model trained at each incremental step. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Figure 6. Qualitative LULC classification results on each domain using the UNet model trained up to Step 4. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Figure 7. Qualitative LULC classification results on the first domain using the UNet model trained at each incremental step. (a) Input image; (b) ground truth; (c) joint; (d) EWC; (e) LwF; (f) ER; (g) Ours.

Figure 8. Effect of sampling ratio on Backward Transfer (BWT) in incremental building segmentation with UNet.

Table 1. Description of each domain used in the sequence.

Dataset	Platform	Resolution (m)	Stride	# of Images	Redefined Class Labels
Potsdam [65]	Airborne	0.05	384	8550	Building, Road, Trees, Grassland, Cars, Background
LoveDA [66]	Airborne	0.3	512	7332	Building, Road, Forest, Water, Agriculture, Barren, Background
DeepGlobe [67]	WorldView-2	0.5	968	7227	Building, Forest, Grassland, Water, Agriculture, Barren, Background
GID [68]	Gaofen-2	4.0	512	10,737	Building, Forest, Grassland, Water, Agriculture, Background

Table 2. Quantitative results of UNet in incremental learning for building segmentation across four domains. The table reports IoU scores at each step, measured on all previously learned domains after training on the current task, as well as additional metrics such as AIA and BWT at the final step. ↑ indicates that a higher value is better. The best performance among the continual learning methods is highlighted in bold.

Method	Step 1	Step 2		Step 3			Step 4				AIA₄ ↑	BWT₄ ↑
Method	Potsdam	Potsdam	LoveDA	Potsdam	LoveDA	DeepGlobe	Potsdam	LoveDA	DeepGlobe	GID	AIA₄ ↑	BWT₄ ↑
Single task	0.8149	0.3672	0.5222	0.2246	0.2693	0.7391	0.0151	0.0028	0.0004	0.7276	-	-
Joint learning	-	0.8159	0.4761	0.8082	0.5192	0.7379	0.7859	0.5054	0.7414	0.6953	0.7078	0.0012
Fine-tuning	-	0.4284	0.5157	0.3032	0.2940	0.7391	0.0103	0.0102	0.0020	0.7220	0.4796	$- 0.6824$
EWC [40]	-	0.4300	0.5054	0.3578	0.2442	0.7386	0.0064	0.0050	0.0017	0.7175	0.4781	$- 0.6823$
LwF [69]	-	0.4386	0.5186	0.2664	0.3088	0.7210	0.0112	0.0168	0.0014	0.7250	0.4785	$- 0.6751$
ER [45]	-	0.6971	0.3629	0.5852	0.3854	0.5300	0.1899	0.2153	0.2827	0.6428	0.5444	$- 0.3400$
Ours	-	0.8113	0.4953	0.8157	0.4815	0.7338	0.8218	0.4703	0.7280	0.6899	0.7057	$- 0.0080$

Table 3. Quantitative results of DeepLabV3+ in incremental learning for building segmentation across four domains. The table reports IoU scores at each step, measured on all previously learned domains after training on the current task, as well as additional metrics such as AIA and BWT at the final step. ↑ indicates that a higher value is better. The best performance among the continual learning methods is highlighted in bold.

Method	Step 1	Step 2		Step 3			Step 4				AIA₄ ↑	BWT₄ ↑
Method	Potsdam	Potsdam	LoveDA	Potsdam	LoveDA	DeepGlobe	Potsdam	LoveDA	DeepGlobe	GID	AIA₄ ↑	BWT₄ ↑
Single task	0.7678	0.4507	0.4859	0.3360	0.2769	0.7228	0.0216	0.0072	0.0018	0.7095	-	-
Joint learning	-	0.8112	0.4784	0.7987	0.4667	0.7037	0.7925	0.4526	0.7351	0.6775	0.6834	0.0101
Fine-tuning	-	0.4671	0.4903	0.3303	0.2930	0.7328	0.0172	0.0128	0.0045	0.7138	0.4714	$- 0.6521$
EWC [40]	-	0.4311	0.5267	0.3876	0.2721	0.7406	0.0126	0.0027	0.0017	0.7077	0.4737	$- 0.6727$
LwF [69]	-	0.3893	0.4883	0.3488	0.2779	0.7463	0.0159	0.0038	0.0047	0.7044	0.4616	$- 0.6593$
ER [45]	-	0.7075	0.3916	0.6097	0.3941	0.5545	0.3501	0.3679	0.4466	0.6584	0.5731	$- 0.1831$
Ours	-	0.7941	0.5109	0.8234	0.4475	0.7380	0.8044	0.4705	0.7149	0.6993	0.6906	$- 0.0090$

Table 4. Quantitative results of UNet in incremental learning for LULC classification across four domains. The table reports mIoU scores at each step, measured on all previously learned domains after training on the current task, as well as additional metrics such as AIA and BWT at the final step. ↑ indicates that a higher value is better. The best performance among the continual learning methods is highlighted in bold.

Method	Step 1	Step 2		Step 3			Step 4				AIA₄ ↑	BWT₄ ↑
Method	Potsdam	Potsdam	LoveDA	Potsdam	LoveDA	DeepGlobe	Potsdam	LoveDA	DeepGlobe	GID	AIA₄ ↑	BWT₄ ↑
Single task	0.6434	0.1087	0.3611	0.1912	0.1717	0.5857	0.1114	0.1610	0.0955	0.6339	-	-
Joint learning	-	0.6425	0.3253	0.6304	0.3812	0.5755	0.6336	0.3928	0.5245	0.6738	0.5534	0.0019
Fine-tuning	-	0.1277	0.3515	0.1563	0.1610	0.5991	0.0640	0.1857	0.0845	0.6913	0.3613	$- 0.4201$
EWC [40]	-	0.1342	0.3705	0.1417	0.1202	0.6117	0.0659	0.2342	0.0969	0.6851	0.3645	$- 0.4097$
LwF [69]	-	0.1786	0.3679	0.1109	0.1098	0.5925	0.0821	0.2317	0.1829	0.6519	0.3688	$- 0.3691$
ER [45]	-	0.2925	0.2306	0.1757	0.2667	0.4220	0.2600	0.2425	0.3159	0.6173	0.3880	$- 0.1592$
Ours	-	0.6571	0.3811	0.6555	0.3710	0.5941	0.6286	0.3606	0.5272	0.6869	0.5636	$- 0.0344$

Table 5. Quantitative results of DeepLabV3+ in incremental learning for LULC classification across four domains. The table reports mIoU scores at each step, measured on all previously learned domains after training on the current task, as well as additional metrics such as AIA and BWT at the final step. ↑ indicates that a higher value is better. The best performance among the continual learning methods is highlighted in bold.

Method	Step 1	Step 2		Step 3			Step 4				AIA₄ ↑	BWT₄ ↑
Method	Potsdam	Potsdam	LoveDA	Potsdam	LoveDA	DeepGlobe	Potsdam	LoveDA	DeepGlobe	GID	AIA₄ ↑	BWT₄ ↑
Single task	0.6354	0.1425	0.3310	0.1244	0.1325	0.5391	0.0540	0.1962	0.1528	0.5630	-	-
Joint learning	-	0.6300	0.3110	0.6289	0.3602	0.6191	0.5886	0.3652	0.5638	0.6530	0.5462	$- 0.0160$
Fine-tuning	-	0.1320	0.3795	0.1466	0.1537	0.5799	0.0729	0.2002	0.1457	0.6563	0.3633	$- 0.3920$
EWC [40]	-	0.1769	0.3570	0.1618	0.1789	0.6125	0.0649	0.2018	0.1567	0.6679	0.3732	$- 0.3938$
LwF [69]	-	0.1853	0.3550	0.2107	0.1504	0.5083	0.0774	0.2568	0.2265	0.6069	0.3718	$- 0.3127$
ER [45]	-	0.3567	0.3142	0.3565	0.2866	0.3675	0.3537	0.3205	0.4240	0.5443	0.4296	$- 0.0729$
Ours	-	0.6275	0.3446	0.5908	0.3441	0.5805	0.5297	0.3521	0.5407	0.6164	0.5341	$- 0.0460$

Table 6. Results of the ablation study on the proposed sample selection algorithm in incremental building segmentation with UNet. Bold indicates our method.

Method	Submodular	Eval. Score	Score Function	Step 2		Step 3		Step 4
Method	Submodular	Eval. Score	Score Function	AIA₂	BWT₂	AIA₃	BWT₃	AIA₄	BWT₄
Random	✗	✗	Random	0.7305	$- 0.0036$	0.7042	$- 0.0499$	0.6907	$- 0.0470$
Submodular	✓	✗	$Intra - Inter$	0.7226	$- 0.0381$	0.7056	$- 0.0099$	0.6796	$- 0.1031$
Eval. Score	✗	✓	S	0.7199	$- 0.0821$	0.6990	$- 0.0360$	0.6930	$- 0.0144$
Ours	✓	✓	$(Intra - Inter) \cdot S$	0.7341	$- 0.0036$	0.7151	$- 0.0065$	0.7057	$- 0.0080$

Table 7. Comparison of training time and memory usage at Step 4 of incremental building segmentation with UNet. Bold indicates our method.

Method	Epoch Time (s)	Peak Memory Usage (MB)
Joint learning	902.55	5993.26
EWC [40]	354.23	6169.67
LwF [69]	379.14	7202.90
ER [45]	603.38	11,325.57
Ours	441.28	6062.77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, Y.; Lee, D.; Kwak, T.; Kim, Y. ER-PASS: Experience Replay with Performance-Aware Submodular Sampling for Domain-Incremental Learning in Remote Sensing. Remote Sens. 2025, 17, 3233. https://doi.org/10.3390/rs17183233

AMA Style

Lee Y, Lee D, Kwak T, Kim Y. ER-PASS: Experience Replay with Performance-Aware Submodular Sampling for Domain-Incremental Learning in Remote Sensing. Remote Sensing. 2025; 17(18):3233. https://doi.org/10.3390/rs17183233

Chicago/Turabian Style

Lee, Yeseok, Donghyeon Lee, Taehong Kwak, and Yongil Kim. 2025. "ER-PASS: Experience Replay with Performance-Aware Submodular Sampling for Domain-Incremental Learning in Remote Sensing" Remote Sensing 17, no. 18: 3233. https://doi.org/10.3390/rs17183233

APA Style

Lee, Y., Lee, D., Kwak, T., & Kim, Y. (2025). ER-PASS: Experience Replay with Performance-Aware Submodular Sampling for Domain-Incremental Learning in Remote Sensing. Remote Sensing, 17(18), 3233. https://doi.org/10.3390/rs17183233

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ER-PASS: Experience Replay with Performance-Aware Submodular Sampling for Domain-Incremental Learning in Remote Sensing

Abstract

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Domain-Incremental Learning in Remote Sensing

2.2. Replay-Based Continual Learning Algorithms

2.3. Sample Selection Strategies for Replay

3. Methodology

3.1. Overview

3.2. Proposed Algorithm

3.3. Performance-Aware Submodular Sample Selection

4. Experimental Settings

4.1. Datasets

4.2. Implementation Details

4.3. Evaluation Metrics

5. Results

5.1. Building Segmentation (Strict DIL Setting)

5.2. LULC Classification (Relaxed DIL Setting)

6. Discussion

6.1. Ablation Study on Proposed Sample Selection Strategy

6.2. Effect of Sampling Ratio

6.3. Computational Efficiency Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI