Meta-Learning in Land Use and Land Cover Classification: Review and Perspective

He, Wei; Li, Lianfa; Wu, Haoxiong; Gao, Xilin; Yang, Yichen; Zhang, Zixuan; Yang, Xiaomei; Ge, Yong

doi:10.3390/rs18121879

Open AccessSystematic Review

Meta-Learning in Land Use and Land Cover Classification: Review and Perspective

by

Wei He

^1,2,†

,

Lianfa Li

^1,2,*,†

,

Haoxiong Wu

^1,2,

Xilin Gao

^1,2,

Yichen Yang

^1,2

,

Zixuan Zhang

³,

Xiaomei Yang

^1,2

and

Yong Ge

^1,2

¹

State Key Laboratory of Resources and Environmental Information Systems, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Datun Road, Beijing 100101, China

²

University of the Chinese Academy of Sciences, Beijing 100049, China

³

Department of Mathematics, University of Manchester, Oxford Road, Manchester M13 9PL, UK

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2026, 18(12), 1879; https://doi.org/10.3390/rs18121879

Submission received: 22 April 2026 / Revised: 27 May 2026 / Accepted: 1 June 2026 / Published: 7 June 2026

(This article belongs to the Section AI Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Optimization-based and metric-based meta-learning dominate LULC classification research, with MAML and its variants being the most widely adopted, while memory-augmented methods remain underexplored due to computational overhead on high-dimensional remote sensing data.
Meta-learning consistently outperforms conventional pre-training followed by fine-tuning under significant domain shifts across multiple data modalities, by acquiring cross-task structural knowledge rather than reusing instance-level features.

What are the implications of the main findings?

Temporal dynamics modeling and multimodal data integration remain in early stages, calling for unified meta-learning frameworks that jointly address cross-regional, cross-temporal, and cross-modal generalization challenges arising from spatial heterogeneity.
The integration of meta-learning with remote sensing foundation models represents a promising pathway toward operationally deployable LULC systems, combining large-scale representation learning with rapid few-shot adaptation mechanisms.

Abstract

Deep learning has exhibited potential in land use and land cover (LULC) classification applications. However, the effectiveness of deep learning remains constrained by the availability and quality of annotated training data. The persistent scarcity of labeled samples and spatial heterogeneity of remote sensing imagery hinder the robustness and generalization of trained models. Meta-learning, commonly referred to as “learning to learn”, is a paradigm that trains models over a distribution of tasks to acquire transferable knowledge, enabling rapid adaptation to new tasks with only a few labeled samples. This cross-task learning capability makes meta-learning a promising solution to data scarcity and spatial heterogeneity in the remote sensing context. This paper provides a systematic review of meta-learning applications in LULC classification, identifying a total of 70 relevant studies between 2018 and 2025. Three mainstream meta-learning paradigms (memory-augmented, optimization-based, and metric-based) are reviewed, and the applications are analyzed across four core challenges in LULC remote sensing: label scarcity, cross-region and cross-domain distribution shifts, temporal dynamics modeling, and multimodal data integration. The review reveals that optimization-based and metric-based methods dominate current research, with MAML and its variants being the most widely adopted due to the model-agnostic property, while memory-augmented methods remain underexplored. A consistent finding is that meta-learning outperforms conventional pre-training followed by fine-tuning under significant domain shifts across multiple data modalities. Current limitations, including computational overhead, episodic training constraints, and the lack of standardized evaluation protocols, are discussed. Future directions in cross-domain generalization, integration with foundation models, novel architectures, and standardized benchmarks are identified.

Keywords:

meta-learning; land cover; land use; cross-domain learning; few-shot learning; data scarcity

1. Introduction

Land use and land cover (LULC) classification is a fundamental task in remote sensing, providing essential geospatial information for urban planning, environmental monitoring, agricultural management, and natural resource assessment [1]. Deep learning methods have advanced rapidly in recent years and have been increasingly applied to LULC classification and other remote sensing tasks [2,3], substantially enhancing the capabilities of geographic interpretation and feature extraction [4]. Despite these advancements, the performance of deep learning models remains highly dependent on the quality and quantity of annotated training data. Acquiring and labeling large volumes of data is both costly and labor-intensive in the remote sensing domain, where expert knowledge is often required for accurate annotation. Data scarcity and heterogeneity of remote sensing scenes continue to pose significant challenges for LULC classification.

Spatial heterogeneity, sensor variation and scale differences are the main factors contributing to the low generalization of trained models when applied across regions and times. Spatial heterogeneity is a fundamental geographic principle describing how environmental, social, or physical attributes vary systematically across space, creating distinct patterns of similarity and difference that operate at multiple scales from local to global [5]. Labeled datasets developed for one task or region typically lack transferability to new application settings. Differences in spectral bands, spatial resolution, and temporal characteristics of remote sensing imagery often lead existing studies to case-specific or scenario-dependent designs [6]. This limitation has fostered a prevailing “one case, one model” paradigm, hindering the scalability and generalization of current LULC classification approaches in remote sensing. To address these challenges, meta-learning, commonly referred to as “learning to learn”, has emerged as a promising approach that enables models to acquire transferable knowledge across a distribution of tasks, facilitating rapid adaptation to new scenarios with limited annotated samples [7]. Unlike conventional deep learning methods that optimize for a single fixed task, meta-learning emphasizes cross-task generalization, making it particularly well-suited for LULC applications characterized by pronounced regional variability and persistent label scarcity.

The suitability of meta-learning for LULC classification is grounded in a structural alignment between the formulation of meta-learning and three intrinsic properties of remote sensing data. First, the variability across geographic regions, sensor configurations, and acquisition periods can be naturally formalized as a distribution of related tasks rather than independent samples, which corresponds to the task-distribution assumption underlying meta-learning. Such variability has been widely documented in remote sensing, where differences in image acquisition conditions, sensor parameters, and regional landscape composition lead to systematic distribution shifts between source and target imagery [8]. While conventional supervised learning treats this variability as a source of distribution shift to be minimized, meta-learning treats it as the structural prior to be exploited. Second, the high annotation cost in remote sensing, which arises from the need for expert interpretation and field validation, constrains the size of labeled datasets achievable for any given region or sensor [9]. This constraint maps directly onto the few-shot setting that meta-learning is designed to address, where adaptation to a new task is performed with only a small support set. Third, although land cover types exhibit different spectral and textural appearances across regions, they share semantic structures that are spatially transferable. For instance, the spectral signatures of healthy vegetation, water bodies, or impervious surfaces remain identifiable across diverse environments despite local variations [2,10]. This shared structure provides the substrate of transferable meta-knowledge that the outer-level optimization in meta-learning seeks to capture. These three properties indicate that the alignment between meta-learning and LULC classification is not incidental but reflects a correspondence between the algorithmic formulation and the underlying data-generating process in remote sensing.

In recent years, an increasing number of studies have explored the integration of meta-learning techniques into LULC applications, aiming to improve few-shot classification [11], domain generalization [12], and cross-region adaptation [13]. These approaches encompass a variety of meta-learning paradigms, including optimization-based frameworks such as Model-Agnostic Meta-Learning (MAML) [14], metric-based, and memory-augmented methods, all aimed at enhancing adaptability across diverse tasks. Several studies have developed general feature extractors tailored to heterogeneous land cover types [15], while the others incorporate multimodal fusion strategies to exploit spatial, spectral, and elevation information [16]. In the field of remote sensing, several recent reviews have addressed related topics. Sun et al. [10] provided an overview of few-shot learning methods for remote sensing image interpretation, encompassing meta-learning alongside transfer learning and data augmentation strategies. Gama et al. [17] reviewed meta-learning approaches for few-shot weakly supervised segmentation. However, these reviews address few-shot learning or weakly supervised segmentation broadly, rather than focusing specifically on meta-learning’ role and effectiveness in LULC classification. Despite growing interest in meta-learning applications, no existing review has systematically synthesized the current state, methodological advances, and effectiveness of meta-learning approaches specifically for land use and land cover (LULC) classification.

To address this gap, this paper presents a systematical review of the developments and applications of meta-learning in LULC classification. The main contributions of this review are as follows: (1) we provide a structured review of representative meta-learning methods organized by the three mainstream paradigms (memory-augmented, optimization-based, and metric-based) and analyze their characteristics in the context of LULC applications; (2) we systematically review 70 peer-reviewed studies published between 2018 and 2025, synthesizing methodological innovations and key findings across four application domains; (3) we critically discuss the advantages and limitations of meta-learning for LULC, and identify future research directions including the integration with foundation models and emerging architectures.

The remainder of this paper is organized as follows. Section 2 describes the review methodology, including the literature search strategy, screening criteria, and review framework. Section 3 categorizes mainstream meta-learning paradigms and summarizes representative algorithms, covering memory-augmented, optimization-based, and metric-based methods. Section 4 reviews meta-learning applications in LULC remote sensing, organized around four core challenges: label scarcity, cross-region and cross-domain distribution shifts, temporal dynamics modeling, and multimodal data integration. Section 5 discusses the key findings, compares meta-learning with related learning paradigms, examines current limitations, and identifies future research directions. Section 6 concludes the paper.

2. Review Methodology

This systematic review is reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [18]. This review was not registered in any systematic review registry, and a formal review protocol was not prepared.

2.1. Literature Search Strategy

We conducted a systematic review of peer-reviewed journal articles, conference proceedings, and review papers published between 2018 and 2025 in the Web of Science database. The primary database used was the Web of Science Core Collection. This time frame was chosen because applications of meta-learning in the LULC domain were limited before 2018, as many of the foundational methods now widely adopted in LULC research were originally proposed during this period. During the search process, we identified key research topics and refined relevant search terms. The search queries combined method-related terms (“Meta-learning”, “Few-shot learning”, “Land Cover”, and “Land Use”). The initial search, restricted to the publication period 2018–2025, yielded 436 relevant papers.

2.2. Screening and Selection Criteria

The initial search results underwent a rigorous screening process based on titles, keywords, and abstracts. Studies were included if they: (1) employed meta-learning as the principal or core methodology (rather than as a minor auxiliary component); (2) addressed land use, land cover, or closely related classification tasks using remote sensing data. Studies were excluded if they: (1) focused on general image classification without a remote sensing or LULC context; (2) mentioned “land cover” only as background without applying meta-learning techniques. Notably, several studies related to general image classification were initially included due to the presence of the keyword “Land Cover”, despite not employing meta-learning techniques. After excluding such irrelevant entries, a total of 70 articles were identified as relevant to the application of meta-learning in remote sensing-based land cover and land use research. The search and selection process is summarized in the PRISMA flow diagram (Figure S4 in Supplementary Material).

2.3. Review Framework

This review is organized around two complementary dimensions. Firstly, Section 3 establishes the conceptual foundation by reviewing the three mainstream meta-learning paradigms: memory-augmented, optimization-based, and metric-based, focusing on their core mechanisms and representative algorithms. The selection of representative methods in Section 3 follows a criterion of methodological significance: we prioritize methods that introduced key architectural innovations or that have been widely adopted as baselines in subsequent LULC studies.

Secondly, Section 4 provides a systematic review of the 70 identified studies, organized by application domain within LULC. For each review study, we extracted the following information: meta-learning approach adopted, task type, data modality and dataset, key technical innovation, and reported performance. This structured extraction enables cross-study comparison and the identification of trends across the field. The relationship between Section 3 and Section 4 is thus complementary: Section 3 provides the methodological vocabulary, while Section 4 examines how these methods have been adapted and applied to address domain-specific challenges in LULC classification.

The annual distribution of the 70 identified publications is detailed in Section 4, showing a steady increase from a single article in 2018 to 26 in 2024, reflecting growing academic interest in leveraging meta-learning for LULC applications.

3. Meta-Learning Paradigms

3.1. Meta-Learning Fundamentals

The concept of “learning to learn” was first systematically articulated by Thrun and Pratt [19], who proposed that a learning system exposed to a sufficiently large number of related tasks should be able to extract structural commonalities that accelerate the acquisition of new tasks. This idea forms the foundation of modern meta-learning: rather than optimizing a model for a single task from scratch, the goal is to learn a learning strategy itself, which can be efficiently applied to novel or unseen tasks with minimal supervision [7]. In the context of LULC classification, this paradigm is particularly relevant. Instead of training a separate deep learning model for each geographic region or sensor configuration, meta-learning seeks to distill transferable knowledge from a collection of diverse classification tasks (e.g., land cover mapping across multiple regions) so that the model can rapidly adapt to a new region using only a small set of locally annotated samples.

Meta-learning can be conceptually distinguished from supervised learning by its operating scale (Figure 1). Standard supervised learning optimizes parameters

θ

on a single training dataset

D = {\{(x_{i}, y_{i})\}}_{i = 1}^{n}

, which in remote sensing practice corresponds to training and evaluating a classifier within one region or sensor configuration. Meta-learning instead operates over a distribution of tasks [7,14], which each task

T_{i}

is associated with a support set

D_{i}^{s u p p o r t} = {\{x_{k}, y_{k}\}}_{k = 1}^{K}

for learning and query set

D_{i}^{q u e r y} = {\{x_{l}, y_{l}\}}_{l = 1}^{L}

for evaluation [20]. In a few-shot LULC classification scenarios, the support set typically contains a small number of labeled samples (e.g., 1 or 5 examples per land cover class) from a target region, and the query set measures how well the model generalizes after adapting to that region. This episodic structure reflects the real-world constraint of limited labeled data in remote sensing.

The meta-learning process involves a bilevel optimization structure [14,21]. At the inner level, a base learner adapts to each individual task

T_{i}

by optimizing task-specific parameters

θ_{i}

using the support set:

{θ_{i}}^{*} = \arg \min_{θ_{i}} L_{T_{i}} (D_{i}^{s u p p o r t}; θ_{i}, ω)

(1)

φ^{*} = a r g \min_{φ} \sum_{T a s k ~ p (T)} [L_{m e t a} (D_{i}^{q u e r y}; θ^{* (i)} (ω))]

(2)

At the outer level, a meta-learner optimizes the meta-knowledge

ω

which may represent shared parameter initializations, learned distance metrics, or memory representations depending on the method, by evaluating the adapted models across multiple tasks.

L_{T_{i}}

and

L_{m e t a}

denote the task-level and meta-level loss functions. The key insight is that

ω

is optimized not to perform well on any single task, but to produce models

{θ_{i}}^{*}

that collectively perform well across many tasks after task-specific adaptation. In the LULC context, each task can correspond to a land cover classification problem in a specific region, sensor configuration, or time period. The meta-knowledge

φ

captures cross-region structural patterns that are transferable, while the task-specific parameters

{θ_{i}}^{*}

encode region-specific adaptation.

Based on the type of knowledge acquired by meta-learners, existing mainstream methodologies can be categorized into three mainstream paradigms [7,22]: (1) Memory-augmented meta-learning, which encodes task-relevant information in external or internal memory modules that enable rapid pattern retrieval and adaptation; (2) Optimization-based meta-learning, focusing on efficient parameter initialization and adaptation strategies; (3) Metric-based meta-learning, dedicated to constructing effective distance metric systems in feature space. The following subsections detail each paradigm, with a focus on architectural innovations relevant to LULC applications. For studies involving image classification or segmentation, the design of associated loss functions is additionally examined.

3.2. Memory-Augmented Meta-Learning

Memory-augmented meta-learning seeks to achieve rapid task adaptation through incorporating external or internal memory modules that store and retrieve task-relevant representations. The core premise is to encode the support set as a structured input and leverage memory-based mechanisms, such as content-addressable retrieval and attention-weighted updates, to produce contextually consistent predictions for query samples. The general procedure of this paradigm is illustrated in Figure 2, where support samples are first encoded and written into the memory module, and query samples subsequently interact with the stored representations through a read-out mechanism to generate predictions.

The development of this paradigm has progressed along two main directions. The first direction concerns the design of memory architecture. Memory-Augmented Neural Networks (MANN) [23] pioneered the paradigm by coupling a neural controller with a content-addressable memory matrix based on the Neural Turing Machine (NTM) architecture [24]. Subsequent methods have explored alternative memory designs, including implicit attention-based memory in SNAIL [25], compressed task embeddings in Conditional Neural Processes (CNPs) [26], and dual-stream fast-weight generation in Meta Networks [27]. The second direction focuses on memory efficiency and selective retention, where recent methods such as Adaptive Posterior Learning (APL) [28], MATE [29], The Graph Augmentation Module (GAM) [30], EMO [31], and CSM [32] address how to prioritize informative samples, model task complexity, or transfer knowledge across tasks with reduced computational overhead. A representative subset of these methods, together with their core innovations, is summarized in Table 1. For a chronological overview of the development of these methods, readers are referred to Figure S1 in the Supplementary Material.

In the context of LULC remote sensing, memory-augmented methods remain the least explored among the three paradigms (in Section 4). This is likely attributable to the high computational cost of maintaining external memory modules when processing multimodal data such as hyperspectral imagery. Nonetheless, their capacity for incremental knowledge retention and cross-temporal pattern retrieval could be particularly valuable for dynamic LULC monitoring scenarios where new land cover classes emerge over time, such as urban expansion tracking or post-disaster land cover updating.

3.3. Optimization-Based Meta-Learning

Optimization-based meta-learning primarily aims to learn an initialization of model parameters that enables models to rapidly achieve optimal performance on new tasks. The meta-knowledge in this paradigm is the shared parameter initialization itself. By optimizing this initialization across a distribution of training tasks, the resulting model can quickly converge to good task-specific solutions with minimal labeled data. For image classification or segmentation tasks, cross-entropy loss remains the most widely adopted objective at both the inner and outer optimization levels, due to its effectiveness and general applicability. The general procedure and the bilevel optimization structure of this paradigm are illustrated in Figure 3.

The model-agnostic meta-learning (MAML) [14] is the foundational and most widely adopted method in this category. MAML employs a dual-level parameter update strategy: an inner loop performs task-specific adaptation using the support set via one or a few gradient steps, while an outer loop optimizes the shared initialization by evaluating the adapted parameters on query sets across multiple tasks (Figure 4). This bilevel structure, which balances immediate task performance with long-term cross-task knowledge retention, has made MAML a reference framework for much of the subsequent research. Meta-LSTM [33] provides an alternative perspective by using an LSTM architecture to explicitly model the optimization trajectory, learning both task-specific update rules and a generic initialization (Figure 5).

Building upon MAML, subsequent research has advanced along several distinct directions. (1) Computational simplification, including first-order approximations such as Reptile [34] and partial-update strategies such as ANIL [35] and BOIL [36], which reduce the cost of second-order gradient computation; (2) Probabilistic extensions, including LLAMA [37], PLATIPUS [38], and BMAML [39], which reformulate MAML within Bayesian frameworks to enable principled uncertainty quantification; (3) Robustness and generalization, including TAML [40], LEO [41], and CAML [42], which address meta-overfitting, high-dimensional parameter spaces, and class-level dependencies, respectively; (4) Meta-learned optimizers, including Meta-SGD [43] and Meta-Adam [44], which learn not only the initialization but also the optimization procedure itself; and (5) Recent advances, including MT-net [45], DEML [46], GP-MAML/GP-ANIL/GP-BOIL [47], and XB-MAML [48], which extend MAML through subspace learning, concept-space meta-learning, pseudo-label augmentation, and dynamically expandable basis parameters. A representative subset of these methods, together with their core innovations, is summarized in Table 2. For a chronological overview, readers are referred to Figure S2 in Supplementary Material.

Optimization-based methods, particularly MAML and its variants, have been the most widely adopted meta-learning paradigm in LULC applications (in Section 4). Their model-agnostic nature allows flexible integration with various backbone architectures commonly used in remote sensing, including CNNs and Transformers. However, the computational overhead of bilevel optimization, especially the second-order gradient computation in MAML, remains a practical concern for large-scale LULC mapping applications, which has motivated the adoption of first-order alternatives such as Reptile and ANIL in several remote sensing studies.

3.4. Metric-Based Meta-Learning

Metric-based learning, also known as similarity-based learning, focuses on learning an embedding or distance metric that captures discriminative feature representations, enabling simple classifiers to perform well with limited samples. The meta-knowledge in this paradigm takes the form of a learned embedding function by projecting both support and query samples into a shared feature space, and classification is performed by comparing their pairwise distances without requiring task-specific parameter fine-tuning. This non-parametric inference procedure makes metric-based methods computationally efficient at test time and particularly well-suited for few-shot classification tasks. For this class of methods, the loss function is typically designed to ensure that the distance between an input sample and its corresponding positive class is smaller than the distance to negative classes. The general procedure of this method is outlined below (Figure 6).

Three foundational methods define the core of this paradigm. Matching Network [20] introduces attention-based mapping from a small support set to label predictions, performing one-shot classification by comparing learned embeddings through a differentiable nearest-neighbor mechanism trained end-to-end. Prototypical Network [49] constructs a metric space in which each class is represented by its prototype (the mean embedding of its support examples) and classifies query samples by nearest-prototype distance. Relation Network [50] replaces fixed distance functions with a learned deep relation module that computes non-linear similarity scores between query and support embeddings, enabling richer relationship modeling.

Subsequent research has extended these foundations in several directions. (1) Prototype refinement, including Principal Characteristics Net [51] and IPNET [52], which improve prototype quality through contribution-based weighting and distributional-influence weighting via Maximum Mean Discrepancy; (2) Relational modeling enhancement, including K-Tuple Network [53], Task-Adaptive Relation-Dependent Network [54], Attention-Enhanced Relation Network [55] and MLFRNet [56], which model richer multi-sample, task-adaptive, attention-based, or local-feature relationships beyond pairwise comparison; (3) Cross-domain generalization, including LDP-Net [57], SS-Matching Networks [58], and Prototypical Siamese Networks [59], which improve transferability across domains through global-local knowledge distillation, scheduled sampling, and refined prototypical representations; and (4) Hybrid and neighbor-based approaches, including TPN [60], PNN [61], which combine prototype learning with NOTA calibration or KNN-inspired neighbor modeling for cross-domain generalization and complex distribution handling. A representative subset of these methods, together with their core innovations, is summarized in Table 3. For a chronological overview, readers are referred to Figure S3 in the Supplementary Material.

Metric-based methods are particularly prominent in LULC applications involving hyperspectral image classification, where the high-dimensional embedding space naturally favors distance-based classification strategies (in Section 4). Their simplicity and inference-time efficiency, requiring no fine-tuning on new tasks, make them attractive for operational remote sensing scenarios. However, as demonstrated by recent studies in HSI classification (discussed in Section 4.3), the choice of distance metric significantly impacts performance, with adaptive metrics (e.g., Mahalanobis distance, probabilistic metrics) increasingly replacing standard Euclidean measures to better capture spectral variability and class distribution complexity in remote sensing data.

3.5. Section Summary

This section has reviewed the representative approaches and developmental trajectories of the three mainstream meta-learning paradigms. Their complementary strengths and limitations in the context of LULC applications can be summarized as follows:

Memory-augmented meta-learning can dynamically adjust its internal memory state for task-specific adaptation, making it conceptually suitable for scenarios requiring incremental knowledge accumulation. However, it often exhibits limited transferability across substantially different domains and demands considerable computational resources and memory capacity, which constrains its practical applicability to multimodal remote sensing data.
Optimization-based methods possess the widest applicability across diverse tasks due to their model-agnostic nature, allowing flexible integration with diverse backbone architectures commonly used in remote sensing (e.g., CNNs and Transformers). However, it remains computationally intensive owing to bilevel optimization, particularly the second-order gradient computation required by MAML.
Metric-based methods are relatively straightforward and computationally efficient at inference time, yet their performance is sensitive to the quality of the learned embedding space and is largely confined to supervised classification settings.

Computational complexity of representative methods. Reviewing the full set of methods listed in Table 1, Table 2 and Table 3 in terms of computational complexity is impractical and would also be uninformative, since many variants share the same asymptotic order as their parent method (for example, IPNET, PA-SRM, and TPN inherit the complexity of Prototypical Network with only constant-factor modifications). We therefore restrict the analysis to eight representative methods selected by three principles: (1) the foundational method of each paradigm; (2) variants that introduce a substantive change in asymptotic order rather than constant-factor differences; and (3) methods that have been most widely adopted in the LULC literature reviewed in Section 4. Table 4 summarizes the resulting analysis. The complexity expressions are derived directly from the algorithmic descriptions in the corresponding original papers; implementation-level constants such as parallelism and framework overhead are abstracted away.

Three sources of computational overhead distinguish the paradigms. First, memory-augmented methods incur a cost on the order of

O (M \cdot D)

per read/write operation, which grows with both memory capacity and embedding dimension. For high-dimensional remote sensing data such as hyperspectral imagery with hundreds of spectral bands, this cost becomes prohibitive, which partially explains the limited adoption of this paradigm in LULC applications. Second, optimization-based methods introduce overhead through bilevel optimization: the second-order gradient computation in canonical MAML scales as

O (T \cdot | θ |)

per task but requires storing the full inner-loop computational graph, which substantially increases memory consumption. First-order approximations such as Reptile and ANIL reduce this overhead by avoiding second-order derivatives or restricting updates to a subset of parameters. ANIL reports over 4× inference speedup over MAML in its original benchmark [35], while Reptile achieves comparable computational efficiency to first-order MAML, both at modest accuracy cost. Third, metric-based methods are the most efficient at inference time because they require no task-specific parameter updates: classification reduces to embedding computation followed by distance comparison, with overall complexity typically

O (N \cdot D)

for prototype construction and

O (C \cdot Q \cdot D)

for query classification. This inference efficiency is a primary reason for the prevalence of metric-based methods in operational LULC scenarios where rapid deployment is required. It should be noted that reported speedups are dataset- and hardware-dependent, and absolute training times are not directly comparable across studies; the analysis above is intended to characterize relative scaling behavior rather than absolute performance.

In the LULC domain, each paradigm offers distinct practical advantages. Optimization-based methods (especially MAML and its first-order variants) provide model-agnostic flexibility suitable for diverse remote sensing data types and backbone architectures, making them the most widely adopted paradigm in current LULC research. Metric-based methods offer computational efficiency during deployment, which is beneficial for operational land cover mapping at scale. Memory-augmented methods, though currently underexplored, hold potential for scenarios requiring long-term knowledge retention or incremental class discovery. Overall, meta-learning has increasingly served as a flexible framework that can be seamlessly integrated with various deep learning architectures, offering practical potential for addressing the spatial heterogeneity and label scarcity challenges that are central to large-scale LULC mapping. To facilitate comparison, Table 5. summarizes the three meta-learning paradigms in terms of their core mechanisms, strengths, limitations, and suitability for LULC applications.

4. LULC Application in Remote Sensing

4.1. Overview of Meta-Learning Applications in LULC

The annual number of publications in this domain has shown a steady increase, rising from a single article in 2018 to 26 in 2024 (Figure 7), reflecting the growing academic interest in leveraging meta-learning for land cover and land use. This growth trajectory can be broadly characterized by three phases: an exploratory phase (2018–2020) in which foundational meta-learning methods were directly transferred to remote sensing tasks, a rapid development phase (2021–2023) marked by the emergence of domain-specific adaptations tailored to the unique characteristics of remote sensing data, and a convergence phase (2024–2025) in which meta-learning has begun to integrate with foundation models and other emerging methods. In the LULC domain, most existing studies have primarily adopted optimization-based and metric-based methods, while applications of memory-augmented meta-learning remain relatively scarce, likely due to its high computational complexity and limited scalability when handling multimodal, heterogeneous remote sensing data.

To characterize the research landscape, the distribution of existing studies across different data modalities and task settings is examined. Based on the 70 identified studies, four research categories are summarized: few-shot classification (48.6%), hyperspectral image classification (30.0%), time series classification (14.3%), and multimodal data fusion (7.1%) (Figure 8). This data-modality-based categorization directly reflects where meta-learning has been most actively adopted in the LULC domain and reveals the relative maturity of research across different data types. The dominance of few-shot classification and hyperspectral image classification indicates that label scarcity and multimodal spectral data remain the primary contexts driving meta-learning adoption, while the smaller proportions of time series and multimodal studies suggest emerging but less explored research fronts. To provide a complementary view of the research landscape organized by the challenges addressed in Section 4.2, Section 4.3, Section 4.4 and Section 4.5, Table 6. summarizes representative studies under each LULC challenge, together with the meta-learning paradigms adopted, the data modalities involved, and the main contribution of each study.

Figure 9 presents a tree-structured visualization that systematically maps the recent development of meta-learning approaches in LULC. The trunk symbolizes the overarching trajectory of meta-learning research, while the branches reflect specific application directions, such as regional heterogeneity and multimodal heterogeneity. Each leaf denotes a representative study, chronologically arranged to depict the technological evolution over time. The studies highlighted in this review adopt meta-learning as the principal strategy to address domain-specific challenges, rather than treating it as an auxiliary tool.

While the above data-modality-based overview effectively captures the empirical landscape of current research, understanding how meta-learning contributes to advancing LULC classification requires a complementary analytical perspective. Studies across different data modalities often address shared underlying challenges: a few-shot classification study on optical imagery and a hyperspectral image classification study may both fundamentally tackle label scarcity, while a cross-scene HSI study and a multimodal fusion study may both confront domain distribution shifts. To provide this deeper analytical understanding, the following subsections are organized around the core challenges that meta-learning addresses in LULC remote sensing: label scarcity (Section 4.2), cross-region and cross-domain distribution shifts (Section 4.3), temporal dynamics modeling (Section 4.4), and multimodal data integration (Section 4.5). This challenge-driven framework complements the data-modality perspective of Figure 8. by revealing the underlying mechanisms through which meta-learning advances LULC classification.

It should be noted that the following review does not attempt to exhaustively describe all 70 identified studies. Instead, representative and methodologically significant works are selected to best illustrate how meta-learning techniques address each core challenge and contribute to advancing LULC classification, with a focus on methodological innovations and practical implications for the remote sensing domain.

4.2. Label Scarcity in LULC

Few-shot learning is a learning paradigm designed to learn from a small number of labeled examples [76,77]. LULC classification aims to categorize satellite or aerial imagery into predefined classes, thereby providing essential geographic information. This task is particularly challenging due to substantial intra-class variability across varying regions and seasons, along with the scarcity of labeled data in many areas [9]. In remote sensing, acquiring and labeling training data demands specialized prior knowledge and fieldwork, making the construction of large-scale labeled datasets costly and time-consuming compared to the natural image domain. Traditional deep learning methods struggle to generalize to new geographic regions or novel classes when training samples are limited. Meta-learning, especially under the few-shot learning paradigm, addresses this constraint by enabling models to learn from many small training tasks so that they can rapidly adapt to new land cover/use classes using only a few labeled samples [13]. Recent studies have demonstrated progress in applying few-shot meta-learning to remote sensing tasks [10], including image classification [78], semantic segmentation [79], object detection [80], and scene classification [81].

Optimization-based meta-learning has been applied to address label scarcity in LULC classification, particularly in hyperspectral image (HSI) scenarios where labeled samples are scarce. Gao et al. (2021) introduced a MAML-based framework for HSI classification under limited labeled data [62]. This formulation enables fast adaptation to new classes with only a few samples and minimal gradient updates, addressing the label scarcity that typically constrains HSI classification. Amoako et al. (2025) proposed MLOSL, a meta-learning framework with an orthogonal Softmax layer that constructs unsupervised meta-tasks by extracting multi-view spectral features from different bands and spatial features through data augmentation [63]. The orthogonal Softmax promotes diverse representations for generalization, while label smoothing mitigates class imbalance and misclassification, reducing the structural and computational complexity of CNNs in small-sample HSI classification. These two studies indicate that optimization-based methods, through their model-agnostic property, can acquire transferable initializations for multimodal spectral data with limited annotations.

Metric-based meta-learning has also been applied to few-shot LULC classification, particularly for tasks where class prototypes can be constructed from limited support samples. Li et al. (2024) advanced SegLand, a few-shot segmentation framework that integrates a metric-based meta-learning architecture called Projection onto Orthogonal Prototypes (POP) [64]. POP enables the model to learn and incorporate new land-cover categories using only a small number of labeled samples, addressing the need to update baseline land cover maps as new classes emerge. Li et al. (2020) developed a few-shot pest recognition approach based on a prototypical network with triplet loss, which constructs prototypical representations for each class and classifies query samples by measuring their distance to the corresponding class prototypes [82]. The method achieves 96.2% accuracy on the NBAIR dataset and enables FPGA-based terminal deployment for field use. Jia et al. (2025) proposed the PA-SRM, which combines a parameter-free region attention module with a local description classifier to dynamically emphasize discriminative regions [65]. This framework mitigates base-class bias through joint assessment of semantic similarity and spatial coherence. In the domain of crop selection, Swaminathan et al. (2022) proposed a dynamic ensemble learning framework that adopts a metric-based meta-learning dynamic ensemble selection (DES) approach, employing meta-classifiers to select the most competent classifiers based on multiple criteria [66]. The framework integrates the VIKOR ranking method to identify the most informative land samples and reduce ensemble complexity, addressing imbalanced multi-class nutrient data.

The above studies indicate that both optimization-based and metric-based meta-learning methods can address label scarcity in LULC applications, though through different mechanisms: the former learns transferable parameter initializations that enable gradient-based adaptation with few samples, while the latter constructs embedding spaces for distance-based classification without task-specific fine-tuning. However, most existing studies evaluate their methods on relatively small-scale benchmarks, and the scalability of these approaches to large-area LULC mapping with diverse land cover types has not been sufficiently examined. Additionally, current few-shot meta-learning methods primarily rely on optical or hyperspectral imagery, while their applicability to other data modalities under label-scarce conditions remains underexplored.

4.3. Cross-Region and Cross-Domain Generalization in LULC

Spatial heterogeneity is a fundamental geographic property describing how environmental, social, or physical attributes vary systematically across space [5]. In remote sensing, this property causes identical land cover types to exhibit different spectral and textural characteristics across regions, sensors, and acquisition conditions. Labeled datasets developed for one task or region typically lack transferability to new application settings. Differences in spectral bands, spatial resolution, and temporal characteristics of remote sensing imagery often lead existing studies to case-specific or scenario-dependent designs [6]. This limitation has fostered a prevailing “one case, one model” paradigm, hindering the scalability and generalization of LULC classification approaches. Meta-learning addresses this challenge by optimizing for cross-task generalization, enabling models trained on source regions to adapt to target regions with limited labeled data. Unlike the label scarcity discussed in Section 4.2, the focus of this section is on distribution shifts between domains, where models may fail due to the spectral and spatial discrepancies between source and target regions even when labeled data are available.

Several studies have demonstrated that meta-learning outperforms conventional pre-training followed by fine-tuning when substantial domain shifts exist between source and target regions. Rußwurm et al. (2020) framed regional variability in LULC as an inductive transfer learning problem and applied the MAML algorithm to few-shot land cover classification tasks using Sen12MS [13] and DeepGlobe datasets [83], where substantial differences existed between the source and target domains. The meta-learning-based adaptation outperformed conventional pre-training followed by fine-tuning, indicating that meta-learning is beneficial for Earth observation tasks with pronounced regional variability while traditional supervised learning remains suitable when feature or label shifts are minimal. Extending this work, Rußwurm et al. (2022) benchmarked MAML against human performance for few-shot land cover classification using Sentinel-2 imagery from the Sen12MS dataset [67]. The results showed that humans achieved lower accuracy and higher variability compared to the MAML-trained model when classifying globally distributed land cover imagery, suggesting that meta-learning models with minimal labeled data can yield more consistent classification results than multiple human annotators in cross-region settings. Wang et al. (2020) employed MAML to meta-train neural networks on land cover classification tasks from high-resource regions, enhancing performance in low-resource scenarios [68]. The meta-trained network significantly outperformed models trained from scratch and those pretrained on high-resource data followed by fine-tuning on limited samples. Across these studies, a consistent pattern can be observed: under significant domain shifts, meta-learning provides advantages over the pre-train-then-fine-tune paradigm by acquiring cross-task knowledge that captures shared task structures rather than reusing instance-level features.

In hyperspectral image classification, cross-domain distribution shifts are particularly pronounced due to sensor-specific spectral discrepancies, and meta-learning methods have been applied to enhance cross-scene generalization. Deng et al. (2019) proposed a metric-based feature embedding model for same-scene and cross-scene HSI classification tasks [69], which combines similarity learning with unsupervised domain adaptation to enable few-shot classification by comparing sample pairs and transferring knowledge from labeled source scenes to unlabeled target scenes. An adversarial mechanism ensures domain-invariant feature embeddings while promoting class-consistent clustering across domains. Xi et al. (2022) introduced CMFSL [70], a few-shot learning framework based on a class-covariance metric that earns global class representations by jointly utilizing base and novel class samples in each training episode and applies a synthesis strategy to novel classes to mitigate overfitting. Instead of relying on an external classifier, CMFSL employs Mahalanobis distance for label prediction, enabling task-adaptive class covariance estimation and more flexible decision boundaries than standard Euclidean-based methods. Wang et al. (2025) presented PDML, a Probabilistic Deep Metric Learning framework for hyperspectral image classification that learns global probabilistic distributions within image patches and computes class distances using a probabilistic metric, treating all pixels in a patch as training samples [71]. By modeling pixel-wise categorical uncertainty, PDML addresses spectral variability and spatial ambiguity and improves performance and robustness under low spatial resolution conditions. The progression of these three studies, from fixed similarity learning [69] to adaptive covariance-based metrics [70] to probabilistic distance functions [71], reflects an evolution in how the HSI classification community handles spectral variability within metric-based meta-learning frameworks.

The reviewed studies suggest that cross-region and cross-domain generalization is a direction where meta-learning addresses a challenge specific to remote sensing, as spatial heterogeneity causes the same land cover type to exhibit different characteristics across regions, which is distinct from the category-recognition focus of few-shot learning in natural image domains. The results from optical imagery, time series data, and HSI classification collectively support that meta-learning provides advantages over conventional fine-tuning under significant domain shifts. However, as noted by Rußwurm et al. (2020) [13], these advantages diminish when feature or label shifts between domains are minimal, suggesting that the practical benefit of meta-learning is context-dependent. Furthermore, the effectiveness of meta-learning in processing multimodal HSI data depends on the capacity of the underlying model architecture, and the design of appropriate distance metrics for spectral data remains an active research area.

4.4. Temporal Dynamics Modeling in LULC

Time series image analysis is an important field of remote sensing, with time series classification serving as a core task related to land cover mapping [84], flood monitoring [85], and vegetation protection [86]. Effectively capturing multiscale temporal dependencies is essential for modeling seasonal and phenological dynamics, thereby enhancing the accuracy of time series land cover classification. Conventional deep learning methods often struggle to model complex temporal dynamics and maintain generalization across time periods when faced with limited training samples. Meta-learning addresses this constraint by framing classification tasks from different time periods or regions as separate tasks within an episodic training framework, enabling cross-temporal adaptation with limited labeled data.

One line of research focuses on improving temporal feature representation within meta-learning frameworks to better capture multiscale temporal dependencies. Park et al. (2023) introduced a meta-learning framework that encodes time series data into multiple types of images, enabling models to leverage diverse and informative features [16]. The proposed Temporal-Context Attention (TCA) and Meta-Feature Fusion (MFF) mechanisms integrate global contextual information from feature maps, highlighting pixels with significant informative correlations. Wu et al. (2025) proposed a meta-learning framework that integrates attention mechanisms with task-specific adaptation layers to address temporal dependencies and task heterogeneity in time series forecasting [72]. By capturing and transferring common temporal patterns across tasks, the proposed approach improves generalization and predictive performance.

Another line of research leverages meta-learning to mitigate sample scarcity and enable cross-regional or cross-temporal adaptation. Wang et al. (2020) employed MAML to meta-train neural networks on land cover classification tasks from high-resource regions, enhancing performance in low-resource scenarios [68]. The results demonstrated that the meta-trained network outperformed models trained from scratch and those pretrained on high-resource data followed by fine-tuning on limited samples, suggesting that meta-learning can bridge the data availability gap between regions in the temporal classification context. Building on this direction, subsequent studies have further tailored meta-learning to specific agricultural and environmental scenarios. Mohammadi et al. (2024) adapted eight meta-learning methods to address crop mapping challenges in label-scarce and agriculturally complex environments [11]. The study targeted infrequent crop mapping in selected French regions and diverse crop classification in a complex agricultural landscape in Ghana, providing a comparative evaluation of meta-learning approaches for time series crop classification. Jiang et al. (2025) proposed a meta-learning framework that integrates MAML with Transformer architecture to address the scarcity of historical time series data in flood prediction [87]. This approach enables rapid adaptation with limited observations and captures complex temporal dependencies, suggesting that combining meta-learning with modern architecture can extend the applicability of temporal classification to environmental monitoring tasks.

Time series analysis currently accounts for approximately 14.3% of meta-learning studies in LULC, suggesting that this direction is still in the early stages of development compared to few-shot classification and HSI classification. The reviewed studies show that meta-learning can transfer temporal classification knowledge from data-rich to data-scarce regions and capture multi-scale temporal features through diverse encoding strategies. Time series data exhibit sequential characteristics across temporal dimensions, with seasonal variations observed in geographic objects such as vegetation, construction sites, and croplands. These temporal patterns can be naturally formulated as separate tasks within the meta-learning framework. However, current studies have primarily applied existing meta-learning methods to temporal data without substantial adaptation to the specific properties of time series, such as autocorrelation between adjacent time steps and varying temporal resolutions across sensors. How to design meta-learning frameworks that explicitly model these temporal properties remains an open question.

4.5. Multimodal Data Integration in LULC

Multimodal data fusion integrates data from various sensors, satellites, spatial resolutions, spectral bands, and data types, to generate more informative insights for LULC classification. Numerous remote sensing systems provide diverse data streams, differing in spatial resolution and spectral characteristics. Combining these heterogeneous sources enriches feature representations, enhancing land cover/use classification performance. Although recent studies leveraging deep learning for multimodal remote sensing data fusion have demonstrated progress [88], these methods often encounter performance bottlenecks due to poor adaptability, inadequate capacity and limited annotated samples to analyze strongly heterogeneous data. Meta-learning addresses these challenges by improving the generalization capability of deep learning models with limited samples, and by enhancing the effectiveness in cross-modal, cross-resolution, and cross-platform data fusion tasks.

One line of research focuses on bridging semantic gaps across modalities and handling diverse input configurations. Dai et al. (2024) introduced MFRN-ML, a multimodal fusion relational network with meta-learning that consists of a cross-modality feature fusion module and a relation learning module, trained through a three-stage task-based meta-learning procedure [73]. The framework bridges semantic gaps between LiDAR and hyperspectral imagery and enables transferable representation learning for cross-scene classification with limited annotations. Rußwurm et al. (2024) proposed METEOR, a meta-learning framework for Earth observation problems involving multi-resolution data [15], which leverages knowledge extracted from global land cover information to rapidly adapt to unseen target problems with only a few training samples. The key innovation of METEOR lies in its ability to handle image data with varying numbers of spectral channels and to accommodate downstream tasks involving different numbers of classification categories.

Another line of research leverages meta-learning to address data scarcity and sample imbalance in multimodal fusion. Zhang et al. (2020) combined MAML and CNN with Particle Swarm Optimization (PSO) to optimize neural network parameter updates, leveraging positional and elevation features from LiDAR alongside spectral and textural information from remote sensing imagery [74]. The framework addresses the challenges of highly imbalanced and scarce annotated samples in urban ground feature classification. Qiao et al. (2023) proposed a multiscale feature fusion module that integrates information across multiple spatial scales to mitigate scale discrepancies and reduce negative transfer effects between ImageNet and remote sensing domains [75].

Multimodal data fusion currently constitutes approximately 7.1% of meta-learning studies in LULC, making it the least explored among the four challenge domains. The reviewed studies show that meta-learning can facilitate joint utilization of heterogeneous data sources under data-constrained conditions and bridge semantic gaps across modalities. However, some challenges remain unresolved. Cross-modal learning is a central issue: although optical imagery, synthetic aperture radar (SAR), and multispectral (MS) data are globally accessible, hyperspectral (HS) data remains difficult to obtain widely due to sensor constraints, leading to data scarcity in certain modalities for specific regions [89,90]. The capacity of meta-learning to transfer knowledge from data-rich to data-scarce settings suggests potential for alleviating modality-specific data shortages, but systematic evaluation of this capacity across diverse modal combinations has not been conducted. Additionally, most existing multimodal meta-learning studies focus on pairs of modalities, and the extension to scenarios involving three or more data sources remains underexplored.

5. Discussion

The preceding sections have reviewed the methodological foundations (Section 3) and application landscape (Section 4) of meta-learning in LULC classification. This section synthesizes the findings from the reviewed literature, compares meta-learning with related learning methods, examines the limitations of current research, and identifies directions for future investigation.

5.1. Key Findings

Based on the review of 70 studies in Section 4, several findings regarding the current state of meta-learning in LULC classification can be summarized. The field has evolved through three phases: an exploratory phase (2018–2020) characterized by direct transfer of general-purpose methods to remote sensing tasks [13,68], a rapid development phase (2021–2023) marked by domain-specific adaptations such as spectral-spatial feature fusion and cross-domain adaptation mechanisms, and a convergence phase (2024–2025) in which meta-learning has begun to integrate with foundation models [15]. In terms of method distribution, optimization-based and metric-based paradigms dominate current research, whereas memory-augmented methods remain underexplored, largely because the high computational and memory overhead of external memory modules limits their scalability to high-dimensional remote sensing data such as hyperspectral and multimodal imagery. The prevalence of the other two paradigms, by contrast, can be attributed to the model-agnostic flexibility of optimization-based methods (particularly MAML) for integration with diverse backbone architectures and the computational efficiency of metric-based methods at inference time. MAML and its variants are the most widely adopted algorithms, applied across optical imagery [13], hyperspectral data [62], time series [68], and multimodal fusion.

A consistent finding across multiple data modalities is that meta-learning outperforms conventional pre-training followed by fine-tuning in LULC classification scenarios characterized by significant domain shifts [13,15,68]. The underlying mechanism is that meta-learning acquires cross-task structural knowledge that captures shared task patterns, rather than reusing instance-level features as in conventional transfer learning. This property is particularly relevant for LULC applications, where spatial heterogeneity causes the same land cover type to exhibit different spectral and textural characteristics across regions. In addition, the metric design within meta-learning frameworks has become increasingly refined to accommodate the specific properties of remote sensing data. In hyperspectral image classification, for example, the research trajectory shows a progression from fixed similarity measures [69] to adaptive covariance-based metrics [70] and further to probabilistic distance functions [71], reflecting the need for uncertainty-aware metrics in multimodal spectral spaces where class distributions are non-spherical.

The four challenge domains examined in Section 4 exhibit different levels of development. Label scarcity (Section 4.2) and cross-region or cross-domain generalization (Section 4.3) are the most extensively studied, collectively accounting for most of the reviewed literature. Cross-domain generalization addresses a challenge specific to remote sensing that is not present in natural image few-shot learning, where the task is typically defined as recognizing new categories rather than adapting to distribution shifts caused by spatial heterogeneity. In contrast, temporal dynamics modeling (14.3%, Section 4.4) and multimodal data integration (7.1%, Section 4.5) remain in the early stages of exploration. Current temporal studies have primarily applied existing meta-learning methods without substantial adaptation to time series-specific properties such as temporal autocorrelation, while multimodal studies have focused on pairwise modality combinations without extension to more complex multi-source scenarios. These two directions represent areas where further research is needed.

5.2. Comparison with Related Paradigms

The scarcity of annotated datasets remains a foundational challenge in remote sensing LULC mapping. In contrast to natural image datasets readily available at publicly accessible sites such as ImageNet, the acquisition of remote sensing imagery is costly and influenced by atmospheric conditions, sensor specifications, and temporal revisit periods. Additionally, interpreting remote sensing imagery generally requires specialized prior knowledge and annotation skills [91]. Multiple learning paradigms have been proposed to address this challenge, each approaching the problem from a different perspective.

Self-supervised learning methods aim to learn latent representations directly from unlabeled data by designing pretext tasks such as input reconstruction, contrastive learning, or masked image modeling, thereby capturing inherent patterns within the data [92]. Semi-supervised learning leverages a limited set of labeled data combined with abundant unlabeled samples to discover underlying data structures through techniques such as pseudo-labeling and consistency regularization [93]. Weakly supervised learning reduces reliance on fully annotated datasets by training models using incomplete or coarse annotations, such as image-level labels for pixel-level segmentation tasks [94]. These three approaches primarily focus on exploiting the intrinsic feature distributions of the data, employing minimal or imprecise labels as guiding signals. In contrast, meta-learning emphasizes leveraging prior knowledge acquired across a distribution of tasks from source domains and facilitates rapid adaptation to new target tasks with relatively few labeled samples. The fundamental distinction lies in the source of generalization: self-supervised, semi-supervised, and weakly supervised methods generalize by learning representations within a single task distribution, whereas meta-learning generalizes by learning a transferable learning strategy across multiple task distributions.

Meta-learning can be regarded as an effective form of transfer learning, but transfer learning encompasses a broader range of approaches (Figure 10). Conventionally, transfer learning applies prior knowledge from tasks or datasets in a source domain to related tasks or datasets in a target domain, aiming to enhance model performance on target-domain tasks. The primary objective is to leverage knowledge acquired from source domain data through a pre-trained model to improve performance in the target domain, such as model accuracy and training efficiency [95]. While conventional transfer learning primarily transfers instance-level features from source to target tasks, meta-learning further acquires cross-task knowledge that captures shared task structures. This property enables meta-learning to achieve faster and more robust adaptation to novel scenarios. As demonstrated in the reviewed LULC studies (Section 4.3), meta-learning outperforms the pre-train-then-fine-tune paradigm when significant domain shifts exist between source and target regions. However, when feature or label shifts are minimal, conventional transfer learning may achieve comparable performance with lower computational overhead, as the additional cost of bilevel optimization in meta-learning may not be justified.

Foundation models have recently emerged as an important paradigm in remote sensing [96,97]. Trained on massive datasets in a task-agnostic manner, remote sensing foundation models such as RingMoE [98], RemoteCLIP [99], EarthGPT [100], MetaEarth [101], and GeoCLIP [102] have demonstrated generalization capabilities across tasks including semantic segmentation [103], scene classification [104], and object recognition [105]. Foundation models provide universal representations, but the pretraining process is cost-intensive and resource-intensive. Meta-learning and foundation models are complementary along the representation–adaptation axis: foundation models supply general-purpose features, while meta-learning provides explicit mechanisms for rapid adaptation across tasks. Two technical pathways for integrating the two paradigms can be identified.

The first pathway applies meta-learning as the outer-loop optimization over parameter-efficient fine-tuning (PEFT) modules attached to a frozen foundation model. Rather than meta-learning over all parameters, which is computationally infeasible at foundation-model scale, the meta-learner updates only lightweight adapters such as Low-Rank Adaptation (LoRA) matrices [106] or prompt tokens. Recent work has shown that meta-learning over PEFT adapters can yield tuning-free few-shot adaptation in visual foundation models [107], enabling a single pretrained model to support multiple LULC tasks through task-specific adapter sets that are meta-trained jointly.

The second pathway designs lightweight foundation models from the outset using meta-learning as the training objective. METEOR [15] exemplifies this direction, using MAML to train a foundation model capable of rapid generalization across varying spectral channels, spatial resolutions, and geographic regions with only a few labeled samples. The advantage is end-to-end alignment between the meta-objective and the architecture; the limitation is that the resulting models are typically smaller than mainstream remote sensing foundation models and may not match their representational breadth.

Several challenges constrain these pathways. First, the second-order gradient computation in MAML becomes prohibitive at foundation-model scale, as explicitly forming the Hessian is infeasible for models with millions to billions of parameters [108]. Restricting meta-updates to PEFT modules partially addresses this but introduces a representational bottleneck limited by adapter rank. Second, catastrophic forgetting in the backbone during meta-training can erase the broad knowledge that motivates using a foundation model in the first place. Third, the diversity of remote sensing modalities (optical, SAR, hyperspectral, LiDAR) creates a more heterogeneous task distribution than natural-image domains, making the design of representative meta-training task distributions an open problem.

5.3. Current Limitations

Despite the progress reviewed above, several limitations of current meta-learning applications in LULC require examination. Computational cost remains a practical concern for large-scale deployment. The bilevel optimization structure inherent in MAML and its variants requires computing second-order gradients, which increases training time and memory consumption. Although first-order approximations such as Reptile [34] and ANIL [35] have been proposed to reduce this overhead, the trade-off between computational efficiency and adaptation quality in the context of multimodal remote sensing data has not been sufficiently investigated. This constraint is particularly relevant for operational LULC mapping applications that require processing large volumes of satellite imagery across extensive geographic areas.

The episodic training paradigm, while effective for standard few-shot benchmarks, presents challenges when applied to LULC tasks. In conventional N-way K-shot settings, the number of classes per episode is typically small. However, real-world LULC classification often involves substantially more categories, and the optimal strategy for sampling tasks from a large and heterogeneous class space remains an open question. Furthermore, the assumption of balanced class distributions within episodes may not reflect the inherent class imbalance commonly encountered in LULC datasets, where certain land cover types occupy disproportionately small spatial extents.

The lack of standardized evaluation protocols limits cross-study comparability. Different studies adopt varying N-way K-shot configurations, use different datasets with different spatial resolutions and geographic coverages, and report different evaluation metrics. This heterogeneity makes it difficult to draw definitive conclusions about the relative effectiveness of different meta-learning approaches. Moreover, a substantial portion of HSI classification studies rely on a limited set of benchmarks such as Indian Pines, Pavia University, and Houston datasets, which are relatively small in spatial extent and class diversity. Although larger-scale HSI datasets such as WHU-OHS [109] and OHID-1 [110] datasets have recently been released, meta-learning methods have not yet been widely evaluated on these datasets. The generalizability of findings obtained on these benchmarks to large-area, operationally relevant LULC mapping scenarios remains uncertain.

The boundary conditions under which meta-learning provides advantages over conventional methods have not been thoroughly delineated. As noted by Rußwurm et al. (2020) [13], traditional supervised learning remains suitable when feature or label shifts between domains are minimal. A characterization of when meta-learning provides meaningful improvements over simpler alternatives, and when the additional complexity is not warranted, would be valuable for guiding method selection in practical LULC applications.

The interpretability of meta-learned representations and adaptation processes also remains limited. While meta-learning models demonstrate empirical effectiveness across diverse LULC scenarios, the mechanisms through which cross-task knowledge is encoded, stored, and transferred during adaptation are not well understood. For remote sensing applications where model decisions may inform policy or resource allocation, improved interpretability could enhance both scientific understanding and practical trust in meta-learning-based classification systems.

Beyond model-side limitations, an emerging governance-side consideration is the absence of mechanisms for removing the influence of specific training data once a model has been deployed. Machine unlearning, motivated by privacy regulations and the right to be forgotten, has been actively studied in mainstream machine learning [111], but its applicability to meta-learned models has received little attention. Because meta-learning encodes knowledge across a distribution of tasks rather than in instance-level features, selectively removing the contribution of a particular region, sensor, or annotation source is non-trivial, and represents a relevant direction as meta-learning is increasingly considered for operational LULC systems.

5.4. Future Directions

Based on the findings and limitations discussed above, several directions can be identified for advancing meta-learning applications in LULC classification. The development of meta-learning frameworks designed to jointly address cross-regional, cross-temporal, and cross-modal generalization challenges arising from spatial heterogeneity represents an important direction. Current approaches predominantly address these challenges in isolation, yet real-world LULC mapping often requires simultaneous adaptation across multiple dimensions. For example, a model deployed for global land cover monitoring must contend with varying geographic regions, seasonal conditions, sensor configurations, and data modalities simultaneously. Future research could explore unified meta-learning frameworks that jointly model these heterogeneous sources of variation, enabling models trained in one geographic and temporal context to adapt to diverse environmental settings.

Within this unified perspective, the temporal dimension deserves particular attention. Seasonal and phenological variations introduce systematic distribution shifts that cause the same land cover type to exhibit markedly different spectral and textural signatures across acquisition dates, which is one of the long-standing difficulties in time series LULC classification. A meta-learning formulation that treats different seasons, phenological stages, or acquisition periods as distinct tasks within an episodic training framework could allow models to extract season-invariant representations while preserving the capacity to rapidly specialize to the target date. Such a formulation would naturally extend the task construction strategies reviewed in Section 4.4 and could mitigate performance degradation caused by temporal inconsistency between training and inference data, which is frequently encountered in operational monitoring scenarios such as annual cropland mapping and multi-year land cover change detection.

The integration of meta-learning with foundation models, discussed in Section 5.2, represents another concrete direction. Future work could systematically investigate how to combine the two paradigms during both pretraining and fine-tuning to reduce training cost and resource requirements while maintaining generalization performance.

The adoption of novel architectures as the base learner for meta-learning also merits attention. Mamba [112], built upon structured state space models (SSM), has been proposed as an alternative to Vision Transformer (ViT) for remote sensing tasks. Although ViT has achieved notable success in remote sensing [113], the quadratic computational complexity of the self-attention mechanism results in high memory consumption and computational costs, limiting scalability for high-resolution imagery and large-area monitoring applications [114]. Mamba achieves linear computational complexity while retaining global contextual modeling through state-space representations [115], enhancing efficiency in processing long sequences and high-resolution data. The ability to capture long-range spatial dependencies and produce efficient feature representations makes Mamba a candidate for next-generation base learners in meta-learning frameworks for remote sensing imagery.

Improving the computational efficiency of meta-learning itself represents another concrete direction, complementing the architectural advances discussed above. As characterized in Section 3.5, second-order gradient computation in MAML and memory-related overhead in memory-augmented methods scale poorly with model and data dimensionality, which limits the deployment of meta-learning to operational LULC scenarios involving high-resolution or hyperspectral imagery. Beyond first-order approximations such as Reptile and ANIL, recent work has explored scalable meta-learning algorithms that avoid explicit Hessian computation and exploit distributed training techniques to substantially reduce memory and runtime cost at large model scales [108]. Systematic adoption of such efficiency-oriented techniques, together with the parameter-efficient meta fine-tuning of foundation models discussed in Section 5.2, would help close the gap between methodological advances and operational deployment of meta-learning in LULC mapping.

The establishment of standardized evaluation frameworks is also needed. As discussed in Section 5.3, the current lack of unified evaluation protocols hinders cross-study comparison and impedes progress in the field. Future efforts could focus on developing community-agreed benchmark suites that encompass diverse geographic regions, multiple data modalities, and varying levels of domain shift, accompanied by standardized N-way K-shot configurations and evaluation metrics. Such benchmarks would facilitate rigorous and reproducible assessment of meta-learning methods and accelerate the identification of best practices for LULC applications.

6. Conclusions

This paper presents a systematic review of meta-learning for land use and land cover (LULC) classification, covering 70 peer-reviewed studies published between 2018 and 2025. Three representative meta-learning paradigms—memory-augmented, optimization-based, and metric-based methods—were examined, and their applications were analyzed across four core challenges in LULC remote sensing: label scarcity, cross-region and cross-domain distribution shifts, temporal dynamics modeling, and multimodal data integration.

Several key findings emerge from the review. Optimization-based and metric-based methods dominate current research, with MAML and its variants being the most widely adopted due to their model-agnostic property, whereas memory-augmented methods remain underexplored largely because of the high computational and memory overhead of external memory modules on high-dimensional remote sensing data. Across multiple data modalities, meta-learning consistently outperforms the conventional pre-training followed by a fine-tuning paradigm under significant domain shifts, owing to its capacity to acquire cross-task structural knowledge rather than reusing instance-level features. The four application areas are unevenly developed: label scarcity and cross-domain generalization are the most extensively studied, while temporal dynamics modeling (14.3%) and multimodal data integration (7.1%) remain in the early stages of exploration.

Current research is still constrained by computational overhead, the rigidity of episodic training, and the absence of standardized evaluation protocols that support reproducible cross-study comparison. To address these limitations, future research should pursue unified meta-learning frameworks that jointly handle cross-regional, cross-temporal, and cross-modal generalization, deeper integration of meta-learning with foundation models through both pretraining and fine-tuning, the adoption of efficient new architectures such as state-space models as base learners, and the establishment of community-agreed benchmarks spanning diverse geographic regions, data modalities, and levels of domain shift. Advancing along these directions is expected to yield more generalizable, sample-efficient, and deployment-ready meta-learning solutions for LULC classification.

Taken together, the body of work reviewed here indicates that meta-learning is more than a collection of few-shot techniques for LULC classification: it provides a structurally aligned response to the spatial heterogeneity and label scarcity that define this remote sensing problem, by treating the diversity of regions, sensors, seasons, and modalities as a distribution of tasks rather than as instance-level variability to be averaged out. Among the directions discussed above, the integration of meta-learning with remote sensing foundation models is particularly promising, because it combines large-scale representation learning with explicit mechanisms for rapid task adaptation, offering a path from methodological progress toward operational, globally deployable LULC systems. Realizing this prospect will require sustained progress on computational efficiency, standardized evaluation, and the design of task distributions that reflect the heterogeneity of real-world Earth observation data.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/rs18121879/s1, Figure S1: Chronological development of representative memory-augmented meta-learning methods (2014–2024); Figure S2: Chronological development of representative optimization-based meta-learning methods (2017–2024); Figure S3: Chronological development of representative metric-based meta-learning methods (2015–2023); Figure S4: PRISMA 2020 flow diagram illustrating the systematic search and study selection process.

Author Contributions

Conceptualization, W.H. and L.L.; methodology, H.W. and L.L.; software, W.H. and L.L.; validation, X.G., Y.Y., X.Y. and Y.G.; formal analysis, W.H., H.W. and Z.Z.; investigation, W.H. and L.L.; resources, L.L.; data curation, W.H.; writing—original draft preparation, W.H., L.L. and X.G.; writing—review and editing, W.H., L.L., H.W., Z.Z., Y.Y., X.Y. and Y.G.; visualization, W.H. and Z.Z.; supervision, L.L.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No: XDB0740100), National Natural Science Foundation of China (grant number 42471500), the National Key Research and Development Program of China (Grant No: 2021YFB3900501), LREIS Independent Innovation Project (grant number 05Z5006JYA), and Quality Evaluation and Demonstration Application Technology for Standardized Quantitative Common Products Based on Multi-Source Data and Multi-Domain Applications (Grant No: D040401).

Data Availability Statement

No new data were created or analyzed in this study. This article is a systematic review, and all data supporting the reported results are derived from previously published studies, which are cited throughout the manuscript. The literature search strategy, selection criteria, and screening process are fully described in the review methodology section.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Alem, A.; Kumar, S. Deep Learning Methods for Land Cover and Land Use Classification in Remote Sensing: A Review. In Proceedings of the 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 4–5 June 2020; pp. 903–908. [Google Scholar]
Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer Learning in Environmental Remote Sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
Vivone, G.; Deng, L.-J.; Deng, S.; Hong, D.; Jiang, M.; Li, C.; Li, W.; Shen, H.; Wu, X.; Xiao, J.-L.; et al. Deep Learning in Remote Sensing Image Fusion: Methods, Protocols, Data, and Future Perspectives. IEEE Geosci. Remote Sens. Mag. 2024, 13, 269–310. [Google Scholar] [CrossRef]
Lou, C.; Al-Qaness, M.A.A.; AL-Alimi, D.; Dahou, A.; Abd Elaziz, M.; Abualigah, L.; Ewees, A.A. Land Use/Land Cover (LULC) Classification Using Hyperspectral Images: A Review. Geo-Spat. Inf. Sci. 2025, 28, 345–386. [Google Scholar] [CrossRef]
Wang, J.; Haining, R.; Zhang, T.; Xu, C.; Hu, M.; Yin, Q.; Li, L.; Zhou, C.; Li, G.; Chen, H. Statistical Modeling of Spatially Stratified Heterogeneous Data. Ann. Am. Assoc. Geogr. 2024, 114, 499–519. [Google Scholar] [CrossRef]
Lu, S.; Guo, J.; Zimmer-Dauphinee, J.R.; Nieusma, J.M.; Wang, X.; VanValkenburgh, P.; Wernke, S.A.; Huo, Y. Vision Foundation Models in Remote Sensing: A Survey. IEEE Geosci. Remote Sens. Mag. 2025, 13, 190–215. [Google Scholar] [CrossRef]
Hospedales, T.M.; Antoniou, A.; Micaelli, P.; Storkey, A.J. Meta-Learning in Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5149–5169. [Google Scholar] [CrossRef]
Tong, X.-Y.; Xia, G.-S.; Lu, Q.; Shen, H.; Li, S.; You, S.; Zhang, L. Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models. Remote Sens. Environ. 2020, 237, 111322. [Google Scholar] [CrossRef]
Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-dabbagh, B.S.N.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A Survey on Deep Learning Tools Dealing with Data Scarcity: Definitions, Challenges, Solutions, Tips, and Applications. J. Big Data 2023, 10, 46. [Google Scholar] [CrossRef]
Sun, X.; Wang, B.; Wang, Z.; Li, H.; Li, H.; Fu, K. Research Progress on Few-Shot Learning for Remote Sensing Image Interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2387–2402. [Google Scholar] [CrossRef]
Mohammadi, S.; Belgiu, M.; Stein, A. Few-Shot Learning for Crop Mapping from Satellite Image Time Series. Remote Sens. 2024, 16, 1026. [Google Scholar] [CrossRef]
Ma, Y.; Zhang, Z.; Yang, H.L.; Yang, Z. An Adaptive Adversarial Domain Adaptation Approach for Corn Yield Prediction. Comput. Electron. Agric. 2021, 187, 106314. [Google Scholar] [CrossRef]
Rußwurm, M.; Wang, S.; Korner, M.; Lobell, D. Meta-Learning for Few-Shot Land Cover Classification. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 200–201. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017. [Google Scholar]
Rußwurm, M.; Wang, S.; Kellenberger, B.; Roscher, R.; Tuia, D. Meta-Learning to Address Diverse Earth Observation Problems across Resolutions. Commun. Earth Environ. 2024, 5, 37. [Google Scholar] [CrossRef]
Park, S.-H.; Syazwany, N.S.; Lee, S.-C. Meta-Feature Fusion for Few-Shot Time Series Classification. IEEE Access 2023, 11, 41400–41414. [Google Scholar] [CrossRef]
Gama, P.H.T.; Oliveira, H.; dos Santos, J.A.; Cesar, R.M. An Overview on Meta-Learning Approaches for Few-Shot Weakly-Supervised Segmentation. Comput. Graph. 2023, 113, 77–88. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Thrun, S.; Pratt, L. Learning to Learn: Introduction and Overview. In Learning to Learn; Thrun, S., Pratt, L., Eds.; Springer: Boston, MA, USA, 1998; pp. 3–17. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 3637–3645. [Google Scholar]
Franceschi, L.; Frasconi, P.; Salzo, S.; Grazzi, R.; Pontil, M. Bilevel Programming for Hyperparameter Optimization and Meta-Learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
Vettoruzzo, A.; Bouguelia, M.-R.; Vanschoren, J.; Rögnvaldsson, T.; Santosh, K. Advances and Challenges in Meta-Learning: A Technical Review. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4763–4779. [Google Scholar] [CrossRef]
Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-Learning with Memory-Augmented Neural Networks. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 20–22 June 2016. [Google Scholar]
Graves, A.; Wayne, G.; Danihelka, I. Neural Turing Machines. arXiv 2014, arXiv:1410.5401. [Google Scholar] [CrossRef]
Mishra, N.; Rohaninejad, M.; Chen, X.; Abbeel, P. A Simple Neural Attentive Meta-Learner. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Garnelo, M.; Rosenbaum, D.; Maddison, C.J.; Ramalho, T.; Saxton, D.; Shanahan, M.; Teh, Y.W.; Rezende, D.J.; Eslami, S.M.A. Conditional Neural Processes. arXiv 2018, arXiv:1807.01613. [Google Scholar]
Munkhdalai, T.; Yu, H. Meta Networks. Proc. Mach. Learn. Res. 2017, 70, 2554–2563. [Google Scholar]
Ramalho, T.; Garnelo, M. Adaptive Posterior Learning: Few-Shot Learning with a Surprise-Based Memory Module. arXiv 2019, arXiv:1902.02527. [Google Scholar]
Chen, X.; Wang, Z.; Tang, S.; Muandet, K. MATE: Plugging in Model Awareness to Task Embedding for Meta Learning. In Proceedings of the Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
Liu, X.; Tian, X.; Lin, S.; Qu, Y.; Ma, L.; Yuan, W.; Zhang, Z.; Xie, Y. Learn from Concepts: Towards the Purified Memory for Few-Shot Learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Montréal, QC, Canada, 19–26 August 2021. [Google Scholar]
Du, Y.; Shen, J.; Zhen, X.; Snoek, C.G.M. EMO: Episodic Memory Optimization for Few-Shot Meta-Learning. In Proceedings of the CoLLAs, Montréal, QC, Canada, 22–25 August 2023. [Google Scholar]
Chen, X.; Shi, M. Memory-Guided Network with Uncertainty-Based Feature Augmentation for Few-Shot Semantic Segmentation. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; pp. 1–6. [Google Scholar]
Ravi, S.; Larochelle, H. Optimization as a Model for Few-Shot Learning. arXiv 2017, arXiv:1707.05041. [Google Scholar]
Nichol, A.; Achiam, J.; Schulman, J. On First-Order Meta-Learning Algorithms. arXiv 2018, arXiv:1803.02999. [Google Scholar]
Raghu, A.; Raghu, M.; Bengio, S.; Vinyals, O. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. arXiv 2019, arXiv:1909.09157. [Google Scholar]
Oh, J.; Yoo, H.; Kim, C.; Yun, S.-Y. BOIL: Towards Representation Change for Few-Shot Learning. arXiv 2021, arXiv:2008.08882. [Google Scholar] [CrossRef]
Grant, E.; Finn, C.; Levine, S.; Darrell, T.; Griffiths, T. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. arXiv 2018, arXiv:1801.08930. [Google Scholar]
Finn, C.; Xu, K.; Levine, S. Probabilistic Model-Agnostic Meta-Learning. arXiv 2019, arXiv:1806.02817. [Google Scholar] [CrossRef]
Kim, T.; Yoon, J.; Dia, O.; Kim, S.; Bengio, Y.; Ahn, S. Bayesian Model-Agnostic Meta-Learning. arXiv 2018, arXiv:1806.03836. [Google Scholar] [CrossRef]
Jamal, M.A.; Qi, G.-J.; Shah, M. Task-Agnostic Meta-Learning for Few-Shot Learning. arXiv 2018, arXiv:1805.07722. [Google Scholar]
Rusu, A.A.; Rao, D.; Sygnowski, J.; Vinyals, O.; Pascanu, R.; Osindero, S.; Hadsell, R. Meta-Learning with Latent Embedding Optimization. arXiv 2019, arXiv:1807.05960. [Google Scholar] [CrossRef]
Jiang, X.; Havaei, M.; Varno, F.; Chartrand, G.; Chapados, N.; Matwin, S. Learning to Learn with Conditional Class Dependencies. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
Sun, S.; Gao, H. Meta-AdaM: A Meta-Learned Adaptive Optimizer with Momentum for Few-Shot Learning. In Proceedings of the Proceedings of the 37th International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2023. [Google Scholar]
Lee, Y.; Choi, S. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace. arXiv 2018, arXiv:1801.05558. [Google Scholar]
Zhou, F.; Wu, B.; Li, Z. Deep Meta-Learning: Learning to Learn in the Concept Space. arXiv 2018, arXiv:1802.03596. [Google Scholar] [CrossRef]
Liu, G.; Wang, T.; Zhang, S.; He, K. Generating Pseudo-Labels Adaptively for Few-Shot Model-Agnostic Meta-Learning. In Proceedings of the British Machine Vision Conference, London, UK, 21–24 November 2022. [Google Scholar]
Lee, J.-J.; Yoon, S.W. XB-MAML: Learning Expandable Basis Parameters for Effective Meta-Learning with Wide Task Coverage. arXiv 2024, arXiv:2403.06768. [Google Scholar]
Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-Shot Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4080–4090. [Google Scholar]
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to Compare: Relation Network for Few-Shot Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar]
Zheng, Y.; Wang, R.; Yang, J.; Xue, L.; Hu, M. Principal Characteristic Networks for Few-Shot Learning. J. Vis. Commun. Image Represent. 2019, 59, 563–573. [Google Scholar] [CrossRef]
Chowdhury, R.R.; Bathula, D.R. IPNET: Influential Prototypical Networks for Few Shot Learning. arXiv 2022, arXiv:2208.09345. [Google Scholar] [CrossRef]
Li, X.; Yu, L.; Fu, C.-W.; Fang, M.; Heng, P.-A. Revisiting Metric Learning for Few-Shot Image Classification. Neurocomputing 2020, 406, 49–58. [Google Scholar] [CrossRef]
He, X.; Li, F.; Liu, L. Task-Adaptive Relation Dependent Network for Few-Shot Learning. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN); IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
Li, J.; Tong, J.; Gao, G.; Xu, W. Attention-Enhanced Relation Network for Few-Shot Image Classification. In Proceedings of the 2023 6th International Conference on Image and Graphics Processing, Chongqing, China, 6–8 January 2023; pp. 197–203. [Google Scholar]
Ren, L.; Duan, G.; Huang, T.; Kang, Z. Multi-Local Feature Relation Network for Few-Shot Learning. Neural Comput. Appl. 2022, 34, 7393–7403. [Google Scholar] [CrossRef]
Zhou, F.; Wang, P.; Zhang, L.; Wei, W.; Zhang, Y. Revisiting Prototypical Network for Cross Domain Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–23 June 2023; pp. 20061–20070. [Google Scholar]
Zhang, L.; Liu, J.; Luo, M.; Chang, X.; Zheng, Q.; Hauptmann, A.G. Scheduled Sampling for One-Shot Learning via Matching Network. Pattern Recognit. 2019, 96, 106962. [Google Scholar] [CrossRef]
Wang, J.; Zhai, Y. Prototypical Siamese Networks for Few-Shot Learning. In Proceedings of the 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC); IEEE: New York, NY, USA, 2020; pp. 178–181. [Google Scholar]
Zhang, Y.; Kang, Z. TPN: Transferable Proto-Learning Network towards Few-Shot Document-Level Relation Extraction. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN); IEEE: New York, NY, USA, 2024; pp. 1–9. [Google Scholar]
Jiang, Z.; Feng, Z.; Niu, B. Prototype-Neighbor Networks with Task-Specific Enhanced Meta-Learning for Few-Shot Classification. Neural Netw. 2025, 190, 107761. [Google Scholar] [CrossRef]
Gao, K.; Liu, B.; Yu, X.; Zhang, P.; Tan, X.; Sun, Y. Small Sample Classification of Hyperspectral Image Using Model-Agnostic Meta-Learning Algorithm and Convolutional Neural Network. Int. J. Remote Sens. 2021, 42, 3090–3122. [Google Scholar] [CrossRef]
Amoako, P.Y.O.; Cao, G.; Shi, H.; Arthur, J.K.; Agyenim, Y.O.B. Meta-Learning with Orthogonal Softmax Layer (MLOSL) for Small Sample Hyperspectral Image Classification. Multimed. Tools Appl. 2025, 84, 1–30. [Google Scholar] [CrossRef]
Li, Z.; Lu, F.; Zou, J.; Hu, L.; Zhang, H. Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: New York, NY, USA, 2024; pp. 2744–2754. [Google Scholar]
Jia, Y.; Sun, C.; Gao, J.; Wang, Q. Few-Shot Remote Sensing Scene Classification via Parameter-Free Attention and Region Matching. ISPRS J. Photogramm. Remote Sens. 2025, 227, 265–275. [Google Scholar] [CrossRef]
Swaminathan, B.; Palani, S.; Vairavasundaram, S. Meta Learning-Based Dynamic Ensemble Model for Crop Selection. Appl. Artif. Intell. 2022, 36, 2145646. [Google Scholar] [CrossRef]
Rußwurm, M.; Wang, S.; Tuia, D. Humans Are Poor Few-Shot Classifiers for Sentinel-2 Land Cover. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 4859–4862. [Google Scholar]
Wang, S.; Rußwurm, M.; Körner, M.; Lobell, D.B. Meta-Learning for Few-Shot Time Series Classification. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium; IEEE: New York, NY, USA, 2020; pp. 7041–7044. [Google Scholar]
Deng, B.; Jia, S.; Shi, D. Deep Metric Learning-Based Feature Embedding for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 1422–1435. [Google Scholar] [CrossRef]
Xi, B.; Li, J.; Li, Y.; Song, R.; Hong, D.; Chanussot, J. Few-Shot Learning with Class-Covariance Metric for Hyperspectral Image Classification. IEEE Trans. Image Process. 2022, 31, 5079–5092. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Zheng, W.; Sun, X.; Zhou, J.; Lu, J. Probabilistic Deep Metric Learning for Hyperspectral Image Classification. Pattern Recognit. 2025, 157, 110878. [Google Scholar] [CrossRef]
Wu, G.; Cong, L.; Huang, C.; Ju, Y.; Jiang, J.; Chen, C. Meta-Learning Framework for Effective Few Shot Time Series Prediction. In Proceedings of the 2025 IEEE 5th International Conference on Power, Electronics and Computer Applications (ICPECA); IEEE: New York, NY, USA, 2025; pp. 18–22. [Google Scholar]
Dai, M.; Xing, S.; Xu, Q.; Wang, H.; Li, P.; Sun, Y.; Pan, J.; Li, Y. Learning Transferable Cross-Modality Representations for Few-Shot Hyperspectral and LiDAR Collaborative Classification. Int. J. Appl. Earth Obs. Geoinf. 2024, 126, 103640. [Google Scholar] [CrossRef]
Zhang, K.; Han, Y.; Chen, J.; Zhang, Z.; Wang, S. Semantic Segmentation for Remote Sensing Based on RGB Images and Lidar Data Using Model-Agnostic Meta-Learning and Partical Swarm Optimization. IFAC-PapersOnLine 2020, 53, 397–402. [Google Scholar] [CrossRef]
Qiao, X.; Xing, L.; Han, A.; Liu, W.; Liu, B. Multi-Scale Fusion for Few-Shot Remote Sensing Image Classification. Int. J. Remote Sens. 2023, 44, 6012–6032. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef] [PubMed]
Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-Level Concept Learning through Probabilistic Program Induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef] [PubMed]
Liu, B.; Yu, X.; Yu, A.; Zhang, P.; Wan, G.; Wang, R. Deep Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2290–2304. [Google Scholar] [CrossRef]
Jiang, X.; Zhou, N.; Li, X. Few-Shot Segmentation of Remote Sensing Images Using Deep Metric Learning. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6507405. [Google Scholar] [CrossRef]
Li, X.; Deng, J.; Fang, Y. Few-Shot Object Detection on Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5601614. [Google Scholar] [CrossRef]
Alajaji, D.; Alhichri, H.S.; Ammour, N.; Alajlan, N. Few-Shot Learning for Remote Sensing Scene Classification. In Proceedings of the 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS); IEEE: New York, NY, USA, 2020; pp. 81–84. [Google Scholar]
Li, Y.; Yang, J. Few-Shot Cotton Pest Recognition and Terminal Realization. Comput. Electron. Agric. 2020, 169, 105240. [Google Scholar] [CrossRef]
Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: New York, NY, USA, 2018; pp. 172–17209. [Google Scholar]
Yan, J.; Wang, L.; Song, W.; Chen, Y.; Chen, X.; Deng, Z. A Time-Series Classification Approach Based on Change Detection for Rapid Land Cover Mapping. ISPRS J. Photogramm. Remote Sens. 2019, 158, 249–262. [Google Scholar] [CrossRef]
Sajid, M.; Khan, H.H.; Khan, A.; Ahmad, R.; Khan, A.; Siraj, G.; Ansari, A.A. Application of Sentinel-1 SAR Data for Flood Monitoring in the Lower Ganges Basin: A Time-Series Analysis of 2021 Flood in Bihar. Discov. Sens. 2025, 1, 12. [Google Scholar] [CrossRef]
Peng, K.; Jiang, W.; Hou, P.; Wu, Z.; Cui, T. Detailed Wetland-Type Classification Using Landsat-8 Time-Series Images: A Pixel-and Object-Based Algorithm with Knowledge (POK). GIScience Remote Sens. 2024, 61, 2293525. [Google Scholar] [CrossRef]
Jiang, J.; Chen, C.; Lackinger, A.; Li, H.; Li, W.; Pei, Q.; Dustdar, S. MetaTrans-FSTSF: A Transformer-Based Meta-Learning Framework for Few-Shot Time Series Forecasting in Flood Prediction. Remote Sens. 2025, 17, 77. [Google Scholar] [CrossRef]
Li, J.; Hong, D.; Gao, L.; Yao, J.; Zheng, K.; Zhang, B.; Chanussot, J. Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102926. [Google Scholar] [CrossRef]
Gominski, D.; Gouet-Brunet, V.; Chen, L. Cross-Dataset Learning for Generalizable Land Use Scene Classification. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 1381–1390. [Google Scholar]
Lu, X.; Gong, T.; Zheng, X. Domain Mapping Network for Remote Sensing Cross-Domain Few-Shot Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5606411. [Google Scholar] [CrossRef]
Xiong, Z.; Zhang, F.; Wang, Y.; Shi, Y.; Zhu, X.X. EarthNets: Empowering Artificial Intelligence for Earth Observation. IEEE Geosci. Remote Sens. Mag. 2024, 13, 45–78. [Google Scholar] [CrossRef]
Wang, Y.; Albrecht, C.M.; Braham, N.A.A.; Mou, L.; Zhu, X.X. Self-Supervised Learning in Remote Sensing: A Review. IEEE Geosci. Remote Sens. Mag. 2022, 10, 213–247. [Google Scholar] [CrossRef]
Li, L.; Zhang, W.; Zhang, X.; Emam, M.; Jing, W. Semi-Supervised Remote Sensing Image Semantic Segmentation Method Based on Deep Learning. Electronics 2023, 12, 348. [Google Scholar] [CrossRef]
Wang, S.; Chen, W.; Xie, S.M.; Azzari, G.; Lobell, D.B. Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery. Remote Sens. 2020, 12, 207. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
Mai, G.; Huang, W.; Sun, J.; Song, S.; Mishra, D.; Liu, N.; Gao, S.; Liu, T.; Cong, G.; Hu, Y.; et al. On the Opportunities and Challenges of Foundation Models for Geoai (Vision Paper). ACM Trans. Spat. Algorithms Syst. 2024, 10, 1–46. [Google Scholar] [CrossRef]
Bi, H.; Feng, Y.; Tong, B.; Wang, M.; Yu, H.; Mao, Y.; Chang, H.; Diao, W.; Wang, P.; Yu, Y.; et al. RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 48, 4388–4405. [Google Scholar] [CrossRef]
Liu, F.; Chen, D.; Guan, Z.; Zhou, X.; Zhu, J.; Ye, Q.; Fu, L.; Zhou, J. RemoteCLIP: A Vision Language Foundation Model for Remote Sensing. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
Zhang, W.; Cai, M.; Zhang, T.; Zhuang, Y.; Mao, X. EarthGPT: A Universal Multi-Modal Large Language Model for Multi-Sensor Image Comprehension in Remote Sensing Domain. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 1–20. [Google Scholar]
Yu, Z.; Liu, C.; Liu, L.; Shi, Z.; Zou, Z. MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1764–1781. [Google Scholar] [CrossRef]
Cepeda, V.V.; Nayak, G.K.; Shah, M. GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-Localization. arXiv 2023, arXiv:2309.16020. [Google Scholar]
Zhu, J.; Yao, L.; Liu, F.; Zhang, C.; Shen, C.; Zhou, J. A Encoder-Decoder Framework for Foundation Model-Based Remote Sensing Semantic Segmentation. In Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar]
Li, X.; Wen, C.; Hu, Y.; Zhou, N. RS-CLIP: Zero Shot Remote Sensing Scene Classification via Contrastive Vision-Language Supervision. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103497. [Google Scholar] [CrossRef]
Huang, Z.; Yan, H.; Zhan, Q.; Yang, S.; Zhang, M.; Zhang, C.; Lei, Y.; Liu, Z.; Liu, Q.; Wang, Y. A Survey on Remote Sensing Foundation Models: From Vision to Multimodality. arXiv 2025, arXiv:2503.22081. [Google Scholar] [CrossRef]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
Hu, Z.; Wei, Y.; Shen, L.; Yuan, C.; Tao, D. LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
Choe, S.K.; Mehta, S.V.; Ahn, H.; Neiswanger, W.; Xie, P.; Strubell, E.; Xing, E. Making Scalable Meta Learning Practical. arXiv 2023, arXiv:2310.05674. [Google Scholar] [CrossRef]
Li, J.; Huang, X.; Tu, L. WHU-OHS: A Benchmark Dataset for Large-Scale Hersepctral Image Classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 103022. [Google Scholar] [CrossRef]
Mani, A.; Gorbachev, S.; Yan, J.; Dixit, A.; Shi, X.; Li, L.; Sun, Y.; Chen, X.; Wu, J.; Deng, J.; et al. OHID-1: A New Large Hyperspectral Image Dataset for Multi-Classification. Sci. Data 2025, 12, 251. [Google Scholar] [CrossRef] [PubMed]
Nguyen, T.T.; Huynh, T.T.; Ren, Z.; Nguyen, P.L.; Liew, A.W.-C.; Yin, H.; Nguyen, Q.V.H. A Survey of Machine Unlearning. ACM Trans. Intell. Syst. Technol. 2024, 16, 1–46. [Google Scholar] [CrossRef]
Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2024, arXiv:2312.00752. [Google Scholar] [CrossRef]
Aleissaee, A.A.; Kumar, A.; Anwer, R.M.; Khan, S.; Cholakkal, H.; Xia, G.-S.; Khan, F.S. Transformers in Remote Sensing: A Survey. arXiv 2022, arXiv:2209.01206. [Google Scholar] [CrossRef]
Bao, M.; Lyu, S.; Xu, Z.; Zhou, H.; Ren, J.; Xiang, S.; Li, X.; Cheng, G. Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook. Remote. Sens. 2025, 18, 594. [Google Scholar] [CrossRef]
Wang, F.; Wang, H.; Wang, Y.; Wang, D.; Chen, M.; Zhao, H.; Sun, Y.; Wang, S.; Lan, L.; Yang, W.; et al. RoMA: Scaling up Mamba-Based Foundation Models for Remote Sensing. arXiv 2025, arXiv:2503.10392. [Google Scholar]

Figure 1. Training processes: Meta-learning and Supervised learning for LULC classification.

Figure 2. The general procedure of memory-augmented learning methods.

Figure 3. The overview of the principles of optimization-based learning. (The different colored dashed borders distinguish individual tasks within the support set. In the lower panel, colored lines illustrate the optimization trajectories for different tasks, where

θ

and

θ'

denote the initial and updated parameters, and

φ

represents the task-specific adapted parameters).

Figure 3. The overview of the principles of optimization-based learning. (The different colored dashed borders distinguish individual tasks within the support set. In the lower panel, colored lines illustrate the optimization trajectories for different tasks, where

θ

and

θ'

denote the initial and updated parameters, and

φ

represents the task-specific adapted parameters).

Figure 4. The architecture of MAML.

Figure 5. The architecture of Meta-LSTM.

Figure 6. The overview of the principles of metric-based learning.

Figure 7. Trend in published papers.

Figure 8. Number and proportion of meta-learning publications in land cover/use by research category.

Figure 9. Reviews of representative studies.

Figure 10. A comparison of Meta-learning and conventional transfer learning.

Table 1. Representative memory-augmented meta-learning methods.

Method	Year	Memory Mechanism	Core Innovation
MANN [23]	2016	External content-addressable matrix (NTM-based)	Pioneered memory-augmented meta-learning with LRUA-based writing for persistent representation-label storage
SNAIL [25]	2018	Implicit attention-based memory	Replaces explicit memory with interleaved temporal convolution and soft attention for high-bandwidth retrieval
CNPs [26]	2018	Compressed task embedding	Aggregates support set into a single task representation; enables calibrated uncertainty estimation
Meta Networks [27]	2017	Dual-stream fast-weight generation	Meta-learner generates fast weights for cross-task adaptation; base learner handles task-specific objectives
APL [28]	2019	Surprise-based incremental buffer	Selectively retains high-information samples to mitigate memory growth
MATE [29]	2020	Kernel mean embedding with adaptive attention	Models inter-task distributional differences and task complexity jointly
GAM [30]	2021	Graph-based memory aggregation	Aggregates knowledge from past tasks via GNNs in a model-agnostic manner
EMO [31]	2023	Gradient-history memory	Stores prior-task gradient histories to assist parameter updates when current-task signals are weak
CSM [32]	2024	Learnable memory vectors	Extracts base-class object patterns and encodes query features to align base-novel distributions

Table 2. Representative optimization-based meta-learning methods.

Method	Year	Direction	Core Innovation
MAML [14]	2017	Foundational	Bilevel optimization with task-specific inner-loop adaptation and shared outer-loop initialization
Meta-LSTM [33]	2017	Foundational	Uses LSTM to explicitly model the optimization trajectory and learn task-specific update rules
Reptile [34]	2018	Computational simplification	First-order approximation of MAML through iterative task sampling; avoids second-order gradients
ANIL [35]	2020	Computational simplification	Restricts inner-loop updates to the classifier head, leveraging feature reuse
BOIL [36]	2021	Computational simplification	Updates only the feature extractor; emphasizes representation change for cross-domain scenarios
LLAMA [37]	2018	Probabilistic extension	Recasts MAML as hierarchical Bayesian inference with approximate curvature estimation
PLATIPUS [38]	2018	Probabilistic extension	Samples diverse task-specific models from a learned distribution via a probabilistic graphical model
BMAML [39]	2018	Probabilistic extension	Integrates MAML with Stein variational gradient descent for non-parametric Bayesian meta-learning
TAML [40]	2018	Robustness and generalization	Promotes unbiased initialization by maximizing label entropy to reduce meta-overfitting
LEO [41]	2019	Robustness and generalization	Performs adaptation in a learned low-dimensional latent embedding for efficient optimization
CAML [42]	2019	Robustness and generalization	Conditionally transforms feature representations based on class-level dependencies
Meta-SGD [43]	2017	Meta-learned optimizer	Meta-learns initialization, per-parameter learning rates, and update directions jointly
Meta-Adam [44]	2023	Meta-learned optimizer	Incorporates weight-update history and momentum to predict adaptive learning rates
MT-net [45]	2018	Recent advance	Introduces layer-wise subspace learning with meta-learned distance metrics
DEML [46]	2018	Recent advance	Enables meta-learning in concept space rather than instance space
GP-MAML/ANIL/BOIL [47]	2022	Recent advance	Leverages pseudo-labeled query samples to enrich the support set during training
XB-MAML [48]	2024	Recent advance	Dynamically expandable basis parameters linearly combined to form task-specific initializations

Table 3. Representative metric-based meta-learning methods.

Method	Year	Direction	Core Innovation
Matching Network [20]	2016	Foundational	Attention-based mapping from support set to label predictions via differentiable nearest-neighbor mechanism
Prototypical Network [49]	2017	Foundational	Represents each class by the mean embedding of its support examples; classifies query samples by nearest-prototype distance
Relation Network [50]	2018	Foundational	Learned deep relation module replaces fixed distance functions for non-linear similarity scoring
Principal Characteristics Net [51]	2019	Prototype refinement	Contribution-based weighting of embedded vectors for more expressive class prototypes
IPNET [52]	2022	Prototype refinement	Weights support samples by Maximum Mean Discrepancy to reduce outlier influence
K-Tuple Network [53]	2020	Relational modeling	Captures multi-sample relational structures during episodic training
Task-Adaptive Relation-Dependent Network [54]	2021	Relational modeling	Addresses train–test distribution bias via distribution-shifting and fine-grained feature comparison
Attention-Enhanced Relation Network [55]	2023	Relational modeling	Integrates adaptive-kernel and cross-channel attention to encode multi-scale features
MLFRNet [56]	2022	Relational modeling	Models local feature relationships using cosine-distance-based attention
LDP-Net [57]	2023	Cross-domain generalization	Dual-branch global–local knowledge distillation with EMA updates
SS-Matching Networks [58]	2019	Cross-domain generalization	Difficulty-aware metric with scheduled sampling for progressive training
Prototypical Siamese Networks [59]	2020	Cross-domain generalization	Siamese architecture with dedicated module for refined prototypical representation
TPN [60]	2024	Hybrid and neighbor-based	Transferable proto-learner with NOTA calibration and virtual adversarial training
PNN [61]	2025	Hybrid and neighbor-based	Combines Prototypical Networks with KNN-inspired Neighbor Network and hybrid data augmentation

Table 4. Computational complexity of representative meta-learning methods.

Paradigm	Method	Complexity	Dominant Cost
Memory-augmented	MANN [23]	$O (L \cdot M \cdot D$ )	Content-based read/write to external memory at every time step over sequence length L
Memory-augmented	CNPs [26]	$O (N \cdot D$ )	Single deterministic aggregation of support set into a task representation; no per-step memory access
Optimization-based	MAML [14]	$O (T \cdot \|θ\|$ )	Bilevel optimization requires backpropagation through the inner-loop computation graph
Optimization-based	Reptile [34]	$O (T \cdot \|θ\|$ )	Avoids second-order gradients via iterative task sampling
Optimization-based	ANIL [35]	$O (T \cdot \|θ_{h e a d}\|$ )	Inner-loop updates restricted to the classifier head
Metric-based	Matching Network [20]	$O (N \cdot Q \cdot D$ )	Attention over the entire support set for each query
Metric-based	Prototypical Network [49]	$O (N \cdot D)$ $+ O (C \cdot Q \cdot D$ )	Prototype computation followed by distance comparison; no inner-loop adaptation
Metric-based	Relation Network [50]	$O (C \cdot Q \cdot \|θ_{r e l}\|$ )	Learned relation module replaces fixed distance, applied to each query–class pair

Notation:

N

: number of support samples per episode;

M

: memory size; L: sequence length;

D

: feature dimension;

C

: number of classes per episode;

Q

: number of query samples;

| θ |

: number of model parameters;

|θ_{r e l}|

,

|θ_{h e a d}|

: parameter counts of the relation module and classifier head, respectively;

T

: number of inner-loop gradient steps.

Table 5. Comparison of the three meta-learning paradigms in the context of LULC classification.

Paradigm	Core Mechanism	Strengths	Limitations	Suitability for LULC
Memory-augmented meta-learning	Encode support-set information into external or internal memory modules; retrieve relevant content for query prediction via attention or content-addressable access	Supports incremental knowledge retention; suitable for cross-temporal pattern retrieval	High computational and memory overhead; limited transferability across substantially different domains	Underexplored in LULC; potential for dynamic monitoring and incremental class discovery
Optimization-based meta-learning	Learn a shared parameter initialization that can be rapidly adapted to new tasks through a few gradient steps (bilevel optimization)	Model-agnostic; flexible integration with diverse backbones (CNN, Transformer); broad task applicability	Computationally intensive due to second-order gradients; sensitive to task distribution	Most widely adopted in LULC; applied across optical, hyperspectral, time series, and multimodal data
Metric-based meta-learning	Learn an embedding space in which query-to-support distance determines classification; non-parametric inference at test time	Computationally efficient at inference; no task-specific fine-tuning required	Performance sensitive to embedding quality; largely confined to supervised classification	Well-suited for hyperspectral image classification and operational large-scale mapping

Table 6. Representative studies addressing the four LULC challenges with meta-learning.

LULC Challenge	Meta-Learning Paradigms	Data Modalities	Representative References
Label scarcity (Section 4.2)	Optimization-based; Metric-based	Optical image; Hyperspectral image	Gao et al. (2021) [62]; Amoako et al. (2025) [63]; Li et al. (2024) [64]; Jia et al. (2025) [65]; Swaminathan et al. (2022) [66]
Cross-region and cross-domain shifts (Section 4.3)	Optimization-based; Metric-based	Multispectral (Sentinel-2); Hyperspectral image	Rußwurm et al. (2020) [13]; Rußwurm et al. (2022) [67]; Wang et al. (2020) [68]; Deng et al. (2019) [69]; Xi et al. (2022) [70]; Wang et al. (2025) [71]
Temporal dynamics modeling (Section 4.4)	Optimization-based	Satellite time series image	Park et al. (2023) [16]; Mohammadi et al. (2024) [11]; Wu et al. (2025) [72]; Jiang et al. (2025) [61]
Multimodal data integration (Section 4.5)	Optimization-based; Metric-based	LiDAR and Hyperspectral image; Multi-resolution image; multispectral image	Dai et al. (2024) [73]; Rußwurm et al. (2024) [15]; Zhang et al. (2020) [74]; Qiao et al. (2023) [75]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, W.; Li, L.; Wu, H.; Gao, X.; Yang, Y.; Zhang, Z.; Yang, X.; Ge, Y. Meta-Learning in Land Use and Land Cover Classification: Review and Perspective. Remote Sens. 2026, 18, 1879. https://doi.org/10.3390/rs18121879

AMA Style

He W, Li L, Wu H, Gao X, Yang Y, Zhang Z, Yang X, Ge Y. Meta-Learning in Land Use and Land Cover Classification: Review and Perspective. Remote Sensing. 2026; 18(12):1879. https://doi.org/10.3390/rs18121879

Chicago/Turabian Style

He, Wei, Lianfa Li, Haoxiong Wu, Xilin Gao, Yichen Yang, Zixuan Zhang, Xiaomei Yang, and Yong Ge. 2026. "Meta-Learning in Land Use and Land Cover Classification: Review and Perspective" Remote Sensing 18, no. 12: 1879. https://doi.org/10.3390/rs18121879

APA Style

He, W., Li, L., Wu, H., Gao, X., Yang, Y., Zhang, Z., Yang, X., & Ge, Y. (2026). Meta-Learning in Land Use and Land Cover Classification: Review and Perspective. Remote Sensing, 18(12), 1879. https://doi.org/10.3390/rs18121879

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Meta-Learning in Land Use and Land Cover Classification: Review and Perspective

Highlights

Abstract

1. Introduction

2. Review Methodology

2.1. Literature Search Strategy

2.2. Screening and Selection Criteria

2.3. Review Framework

3. Meta-Learning Paradigms

3.1. Meta-Learning Fundamentals

3.2. Memory-Augmented Meta-Learning

3.3. Optimization-Based Meta-Learning

3.4. Metric-Based Meta-Learning

3.5. Section Summary

4. LULC Application in Remote Sensing

4.1. Overview of Meta-Learning Applications in LULC

4.2. Label Scarcity in LULC

4.3. Cross-Region and Cross-Domain Generalization in LULC

4.4. Temporal Dynamics Modeling in LULC

4.5. Multimodal Data Integration in LULC

5. Discussion

5.1. Key Findings

5.2. Comparison with Related Paradigms

5.3. Current Limitations

5.4. Future Directions

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI