Advances in Machine Learning Approaches for UAV-Based Remote Sensing in Data-Deficient Antarctic Environments

Gorry, Brittany; Sandino, Juan; Moghadam, Peyman; Gonzalez, Felipe; Roberts, Jonathan

doi:10.3390/rs18030459

Open AccessReview

Advances in Machine Learning Approaches for UAV-Based Remote Sensing in Data-Deficient Antarctic Environments

by

Brittany Gorry

^1,2,3,*

,

Juan Sandino

^1,2

,

Peyman Moghadam

^2,3

,

Felipe Gonzalez

^1,2

and

Jonathan Roberts

²

¹

Securing Antarctica’s Environmental Future (SAEF), Queensland University of Technology, 2 George St, Brisbane City, QLD 4000, Australia

²

QUT Centre For Robotics, Queensland University of Technology (QUT), 2 George St, Brisbane City, QLD 4000, Australia

³

CSIRO Robotics, Data61, CSIRO, 1 Technology Court, Pullenvale, QLD 4069, Australia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(3), 459; https://doi.org/10.3390/rs18030459

Submission received: 30 October 2025 / Revised: 16 January 2026 / Accepted: 17 January 2026 / Published: 1 February 2026

(This article belongs to the Special Issue Advanced Machine Learning Models for Remote Sensing Applications and Data Analysis—Recent Developments)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Task-specific methods continue to perform well in small-scale data regimes, but data scarcity means that Antarctic science continues to lag behind broader ML advances;
A paradigm shift in Earth Observation (EO) towards foundation models is underway, but Antarctic scenes remain largely absent from global datasets and benchmarking efforts.

What are the implications of the main findings?

Generalist EO models effectively support multimodal and multi-scale inputs, but remain limited by focusing on urban and agricultural satellite data;
Integrating UAV-based polar data is essential for effective cross-domain adaptation to data-scarce Antarctic environments.

Abstract

Remote sensing plays a vital role in monitoring environmental change in Antarctica, offering non-invasive insights into ice dynamics, biodiversity, and fragile ecosystems. Harsh conditions, limited field access, and logistical challenges result in sparse, noisy, and often unlabelled datasets, posing major obstacles for machine learning (ML) approaches. Data scarcity remains a fundamental challenge for uncrewed aerial vehicle (UAV)-based ecological monitoring. While ML models in other Earth observation domains demonstrate state-of-the-art performance, their applicability in Antarctic and polar regions’ settings is limited. This paper reviews the intersection of ML and UAV-based remote sensing in Antarctica under extreme data constraints. We surveyed recent strategies designed to overcome these limitations, including self-supervised learning, physics-informed modelling, and foundation models. Results highlight a notable gap, as polar environments remain excluded from global datasets and benchmarks due to the extensive data requirements of large-scale models. Opportunities exist where multimodal and multi-scale generalisation can enhance cross-domain adaption to data-scarce use cases. Unlike prior reviews on general remote sensing or task-specific polar studies, this work uniquely underscores the need for Antarctic representation in global ML advances, positioning Antarctica as a frontier testbed for machine learning in extreme, inaccessible, and under-resourced fields.

Keywords:

machine learning; deep learning; foundation models; Antarctica; polar; data scarcity; environmental monitoring; Earth observation; drones; conservation

Graphical Abstract

1. Introduction

Antarctic ecosystems are among the most fragile and rapidly changing on Earth. Harsh conditions and climate change impacts make the continent highly susceptible to local and global processes [1]. Unique and isolated terrestrial communities that rely on ice-free areas and meltwater are threatened by changing ice dynamics, while vegetation, including mosses and lichens, play a critical role in maintaining global ecological balance and biogeochemical cycling [2,3,4]. Monitoring these environments is essential to understand the impacts of climate change, biodiversity loss, and global ecological shifts [3,5].

Remote sensing (RS) provides scalable, non-invasive methods for observing environmental changes, from vegetation and wildlife to ground cover and terrain characteristics [6]. In particular, the use of uncrewed aerial vehicles (UAVs) equipped with RGB, multispectral, and hyperspectral sensors has enabled finer-grained environmental observations than satellite data or traditional in situ fieldwork alone. Figure 1 illustrates the resolution of Antarctic moss and lichen captured by UAVs, which cannot be easily distinguished by satellite but is laborious to collect through in situ measurements. This is particularly important for Antarctic studies where vegetation is susceptible to human impacts [4,7,8,9,10].

Despite technological advances, harsh weather, short operational windows during field seasons, high logistical costs, and strict environmental and safety regulations drastically limit the amount of raw ground-truth data that can be collected using UAVs. As a result, Antarctic remote sensing datasets are often limited in size, noisy, and inconsistently labelled, with many remaining entirely unlabelled [8,11,12]. Further limitations in insufficient spatial and spectral resolution, atmospheric interference, and sensor-specific discrepancies reduce the flexibility of current Antarctic datasets, making them difficult to reuse for a wide range of applications [13,14]. In this context, data scarcity refers not only to the limited number of available samples, but also to restricted spatial and spectral coverage, inconsistent temporal frequency, and a high proportion of unlabelled or noisy data. Compared to global Earth observation (EO) datasets that contain millions of annotated samples, Antarctic datasets remain orders of magnitude smaller and less diverse, which constrains the applicability of many machine learning approaches.

Machine learning (ML) has emerged as a powerful tool for processing and interpreting complex RS data. Existing approaches, including conventional statistical methods and supervised learning, rely on labelled data that is often difficult to collect in harsh Antarctic environments. Recent advances in semi-supervised and self-supervised learning methods, such as those commonly used in vision transformer (ViT) architectures, have shown promise in addressing these data annotation limitations [15,16,17]. These methods have also been shown to support various downstream tasks, such as single-class and multi-class classification, semantic segmentation, and change detection.

Despite the reported potential and contribution of ViTs for processing imagery, various limitations have been observed. For instance, many models struggle to generalise across sensors, time periods, or ecological contexts, thus limiting their practical use [13,18,19]. More broadly, self-supervised methods require significant quantities of data, making them difficult to adapt to data-scarce domains such as Antarctic RS [14,18,20]. To overcome these challenges, studies have investigated techniques such as pretraining strategies and transfer learning, and physics-based priors [21,22,23]. Geospatial foundation models have targeted multimodal and multi-scale learning capabilities, aiming to utilise abundant satellite data for EO tasks [19,24,25]. These studies have illustrated enhanced model performance, but lack integration of fine-scale spatial and spectral resolution for data-scarce Antarctic environments.

This review explores the challenges and opportunities of integrating UAVs and ML for RS in Antarctic and sub-Antarctic environments through the lens of data scarcity. As a critical review, our synthesis highlights conceptual and methodological gaps rather than providing an exhaustive systematic survey. We position Antarctica as a representative frontier for developing and testing ML strategies in extreme data regimes and examine how ML approaches can be adapted to work under such constraints. The techniques discussed aim to reduce the dependence on labelled data, improve generalisation across diverse environments and sensors, and extract meaningful insights from existing datasets.

The remainder of the paper is structured as follows: Section 2 outlines the literature search strategy. Section 3 provides an overview of the challenges associated with data deficiency in Antarctic remote sensing, including limited accessibility and sensor constraints. Section 4 reviews the current state of machine learning for remote sensing under data limitations, tracing the progression from traditional task-specific approaches to emerging generalist foundation models. Section 5 provides an overview of publicly available datasets relevant to the aforementioned techniques, ranging from global-scale resources to Antarctic-specific databases. Section 6 discusses emerging themes and trends, highlights opportunities for advancement, and outlines potential challenges for future research directions. Finally, Section 7 summarises the key findings and their applicability to data-deficient Antarctic environments. This paper aims to inform researchers about the methods and techniques that enable data-scarce remote sensing studies in Antarctic and other polar environments. While existing reviews survey ML methods for remote sensing broadly and others examine applications in Antarctic science, none analyse the constraints and opportunities through a data scarcity lens. This review, therefore, presents a unique contribution by bridging global ML advances with Antarctic remote sensing, while explicitly identifying the lack of Antarctic representation in large-scale datasets and foundation model benchmarks. In doing so, we offer both a synthesis of transferable methods for cross-domain adaptation and a potential roadmap for addressing the unique challenges of data scarcity in Antarctic remote sensing.

2. Methodology

This paper is a critical review that positions Antarctica as a frontier for machine learning advancements. Its aim is to synthesise and evaluate emerging ML approaches relevant to data-scarce Antarctic remote sensing, drawing on transferable insights from broader remote sensing and ML research where Antarctic-specific studies remain limited. While the fields of ML, remote sensing, and Antarctic environmental monitoring intersect in places, state-of-the-art methods rarely span all three. We therefore adopt a forward-looking perspective to situate Antarctic remote sensing within global ML developments and to identify emerging opportunities and implications for future research.

2.1. Literature Search Strategy

An iterative search strategy was conducted using Scopus, ScienceDirect, Google Scholar, and IEEE Xplore. Given the rapid advancements in machine learning in recent years, this review primarily considered studies published after 2020, while seminal earlier works were retained where they established foundational concepts or methodological precedents.

Search queries combined relevant terms using Boolean operators and search modifiers. Through iterative searching, small subsets of topics were covered. For instance, to the best of our knowledge, searching for UAV-based studies using foundation models in polar environments does not currently yield relevant results, but UAVs are used plenty in Antarctica, while foundation models for remote sensing is a rapidly growing research field. Forward and backward citation tracking captured additional relevant literature. Example search terms included:

(“Antarctic*” OR “Arctic” OR “polar”);
AND (“machine learning” OR “artificial intelligence” OR “AI” OR “deep learning”);
AND (“remote sensing” OR “UAV” OR “uncrewed aerial vehicle” OR “unmanned aerial vehicle” OR “drone” OR “multispectral” OR “hyperspectral”).

The initial literature searches returned over 600 records after removing duplicates and filtering from 2020 to 2025. After screening titles and abstracts for relevance, 180 were retained for full-text review. Of these, 98 were included in the final synthesis, supplemented by additional papers identified through back-citation. Inclusion criteria required studies to maintain relevance to data-scarce Antarctic remote sensing with machine learning methods. Where studies were not applied to Antarctic domains or UAV-based RS, they were included for relevance to transferable EO insights. Exclusion criteria encompassed papers focused exclusively on mid-latitude agricultural or urban monitoring without transferable insights, studies without methodological detail, or those relying on large, fully labelled datasets with minimal discussion of data constraints.

Studies were subsequently categorised according to their relevance to data-deficient domains, for instance, through the use of small datasets, appropriate supervision techniques for few-shot or zero-shot learning, or multimodal approaches that could support a combination of small datasets into a larger dataset. For more topics with recent rapid progress that have become relatively saturated, such as foundation models, newer studies were prioritised.

Peer-reviewed journal articles were prioritised for this review, with the exception of high-quality conference papers and recent significant contributions from leading research organisations and agencies.

2.2. Data Synthesis and Analysis

To form a thematic synthesis, studies were methodologically categorised by ML approach, organised from those using small datasets to those relying on large datasets for transfer learning techniques. Polar environments were under-represented in global-scale models, but results were organised to emphasise techniques and opportunities that can inform opportunities for future work. The resulting synthesis emphasises methodological pathways for addressing the under-explored intersection of advanced ML and Antarctic data constraints.

The following three sections step through the key problems, challenges, and opportunities for addressing data constraints for Antarctic environmental monitoring, followed by a discussion and conclusion to describe potential future work.

Section 3 examines key factors relating to Antarctic RS, outlining the main problems that give rise to data scarcity.
Section 4 analyses the challenges and opportunities for various ML methods applicable to RS, ranging from simple, task-specific techniques to computationally expensive and complex generalist methods.
-
Section 4.1 summarises manual feature engineering and rule-based strategies that work effectively with small Antarctic datasets;
-
Section 4.2 covers traditional ML techniques that remain domain-specific to Antarctic environments and datasets;
-
Section 4.3 analyses deep learning techniques adapted from natural language processing and computer vision for spectral data, including neural networks in Section 4.3.1 and transformers in Section 4.3.2; these studies use larger datasets and are discussed outside the polar domain;
-
Section 4.4 summarises physics and prior-based methods that incorporate physical laws and domain knowledge for ML techniques;
-
Section 4.5 discusses foundation models for EO, focusing on global generalist models that aim for transferability and generalisability across domains, sensors, and tasks.
Section 5 surveys existing publicly accessible datasets that may be leveraged for techniques discussed previously.
-
Section 5.1 summarises large-scale satellite datasets primarily used for transformer-based foundation models that are intended to support pretraining;
-
Section 5.2 introduces accessible databases that provide a practical entry point for browsing EO data across many domains;
-
Section 5.3 overviews databases that provide access to Antarctic and polar data, including those hosted by Antarctic research organisations.
Section 6 provides a discussion of these challenges, as well as opportunities for future work.
Section 7 concludes with a summary of key points.

3. Data Collection for Antarctic Remote Sensing

The problem of data scarcity for Antarctic remote sensing stems from various aspects. In this section, we first summarise the strengths and limitations of key platforms available for Antarctic remote sensing, as they often vary according to the observational scales needed for the research question being investigated. We then focus on three primary sensor types: RGB, multispectral imagery (MSI), and hyperspectral imagery (HSI). These are the most commonly deployed on UAV platforms and provide a spectrum of spatial and spectral resolution suitable for vegetation and ecosystem monitoring. We conclude Section 3 by describing the main technical and environmental challenges inherent to Antarctic environments.

3.1. Platforms of Data Collection for Antarctic Remote Sensing

Remote sensing is crucial for monitoring Antarctica’s rapidly changing ecosystems. It enables large-scale, non-invasive observation of vegetation, wildlife, snow cover, and surface conditions in areas that are otherwise inaccessible and sensitive to human disturbance [2,26]. However, data acquisition is severely restricted by harsh weather, unpredictable wind and cloud cover, and logistical challenges of transporting equipment to remote, ice-covered terrain [12,27].

Handheld sensors, UAVs, manned aircraft, and satellites can be applied to conduct RS studies according to the aim of the expedition. As shown in Figure 2, the spatial resolution and survey altitude of each platform must be considered appropriately. These differences in coverage and accessibility have been summarised in Table 1.

Ground-based surveys using handheld sensors provide the highest resolution data, but are time-consuming, labour-intensive, and spatially limited [8,28]. Satellite imagery, while offering continent-scale coverage and repeatable temporal data, lacks the resolution required to distinguish small-scale biological features, such as scattered moss beds; acquired pixels often contain mixed types of vegetation, leading to increased misclassification [8,12,29]. Additionally, low light levels near the poles, challenging latitudes, and long revisit intervals significantly limit the amount of satellite data available over Antarctica [30,31].

UAVs provide a balance between resolution and targeted deployment, and are flexible platforms that enable RGB, multispectral, and hyperspectral image acquisition for a range of important studies. Case studies show, for example, that UAVs can facilitate surveys of snow and ice landforms, providing information on snow cover, cryotic features, and ice movement [6,32]. Studies have also been conducted using UAVs to count wildlife and understand the behavioural and physiological responses of various Antarctic animals such as penguins, seals, and whales [6,32,33,34]. Through vegetation classification and mapping, UAVs have been used to identify vulnerable regions, thus informing policy makers on increasing conservation and long-term monitoring efforts [2,7,8].

Understanding the distribution of mosses and lichens is essential for protecting Antarctica’s delicate ecosystems and global health. As key components of terrestrial biodiversity, moss and lichen species provide habitats for microorganisms, act as environmental buffers by storing carbon and meltwater, and provide insight into species variation across the continent, thus indicating the efficacy of environmental protection methods [3]. Despite these far-reaching impacts, UAV operations are still subject to various limitations, as described in Table 2. As such, missions must be planned carefully to maximise the coverage and usefulness of UAV-based data collection during narrow seasonal windows.

3.2. Common Sensor Types for Antarctic Remote Sensing

Just as the RS method must be selected to suit the key objectives of each study, the resolution of spectral sensors must be planned accordingly to balance their respective benefits and trade-offs. RGB cameras can be mounted or built into UAVs used for fine-scale mapping of moss beds or using ultra-high spatial resolution images to build digital elevation models (DEMs) [6,8,32]. However, RGB cameras lack spectral resolution and are limited in their use for distinguishing between lichen species and assessing moss health without obvious visual features [2,8]. Both multispectral imaging (MSI) and hyperspectral imaging (HSI) capture information across a broader range of wavelengths beyond visible light. Figure 3 shows this depth, which enables the detection of subtle differences in material composition and biological features based on unique spectral signatures.

MSI is particularly useful for vegetation monitoring using Near-Infrared (NIR) and Red-Edge spectral bands [6,8,32]. These bands provide important indicators of plant health and moisture, often used in the calculation of vegetation indices [2,8]. Although MSI can help detect changes in vegetation health that appear outside the visible spectrum, it lacks the spectral resolution required for finer-scale studies. In comparison with RGB and MSI sensors, HSI sensors capture hundreds of narrow bands for a more continuous spectral profile, as illustrated in Figure 4. This enriched spectral resolution enables enhanced segmentation maps such as precise vegetation classification, water stress assessment, early plant disease detection, and chlorophyll detection [6,8,32,34,35].

High-resolution spectroscopy is useful for differentiating plant types and indicating plant health using subtle spectral differences that are present at the molecular level [37]. However, these spectrometers lack spatial resolution and coverage range, making them poorly suited for large-scale mapping. Furthermore, logistical deployment of both HSI cameras and spectrometers are limited by cost, power consumption, and weight, making logistical deployment an important consideration [2,8].

3.3. Environmental and Technical Challenges for Antarctic Remote Sensing

The most significant challenges for Antarctic remote sensing stem from environmental and operational constraints that persist regardless of the platform or sensor modality. Remote and complex terrain, low temperatures, high winds, and persistent overcast conditions can severely limit access for field crews operating hand-held sensors and constrain UAV deployment [38,39]. Field campaigns are frequently delayed, postponed, or suspended due to rainfall and snowfall, and UAV take-off and landing can be compromised by rapidly changing meteorological conditions [6]. Low temperatures often necessitate thermal insulation of payload components; however, the resulting mass increase reduces flight efficiency and endurance [40]. Icing is consistently reported as a major challenge for UAV operations in polar environments, as ice accretion alters aerodynamics, increases weight, and can rapidly degrade battery capacity [6,40]. Operational planning must also consider ecological impacts, including potential disturbance to wildlife and damage to fragile vegetation and terrain [4,41,42]. Collectively, these constraints limit attainable survey duration and spatial coverage, even when icing-prone regions are avoided, and anti-icing coatings are applied.

Despite these limitations, satellite data alone are insufficient for many studies of Antarctic ecology. The spatial resolution of satellite imagery is often inadequate for capturing fine-scale vegetation that is dispersed over wide areas [38,41]. At coarse resolutions, atmospheric scattering and high surface albedo can further degrade image quality, and temporal variability near the sea-ice margin can introduce discontinuities [43,44,45]. Similar spectral signatures of snow and bare ice can also produce low-contrast maps; discriminating between these classes is challenging [43]. More broadly, landscape features, sea-ice characteristics, and wildlife can be difficult to identify reliably using satellite imagery alone. For example, seals may be visually confounded with rocks, and shadows can further complicate wildlife counts in densely clustered groups [42]. Related challenges have been reported in Arctic settings, where polar bears can be difficult to discern against snow in panchromatic imagery [42]. In addition, frequent cloud cover, low solar elevation angles, and strong seasonal variability substantially constrain the volume and consistency of satellite observations over Antarctica [30,43]. Collectively, these factors limit the capacity for detailed and continuous ecological monitoring [46].

As a result, Antarctic remote sensing datasets are often scarce, noisy, and minimally labelled. To better leverage existing ML techniques, studies may benefit from a complementary data strategy that combines satellite imagery with UAV observations and targeted ground-truth sampling [39,46,47]. Because large-scale Antarctic data collection is inherently difficult, the emphasis shifts from expanding acquisition across platforms to using available data more efficiently. Accordingly, recent work investigates ML strategies that perform under limited supervision, integrate heterogeneous sensors and resolutions, and leverage transfer learning for improved generalisation.

4. Machine Learning for Remote Sensing

The problems outlined in Section 3 underscore a key challenge for Antarctic remote sensing studies, which is the scarcity of data and the limited labelled training samples. Outside the Antarctic domain, researchers have investigated a variety of strategies that aim to either maximise the usefulness of limited samples or minimise the reliance on extensive labelling. Throughout this section, we organise commonly reported methods into five categories, as outlined in Table 3, and discuss each overarching strategy regarding its applicability for data- and label-constrained contexts. We describe the challenges faced by each method, as well as opportunities that offer transferable insights for Antarctic data scarcity. It is important to note that direct quantitative comparisons between methods and models are limited by the heterogeneity of datasets, evaluation metrics, and tasks across different studies. We therefore emphasise qualitative trends and contextual limitations, while highlighting representative challenges and strengths notable within the literature.

Each ML method differs in its suitability for data-scarce settings, so it is important to consider the availability of both domain-specific knowledge and large-scale datasets, as illustrated in Figure 5. Typically, methods developed for smaller datasets remain task-specific and often require retraining for new datasets or reconfiguration for new sensors. As methods scale to improve generalisability, they increasingly depend on large volumes of data.

These methods can be adapted to different contexts, with some targeted towards specific downstream tasks such as classification, change detection, or landscape monitoring. While the choice of data must be adapted to the intended downstream task, many modern approaches are designed to improve generalisability and flexibility, extending their use across multiple applications. Larger-scale models, particularly foundation models developed for a wide range of EO tasks, typically rely on satellite data rather than UAV-based samples. However, the learning strategies presented throughout this section provide valuable insights that can be adapted to UAV-based studies.

A key dimension in comparing these approaches is the degree of supervision required during training, which largely dictates the applicability of each method in different data scenarios. Supervised learning depends on labelled data to establish input–output mappings, yet Antarctic studies are often constrained by limited field campaigns and sparse ground-truth labels. Semi-supervised or weakly supervised learning leverages small amounts of labelled samples with larger volumes of unlabelled data, partially mitigating this bottleneck. Self-supervised learning extends further, learning representations from the data itself and enabling the use of vast amounts of unlabelled data. Framing these methods, particularly those applying deep learning and beyond, within the spectrum of training supervision highlights how different strategies navigate the two challenges of data scarcity and label scarcity, which remain central to Antarctic RS studies.

4.1. Manual Feature Engineering and Rule-Based

To efficiently utilise small, noisy datasets, many Antarctic RS studies have relied on classical methods, including vegetation indices, manual feature engineering, and statistical classifiers. These approaches are generally simpler and require fewer training samples, making them computationally inexpensive and accessible under the data constraints of polar research. However, their reliance on handcrafted features and task-specific indices limits their robustness and generalisability across different contexts.

Huovinen et al. [48] study snow-algae and impurities by using spectral reflectance measurements to calculate spectral indices, including the Normalised Difference Snow Index (NDSI) and Snow Darkening Index (SDI). Features such as the presence of algae and albedo calculations are determined using Spectral Mixture Analysis (SMA), a linear mixture model, and red–green (R/G) band ratios. Huovinen et al. [48] report an overall accuracy of 61% using SMA, with a notable underestimation of snow algae. The R/G band ratio and SDI yield 46% and 57% overall accuracy, respectively. These limitations are attributed to complex and changing feedback processes within the snow, suggesting that more robust classification methods must consider interference on reflectance from dirt and mixed algal communities [48].

Román et al. [34] conduct HSI spectral analyses to characterise penguin colonies using indices such as the Normalised Difference Vegetation Index (NDVI), Triangular Vegetation Index (TVI), and Pixel Purity Index (PPI), as well as Minimum Noise Fraction (MNF) and the Spectral Angle Mapper (SAM). The resulting SAM classification map demonstrates 92% overall accuracy. Furthermore, the authors resample HSI for multispectral analysis, obtaining 89% overall accuracy from the SAM thematic map. However, frequent misclassifications between certain vegetation species occur, where humidity and seawater impact their spectral behaviour. As some absorption features are only accessible in finer detailed spectra, this study shows greater classification accuracy from HSI than MSI.

These works illustrate the development of computational methods for Antarctic monitoring and their importance for advancing conservation efforts. While statistical classifiers and spectral indices offer practical tools for specific Antarctic applications, they are challenged by their reliance on handcrafted features. Consequently, they have limited adaptability under noisy, sparse, or shifting data regimes, meaning new indices and features must be calculated for new datasets and sensors. This underscores the need for more flexible, data-efficient alternatives. Techniques to overcome data scarcity are key to realising the potential for ML in Antarctic RS studies and expanding towards more transferable learning approaches.

4.2. Traditional Machine Learning

Building on the limitations of statistical classifiers and manual feature engineering methods, Antarctic RS studies have adopted traditional ML methods, including Support Vector Machines (SVMs), Random Forests (RFs), and k-Nearest Neighbours (KNNs), as well as ensemble approaches such as eXtreme Gradient Boosting (XGBoost). These algorithms are generally effective for small or medium-sized datasets and are valued for their flexibility compared to statistical classifiers. However, their performance depends heavily on the quality of input features, as they cannot automatically extract complex representations from raw data.

Román et al. [33] investigate four different classification algorithms for mapping guano, moss species, and terrain features in penguin colonies. By comparing SVMs, Spectral Angle (SAC), Maximum Likelihood Classifier (MLC), and RFs, Román et al. [33] show that non-parametric classification algorithms obtain the highest overall accuracy, with SVM achieving 93% accuracy. However, misclassification in moss species is attributed to spectral similarity between vegetation classes [33]. Additionally, SVM is noted to perform well on small datasets, but the generalisability and robustness of the algorithm across diverse or more complex scenarios may be limited.

Sandino et al. [4] developed a workflow for vegetation classification of moss health using XGBoost [49]. Their four supervised ML classifiers integrate feature selection and spectral indices to ensure that the workflow can be systemically applied to diverse vegetation types in polar regions. Moreover, the authors fuse HSI and MSI to obtain georeferenced HSI scans for robust classification maps. These techniques improve classification accuracy and reduce computational demands, achieving accuracy ranging from 95% to 98%. Although the inclusion of several custom spectral indices clearly improves classification performance, risks of multicollinearity are noted, which limit model interpretability. Sandino et al. [4] attribute a degree of their model’s exceptional accuracy to the distinct spectral signatures of classes in the dataset. This highlights the need to expand the dataset to facilitate rigorous testing and enable greater adaptability.

Raniga et al. [2] expand on previous work by comparing XGBoost to a convolutional neural network-based classifier, U-Net [50]. While XGBoost alone exceeds 85% in precision, recall, and F1-score for vegetation classification, Raniga et al. [2] emphasise the challenges of limited data. Unlike Sandino et al. [4], who combine HSI and MSI, Raniga et al. [2] use only MSI in this work; the limited labelled data is insufficient to effectively train the deep learning model. This hinders the ability of U-Net to distinguish complex spectral patterns to discern moss health. Despite this, incorporating XGBoost predictions as inputs to U-Net demonstrates a substantial improvement in precision, recall, and F1-score for classification compared to U-Net alone. These results illustrate the potential for deep learning in Antarctic environments, highlighting the need for techniques to overcome the scarcity of labelled data.

The relatively small data requirements of traditional ML methods make them compatible with small-scale UAV-based Antarctic RS datasets, but they are challenged by flexibility and are task-specific. These methods cannot be easily adapted to different contexts, as they require retraining for each new dataset or configuring the parameters according to each different sensor. Raniga et al. [2] illustrate a hybrid approach by combining traditional machine learning and deep learning techniques, but did not acknowledge the need for more labelled data. Consequently, studies appear to adopt a shift towards deep learning approaches, which leverage different learning types to address data challenges.

4.3. Deep Learning

Deep learning has shown promising performance in many RS applications and spectral image analysis. However, adoption in Antarctic contexts has been limited as these methods depend on large datasets that are difficult to obtain in polar environments. Despite this gap, reviewing deep learning approaches, even in other domains, helps illustrate the challenge of large labelled dataset dependence that constrains their applicability to Antarctic studies. Furthermore, recent developments in self-supervised learning and large-scale pretraining offer potential opportunities to mitigate the reliance on large labelled samples, making these techniques valuable to review in data-scarce contexts.

In this section, we first review convolutional neural networks (CNNs), which have been widely utilised for hyperspectral and multispectral data. This is followed by an overview of transformer-based methods, which have shown advances for modelling long-range dependencies under limited annotations. Although the studies discussed are not exclusively applied to polar contexts, they outline useful strategies that suggest promising directions for addressing data scarcity and label scarcity through large-scale pretraining.

4.3.1. CNNs for Spectral Data

Following their success in computer vision, CNNs have been tailored for MSI and HSI in RS applications [51,52,53]. As illustrated in Figure 6, their ability to extract local spatial and spectral features makes them well-suited for tasks such as HSI classification, object detection, semantic segmentation, and spectral reconstruction, where they often outperform traditional ML approaches such as SVM, KNNs, and RFs in many benchmark tests [51,54,55].

To accommodate the structure of spectral data, CNN architectures have evolved into 1D, 2D, and 3D variants. One-dimensional CNNs process spectral pixel vectors, while 2D CNNs incorporate spatial–spectral relationships and consider local contextual information. Three-dimensional CNNs apply convolutions across both spatial and spectral dimensions to improve performance [51,53]. More advanced architectural features, including residual blocks, dense connections, and U-Net style multi-scale modules, have also been explored to enhance feature learning, reduce feature contamination, and support network convergence [13,56,57].

Notable works in this domain include, for instance, that of Xiong et al. [58], who introduce HSCNN, a CNN-based hyperspectral deep learning framework that improves spectral reconstruction by recovering HSI data from RGB images. While outperforming traditional dictionary-based methods, the model relies on prior knowledge of the spectral response function (SRF), wavelength-dependent dispersion function, and transmission function. This limits general applicability, as the model must be trained again to transfer to a new environment [58,59]. Furthermore, Shi et al. [59] expand on HSCNN through HSCNN+, incorporating residual and dense blocks to improve reconstruction performance when the SRF is unknown. However, these studies suggest that HSCNN models remain task-specific and are not easily transferable to downstream tasks beyond spectral reconstruction.

Recent work has further demonstrated the effectiveness of CNNs for semantic segmentation of hyperspectral data in Antarctic environments. Sandino et al. [9] evaluate U-Net [50], a well-established encoder–decoder CNN that continues to perform strongly and outperform more recent models for detailed vegetation mapping. Applied to UAV-based HSI data collected in East Antarctica, U-Net achieves superior results in classifying moss health and lichen. By contrast, models based on generalised gradient centralised convolution, such as G2C-Conv3D [60], are designed to capture robust spectral–spatial features by incorporating gradient-level information often overlooked by conventional convolutions. However, G2C-Conv3D produced broader, noisier predictions, lacking the fine-scale segmentation capabilities exhibited by U-Net. Nonetheless, both models demonstrate strong potential for hyperspectral applications in polar regions. However, these experiments were confined to a small subset of East Antarctica, indicating that further validation is required to evaluate generalisation across other tasks and more diverse Antarctic locations [9].

While CNN-based models have shown remarkable success, their high parameterisation makes them sensitive to limited labelled data. In data-scarce remote sensing applications, this can lead to overfitting and poor generalisation if regularisation or transfer learning strategies are not employed [51]. CNNs also struggle with high spectral dimensionality, often failing to capture long-range dependencies or subtle variations between spectrally similar classes [54]. Additionally, spectral distortion has been observed in sequential information, as CNNs can be biased towards spatial information [16]. Pixelwise CNN approaches can produce noisy classification maps, while 3D CNNs in particular require large parameter sets [51]. Although graph neural networks (GNNs) can improve relational modelling between samples, they may consequently introduce high computational overheads that are often impractical in resource-limited settings [51].

In summary, CNNs have marked a key shift towards deep learning in spectral RS, demonstrating clear advantages over earlier machine learning methods. However, their bias towards local spatial patterns and limited capacity to model long-range dependencies constrain their performance in complex spectral–spatial relationships. These limitations motivate the adoption of alternative architectures that improve robustness and generalisability. Transformer-based models have emerged as a promising direction, offering greater flexibility for capturing global context instead of local spatial/spectral patterns, and adapting to diverse learning strategies.

4.3.2. Transformers for Spectral Data

Transformers, originally developed for natural language processing tasks, have demonstrated a strong capacity for modelling long-range dependencies [61]. Their adaptation to computer vision via the Vision Transformer (ViT), illustrated in Figure 7, marks a shift away from convolution-based inductive biases toward attention-driven models [62]. These benefits have since extended to spectral data in RS, where transformers offer promising opportunities for overcoming key challenges such as spatial–spectral dependencies, limited annotations, and poor generalisability across sensors.

SpectralFormer [16] is a supervised model that adapts the ViT architecture for spectral image classification, but relies on labelled data. The model introduces group-wise spectral embeddings to extract features from neighbouring bands and propagate them through transformer layers, thereby improving classification accuracy with texture and edge details [16]. The SpectralFormer implementation indicates that the model outperforms conventional classifiers, classic CNNs, and the traditional ViT in HSI classification, validated on three datasets. However, its reliance on annotated datasets limits its applicability in data-scarce environments. SpectralFormer cannot be transferred to unseen domains where data is unlabelled, making it unsuitable for Antarctic environments with sparse ground truth.

FactoFormer [15], in contrast, addresses this limitation by introducing a factorised self-supervised transformer framework. The model incorporates individual spectral and spatial transformers to reduce computational complexity while better leveraging large volumes of unlabelled HSI datasets. FactoFormer applies Masked Image Modelling (MIM) pretraining with tokenisation performed directly on raw inputs, followed by a fusion stage that integrates outputs from the spectral and spatial branches. This design enables the learning of meaningful representations and fine-grained spatial–spectral correlations, offering a viable strategy under annotation constraints. However, because the spatial and spectral transformers are pretrained independently, FactoFormer struggles to capture higher-order spatial–spectral interactions, motivating later works to explore more unified architectures.

Building on this limitation, SFMIM [63] proposes a dual-domain masking strategy to jointly model spatial and spectral dependencies within hyperspectral data. The input HSI cube is divided into non-overlapping spatial patches, each containing the full spectral signature of its location. During pretraining, SFMIM applies Masked Image Modelling (MIM) in both spatial and frequency domains—randomly masking selected spatial patches while removing portions of spectral frequency components. The model is then trained to reconstruct the missing information, enabling the transformer-based encoder to capture higher-order spectral–spatial correlations. Compared with other masked modelling approaches, SFMIM achieves faster convergence and stronger fine-tuning performance, demonstrating the efficiency of its unified pretraining strategy. However, most of these methods require sensor-specific pretraining and struggle with cross-domain generalisation. This is problematic for data-constrained fields that select different sensors and platforms according to the research at hand.

SLTN [64] focuses on spectral super-resolution (SSR) using a multispectral input to reconstruct hyperspectral imagery. The model integrates a multilevel feature extraction module (MFEM) with attention mechanisms guided by the spectral response function (SRF). This guidance improves spectral reconstruction under atmospheric degradation [64]. A spectral transformer block (SPTB) learns the correlation between adjacent spectral bands, while a Swin transformer block (SWTB) extracts spatial features. These blocks, composed in a nonlinear mapping learning module (NMLM), enable SLTN to effectively reconstruct hyperspectral images. Nonetheless, SLTN struggles with capturing long-range or non-adjacent spectral dependencies and requires significant spectral prior information. This limits its adaptability to environments with limited ground truth where the SRF cannot be provided.

HTA-SSRMSI [17] proposes a hybrid transformer architecture for reconstructing high-resolution spectral data from multispectral imagery. By combining intra-row, intra-column, and cross-row–column mechanisms, the model learns both local and global spectral–spatial dependencies. Despite improved classification accuracy, its evaluation is restricted to simulated data using the SRF of the Landsat satellite. This leaves its robustness against real-world sensor noise and artefacts unverified, while highlighting data scarcity as a challenge for model training and testing.

PatchOut [65] introduces a patch-free transformer–CNN hybrid framework for fine-grained land-cover classification on large-scale airborne hyperspectral imagery. The architecture integrates a reduced transformer block (RTB) mechanism with convolutional modules to capture both long-range dependencies and local spatial–spectral detail. This is supported by a multi-scale spatial–spectral feature fusion (MSSSFF) module and a feature reconstruction module (FRM) for enhanced consistency. PatchOut claims high classification accuracy and smoother results compared to other patch-free methods, while greatly improving inference efficiency compared to patch-based methods. However, the model is supervised, and the Qingpu-HSI dataset built to benchmark PatchOut is annotated manually. Although this patch-free framework reduces redundant computations, it may still be constrained by high memory usage, a need for extensive labelled data, and limited exploration of its adaptability to resource-constrained environments.

These transformer-based approaches demonstrate superior performance in MSI and HSI classification and spectral reconstruction compared to traditional classifiers and CNN-based methods. While early CNN-based models established the effectiveness of deep learning for spectral analysis, transformer architectures extend this capability by more effectively capturing long-range spatial–spectral dependencies. The potential of self-supervised learning in these frameworks remains promising for mitigating the scarcity of labelled data. However, their effectiveness is still constrained in domains where data collection is logistically difficult. Although transformer architectures offer strong generalisation in data-rich contexts, their high computational cost and propensity to overfitting on small datasets make them challenging to adapt to data-scarce Antarctic contexts. For this reason, they are rarely compatible with UAV-based Antarctic datasets and often originate from satellite or general computer vision domains. Their challenges are further exacerbated by the spectral characteristics inherent to Antarctic environments, such as spectrally monotonous backgrounds where snow and ice dominate, as well as low-contrast conditions such as blizzards and cloud shadows. Extreme weather and seasonal variations that differ from the training data make it challenging for transformer models to generalise across to new, unseen environments [66]. High computational costs and concerns about explainable models further burden adaptability, as ecologists in smaller scale research groups may require solutions with smaller compute requirements or prefer interpretable results [66]. To address this, recent studies have explored integrating domain knowledge and physics-based priors to reduce dependence on large labelled datasets.

4.4. Physics-Based and Prior-Based Methods

While transformers inherently rely on large datasets to overcome label scarcity, another complementary strategy leverages domain-specific priors to guide model learning. Physics-informed machine learning applies physical laws, mathematical models, or observational data to enhance model performance and interpretability [67,68,69]. Enforcing physics- and prior-based constraints has proven useful for yielding physically consistent, data-efficient models compared to data-driven methods [68,70,71]. Such approaches are compatible with Antarctic UAV-based RS, where acquiring large labelled datasets is challenging, but observational data may be collated from weather stations, UAV sensors, and satellite imagery. In unique polar environments, useful physical laws and priors may include the spectral response of ice and moss, snow mass balance, and atmospheric weather models [72,73]. These present opportunities for guiding model training or reducing data dependence when collecting sufficient raw samples is challenging.

The integration of physics priors into ML frameworks has been shown to improve model performance and generalisation [74]. Typical models can integrate physics priors through (1) observational bias, using multimodal data to reflect underlying physical principles; (2) learning bias, enforcing prior knowledge to guide the loss function; or (3) inductive bias, incorporating exact conservation laws through Hamiltonian or Lagrangian mechanics [74]. Characteristics such as light interactions, geometric transformations, and spectral correlations are useful for guiding the learning process towards physically consistent solutions, and reducing computational costs for high-dimensional problems such as HSI [71,74]. While studies have illustrated the use of noise diffusion and partial differential equations (PDEs) in ViTs, the spectral physics of MSI and HSI have not yet been fully adapted to their architectures.

Li et al. [75] propose a diffusion-enhanced masked autoencoder (DEMAE), which aims to incorporate diffusion enhancement into the masked autoencoder framework for HSI classification. Aimed at heuristically learning robust features and preventing overfitting, DEMAE utilises samples with differentiated noise during pretraining and samples with low noise during fine-tuning. The signal-to-noise ratio (SNR) is integrated into the conditional transformer, using prior knowledge of the sample quality to learn discriminative features. Through self-supervised learning, DEMAE achieves improved HSI classification with scarce labelled data. While DEMAE is an early attempt at cross-domain HSI classification, it is limited to similar scenarios and lacks applicability to other modalities.

Zhao et al. [20] present PINNsFormer, which uses physics-informed neural networks (PINNs) to solve partial differential equations (PDESs) for transformer networks. Their study applies a multilayer perceptron (MLP) framework and utilises multi-head attention to capture temporal dependencies. By incorporating PDEs into the loss function and replacing point-wise inputs with sequential loss, PINNsFormer achieves improved generalisation and accuracy. These results are supported by the inclusion of a wavelet activation function, which anticipates Fourier decomposition. However, compared to MLP-based PINNs, PINNsFormer incurs additional computational overhead due to the transformer architecture [20].

Varshney et al. [76] apply physics-defined labels to estimate snow layer thickness, aiming to overcome the challenges of noisy and unlabelled radar data. In particular, ImageNet [77] initialisation proves insufficient for the task, necessitating a more adaptable framework. To address this, a physics-based atmospheric model is used to simulate ice layers. These physics-defined labels are combined with manual annotations, showing that pretraining a CNN architecture using physics simulations improves model accuracy and generalisability. Varshney et al. [76] note that this method enhances the reliability of black-box neural networks for ice layer tracking tasks.

Wang et al. [23] integrate physics- and prior-based techniques to mitigate the lack of reliable training data needed for deep learning sea ice classification tasks. Physical knowledge of the incidence angle of Sentinel-1 SAR is used to implement a physics-aware Gaussian Mixture Model (GMM) to derive training segments. This is followed by physics-based data augmentation to enrich the dataset and facilitate an even distribution across the training classes. The augmented dataset is then applied to a reduced U-Net architecture for pixel-wise sea ice classification. Although this model successfully delineates sea ice and water edges, Wang et al. [23] note the need for more comprehensive scenarios to improve inference.

Overall, these studies demonstrate the use of physics-based techniques to constrain deep learning models and guide them toward learning more discriminative features. The use of physics-based augmentation presents unique opportunities to enrich training datasets, while physics-based labels can simplify dependence on manually labelled ground-truth datasets. This allows physics- and prior-based methods to utilise UAV-based data as a complementary addition to the overall model. However, the examples shown are task-specific and were not tested on other sensor configurations. Opportunities to use physics-based ML approaches to generalise across unseen scenarios remain under-explored in Antarctic environments. Outside Antarctic domains, a growing shift toward large-scale, generalist models is evident. The persistent challenges of data scarcity in edge cases with limited samples have prompted increasing interest in foundation models and transfer learning. These methods seek to leverage large-scale pretrained knowledge for improved adaptability across environments and downstream tasks. Utilising zero-shot and few-shot learning in remote sensing allows for opportunities to reduce dependence on labelled datasets [78].

4.5. Foundation Models for Earth Observation

Foundation models represent a recent shift in ML towards highly adaptable and versatile models, characterised by pretraining on large-scale, diverse datasets and generalisability across multiple downstream tasks, as illustrated in Figure 8 [19,21,79]. Transfer learning extends applicability by fine-tuning pretrained models on new, often smaller or more specialised datasets, enabling adaptation to data-scarce domains [78]. These approaches are particularly relevant for Antarctic datasets, where ground-truth labels are difficult to obtain. Although the foundation models reviewed below are trained on satellite data and do not explicitly demonstrate compatibility with UAV-based remote sensing, we include them for their potentially transferable insights. Because models are evaluated on different benchmark datasets and under varying experimental configurations, direct comparisons across all foundation models are not always meaningful. Where possible, we report performance in settings where newer models are evaluated against established baselines, and we interpret these results from an adaptation and transferability perspective. By leveraging representations learned from broader EO datasets, foundation models offer opportunities to improve multimodal and multi-scale generalisation in polar environments.

SpectralGPT [18] introduces a foundational model architecture inspired by generative pretrained transformer (GPT) and masked autoencoder (MAE) architectures. It utilises 3D masking and tokenisation strategies, as well as sequential pretraining, to promote cross-resolution generalisation. Progressively feeding different types of pretraining input data enhances the diversity of the model, facilitating better flexibility. By dividing the input images into fixed-sized tokens, SpectralGPT is able to analyse data of varying input sizes, including those with different sensors at different spatial and spectral resolutions [18]. However, this has only been shown for satellite data, meaning the compatibility to integrate UAV-based data into SpectralGPT is uncertain. SpectralGPT demonstrates strong performance on classification, segmentation, and change detection tasks, illustrating its adaptability to different downstream tasks. However, it requires extensive pretraining and tends to overfit in change detection scenarios, indicating that generalisability across sensors and tasks remains an open challenge. SpectralGPT demonstrates the impacts of increased training data through several experiments. For instance, when testing single-label classification on the EuroSAT [80] dataset, it achieved 99.21% accuracy when pretrained on both fMoW [81] and BigEarthNet [82] datasets, compared to 99.15% when pretrained on fMoW alone. Moreover, ablation studies focusing on three ViT-based scales show that the Huge scale model outperformed the Large and Base models. Such results clearly indicate that the performance of SpectralGPT scales with both the number of parameters and the size of the pretraining dataset. Although SpectralGPT exhibits a data-intensive nature similar to conventional Vision Transformers (ViTs), its capacity to operate effectively across diverse datasets demonstrates unique opportunities for applying machine learning in noisy and heterogeneous environments.

Dynamic One-For-All (DOFA) [19] takes inspiration from two concepts in neuroplasticity, axon sprouting and synapse remodelling. Building on a shared ViT backbone with masked image modelling, DOFA adaptively alters network weights to dynamically process multimodal EO data. The wavelength-conditioned dynamic patch embedding and wavelength-conditioned dynamic decoder are key, as they enable DOFA to process input data without being constrained to a fixed number of spectral bands. DOFA exhibits robust performance across downstream tasks, including image-level classification and pixel-level segmentation. Consistently outperforming single-modality models, DOFA’s ability to adapt to unseen sensors illustrates its strength in leveraging multiple data modalities without pretraining specialised models for different sensors or tasks. However, its applicability remains limited to image-based modalities, with future work aimed at incorporating LiDAR point clouds, textual data, and time-series information to advance Earth system modelling. DOFA shows how model size correlates to improved accuracy, as the ViT-Large backbone model continuously outperforms the ViT-Base backbone, achieving 93.8% overall accuracy compared to 92.2% overall accuracy on classification using the EuroSAT [80] dataset. Challenges with domain adaptation persist, as DOFA achieves a comparatively lower accuracy of 64.4% on BigEarthNet [82]. Further refinement of pretraining strategies may support more robust performance across diverse datasets.

HyperSIGMA [79] aims to unify HSI interpretation across different scenes and tasks with a scalable VIT-based model exceeding one billion parameters. It combines spectral and spatial features using a spectral enhancement module and Sparse Sampling Attention (SSA), mitigating redundancy in HSI data. Using MAE for pretraining on HyperGlobal-450K, HyperSIGMA performs well for both high-level classification detection and low-level denoising tasks. However, HSI data in HyperGlobal-450K are collected by different sensors, and HyperSIGMA cannot batch-process images with varying numbers of channels. This limitation is addressed by maintaining a consistent number of selected channels. Nonetheless, HyperSIGMA demonstrates impressive scalability. When testing HSI classification across six different datasets, HyperSIGMA consistently outperforms other methods, achieving 85.54% overall accuracy on the widely used Indian Pines dataset, compared to the 50.05% accuracy of the previously discussed SpectralFormer [16]. HyperSIGMA also demonstrates promising results across anomaly detection and change detection tests, offering opportunities for adaptation to various downstream tasks.

AnySat [22] addresses the rigidity of earlier models that could not adapt to diverse datasets. AnySat concentrates on adapting to new sensor configurations and heterogeneous datasets with different resolutions, scales, and modalities. It introduces a multimodal framework based on Joint Embedding Predictive Architecture (JEPA) with scale-adaptive spatial encoders, and is pretrained on GeoPlex, an aggregation of five datasets comprising 11 distinct sensors across five continents, excluding Antarctica. As such, AnySat avoids modality-specific decoders and supports adaptability to unseen sensors, provided they are not too different from those in the training data. For instance, Astruc et al. [22] applies a padding strategy that successfully manages missing bands between Landsat and Sentinel data, but this is not demonstrated for more complicated differences, such as band discrepancies between multispectral and hyperspectral imagery. Although this slightly limits flexibility, AnySat shows strong generalisation and impressive performance for self-supervised learning on heterogeneous datasets. When tested across classification, segmentation, change detection, and regression tasks, AnySAT consistently achieved high accuracy. Self-supervised learning on the diverse dataset is supported by fine-tuning or linear probing on edge cases where data is less abundant, such as burn scars and flood event datasets. With linear probing, AnySAT achieved 91.1% mIoU, with DOFA [19] lagging at 89.4% mIoU. AnySAT presents useful opportunities for fine-tuning and adapting to domains with limited data availability.

Galileo [21] learns multimodal and multi-scale representations for EO tasks. The model integrates a variety of modalities, including multispectral, SAR, topography, weather, and population data. These data types, which are relevant to common RS tasks, are categorised into space–time varying, space varying, time varying, and static inputs. A key innovation lies in its use of local and global features by pretraining on shallow and deep targets. Local and global masking also enable the model to capture both fine-scale and large-scale features. This design supports applications ranging from object-level tasks, such as boat detection, to landscape-level analysis, such as glacier monitoring. Tseng et al. [21] position Galileo as a generalist model that outperforms specialist models in downstream tasks, including image classification, semantic segmentation, and pixel time-series classification. Galileo’s flexibility is further highlighted through its multiple model sizes, which leverage either computationally expensive fine-tuning or lightweight linear probing. This extends the model’s applicability to compute-constrained environments, as well as those in which few labels are available. Providing different model sizes, enabling fine-tuning strategies with different computational costs, and pretraining on publicly available data are three strategies Tseng et al. [21] have adopted to facilitate the use of Galileo in a range of scenarios. However, these backbone sizes also illustrate that larger models perform better than smaller backbones. For testing on EuroSAT image classification by kNN, Galileo’s ViT-Base model achieves 93.0% accuracy, outperforming the ViT-Tiny alternative by 2.9% and besting AnySAT [22] by 10.8%. The inclusion of additional modalities appears to improve Galileo’s benchmark performance relative to AnySAT, suggesting greater flexibility for remote sensing applications.

TerraMind [24] focuses on multimodality by applying dual-scale early fusion with token-level and pixel-level data. Positioned as a large-scale any-to-any generative model, TerraMind introduces a unique Thinking in Modalities (TiM) ability to generate missing modalities using compact tokens. During pretraining on the TerraMesh dataset, the model learns correlations between satellite images and other modalities. This cross-modal learning enhances fine-tuning, where modalities such as land use maps, DEM, and vegetation indices can improve prediction performance on downstream tasks. This is particularly beneficial for applications where complementary information can help fill data gaps. Averaging over nine datasets and tests, TerraMind outperforms task-specific, unimodal, and multimodal U-Net baselines, as well as DOFA [19] and SpectralGPT [18], with TerraMindv1-L (large backbone) besting TerraMindv1-B (base model). Although TerraMind maintains high mIoU across most tests, the U-Net task-specific model achieved top results for burn scar and flood scenarios. These results demonstrate the advantages of TiM in enriching the dataset, but challenges remain regarding cross-domain adaptability for contexts where data is less abundant. TerraMind is well-equipped for reducing the reliance on extensive training datasets using zero-shot and few-shot learning. The model generalises well across agricultural and water-body mapping tasks, but extensions into multi-temporal, multi-resolution, and hyperspectral data have yet to be explored.

AlphaEarth Foundations (AEF) [25] advances the concept of task-agnostic EO featurisation, aiming to create a universal feature space for sparse-data domains. It integrates multiple sources and modalities with explicit temporal encoding through the Space Time Precision (STP) encoder. AEF enables accurate extrapolation of annotations and field measurements, performing well in all classification evaluations. Through robust and efficient featurisation strategies, AEF highlights the importance of temporal dynamics and universality, showing promising results for diverse tasks without retraining. However, the reconstruction component of AEF introduces a bottleneck that limits efficiency, and its exclusion of surface categories such as ice, snow, and moss leaves its transferability to Antarctic contexts yet to be established.

The shift towards foundation models has accelerated in recent years, with most frameworks emerging within the past two years. Recent advances in Earth observation (EO) foundation models demonstrate a wide range of strategies, from multimodal and multi-scale learning to unified pretraining across vast and heterogeneous datasets. While such models benefit from the abundance of satellite imagery and open datasets, developing a foundation model specifically tailored to polar regions using UAV data remains challenging, as the limited availability of data is not expected to enable the scales required for robust foundation models. The billions of parameters involved in pretraining foundation models also make them computationally expensive, which can constrain their usability in small-scale ecological studies. This motivates the exploration of more efficient corruption strategies for self-supervised network pretraining, such as the continuous masking approach introduced in ROPIM [83]. Unlike binary masking, ROPIM uses count sketching as a continuous projection to corrupt features in embedding space without sparse token handling or heavy decoder modules, pointing to a scalable direction for future pretraining algorithms, especially when prior- and physics-informed structures are beneficial.

In addition, fine-tuning techniques, as well as zero-shot and few-shot learning, present promising opportunities for overcoming data scarcity [78]. This has been shown through examples that generalise to burn scar and flood use cases, which have less abundant samples than ordinary agricultural and urban land use classification tasks. Models can also support diverse downstream tasks without task-specific retraining and can combine several data types to enrich the training set [19,21,22,24]. Each foundation model discussed shows that their performance and generalisability scale according to the size, diversity, and quality of the pretraining datasets [82,84,85]. Additionally, their compatibility with UAV-based data and Antarctic scenes have not yet been shown. Future work may extend on existing EO foundation models by exploring cross-domain adaptability to unseen scenarios.

5. Remote Sensing Datasets for Large-Scale Pretraining and Antarctic Fine-Tuning

Effective model development in RS relies not only on the algorithms and learning techniques outlined in Section 4, but also on the availability and quality of data [85]. Addressing data deficiency in any context requires leveraging available datasets to suit the purposes of the study. As outlined in Section 3, collecting large, well-annotated datasets for Antarctic RS is often impractical due to logistical constraints and environmental conditions. Publicly available datasets, especially those collected from satellites with vast coverage, provide extensive spectral imagery that can support model training and evaluation [84]. Leveraging these datasets may allow models to learn general representations before being adapted to specific domains and applications [85]. In this section, we survey key resources being leveraged in large-scale model training in Section 5.1 and Section 5.2, as well as the availability of Antarctic domain data in Section 5.3, thereby complementing the strategies discussed throughout Section 4.

5.1. Large-Scale EO Datasets

Large-scale public datasets have become a powerful driver for the development of EO deep learning techniques and RS foundation models [79,84,86]. Although the abundance of available satellite data enables pretraining at unprecedented scales, the effectiveness of such datasets depends not only on size. Factors such as modality, geographic distribution, and annotations are important for mitigating bias, supporting specific tasks, or encouraging generalisability [81,85]. The datasets presented in Section 5.1 have been summarised in Table 4.

The Functional Map of the World (fMoW) [81] is a comprehensive dataset that contains over 1 million satellite images collected from the DigitalGlobe constellation. It contains RGB, pansharpened RGB, and 4-band and 8-band multispectral imagery, spanning over 200 countries across 400 unique UTM zones to provide global coverage. Each image is annotated by GeoHIVE users with at least one bounding box from 63 categories. By framing the dataset for tasks between classification and object detection, fMoW avoids the prohibitive challenge of constructing a geographically diverse detection dataset with exhaustive annotations. This enables models that can be trained and transferred to real-world systems. Furthermore, to encourage global diversity, spatial density is reduced by removing neighbouring locations within a specified distance, preventing over-representation of particular regions. The 63 categories in fMoW emphasise land use scenarios, while temporal data and metadata provide a source of contextual reasoning. This can include distinguishing construction sites from impoverished land or differentiating office buildings from residential units. With these features, fMoW has the potential to support broader contexts and humanitarian efforts, such as disaster relief. Despite its diversity, some classes remain geographically biased, such as structures concentrated in Europe, while no imagery appears to be available from Antarctica.

EuroSAT [80] is a dataset on land use and land cover classification, containing over 27,000 labelled images collected from Sentinel-2 covering 13 spectral bands. With 10 classes, containing 2000 to 3000 patches per class, EuroSAT consists of images spanning 30 European countries recorded throughout multiple seasons for variability. Although geoinformation has been provided for each patch, atmospheric correction is not performed on the dataset. EuroSAT is benchmarked using two deep CNNs, GoogleNet and ResNet-50, achieving 98.18% and 98.57% classification accuracy, respectively. However, transferability to other tasks and applications has not been shown. The dataset is freely accessible to enable a range of real-world applications but is limited in scope to European climates.

BigEarthNet [82] is a dataset containing approximately 590,000 Sentinel-2 patches annotated with 43 land-cover classes across 10 European countries, which aims to support image scene classification where single labels are insufficient for describing patches containing multiple land use concepts. Although the dataset contains patches collected over multiple seasons, images covered with snow, cloud, and cloud shadow are excluded; consequently, there are fewer images representing the winter season. The dataset is evaluated using shallow CNNs, showing that training on BigEarthNet yields high classification accuracy compared to transfer learning from models trained on ImageNet. This supports the notion that transfer learning is insufficient to bridge the domain gap between ImageNet and RS focused data.

To further support multimodal systems, BigEarthNet has been extended to BigEarthNet-MM [87]. This updated dataset includes the original Sentinel-2 patches from BigEarthNet, pairing each with Sentinel-1 patches. A new nomenclature is also established, simplifying the classes to 19 and eliminating those that cannot be identified using single-date images, as well as classes that are highly dependent on land use or need additional data to be discriminated. This aims to support RS systems that typically observe land cover, as opposed to land use, where limited spatial resolution may hinder models in recognising land-use based labels. Evaluation of BigEarthNet-MM maintains that training directly on BigEarthNet-MM yields better classification than transfer learning from ImageNet. However, the dataset remains limited in geographical coverage and lacks time-series data. Additionally, some Sentinel-1 patches may be contaminated by artefacts independent of the preprocessing applied.

SSL4EO-S12 [88] is a large-scale dataset that aims to support self-supervised learning for RS model development. The data is obtained from Sentinel-1 and two Sentinel-2 products using Google Earth Engine, where image patches surround the 10,000 most populated cities worldwide to promote global coverage. To extend this distribution, approximately 250,000 close-by locations are sampled using a Gaussian distribution centred on each city with a standard deviation of 50 km. At each location, four seasonal images facilitate year-round variability. The resulting images are filtered for cloud coverage below 10%, while a grid-search strategy is used to prevent overlapping and reduce redundancy. This results in approximately one million triplets from each Sentinel product, totalling more than three million images. SSL4EO-S12 is evaluated across four representative self-supervised frameworks (MoCo, DINO, MAE, and data2vec) and on three downstream tasks: scene classification, semantic segmentation, and change detection. Pretraining on SSL4EO-S12 enables models to outperform other datasets, including BigEarthNet and EuroSAT, while improving model transferability due to its wide geographical coverage. Wang et al. [88] argue that SSL4EO-S12 is comparable to the scale of ImageNet, acting as a suitable baseline for RS model development. It is more appropriate than training RS models on ImageNet due to its inclusion of medium-resolution radar and MSI, which ImageNet lacks. However, SSL4EO-S12 does not include sensor variability and excludes other EO modalities. Furthermore, cloud filtering and the concentration on densely populated areas may introduce geographical variability. However, the sampling strategy avoids skewing the image distribution, which would otherwise be dominated by repetitive ocean, desert, and forest. Wang et al. [88] note that SSL4EO-S12 contains little coverage of polar regions.

Extending on SSL4EO-S12 [88], SSL4EO-L [30] is a large-scale dataset derived from the Landsat satellite that focuses on pretraining foundation models with ResNet and ViT backbones. Stewart et al. [30] follow a similar data collection strategy to Wang et al. [88], but explicitly exclude nodata pixels from sampling on borders and employ a different overlap detection method for SSL4EO-L. This modification removes 1–3% of the overlap observed in SSL4EO-S12. Stewart et al. [30] also increase the cloud coverage threshold for SSL4EO-L to 20% due to the coarser resolution of Landsat compared to Sentinel satellites. Landsat’s long history makes SSL4EO-L suitable for monitoring long-term changes and increasing the scope of potential EO applications. However, low light levels near the poles significantly limit Landsat satellite coverage, which do not capture images above latitudes of 81.8 degrees. SSL4EO-L faces similar challenges to SSL4EO-S12, in which lower populations led to limited coverage of polar regions, as well as tropical rainforests, thereby reducing the diversity and scope of the dataset.

HyperGlobal-450K [79] is a dataset targeted towards hyperspectral imagery, comprising approximately 450,000 images sourced from the Earth Observing-1 (EO-1) and Gaofen-5 (GF-5) satellites. EO-1 imagery is screened to exclude cloud coverage above 5%, while GF-5 data is inherently cloud-free. After removing water vapour absorption and noisy bands, EO-1 images contain 175 bands, while GF-5 images provide 150 bands. EO-1 offers global coverage, with data spanning all continents, while GF-5 imagery is primarily limited to Chinese provinces for observations focusing on vegetation. Despite its coverage being mostly concentrated over China, the scale of hyperspectral imagery made available through HyperGlobal-450K makes it a valuable baseline for hyperspectral representation learning. However, EO-1 was decommissioned in 2017, significantly restricting the potential for temporal data and long-term studies.

SpectralEarth [89] is a hyperspectral dataset built from the EnMAP satellite mission, containing over 500,000 image patches from more than 400,000 unique locations worldwide. Each image comprises 202 spectral bands after excluding those dominated by water absorption. The dataset curation ensures global distribution with cloud cover under 10%. Notably, 17.5% of sites include time-series data from 2022 to 2024, offering insight into landscape evolution and seasonal variability. The data is distributed relatively evenly across months, although affected by higher winter cloud cover and task scheduling constraints, as well as temporary satellite outages. SpectralEarth is well-positioned for hyperspectral deep learning, as shown through evaluation on three self-supervised learning algorithms: MoCo-V2, DINO, and MAE. Benchmarking shows promising performance on nine downstream tasks, including a range of classification and segmentation tasks. Additionally, cross-sensor transferability is tested using datasets not seen during pretraining. Evidently, the dataset’s global diversity in land-cover types and geographical distribution, as well as its significant scale, make SpectralEarth suitable for pretraining hyperspectral self-supervised learning models. However, its global distribution map suggests limited coverage of polar regions. Additionally, time-series data are limited and hinder a comprehensive evaluation of long-term landscape dynamics and trends.

EarthView [90] is a large and diverse heterogeneous dataset designed for self-supervised learning on EO tasks. It integrates imaging from multiple sensors, including SAR and multispectral imaging from Sentinel-1 and Sentinel-2; RGB and NIR data with temporal revisits from Satellogic; and 369 bands of high-resolution hyperspectral data with RGB and elevation data from NEON. Some temporality is also included. While Sentinel and Satellogic satellites offer global coverage, NEON is limited to ecological research sites throughout the United States, restricting its spatial diversity despite its rich spectral depth. Sampling is concentrated within 50 km of the largest cities worldwide, which reduces redundancy from repetitive ocean, desert, and forest scenes. Although this radius covers coastal, agricultural, and rural regions, Velazquez et al. [90] do not mention polar regions. Nonetheless, EarthView’s broad combination of modalities and resolutions across sensors make it a useful resource for deep learning and self-supervised approaches. Velazquez et al. [90] introduce EarthMAE, which is a masked autoencoder with adjusted tokenisers and encoding. However, the authors describe EarthMAE as a model that leaves EarthView’s full potential untapped, as the dataset is a resource positioned to support exploration in self-supervised learning and RS.

The datasets outlined above illustrate the growing range of resources designed to support RS research and ML approaches. While they differ in scale, modality, and intended use case, they collectively highlight the importance of curated, large-scale data for advancing deep learning for EO tasks. These examples also illustrate that no single dataset is universally applicable, as their utility depends on the specific research question, geographic focus, and modality requirements. However, a shift towards large-scale datasets to support foundation model pretraining has been observed, which may alleviate the burden of data scarcity using fine-tuning for domain-specific applications. Transferability across tasks, modalities, and scenes has remained a key theme throughout the aforementioned studies.

5.2. Global Dataset Databases

In addition to these datasets proposed for foundation model pretraining, centralised databases and repositories provide a practical entry point for finding EO data. The IEEE GRSS Earth Observation Database (EOD, https://eod-grss-ieee.com/dataset-search, accessed on 27 October 2025) [91] offers a comprehensive catalogue of RS datasets, including widely used examples such as BigEarthNet and EuroSat. Google provides two complementary resources for dataset searching. Google Dataset search (https://datasetsearch.research.google.com/, accessed on 27 October 2025) is a web interface for discovering open datasets across a range of disciplines and modalities. The Earth Engine Data Catalog (https://developers.google.com/earth-engine/datasets/, accessed on 27 October 2025) gives access to an expansive variety of EO data, including climate and weather information, geospatial imagery, and geophysical data. These databases enable researchers to identify databases suited to their specific use cases and training needs, whether they intend to design a shallow model, train a foundation model, or fine-tune for targeted downstream tasks. Similarly, NASA’s EarthData search (https://search.earthdata.nasa.gov/, accessed on 27 October 2025) hosts an abundance of EO datasets that can be searched and filtered by platforms, data formats, horizontal resolution, and other keywords. ESA’s Data catalogues (https://earth.esa.int/eogateway/catalog, accessed on 27 October 2025) also provide access to EO data from a number of satellites and imaging instruments, including imaging radars, spectrometers, and other active and passive RS instruments. Data can be filtered and searched by several categories, such as missions and data applications.

5.3. Antarctic Databases

The techniques and large datasets discussed in this review can be leveraged for various specified research tasks. To complement studies in Antarctic and polar science, fine-tuning on domain-specific datasets is an important element of the pipeline. As discussed in Section 3, sourcing high resolution RS datasets in Antarctic environments remains a key challenge. Utilising existing datasets may be key for overcoming data and label scarcity.

The Scientific Committee on Antarctic Research (SCAR, https://scar.org/library-data/data, accessed on 27 October 2025) is an organisation of the International Science Council with a key role in Antarctic Research. SCAR promotes free and unrestricted access to Antarctic data and information, providing a curated list of data and databases. This list includes a range of biodiversity, oceanography, and iceberg databases and portals. As mentioned, NASA’s EarthData search (https://search.earthdata.nasa.gov/, accessed on 27 October 2025) hosts datasets, but also includes curated portals such as those by the Southern Ocean Observing System (SOOS, https://www.soos.aq/data, accessed on 27 October 2025) and the Antarctic Metadata Directory (AMD, https://scar.org/library-data/data/amd, accessed on 27 October 2025), also endorsed by SCAR.

The Australian Antarctic Data Centre (AADC, https://data.aad.gov.au/, accessed on 27 October 2025) provides a repository of Antarctic scientific data of varying modalities, including satellite data, UAV-based data, and in situ measurements. Users can search using keywords, places, or feature types, and can view a range of maps and metadata. Users can also filter by instruments, sources, and additional keywords. The British Antarctic Survey (BAS, https://www.bas.ac.uk/, accessed on 27 October 2025) similarly curates and distributes polar datasets through the Antarctic Digital Database (ADD, https://www.bas.ac.uk/project/add/, accessed on 27 October 2025) catalog. The University of Colorado Boulder endorses the National Snow and Ice Data Center (NSIDC, https://nsidc.org/data/explore-data, accessed on 27 October 2025), which is a database focused on snow and sea ice analysis. It also includes parameters such as terrain elevation, vegetation height, and brightness temperature. The Norwegian Polar Data Centre (NPDC, https://data.npolar.no/dataset, accessed on 27 October 2025) serves as the National Antarctic Data Centre (NADC) for Norway. It is a repository containing topographic and geological map data, as well as environmental monitoring data and geophysical data from the polar regions. Locations include Antarctica, the Arctic Ocean, and Svalbard.

A number of studies have utilised these databases to support their research, but large-scale applications to Antarctic ML appear to be under-explored. Lowe et al. [92] and Cox et al. [93] access the AADC and ADD to source datasets, but focus on aggregating data of benthic ecosystems or creating geological maps instead of direct ML applications. Jiang et al. [94] use a dataset from the AADC, in addition to various others, to support their U-Net based method for mapping Antarctic lakes. Several studies have used sea ice data from NSIDC to support deep learning techniques for sea ice analysis, including a CNN-based technique [95], a U-Net based method [96], and a hybrid deep learning model comprising CNN- and ResNet-based techniques [97]. Although these studies use Antarctic data to support deep learning techniques, they mostly rely on in situ and observational data as ground truth. More training data may be required for larger-scale and more flexible ML methods on Antarctic datasets [98].

These datasets may suit different use cases and can be applied according to key research questions. While abundant satellite datasets may be more suited to large-scale pretraining of deep learning models, such as foundation models, Antarctic databases provide a useful avenue for locating specific and higher resolution data, particularly when collecting one’s own data is logistically challenging. This can support strategies such as shallow models or fine-tuning for downstream tasks, as well as few-shot or zero-shot inference. In domains where data collection faces many challenges, it is important to leverage the abundance of satellite data in out-of-domain environments and utilise publicly shared data where possible.

Collectively, these datasets and repositories provide the foundational material needed to support future benchmarking efforts in data-scarce remote sensing, as discussed in Section 6. The availability of curated, georeferenced, and publicly accessible Antarctic datasets is essential for developing standardised evaluation protocols, especially for EO foundation models to become truly global. These resources can underpin comparative studies, enable consistent assessment of model generalisation in polar environments, and support the creation of unified datasets that reflect the unique challenges of Antarctic remote sensing.

6. Discussion

6.1. Key Unresolved Challenges

Monitoring Antarctic ecosystems through RS remains a critical yet challenging task. Harsh environmental conditions and logistical constraints limit opportunities for data collection. While HSI provides superior spectral resolution, MSI remains more practical given payload, cost, and operational constraints inherent to Antarctic campaigns. However, data scarcity of both HSI and MSI limits the performance of ML models. Statistical models and CNNs have shown success on smaller datasets, but lack flexibility and scalability. Alternatively, transformers have exhibited greater flexibility across downstream tasks, expanding applicability across environmental variables beyond that of conventional statistical methods. To support adaptability, physics-based priors can be integrated into model architectures, improving their robustness in data-scarce and challenging environments. Perhaps most notably, pretraining foundation models on large and diverse RS datasets can help lift barriers imposed by data scarcity by leveraging the abundance of EO satellite data.

The underrepresentation of Antarctic regions in large-scale global datasets and benchmarks remains a significant limitation. Current datasets and studies predominantly focus on urban, agricultural, and disaster-related applications, with sampling strategies concentrated around densely populated areas. While such strategies are designed to avoid over-representing spectrally homogeneous regions such as oceans and deserts, the similarly low spectral diversity of Antarctic landscapes poses additional challenges for ensuring sample diversity in satellite observations. Furthermore, limited spatial coverage and low solar elevation angles hinder the acquisition of high-quality data. Even benchmarks that aim for global inclusiveness and explicitly address data-scarce conditions often exclude snow and ice classes due to their extreme scarcity [99]. However, these challenges mean that Antarctic remote sensing continues to lag behind global advances in Earth observation and machine learning.

6.2. Emerging Opportunities and Implications

A key theme throughout the reviewed studies is the movement from narrow, task-specific architectures towards generalist models that can support a wider range of data types and applications. This supports the notion that RS problems are rarely isolated and instead benefit from adaptation to a range of different modalities, resolutions, and downstream tasks. Such benefits have been facilitated by an abundance of satellite data, as well as techniques such as self-supervised learning that can leverage this unlabelled imagery [19,21,24]. Although these techniques have shown promise in general EO settings, their compatibility for polar regions and UAV-based datasets remains largely unexplored. The literature shows that Antarctic science is underrepresented in global datasets and studies focusing on foundation models for EO.

Several opportunities exist for adapting these approaches to data-scarce environments such as Antarctica. Multimodal architectures that combine SAR, MSI, HSI, and topography could provide richer context for feature learning that single-sensor methods lack. Effectively facilitating multimodal and multi-scale adaption is an important avenue for better utilising existing MSI and HSI data in data-scarce domains where simply collecting enough data is an impractical solution. Pretraining strategies, particularly those demonstrated by foundation models leveraging self-supervised learning, offer a pathway to apply transfer learning and make use of unlabelled Antarctic imagery. UAVs also provide a valuable mechanism for generating ground-truth data at high spatial resolution. Unlike task-specific, single-sensor traditional methods, multimodal models illustrate techniques to complement satellite observations and support model evaluation across scales. To facilitate cross-domain transfer learning to Antarctic contexts, studies may consider geographical distributions that balance underrepresented environments, not just urban centres.

Barriers to adoption must be considered. The computational demands of foundation models limit their accessibility, particularly for smaller research groups conducting Antarctic environmental research. The environmental impacts of repeatedly training large, computationally heavy models is also of concern. Tseng et al. [21] touch on this by providing a base pretrained model, as well as smaller models that can be adopted in compute-constrained environments. The push towards global-scale generalist models, which reduce the need for repeated pretraining, may alleviate computational and environmental costs associated with large specialist models.

Model interpretability, transparency, and explainability remain significant challenges that may slow the adoption of complex ML methods in Antarctic science. Black-box models can obscure the reasoning behind predictions, limiting trust and making it difficult for domain experts to assess whether outputs align with ecological understanding. Limited interpretability and transparency also restrict the ability to extract meaningful metrics, diagnose sources of bias or error, and quantify uncertainty, all of which are essential for producing credible predictions [86,100]. Relating ML techniques to field knowledge, and ensuring that model behaviour can be meaningfully inspected, will therefore be important for promoting the responsible use of advanced ML strategies. Additionally, the inherent challenges of polar environments, including low solar angles, persistent cloud cover, and unusual surface properties, mean that foundation models trained on global datasets may not generalise to polar contexts in a straightforward manner.

6.3. Priority Directions for Future Work

Cross-domain adaptation for Antarctic environments remains under-explored and uncertain. For global EO and climate-based efforts, it is crucial to address this gap by increasing generalisability to extreme domains. We first suggest including Antarctic datasets in model pretraining and benchmarking, as this is necessary for evaluating model adaptability to unseen Antarctic data. Similarly, models may investigate the inclusion of UAV-based data to complement existing input types and enrich satellite-based models. The integration of multimodal and multi-scale data could improve the detection of subtle ecosystem changes, while UAVs may play a key role in bridging gaps between satellite-based observations and ground-truth [101].

As Figure 9 shows, both global EO data and Antarctic data may be used in a cumulative pipeline to enable ML strategies and facilitate reliable domain adaptation. It is important to consider the research questions at hand, the resources available, and ways in which rapidly advancing ML techniques can be adjusted appropriately for the given context.

To facilitate this work, accessibility to Antarctic datasets is essential. We suggest curating large, polar-specific datasets to unlock the potential for extending pretraining capabilities to include Antarctic land. Building shared Antarctic datasets, similar to those curated in Section 5.1, could present a significant opportunity for extending the studies discussed. Given the importance of UAV-based data to appropriate resolve features in Antarctic landscapes, we recommend curating a Polar–UAV Benchmark (PUB) Dataset comprising UAV data collected over polar terrain. To effectively support foundation model fine-tuning, this dataset should contain a combination of scales and resolutions from both MSI and HSI. The captured features should also be diverse, ranging from Antarctic moss and lichen to aerial wildlife population surveys. Standardised labels and time-series metadata can further enable accurate classification results and support change detection tasks. Promoting open access to Antarctic data and collating an appropriately diverse dataset is expected to improve opportunities for foundation model cross-domain adaptation to data-scarce Antarctic environments.

Enhancing model interpretability, transparency, and explainability should also be a priority. For Antarctic applications, it is essential that models can verify they are attending to meaningful morphological and environmental features to facilitate credible scientific outcomes. Integrating ecological expert knowledge and physics-based priors are useful for enabling interpretable models [102]. Beyond trust and reliability, model transparency is critical for diagnosing sources of error and bias, as well as supporting uncertainty quantification. These elements are fundamental for ensuring the integrity of model insights in disaster assessment, climate monitoring, and environmental policy-making contexts. Developing approaches that allow scientists to verify and understand model behaviour will be crucial for transforming ML outputs into robust scientific evidence in data-scarce and ecologically sensitive environments.

7. Conclusions

Remote sensing in Antarctica faces persistent challenges of data scarcity, harsh environmental conditions, and logistical constraints. This paper has reviewed the intersection of ML and UAV-based remote sensing in Antarctica under extreme data constraints. By framing these developments within the unique context of Antarctic remote sensing, this review provides a novel contribution that has yet to be addressed in existing works. We have synthesised recent advances from approximately 2020 to 2025, highlighting strategies designed to address the challenges of sparse and unlabelled data. These include self-supervised learning, physics-informed modelling, and transformer-based architectures, which collectively advance the robustness and applicability of ML in data-constrained environments.

The effectiveness of these models depends greatly on the scale and quality of data. As such, it is challenging to adapt large-scale foundation models to data-scarce fields while relying only on domain-specific data. To enable these global advances to translate to Antarctic science, it is important to adapt towards the unique properties of polar environments. We suggest including Antarctic environments in global datasets and foundation model benchmarking. Additionally, we recommend curating a Polar–UAV Benchmark (PUB) Dataset to support the integration of UAV-based data to complement large-scale satellite models. We expect this to facilitate more diverse pretraining and testing datasets, while enabling better multimodal and multi-scale capabilities for improved cross-domain adaptation. This will also address model interpretability and transparency, which are key for credible scientific outcomes. An integrated approach leveraging ML advances and UAV-based remote sensing is key for extending the frontier of environmental monitoring in one of Earth’s most inaccessible, data-scarce regions.

Author Contributions

Conceptualisation, B.G., J.S., P.M., F.G. and J.R.; methodology, B.G.; software, B.G.; validation, B.G.; investigation, B.G.; resources, P.M., F.G. and J.R.; data curation, B.G.; writing—original draft preparation, B.G.; writing—review and editing, B.G., J.S., P.M., F.G. and J.R.; visualisation, B.G.; supervision, J.S., P.M., F.G. and J.R.; project administration, P.M., F.G. and J.R.; funding acquisition, F.G. and J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Australian Research Council (ARC) SRIEAS Grant SR200100005 Securing Antarctica’s Environmental Future; the CSIRO’s Alberto Elfes Memorial Scholarship Fund; and CSIRO Robotics.

Data Availability Statement

No new data were created or analysed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to acknowledge continued support from the Queensland University of Technology (QUT) Centre for Robotics, as well as CSIRO Embodied AI Cluster. We would like to thank the Australian Antarctic Division (AAD) for field and other support through AAS Project 4628, which enabled Juan Sandino to collect Figure 1a,b in ASPA 135, East Antarctica, during the 2022–2023 summer campaign.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AADC	Australian Antarctic Data Centre
ADD	Antarctic Digital Database
AEF	AlphaEarth Foundations
AMD	Antarctic Metadata Directory
BAS	British Antarctic Survey
BRDF	Bidirectional Reflectance Distribution Function
CNN	Convolutional Neural Network
DEM	Digital Elevation Model
DOFA	Dynamic One-For-All
EO	Earth Observation
EOD	Earth Observation Database
EO-1	Earth Observing-1 satellite
ESA	European Space Agency
fMoW	Functional Map of the World
FRM	Feature Reconstruction Module
GCN	Graph Convolutional Network
GF	Gaofen-5 satellite
GMM	Gaussian Mixture Model
GPT	Generative Pretrained Transformer
GRSS	The Geoscience and Remote Sensing Society
HSI	Hyperspectral Imaging
IEE	Institute of Electrical and Electronics Engineers
KNN	K-Nearest Neighbour
MAE	Masked Autoencoder
MIM	Masked Image Modelling
ML	Machine Learning
MLC	Maximum Likelihood Classifier
MLP	Multilayer Perceptron
MNF	Minimum Noise Fraction
MSI	Multispectral Imaging
MSSSFF	Multi-Scale Spatial–Spectral Feature Fusion
NADC	National Antarctic Data Centre
NASA	National Aeronautics and Space Administration
NIR	Near-Infrared
NDSI	Normalised Difference Snow Index
NDVI	Normalised Difference Vegetation Index
NSIDC	National Snow and Ice Data Center
NPDC	Norwegian Polar Data Centre
PDE	Partial Differential Equation
PPI	Pixel Purity Index
PIML	Physics-Informed Machine Learning
PINN	Physics-Informed Neural Network
RF	Random Forest
RGB	Red–Green–Blue
RS	Remote Sensing
RTB	Reduced Transformer Block
RTM	Radiative Transfer Model
SAC	Spectral Angle Classifier
SAM	Spectral Angle Mapper
SAR	Synthetic Aperture Radar
SCAR	Scientific Committee on Antarctic Research
SDI	Snow Darkening Index
SOOS	Southern Ocean Observing System
SRF	Spectral Response Function
SSR	Spectral Super-Resolution
STP	Space Time Precision
VM	Support Vector Machine
TiM	Thinking in Modalities
TVI	Triangular Vegetation Index
UAV	Uncrewed Aerial Vehicle
UTM	Universal Transverse Mercator
ViT	Vision Transformer

References

Burrows, J.L.; Lee, J.R.; Wilson, K.A. Evaluating the Conservation Impact of Antarctica’s Protected Areas. Conserv. Biol. 2023, 37, e14059. [Google Scholar] [CrossRef]
Raniga, D.; Amarasingam, N.; Sandino, J.; Doshi, A.; Barthelemy, J.; Randall, K.; Robinson, S.A.; Gonzalez, F.; Bollard, B. Monitoring of Antarctica’s Fragile Vegetation Using Drone-Based Remote Sensing, Multispectral Imagery and AI. Sensors 2024, 24, 1063. [Google Scholar] [CrossRef]
Anderson, R.; Chown, S.; Leihy, R. Continent-Wide Analysis of Moss Diversity in Antarctica. Ecography 2025, 2025, e07353. [Google Scholar] [CrossRef]
Sandino, J.; Bollard, B.; Doshi, A.; Randall, K.; Barthelemy, J.; Robinson, S.A.; Gonzalez, F. A Green Fingerprint of Antarctica: Drones, Hyperspectral Imaging, and Machine Learning for Moss and Lichen Classification. Remote Sens. 2023, 15, 5658. [Google Scholar] [CrossRef]
Pertierra, L.R.; Convey, P.; Barbosa, A.; Biersma, E.M.; Cowan, D.; Diniz-Filho, J.A.F.; de los Ríos, A.; Escribano-Álvarez, P.; Fraser, C.I.; Fontaneto, D.; et al. Advances and Shortfalls in Knowledge of Antarctic Terrestrial and Freshwater Biodiversity. Science 2025, 387, 609–615. [Google Scholar] [CrossRef] [PubMed]
Pina, P.; Vieira, G. UAVs for Science in Antarctica. Remote Sens. 2022, 14, 1610. [Google Scholar] [CrossRef]
Bollard, B.; Doshi, A.; Gilbert, N.; Poirot, C.; Gillman, L. Drone Technology for Monitoring Protected Areas in Remote and Fragile Environments. Drones 2022, 6, 42. [Google Scholar] [CrossRef]
Lockhart, K.; Sandino, J.; Amarasingam, N.; Hann, R.; Bollard, B.; Gonzalez, F. Unmanned Aerial Vehicles for Real-Time Vegetation Monitoring in Antarctica: A Review. Remote Sens. 2025, 17, 304. [Google Scholar] [CrossRef]
Sandino, J.; Barthelemy, J.; Doshi, A.; Randall, K.; Robinson, S.A.; Bollard, B.; Gonzalez, F. Drone Hyperspectral Imaging and Artificial Intelligence for Monitoring Moss and Lichen in Antarctica. Sci. Rep. 2025, 15, 27244. [Google Scholar] [CrossRef]
Robinson, S.A.; Clarke, L.J.; King, D.; Ayre, D.J.; Hua, Q.; Fink, D.; Lucieer, A. Monitoring Impacts of a Changing Climate on Plant Communities of Continental Antarctica. In Proceedings of the British Ecological Society (BES) Annual Meeting 2010, Leeds, UK, 7–9 September 2010; University of Leeds: Leeds, UK, 2010. [Google Scholar]
Amarasingam, N.; Sandino, J.; Doshi, A.; King, D.; Blackman, E.; Barthelemy, J.; Bollard, B.; Robinson, S.A.; Gonzalez, F. Detection and Mapping of Antarctic Lichen Using Drones, Multispectral Cameras, and Supervised Deep Learning. Int. J. Appl. Earth Obs. Geoinf. 2025, 141, 104577. [Google Scholar] [CrossRef]
Platel, A.; Sandino, J.; Shaw, J.; Bollard, B.; Gonzalez, F. Advancing Sparse Vegetation Monitoring in the Arctic and Antarctic: A Review of Satellite and UAV Remote Sensing, Machine Learning, and Sensor Fusion. Remote Sens. 2025, 17, 1513. [Google Scholar] [CrossRef]
Zhu, L.; Wu, J.; Biao, W.; Liao, Y.; Gu, D. SpectralMAE: Spectral Masked Autoencoder for Hyperspectral Remote Sensing Image Reconstruction. Sensors 2023, 23, 3728. [Google Scholar] [CrossRef]
Yang, Y.; Zhao, H.; Huangfu, X.; Li, Z.; Wang, P. ViT-ISRGAN: A High-Quality Super-Resolution Reconstruction Method for Multispectral Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3973–3988. [Google Scholar] [CrossRef]
Mohamed, S.; Haghighat, M.; Fernando, T.; Sridharan, S.; Fookes, C.; Moghadam, P. FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised Pretraining. IEEE Trans. Geosci. Remote Sens. 2023, 62, 5501614. [Google Scholar] [CrossRef]
Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5518615. [Google Scholar] [CrossRef]
Zhao, G.; He, Y.; Wang, Z.; Wu, H. Hybrid Transformer Architecture for Spectral Super-Resolution Reconstruction of Multispectral Images. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; IEEE: New York, NY, USA, 2024; pp. 9468–9471. [Google Scholar] [CrossRef]
Hong, D.; Zhang, B.; Li, X.; Li, Y.; Li, C.; Yao, J.; Yokoya, N.; Li, H.; Ghamisi, P.; Jia, X.; et al. SpectralGPT: Spectral Remote Sensing Foundation Model. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5227–5244. [Google Scholar] [CrossRef]
Xiong, Z.; Wang, Y.; Zhang, F.; Stewart, A.J.; Hanna, J.; Borth, D.; Papoutsis, I.; Saux, B.L.; Camps-Valls, G.; Zhu, X.X. Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation. arXiv 2024, arXiv:2403.15356. [Google Scholar] [CrossRef]
Zhao, Z.; Ding, X.; Prakash, B.A. PINNsFormer: A Transformer-Based Framework for Physics-Informed Neural Networks. arXiv 2024, arXiv:2307.11833. [Google Scholar] [CrossRef]
Tseng, G.; Fuller, A.; Reil, M.; Herzog, H.; Beukema, P.; Bastani, F.; Green, J.R.; Shelhamer, E.; Kerner, H.; Rolnick, D. Galileo: Learning Global & Local Features of Many Remote Sensing Modalities. arXiv 2025, arXiv:2502.09356. [Google Scholar] [CrossRef]
Astruc, G.; Gonthier, N.; Mallet, C.; Landrieu, L. AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; IEEE: New York, NY, USA, 2025; pp. 19530–19540. [Google Scholar] [CrossRef]
Wang, Q.; Doulgeris, A.P.; Eltoft, T. Physics-Aware Training Data to Improve Machine Learning for Sea Ice Classification from Sentinel-1 SAR Scenes. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: New York, NY, USA, 2022; pp. 4992–4995. [Google Scholar] [CrossRef]
Jakubik, J.; Yang, F.; Blumenstiel, B.; Scheurer, E.; Sedona, R.; Maurogiovanni, S.; Bosmans, J.; Dionelis, N.; Marsocci, V.; Kopp, N.; et al. TerraMind: Large-Scale Generative Multimodality for Earth Observation. arXiv 2025, arXiv:2504.11171. [Google Scholar] [CrossRef]
Brown, C.F.; Kazmierski, M.R.; Pasquarella, V.J.; Rucklidge, W.J.; Samsikova, M.; Zhang, C.; Shelhamer, E.; Lahera, E.; Wiles, O.; Ilyushchenko, S.; et al. AlphaEarth Foundations: An Embedding Field Model for Accurate and Efficient Global Mapping from Sparse Label Data. arXiv 2025, arXiv:2507.22291. [Google Scholar] [CrossRef]
Maslanik, J.A.; Barry, R.G. Remote Sensing in Antarctica and the Southern Ocean: Applications and Developments. Antarct. Sci. 1990, 2, 105–121. [Google Scholar] [CrossRef]
Husman, S.d.R.; Hu, Z.; Wouters, B.; Munneke, P.K.; Veldhuijsen, S.; Lhermitte, S. Remote Sensing of Surface Melt on Antarctica: Opportunities and Challenges. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2462–2480. [Google Scholar] [CrossRef]
Marques, A.L.; Moraes, M.M.; Arantes, R.M.E. Mapping Research Paths and Perspectives over the Fieldwork of Human Physiology in Antarctica: Reflections on the Integration of Science, Environment, and Subjectivity. An. Acad. Bras. Ciênc. 2022, 94, e20210396. [Google Scholar] [CrossRef] [PubMed]
Ren, G.; Wang, J.; Lu, Y.; Wu, P.; Lu, X.; Chen, C.; Ma, Y. Monitoring Changes to Arctic Vegetation and Glaciers at Ny-Ålesund, Svalbard, Based on Time Series Remote Sensing. Remote Sens. 2021, 13, 3845. [Google Scholar] [CrossRef]
Stewart, A.; Lehmann, N.; Corley, I.; Wang, Y.; Chang, Y.C.; Ait Ali Braham, N.A.; Sehgal, S.; Robinson, C.; Banerjee, A. SSL4EO-L: Datasets and Foundation Models for Landsat Imagery. In Proceedings of the Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 59787–59807. [Google Scholar] [CrossRef]
Cannone, N.; Guglielmin, M.; Ponti, S. Suitability and Limitations of Ground-Based Imagery and Thermography for Long-Term Monitoring of Vegetation Changes in Victoria Land (Continental Antarctica). Ecol. Indic. 2023, 156, 111080. [Google Scholar] [CrossRef]
Li, Y.; Qiao, G.; Popov, S.; Cui, X.; Florinsky, I.V.; Yuan, X.; Wang, L. Unmanned Aerial Vehicle Remote Sensing for Antarctic Research: A Review of Progress, Current Applications, and Future Use Cases. IEEE Geosci. Remote Sens. Mag. 2023, 11, 73–93. [Google Scholar] [CrossRef]
Román, A.; Navarro, G.; Caballero, I.; Tovar-Sánchez, A. High-Spatial Resolution UAV Multispectral Data Complementing Satellite Imagery to Characterize a Chinstrap Penguin Colony Ecosystem on Deception Island (Antarctica). GISci. Remote Sens. 2022, 59, 1159–1176. [Google Scholar] [CrossRef]
Román, A.; Tovar-Sánchez, A.; Fernández-Marín, B.; Navarro, G.; Barbero, L. Characterization of an Antarctic Penguin Colony Ecosystem Using High-Resolution UAV Hyperspectral Imagery. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103565. [Google Scholar] [CrossRef]
Moghadam, P.; Ward, D.; Goan, E.; Jayawardena, S.; Sikka, P.; Hernandez, E. Plant disease detection using hyperspectral imaging. In Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Sydney, Australia, 29 November–1 December 2017; IEEE: New York, NY, USA, 2017; pp. 1–8. [Google Scholar]
Jung, S.H.; Kwon, S.; Seo, I.W.; Kim, J.S. Comparison between Hyperspectral and Multispectral Retrievals of Suspended Sediment Concentration in Rivers. Water 2024, 16, 1275. [Google Scholar] [CrossRef]
Manolakis, D.G.; Lockwood, R.B.; Cooley, T.W. Hyperspectral Imaging Remote Sensing: Physics, Sensors, and Algorithms, 1st ed.; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar] [CrossRef]
Fang, Z.; Savkin, A.V.; Fang, Z.; Savkin, A.V. Strategies for Optimized UAV Surveillance in Various Tasks and Scenarios: A Review. Drones 2024, 8, 193. [Google Scholar] [CrossRef]
Huang, W.; Yu, A.; Xu, Q.; Sun, Q.; Guo, W.; Ji, S.; Wen, B.; Qiu, C.; Huang, W.; Yu, A.; et al. Sea Ice Extraction via Remote Sensing Imagery: Algorithms, Datasets, Applications and Challenges. Remote Sens. 2024, 16, 842. [Google Scholar] [CrossRef]
Lampert, A.; Altstädter, B.; Bärfuss, K.; Bretschneider, L.; Sandgaard, J.; Michaelis, J.; Lobitz, L.; Asmussen, M.; Damm, E.; Käthner, R.; et al. Unmanned Aerial Systems for Investigating the Polar Atmospheric Boundary Layer—Technical Challenges and Examples of Applications. Atmosphere 2020, 11, 416. [Google Scholar] [CrossRef]
Lucieer, A.; Turner, D.; King, D.H.; Robinson, S.A. Using an Unmanned Aerial Vehicle (UAV) to Capture Micro-Topography of Antarctic Moss Beds. Int. J. Appl. Earth Obs. Geoinf. 2014, 27, 53–62. [Google Scholar] [CrossRef]
Attard, M.R.G.; Phillips, R.A.; Bowler, E.; Clarke, P.J.; Cubaynes, H.; Johnston, D.W.; Fretwell, P.T.; Attard, M.R.G.; Phillips, R.A.; Bowler, E.; et al. Review of Satellite Remote Sensing and Unoccupied Aircraft Systems for Counting Wildlife on Land. Remote Sens. 2024, 16, 627. [Google Scholar] [CrossRef]
Bindschadler, R.; Vornberger, P.; Fleming, A.; Fox, A.; Mullins, J.; Binnie, D.; Paulsen, S.J.; Granneman, B.; Gorodetzky, D. The Landsat Image Mosaic of Antarctica. Remote Sens. Environ. 2008, 112, 4214–4226. [Google Scholar] [CrossRef]
Black, M.; Fleming, A.; Riley, T.; Ferrier, G.; Fretwell, P.; McFee, J.; Achal, S.; Diaz, A.U.; Black, M.; Fleming, A.; et al. On the Atmospheric Correction of Antarctic Airborne Hyperspectral Data. Remote Sens. 2014, 6, 4498–4514. [Google Scholar] [CrossRef]
Maniraj, S.P.; Rose, J.D.; Arunachalam, R.; Rangasamy, K.; Patil, V.R.; Kathirvelu, S. Polar Region Climate Dynamics: Deep Learning and Remote Sensing Integration for Monitoring Arctic and Antarctic Changes. Remote Sens. Earth Syst. Sci. 2024, 7, 582–595. [Google Scholar] [CrossRef]
Zhao, C.; Kang, S.; Fan, Y.; Wang, Y.; He, Z.; Tan, Z.; Gao, Y.; Zhang, T.; He, Y.; Fan, Y.; et al. Unmanned Aerial Vehicle Technology for Glaciology Research in the Third Pole. Drones 2025, 9, 254. [Google Scholar] [CrossRef]
Qi, M.; Gadd, M.; De Martini, D.; Davis, K.J.; Xiong, B.; Rosen, A.; Krishna Moorthy, S.M.; Hawes, N.; Salguero-Gómez, R. Biodiversity Research Requires More Motors in Air, Water and on Land. Methods Ecol. Evol. 2025, 1–15. [Google Scholar] [CrossRef]
Huovinen, P.; Ramírez, J.; Gómez, I. Remote Sensing of Albedo-Reducing Snow Algae and Impurities in the Maritime Antarctica. ISPRS J. Photogramm. Remote Sens. 2018, 146, 507–517. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph Convolutional Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5966–5978. [Google Scholar] [CrossRef]
Blair, J.D.; Gaynor, K.M.; Palmer, M.S.; Marshall, K.E. A Gentle Introduction to Computer Vision-Based Specimen Classification in Ecological Datasets. J. Anim. Ecol. 2024, 93, 147–158. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Wen, M.; Zhang, H.; Sun, J.; Yang, Q.; Zhang, Z.; Lu, H. HSIMAE: A Unified Masked Autoencoder with Large-Scale Pretraining for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 14064–14079. [Google Scholar] [CrossRef]
Ibañez, D.; Fernandez-Beltran, R.; Pla, F.; Yokoya, N. Masked Auto-Encoding Spectral–Spatial Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5542614. [Google Scholar] [CrossRef]
Li, J.; Wu, C.; Song, R.; Li, Y.; Liu, F. Adaptive Weighted Attention Network with Camera Spectral Sensitivity Prior for Spectral Reconstruction from RGB Images. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 1894–1903. [Google Scholar] [CrossRef]
Mahendren, S.; Fernando, T.; Sridharan, S.; Moghadam, P.; Fookes, C. Reduction of feature contamination for hyper spectral image classification. In Proceedings of the 2021 Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 29 November–1 December 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
Zhao, Y.; Po, L.M.; Yan, Q.; Liu, W.; Lin, T. Hierarchical Regression Network for Spectral Reconstruction from RGB Images. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 1695–1704. [Google Scholar] [CrossRef]
Xiong, Z.; Shi, Z.; Li, H.; Wang, L.; Liu, D.; Wu, F. HSCNN: CNN-Based Hyperspectral Image Recovery from Spectrally Undersampled Projections. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 518–525. [Google Scholar] [CrossRef]
Shi, Z.; Chen, C.; Xiong, Z.; Liu, D.; Wu, F. HSCNN+: Advanced CNN-Based Hyperspectral Recovery from RGB Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: New York, NY, USA, 2018; pp. 1052–10528. [Google Scholar] [CrossRef]
Roy, S.K.; Kar, P.; Hong, D.; Wu, X.; Plaza, A.; Chanussot, J. Revisiting Deep Hyperspectral Feature Extraction Networks via Gradient Centralized Convolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5516619. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; NIPS: La Jolla, CA, USA, 2017; pp. 6000–6010. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar] [CrossRef]
Mohamed, S.; Fernando, T.; Sridharan, S.; Moghadam, P.; Fookes, C. Dual-Domain Masked Image Modeling: A Self-Supervised Pretraining Strategy Using Spatial and Frequency Domain Masking for Hyperspectral Data. In Proceedings of the IGARSS 2025—IEEE International Geoscience and Remote Sensing Symposium, Brisbane, Australia, 3–8 August 2025; IEEE: New York, NY, USA, 2025. [Google Scholar]
Li, Z.; Li, L.; Liu, B.; Cao, Y.; Zhou, W.; Ni, W.; Yang, Z. Spectral-Learning-Based Transformer Network for the Spectral Super-Resolution of Remote-Sensing Degraded Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5505705. [Google Scholar] [CrossRef]
Ji, R.; Tan, K.; Wang, X.; Tang, S.; Sun, J.; Niu, C.; Pan, C. PatchOut: A Novel Patch-Free Approach Based on a Transformer-CNN Hybrid Framework for Fine-Grained Land-Cover Classification on Large-Scale Airborne Hyperspectral Images. Int. J. Appl. Earth Obs. Geoinf. 2025, 138, 104457. [Google Scholar] [CrossRef]
Raza, A.; Hanif, F.; Mohammed, H.A. Analyzing the Enhancement of CNN-YOLO and Transformer Based Architectures for Real-Time Animal Detection in Complex Ecological Environments. Sci. Rep. 2025, 15, 39142. [Google Scholar] [CrossRef]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-Informed Machine Learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
Nghiem, T.X.; Drgoňa, J.; Jones, C.; Nagy, Z.; Schwan, R.; Dey, B.; Chakrabarty, A.; Di Cairano, S.; Paulson, J.A.; Carron, A.; et al. Physics-Informed Machine Learning for Modeling and Control of Dynamical Systems. In Proceedings of the 2023 American Control Conference (ACC), San Diego, CA, USA, 31 May–2 June 2023; IEEE: New York, NY, USA, 2023; pp. 3735–3750. [Google Scholar] [CrossRef]
Yu, R.; Qiu, C.; Ladwig, R.; Hanson, P.; Xie, Y.; Jia, X. Physics-Guided Foundation Model for Scientific Discovery: An Application to Aquatic Science. In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’25/IAAI’25/EAAI’25); AAAI Press: Washington, DC, USA, 2025; Volume 39, pp. 28548–28556. [Google Scholar] [CrossRef]
Bai, J.; Alzubaidi, L.; Wang, Q.; Kuhl, E.; Bennamoun, M.; Gu, Y. Utilising Physics-Guided Deep Learning to Overcome Data Scarcity. arXiv 2022, arXiv:2211.15664. [Google Scholar] [CrossRef]
Zhang, J.; Su, R.; Fu, Q.; Ren, W.; Heide, F.; Nie, Y. A Survey on Computational Spectral Reconstruction Methods from RGB to Hyperspectral Imaging. Sci. Rep. 2022, 12, 11905. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Rahnemoonfar, M. Physics-Informed Spatio-Temporal Graph Neural Network for Efficient Deep Ice Layer Thickness Estimation in Radar Imagery. In Proceedings of the 2025 IEEE International Radar Conference (RADAR), Atlanta, GA, USA, 3–9 May 2025; IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar] [CrossRef]
Rahnemoonfar, M.; Zalatan, B. Physics-Informed Machine Learning for Deep Ice Layer Tracing in SAR Images. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; IEEE: New York, NY, USA, 2024; pp. 6938–6942. [Google Scholar] [CrossRef]
Banerjee, C.; Nguyen, K.; Fookes, C.; George, K. Physics-Informed Computer Vision: A Review and Perspectives. ACM Comput. Surv. 2024, 57, 17:1–17:38. [Google Scholar] [CrossRef]
Li, Z.; Xue, Z.; Jia, M.; Nie, X.; Wu, H.; Zhang, M.; Su, H. DEMAE: Diffusion-Enhanced Masked Autoencoder for Hyperspectral Image Classification with Few Labeled Samples. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5527616. [Google Scholar] [CrossRef]
Varshney, D.; Ibikunle, O.; Paden, J.; Rahnemoonfar, M. Learning Snow Layer Thickness Through Physics Defined Labels. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: New York, NY, USA, 2022; pp. 1233–1236. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar] [CrossRef]
Lee, G.Y.; Dam, T.; Ferdaus, M.M.; Poenar, D.P.; Duong, V.N. Unlocking the Capabilities of Explainable Few-Shot Learning in Remote Sensing. Artif. Intell. Rev. 2024, 57, 169. [Google Scholar] [CrossRef]
Wang, D.; Hu, M.; Jin, Y.; Miao, Y.; Yang, J.; Xu, Y.; Qin, X.; Ma, J.; Sun, L.; Li, C.; et al. HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 6427–6444. [Google Scholar] [CrossRef]
Helber, P.; Bischke, B.; Dengel, A.; Borth, D. Introducing Eurosat: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: New York, NY, USA, 2018; pp. 204–207. [Google Scholar] [CrossRef]
Christie, G.; Fendley, N.; Wilson, J.; Mukherjee, R. Functional Map of the World. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 6172–6180. [Google Scholar] [CrossRef]
Sumbul, G.; Charfuelan, M.; Demir, B.; Markl, V. Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; IEEE: New York, NY, USA, 2019; pp. 5901–5904. [Google Scholar] [CrossRef]
Haghighat, M.; Moghadam, P.; Mohamed, S.; Koniusz, P. Pre-training with random orthogonal projection image modeling. In Proceedings of the 12th International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Schmitt, M.; Ahmadi, S.A.; Xu, Y.; Taşkin, G.; Verma, U.; Sica, F.; Hänsch, R. There Are No Data Like More Data: Datasets for Deep Learning in Earth Observation. IEEE Geosci. Remote Sens. Mag. 2023, 11, 63–97. [Google Scholar] [CrossRef]
Roscher, R.; Russwurm, M.; Gevaert, C.; Kampffmeyer, M.; Dos Santos, J.A.; Vakalopoulou, M.; Hänsch, R.; Hansen, S.; Nogueira, K.; Prexl, J.; et al. Better, Not Just More: Data-centric Machine Learning for Earth Observation. IEEE Geosci. Remote Sens. Mag. 2024, 12, 335–355. [Google Scholar] [CrossRef]
Xiao, A.; Xuan, W.; Wang, J.; Huang, J.; Tao, D.; Lu, S.; Yokoya, N. Foundation Models for Remote Sensing and Earth Observation: A Survey. IEEE Geosci. Remote Sens. Mag. 2025, 13, 297–324. [Google Scholar] [CrossRef]
Sumbul, G.; de Wall, A.; Kreuziger, T.; Marcelino, F.; Costa, H.; Benevides, P.; Caetano, M.; Demir, B.; Markl, V. BigEarthNet-MM: A Large-Scale, Multimodal, Multilabel Benchmark Archive for Remote Sensing Image Classification and Retrieval [Software and Data Sets]. IEEE Geosci. Remote Sens. Mag. 2021, 9, 174–180. [Google Scholar] [CrossRef]
Wang, Y.; Braham, N.A.A.; Xiong, Z.; Liu, C.; Albrecht, C.M.; Zhu, X.X. SSL4EO-S12: A Large-Scale Multimodal, Multitemporal Dataset for Self-Supervised Learning in Earth Observation [Software and Data Sets]. IEEE Geosci. Remote Sens. Mag. 2023, 11, 98–106. [Google Scholar] [CrossRef]
Braham, N.A.A.; Albrecht, C.M.; Mairal, J.; Chanussot, J.; Wang, Y.; Zhu, X.X. SpectralEarth: Training Hyperspectral Foundation Models at Scale. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 16780–16797. [Google Scholar] [CrossRef]
Velazquez, D.; López, P.R.; Alonso, S.; Gonfaus, J.M.; Gonzalez, J.; Richarte, G.; Marin, J.; Bengio, Y.; Lacoste, A. EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Tucson, AZ, USA, 28 February–4 March 2025; IEEE: New York, NY, USA, 2025; pp. 1138–1147. [Google Scholar] [CrossRef]
Schmitt, M.; Ghamisi, P.; Yokoya, N.; Hänsch, R. EOD: The IEEE GRSS Earth Observation Database. arXiv 2022, arXiv:2209.12480. [Google Scholar] [CrossRef]
Lowe, S.C.; Misiuk, B.; Xu, I.; Abdulazizov, S.; Baroi, A.R.; Bastos, A.C.; Best, M.; Ferrini, V.; Friedman, A.; Hart, D.; et al. BenthicNet: A Global Compilation of Seafloor Images for Deep Learning Applications. Sci. Data 2025, 12, 230. [Google Scholar] [CrossRef] [PubMed]
Cox, S.C.; Smith Lyttle, B.; Elkind, S.; Smith Siddoway, C.; Morin, P.; Capponi, G.; Abu-Alam, T.; Ballinger, M.; Bamber, L.; Kitchener, B.; et al. A Continent-Wide Detailed Geological Map Dataset of Antarctica. Sci. Data 2023, 10, 250. [Google Scholar] [CrossRef] [PubMed]
Jiang, A.; Meng, X.; Huang, Y.; Shi, G. Using Deep Learning and Multi-Source Remote Sensing Images to Map Landlocked Lakes in Antarctica. Cryosphere 2024, 18, 5347–5364. [Google Scholar] [CrossRef]
Xie, H.; He, S.; Cheng, X. A Convolution Neural Network-based Method for Sea Ice Remote Sensing Using GNSS-R Data. In Proceedings of the 2022 4th International Conference on Communications, Information System and Computer Engineering (CISCE), Shenzhen, China, 27–29 May 2022; IEEE: New York, NY, USA, 2022; pp. 284–289. [Google Scholar] [CrossRef]
Ali, S.; Wang, J. MT-IceNet—A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting. In Proceedings of the 2022 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), Vancouver, WA, USA, 6–9 December 2022; IEEE: New York, NY, USA, 2022; pp. 1–10. [Google Scholar] [CrossRef]
Ren, Y.; Li, X. Predicting the Daily Sea Ice Concentration on a Subseasonal Scale of the Pan-Arctic During the Melting Season by a Deep Learning Model. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4301315. [Google Scholar] [CrossRef]
Shen, X.; Ke, C.Q.; Li, H. Snow Depth Product over Antarctic Sea Ice from 2002 to 2020 Using Multisource Passive Microwave Radiometers. Earth Syst. Sci. Data 2022, 14, 619–636. [Google Scholar] [CrossRef]
Marsocci, V.; Jia, Y.; Bellier, G.L.; Kerekes, D.; Zeng, L.; Hafner, S.; Gerard, S.; Brune, E.; Yadav, R.; Shibli, A.; et al. PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models. arXiv 2025, arXiv:2412.04204. [Google Scholar] [CrossRef]
Ghamisi, P.; Yu, W.; Marinoni, A.; Gevaert, C.M.; Persello, C.; Selvakumaran, S.; Girotto, M.; Horton, B.P.; Rufin, P.; Hostert, P.; et al. Responsible Artificial Intelligence for Earth Observation: Achievable and Realistic Paths to Serve the Collective Good. IEEE Geosci. Remote Sens. Mag. 2025, 13, 72–96. [Google Scholar] [CrossRef]
Wehner, H.; Dietz, A.; Kounev, S.; Kuenzer, C.; Wehner, H.; Dietz, A.; Kounev, S.; Kuenzer, C. Systematic Review of Satellite-Based Earth Observation Applications for Wildlife Ecology Research in Terrestrial Polar and Mountain Regions. Remote Sens. 2025, 17, 2780. [Google Scholar] [CrossRef]
Klein, N.; Carr, A.; Hampel-Arias, Z.; Zastrow, A.; Ziemann, A.; Flynn, E. Hyperspectral Target Identification Using Physics-Guided Neural Networks with Explainability and Feature Attribution. In Proceedings of the IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; IEEE: New York, NY, USA, 2023; pp. 946–949. [Google Scholar] [CrossRef]

Figure 1. Comparison of Antarctic landscapes to demonstrate the scale and resolution required to distinguish Antarctic vegetation. (a) Moss bed appears abundant up close, but lichens are relatively small, and moss health segmentation requires close attention. (b) Moss beds are surrounded by rocks and snow in the remote Antarctic landscape; moss health and lichens are practically impossible to distinguish. (c) The ortho-rectified patch, extracted from [9], illustrates the scale of similar moss beds.

Figure 2. Scales of remote sensing platforms. Each platform provides a range of spatial resolutions and survey altitudes, which must be selected to suit the research at hand.

Figure 3. Three-dimensional representation of hyperspectral imaging (HSI) data and an example of how spectral information can support material identification. Continuous, high-resolution spectral profiles enable the discrimination of features such as ice, soil, and vegetation.

Figure 4. Comparison of spectral resolution between hyperspectral (HSI) and multispectral (MSI) imagery. HSI sensors capture hundreds of narrow bands for a more continuous spectral profile, enabling enhanced segmentation maps. Concept adapted from [36].

Figure 5. Conceptual hierarchy of ML methods according to data needs and task flexibility. Generalisability typically scales with data dependence, in which foundation models can perform well when pretrained on large volumes of data. Alternatively, traditional methods perform well with limited data but are often narrow or task-specific.

Figure 6. General architecture of a deep CNN applied to hyperspectral imaging (HSI). Compared to standard RGB inputs, the model receives a pixel-wise hyperspectral vector. Stacked convolutional and pooling layers extract spectral features of increasing abstraction, compressing the input into a discriminative feature vector. A classifier with softmax activation outputs class probabilities for pixel-wise classification.

Figure 7. General architecture of the standard Vision Transformer (ViT). The input image is divided into fixed-size patches, each linearly embedded and combined with positional information. A learnable classification token is prepended to the sequence and processed through stacked Transformer encoder layers. The final output is produced via a multilayer perceptron (MLP) head, yielding class probabilities.

Figure 8. Overview of foundation models for EO. The key benefit of foundation models for EO is their ability to intake various types of data that can be used for a range of downstream tasks. The data types and subsequent tasks differ according to the model design.

Figure 9. Potential workflow integrating global EO data with Antarctic data for ML strategies to address domain-specific data scarcity. Combining multimodal EO data with domain-specific data may support deep learning strategies that depend on large volumes of data, while enabling greater potential for specific domain adaptation to Antarctic contexts that are often left out of large-scale EO foundation models. This pipeline could facilitate improved performance and applicability to more downstream tasks than traditional, task-specific methods.

Table 1. Comparison of platforms for Antarctic remote sensing. As platforms increase in spatial resolution, their coverage typically decreases proportionately, enabling them to perform better in different research contexts.

Platform	Spatial Resolution	Coverage and Accessibility	Key Characteristics
Satellite	Low to Moderate	High; global and repeatable	Broad spatial and temporal coverage; limited spatial detail; restricted polar revisit rates and cloud interference.
Manned aircraft	Moderate	Moderate to high; regional campaigns	High-quality data over targeted areas; supports heavy sensors; costly and logistically demanding; weather- and safety-dependent.
UAV	High	Moderate; local operations	Flexible deployment; very high spatial detail; constrained by battery life, payload, and environmental conditions.
Handheld Sensors	Very High	Low; point-based	Ground-truth precision; limited spatial extent; labour-intensive and weather-sensitive.

Table 2. Overview of data constraints in Antarctic remote sensing. Environmental and technical challenges directly impact the ability to collect imagery and ground-truth labels. These have subsequent effects on the applicability of ML methods according to their data requirements, reliance on quality labels, and computational overhead.

Constraint	Impact on Data	Implication for ML
Harsh weather, cloud cover	Incomplete or inconsistent imagery	Sparse, noisy input data
Short field seasons	Minimal opportunities to collect ground truth	Very limited labelled datasets
UAV battery, icing risk	Limited spatial coverage	Smaller datasets, high variation
Environmental regulations	Limited ground-based annotation	Lack of verified labels
Sensor trade-offs (HSI/MSI)	No sensor provides full resolution and coverage	Heterogeneous, unbalanced datasets

Table 3. Taxonomy of machine learning methods for remote sensing data. In this section, we discuss categories of ML methods that range in their applicability to data-scarce environments. While earlier approaches can perform well with fewer samples, they are more task-specific than large models that rely on large volumes of data for generalisability.

Category	Description
Section 4.1: Manual Feature Engineering and Rule-Based	Early approaches that rely on predefined indices, ratios, or spectral similarity measures. These methods use expert knowledge to design features and apply thresholding or matching rules for classification.
Section 4.2: Traditional Machine Learning	Algorithms that learn from handcrafted features to perform classification or clustering. They generally require smaller datasets, offer interpretability, but struggle with complex spectral–spatial patterns.
Section 4.3: Deep Learning	Data-driven methods that automatically learn hierarchical spectral–spatial features. Includes CNNs, RNNs, and transformers, offering high accuracy and flexibility but requiring larger training datasets.
Section 4.4: Physics-Based Methods	Approaches grounded in physical laws and domain knowledge. They exploit physical principles to invert surface parameters, simulate spectral responses, or guide loss functions.
Section 4.5: Foundation Models	Large pretrained models designed for generalisation across tasks, domains, and modalities. They leverage massive datasets and can be adapted to new applications with minimal labelled data.

Table 4. Summary of large-scale datasets discussed in Section 5.1. The abundance of satellite data has enabled these datasets to reach impressive scales, with geographic coverage spanning the globe. However, some are limited to European climates or urban areas, while all have minimal to no polar coverage.

Dataset	Key Features	Limitations/Relevance to Antarctic Work
fMoW (Functional Map of the World) [81]	1M+ satellite images; global RGB and multispectral coverage; labelled for land use/object detection across 63 categories.	Excellent geographic diversity but no Antarctic imagery; biased toward urban and temperate regions.
EuroSAT [80]	Sentinel-2 multispectral imagery with 13 bands; 27k labelled patches for land-use/land-cover classification.	Limited to European climates; lacks high-latitude or polar scenes.
BigEarthNet/BigEarthNet-MM [82,87]	590k Sentinel-based image pairs (S1 + S2); supports multimodal and multi-label learning.	Europe-centric; no time-series data; minor sensor artefacts; minimal relevance for Antarctic domains.
SSL4EO-S12 [88]	Self-supervised dataset (∼3M images) from Sentinel-1/2; covers multiple seasons and tasks.	Sparse polar coverage; urban sampling bias; limited sensor/modal diversity.
SSL4EO-L [30]	Landsat-based SSL dataset (>1M patches) supporting long-term temporal analysis.	Minimal polar coverage; therefore lack of Antarctic relevance.
HyperGlobal-450K [79]	∼450k hyperspectral images (EO-1 + GF-5); cloud-filtered with 150–175 bands.	Regionally biased toward China; limited temporal continuity.
SpectralEarth [89]	500k+ EnMAP hyperspectral patches (202 bands); partial 2022–2024 time-series data.	Sparse polar coverage; limited multi-year time depth.
EarthView [90]	Multimodal global dataset (SAR, RGB, NIR, HSI, elevation); designed for self-supervised learning.	Urban-area focus; minimal to no Antarctic representation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gorry, B.; Sandino, J.; Moghadam, P.; Gonzalez, F.; Roberts, J. Advances in Machine Learning Approaches for UAV-Based Remote Sensing in Data-Deficient Antarctic Environments. Remote Sens. 2026, 18, 459. https://doi.org/10.3390/rs18030459

AMA Style

Gorry B, Sandino J, Moghadam P, Gonzalez F, Roberts J. Advances in Machine Learning Approaches for UAV-Based Remote Sensing in Data-Deficient Antarctic Environments. Remote Sensing. 2026; 18(3):459. https://doi.org/10.3390/rs18030459

Chicago/Turabian Style

Gorry, Brittany, Juan Sandino, Peyman Moghadam, Felipe Gonzalez, and Jonathan Roberts. 2026. "Advances in Machine Learning Approaches for UAV-Based Remote Sensing in Data-Deficient Antarctic Environments" Remote Sensing 18, no. 3: 459. https://doi.org/10.3390/rs18030459

APA Style

Gorry, B., Sandino, J., Moghadam, P., Gonzalez, F., & Roberts, J. (2026). Advances in Machine Learning Approaches for UAV-Based Remote Sensing in Data-Deficient Antarctic Environments. Remote Sensing, 18(3), 459. https://doi.org/10.3390/rs18030459

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advances in Machine Learning Approaches for UAV-Based Remote Sensing in Data-Deficient Antarctic Environments

Highlights

Abstract

1. Introduction

2. Methodology

2.1. Literature Search Strategy

2.2. Data Synthesis and Analysis

3. Data Collection for Antarctic Remote Sensing

3.1. Platforms of Data Collection for Antarctic Remote Sensing

3.2. Common Sensor Types for Antarctic Remote Sensing

3.3. Environmental and Technical Challenges for Antarctic Remote Sensing

4. Machine Learning for Remote Sensing

4.1. Manual Feature Engineering and Rule-Based

4.2. Traditional Machine Learning

4.3. Deep Learning

4.3.1. CNNs for Spectral Data

4.3.2. Transformers for Spectral Data

4.4. Physics-Based and Prior-Based Methods

4.5. Foundation Models for Earth Observation

5. Remote Sensing Datasets for Large-Scale Pretraining and Antarctic Fine-Tuning

5.1. Large-Scale EO Datasets

5.2. Global Dataset Databases

5.3. Antarctic Databases

6. Discussion

6.1. Key Unresolved Challenges

6.2. Emerging Opportunities and Implications

6.3. Priority Directions for Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI