1. Introduction
Oceanic islands are the sub-aerial summits of submarine volcanoes (seamounts) that rise from the ocean floor, typically from depths of 1000–4000 m below sea level [
1]. When such volcanoes are active, dynamic phenomena such as magmatic outflow and violent explosions may occur. In these scenarios, seamounts can emerge from sea level to form visible islands, the growth and collapse of which may occur within a few months or weeks. Volcanic activity of seamounts poses significant threats, for which maritime traffic, aviation, and residents are alerted on a 4-level scale, defining the minimum distance that must be kept from the island [
2].
On existing ocean islands, one premonitory sign of volcanic unrest is ground uplift, which manifests as an increase in sub-aerial land surface. Such morphological changes can be effectively observed and monitored using satellite instruments. The objective of this study is to develop an automatic method for detecting and monitoring changes on oceanic islands, a capability that could become a vital component of a global volcanic hazard early warning system. The short time resolution of satellite images (5 days for the Sentinel-2 constellation) has the potential to enable the issuing of warnings for navigation, where knowledge on the location of shoals, islands, and volcanic hazards is of primary importance.
Existing literature already provides automated deformation (InSAR) [
3] and thermal anomaly [
4] monitoring. However, these systems are optimized for continental volcanoes and large volcanic centers. Small oceanic islands present unique technical challenges, such as mixed land–sea pixels, tidal and sea-level effects, and rapid coastal morphology changes. To address these limitations, this work develops a near-real-time approach tailored to detecting the emergence and evolution of seamounts and volcanic islands, taking advantage of the 10 m spatial resolution of Sentinel-2 imagery in combination with land–water segmentation. The utility of Sentinel-2 for mapping shoreline changes has been demonstrated in several recent studies [
5,
6].
In this study, we present an automated workflow for monitoring the emergence and morphological evolution of volcanic islands in the Tonga Archipelago. The procedure analyzes Sentinel-2 multispectral imagery to extract land–water boundaries via convolutional neural network (CNN) semantic segmentation, and applies a custom change detection module to identify land changes between pairs of images. To show the potential of our methodology, we analyze multi-year data focused on Metis Shoal and Home Reef islands, belonging to the Kingdom of Tonga. We also test the procedure over a polygonal area of 24,000 km2 in the Tonga Archipelago, to assess operative feasibility on a larger scale.
2. Related Work
Land–water segmentation of remote sensing multispectral images has been a topic of research for many years. Among the simplest methods are thresholding techniques based on indices such as NDWI (Normalized Difference Water Index) or NDVI (Normalized Difference Vegetation Index) [
7]. The formula for NDWI is
where Green and NIR represent a Green and a Near Infrared band, respectively. NDWI is a sufficient method in a number of scenarios; however, it has important limitations hindering larger-scale applications. It is overly sensitive to vegetation and is inaccurate with shadows. Moreover, optimal thresholds vary greatly between regions, as well as due to weather and light conditions [
8]. NDVI is expressed similarly to NDWI, as the normalized difference between a Red and Near Infrared band. Although primarily used to detect the presence of vegetation, it can also be applied to distinguish land from water, but suffers from similar limitations as NDWI. Learning-based algorithms provide better solutions for the segmentation of remote sensing images. Recently, deep learning methods have become a research topic in this field.
Wieland et al. [
9] compared the use of U-Net [
10] and DeepLabV3+ [
11] with different choices of encoder backbones (MobileNet-V3, ResNet-50, and EfficientNet-B4), applied to water body segmentation of RGB images captured by satellite and airborne cameras. The study suggested U-Net with MobileNet-V3 as the best architecture, based on their test data. They observed that the addition of an Infrared band, if available, slightly improved the accuracy. Sun et al. [
12] proposed a model architecture based on DeepLabV3+, with a new fusion mechanism for high- and low-level features. In their network—which is intended for segmentation of lakes and rivers—they re-designed the Atrous Spatial Pyramid Pooling (ASPP) module to improve discrimination between adjacent objects of similar colors.
In 2018, Hu et al. [
13] proposed a novel residual network block architecture, named Squeeze-and-Excitation (SE) blocks. These adjust channel-wise feature responses by capturing and leveraging the relationships between different channels. The Squeeze mechanism tackles the filers’ limited receptive fields by pooling global information into a channel descriptor. This descriptor is used in the Excitation step via a gating mechanism. SE blocks can be added to a residual network, helping it focus on specific regions based on channel dependencies. Zhang et al. [
14] applied an SE Residual Network (SE-ResNet) to land cover segmentation of high-resolution remote sensing images, proving it more powerful than other networks such as U-Net, ResNet50, and DeepLabV3+. SE-ResNet can also be employed as an encoder backbone for U-Net.
A different attention mechanism was introduced in Oktay et al. [
15]. In their new architecture, called Attention U-Net, they employ Attention Gates (AG), which learn to suppress irrelevant regions of the image. This unit is able to focus on target objects of different shapes and scales. The authors integrated AGs into the U-Net’s skip connections, for finer control over information flow between encoder and decoder. The tests performed showed that Attention U-Net outperformed standard U-Net in medical image multiclass segmentation over different datasets and training sizes. More recently, Ghaznavi et al. [
16] compared the performance of simple U-Net, Attention U-Net, and a U-Net with VGG16 encoder backbone, with the objective of extracting inland water bodies from RGB satellite images. For this goal, VGG16 U-Net had the highest accuracy scores, though Attention U-Net was very close.
Hybrid approaches to model architectures are possible. For example, Cui et al. [
17] proposes a modified U-Net model, with a CNN-based encoder and a decoder based on the Mamba architecture [
18]. CNN–transformer hybrids have also been explored [
19], where a convolutional encoder is combined with transformer-based modules, enabling the model to capture both local spatial detail and long-range dependencies with semantic features.
4. Methods
The proposed workflow accesses Sentinel-2 Level-1C imagery through the GEE API. The user-defined ROI is subdivided into 256 × 256 pixel tiles, which are processed iteratively. For each tile, the two most recent cloud-free images are retrieved, and land–water segmentation is applied. A dedicated change detection module then evaluates whether significant changes in subaerial land area have occurred between the two images.
Figure 2 illustrates this Section’s structure.
4.1. Segmentation
4.1.1. Segmentation Models
We use a U-Net-type model for segmenting multispectral images. U-Net is one of the most widely used models for semantic segmentation and is considered a benchmark among computer vision models. It is usually applied to gray-scale or RGB images, but can be extended to work with multi-channel images. U-Net is a Fully Convolutional Neural Network (FCN), based on an encoder-decoder structure. The encoder is a typical convolutional network consisting of a series of convolutions, each followed by a Rectified Linear Unit (ReLU) and a max pooling operation. Through this downsampling path, the image’s size is decreased while its number of channels is increased. In other words, spatial information is reduced, and feature information is increased, helping the network learn dense image features and capture context. In the symmetric decoder block, pooling operations are replaced by up-sampling operations, reconstructing the desired resolution. Their output is concatenated with features from the corresponding encoder layer, through so-called skip connections. Skip connections are one of U-Net’s defining features, and are crucial in recovering spatial details that would be lost due to downsampling. Different backbone architectures can be inserted as the encoder.
For this study, the models tested are U-Net with ResNet34 backbone, as well as U-Net with SE-ResNet50 backbone and Attention U-Net with ResNet101V2 backbone. The last two models feature attention mechanisms, as explained in
Section 2. ResNetV2 [
40] is a variation on ResNet [
41], which changes the order of operations within the residual blocks, improving the way data flows through the network. This facilitates the training of very deep networks by allowing the gradients to flow more easily during backpropagation. In this work, the backbone is a lightweight version of the full ResNet101V2, which shares the same design principles while cutting down on model complexity.
A diagram of the Attention U-Net architecture adopted is shown in
Figure 3. The structure of U-Net with SE-ResNet50 backbone is similar to that of U-Net, but the plain encoder is substituted by an SE-ResNet50 encoder, adding residual blocks and Squeeze-and-Excitation blocks. All models used have an encoder depth of 5, but the filter sizes are
for the ResNet34 and SE-ResNet50 variants and
for Attention U-Net. The number of trainable parameters for U-Net with ResNet34 backbone, U-Net with SE-ResNet50 backbone, and Attention U-Net is respectively 24.5 million, 35.1 million, and 36.8 million.
4.1.2. Training Setup and Metrics
The models are implemented in Python using the Keras framework [
42]. The U-Net implementation is taken from the
segmentation_models library (version 1.0.1) [
43], while Attention U-Net is taken from the
keras-unet-collection library (version 0.1.13) [
44]. Training is done using early stopping to prevent overfitting (
keras.callbacks.EarlyStopping), with a patience of 40 or 50 epochs, and a warm-up stage of 20 epochs. A second callback (
keras.callbacks.ModelCheckpoint) is used to save the best model in real time, which allows for saving more models than just the best overall. In both callbacks, the metric to monitor is validation loss. A batch size of 16 is used.
To assess the relative contribution of individual spectral bands and vegetation/water indices to model predictions, we apply a gradient-based feature importance analysis. Specifically, we compute the channel-wise mean absolute gradient of the model output with respect to each input band, a method commonly used to approximate input sensitivity in deep learning models [
45,
46]. To ensure comparability across bands and models, we normalize the feature importance vectors from each trained model to unit norm. We train four instances of a baseline U-Net model with a ResNet34 backbone, and compute a normalized importance vector for each. We then average these vectors to obtain a more robust estimate of per-band importance (
Figure 4).
The resulting mean feature importance vector is used to guide the selection of a compact yet spectrally informative subset of Sentinel-2 bands. Specifically, we select all bands except B6, B7, B8A, and NDVI, for a total of 8 bands. This choice is made to retain the full core spectral range of Sentinel-2, while minimizing redundancy from adjacent or low-importance channels. B6, B7, and B8A are excluded due to their lower relative importance in the gradient-based analysis, whereas B5 and B8, which fall in the same VNIR spectral region, are retained. NDVI is excluded from the selected subset due to its limited relevance to the primary task of island segmentation. In contrast, NDWI, which emphasizes land–water boundaries, is retained due to its higher feature importance and direct relevance to distinguishing coastal and aquatic regions.
A weighted Binary Cross-Entropy (BCE) loss is adopted to improve segmentation of small volcanic features such as newly emerged seamounts. These create challenges in segmentation, due to their small size, and their spectral similarity with water. Such thematic features are collected in a dataset (see
Section 3) to monitor the performance of models on this important task. Weight maps are constructed such that all pixels on small islands receive a weight of 8, while in other land regions, only shoreline pixels are up-weighted to 5 and all others retain a weight of 1. Shoreline pixels are defined as land pixels whose
square neighborhood includes at least one water pixel. The loss function is calculated by element-wise multiplication of the BCE loss matrix with the weight map matrix.
In order to evaluate the performance of the models under consideration, we adopt the following performance metrics: Precision, Recall, F1 score, Intersection-over-Union score (IoU), and Cohen’s kappa. Precision is the fraction of relevant instances among the retrieved instances, while Recall is the fraction of relevant instances that have been retrieved. Using True Positives (TPs), False Positives (FPs), and False Negatives (FNs), these are formulated as follows:
The F1 score is the harmonic mean of Precision and Recall:
This provides a balanced view of segmentation accuracy, which is useful in the case of imbalanced class distributions such as in our data.
IoU (or Jaccard index) is used in different fields to measure the similarity of sample sets. In our discussion, IoU is scaled by 100, and is defined for two sets
A and
B as follows:
where
denotes the cardinality of set
S. In our case, IoU is associated with the land class, and therefore measures the proportion of land overlap relative to the total land area, over two images. For validating the segmentation models, we consider IoU between the predicted segmentation map and the ground truth mask. The closer this value is to 100, the more accurate the segmentation.
Finally, Cohen’s kappa quantifies the agreement between predicted and reference labels while accounting for chance agreement. It ranges from −1 (complete disagreement) to 1 (perfect agreement), with 0 indicating agreement equivalent to random chance. Its formulation is
where
is the observed agreement, i.e., the fraction of pixels where the predicted and reference classes match, and
is the expected agreement by chance, computed from the marginal distributions of the two classifications.
All metrics, except for Cohen’s kappa, are calculated relative to the land class. As shown in
Section 3.3, land pixels represent a smaller fraction of each image than water. Measuring performance over the water class—for example, using Overall Accuracy or mean IoU—would inflate scores and mask errors in detecting land. Focusing on the land class, therefore, provides a more meaningful assessment of the model’s ability to accurately delineate island regions, which is the primary objective of the segmentation task.
4.2. Automation Framework
4.2.1. Tiling
The region of interest is tessellated into tiles, in order to be given to the trained segmentation model. Optionally, the set of tiles can be duplicated at a distance of 128 in both directions and both orientations, to include overlapping and improve the accuracy of segmentation (areas near the edges of the image are more susceptible to errors, as part of the context is missing). We thus define a “primary” and “secondary” set of tiles. The resulting set of tiles is then used to analyze the region of interest.
The rule applied for gathering overlapping segmentation maps consists of updating the full segmentation map via a logical OR operation. In this way, each pixel is 0, unless any of the segmentation maps has a value 1 for that pixel. This choice is aimed at mitigating border effects in which land area is often underestimated when occurring across adjacent tiles.
4.2.2. Image Selection and Cloudiness Filter
Cloudy pixels are identified using the Cloud Score+ Sentinel-2 product (see
Section 3.2). Clear images are selected by taking the percentage of cloudy pixels within a region of interest, and comparing it to a defined threshold. We therefore use a cloud cover threshold to filter out cloudy images. In our methodology, we apply the filter to each tile while traversing the region, thereby selecting for each tile the most recent clear image.
4.2.3. Preprocessing
Co-registration of images for this work is not feasible due to the lack of reference features in open ocean settings. We rely on the precision in the position of the published Sentinel-2 products. Preprocessing of the images for use in the change detection workflow requires resampling of the bands with 10 m grid cells. This is done with a bilinear interpolation of the four nearest pixels. In the GEE Python API, this logic is encapsulated in the script ee.data.computePixels.
4.3. Change Detection
4.3.1. Conceptual Formulation
To detect change between two images at different times, we consider the two corresponding segmentation maps, which are combined to produce a change mask. We should keep in mind that when more land is present (in particular with longer shorelines), more absolute change is inherent, mostly due to tidal effects. On the other hand, less absolute change is expected when dealing with very small land cover. It seems logical, then, to count the number of changed pixels and perform some normalization step. Two options are:
Normalize the number of changed pixels relative to the size of the landmass. Consistency and sensitivity to varying land area.
Calculate IoU for class land between the two segmentation maps and set a threshold for evaluation. IoU represents the proportion of land overlap relative to the total area classified as land: a measure of landmass similarity between the images.
We choose to use IoU thresholding as the change detection method. It is preferred because of its inherent normalization and compactness. Accordingly, we define the change detection criterion between two segmentation maps
A and
B as follows:
where
T is a threshold value.
By looking at IoU results from the segmentation of sample images, we observe how smaller islands often have relatively low IoU values even when maintaining the same real shape from one image to the next. These cases can be attributed to model behaviour, which is sensitive to contextual features such as brightness, color, and boundary information, e.g., amount of ocean break (waves breaking near and crashing on the shore). In several cases, the segmentation maps for images of Metis Shoal shifted the island by a short length or reduced/increased its scale slightly. Given that consistency is less at lower island sizes, we opt to model the IoU threshold as an increasing function of land area per single tile.
The threshold is defined by the parametric logarithmic function
where
x represents the land area in pixels. With this choice of function, the threshold when no land is present is 0, in which case any appearance of land will give a positive change classification (since both IoU and threshold will be zero). In cases where land is present in the earliest of the two images, the threshold value depends on the parameters
A and
B.
These are set by fitting
to a synthetic dataset, consisting of 700 automatically generated pairs of island masks, annotated with land area
x, IoU, and a manually assigned change label indicating the presence or absence of meaningful landmass change. Synthetic islands are generated by growing land pixel by pixel from a random seed, adding pixels adjacent to existing land to form contiguous regions. After a fixed number of additions, enclosed water areas are filled to complete the landmass. Changes are introduced via morphological operations (dilation, erosion), addition of new land, or removal of existing land. Labels are assigned through visual inspection of each pair. Sixty-three data points related to real cases of volcanic islands were added as well. The threshold model was trained using a cross-entropy loss with a sharp sigmoid applied to
, treating cases where
as indicative of change. The sigmoid sharpness parameter (
) was tuned by maximizing the F1 score over the training data, resulting in an optimal value of
(
Figure 5).
4.3.2. Algorithmic Implementation
The main component of the methodology involves comparing two images taken at different times to determine if a change has occurred. The size of the images may vary if overlaps are adopted.
Figure 6 shows the flowchart for the change detection algorithm.
The boolean output variable is named
change. The algorithm segments both images using a trained segmentation model and calculates the IoU for class 1 (land) from the segmentation maps. The threshold for IoU is calculated via the logarithmic function defined above. The threshold is modeled as a function of the total land area in the first image, assuming the size of the image is exactly
. To apply this function to images of different sizes (if overlaps are adopted), we calculate the average land area per
tile by dividing the total land area by the coverage in terms of single tiles, i.e.,
where
is the average land area per tile,
is the total number of land pixels (across all tiles), and
P is the total number of unique pixels covered by all tiles. We can now compare IoU with the threshold:
(a) if IoU is smaller than the threshold, a possible change is detected. To verify if this is due to cloud presence, we calculate the percentage of pixels with detected change that are cloudy in either image, using the CloudScore+ dataset. If this percentage exceeds 25%, the detected change is taken as dubious, i.e., a possible false positive due to cloud artifacts, and we conservatively set change to False. If the percentage is below 25%, we accept the IoU as indicating a change.
(b) If IoU is greater than the threshold, we initially set change to False. Optionally, an additional step can be performed to detect small but significant morphological changes. For instance, we want to be able to reject normal tidal effects, but detect the emergence of a small land mass next to an already existing one. Both these events could produce a high IoU value, and be rejected by the steps described above.
To perform this check, we create a change mask from the segmentation maps, where the mask has a value of 0 where the maps are identical and 1 otherwise. This mask is then downsampled by a factor of 16, resulting in an image with pixel values in the range [0, 1] and pixel size 160 m (from the original 10 . We search for pairs of adjacent pixels with high values (greater than 0.8), which indicate significant changes over areas of 51,200 0.0512 . If such changes are identified, we perform the cloudiness verification step again to rule out cloud artifacts. If this test is passed, change is set to True.
4.4. Regional Monitoring Algorithm
We can now describe the main algorithm for monitoring a user-defined region for changes in real-time. The procedure can be executed at regular time intervals, as new images become available. The set of primary tiles covering the ROI is looped through, iteratively applying the change detection module described in
Section 4.3, which fetches the two latest cloud-free images and sets a positive or negative value for change to each visited tile. Results are written to a log file, in the minimal form of the following:
tile number: left to right, bottom to top,
coordinates: top-left coordinates for this tile,
last analyzed image: date of the last image that was analyzed during a run, for this tile,
change: whether change was detected in this tile during the last run.
Other data from the run can be saved, for the purpose of analyzing the results.
We assume in the following that tiling includes overlapping “secondary” tiles. The algorithm iterates over the set of primary tiles. For each tile, we fetch a collection of images, filtering by a chosen cloudy pixel percentage threshold, and obtain the two latest images. By reading the date of the last analyzed image in the log file, we verify whether newer images are available. If none are, the tile is either skipped, or the cloudiness filter threshold is raised until a new image is found. If a new image is available, information from all tiles intersecting the central tile is included, meaning that the central tile is joined with all intersecting tiles in the secondary set, using the same Sentinel-2 image. Then, the change detection procedure described above is applied, producing a True/False change value. Results for this tile are written to the log file, and we move to the next primary tile, until all have been visited.
4.5. Parallelization
To parallelize the code, we adopted a CPU-based strategy using Python’s multiprocessing module. This approach is well-suited to our workflow, since the data can be subdivided into independent tiles that are processed without interdependence, making task parallelism on multi-core CPUs efficient and straightforward to implement. This also provides a scalable solution to be deployed on a high-performance computing (HPC) cluster, for rapid testing of the procedure. GPU acceleration should be sought for further optimization.
The procedures were run either locally (when focused on a single tile) or on the Demetra HPC cluster of the University of Trieste’s Department of Mathematics, Informatics and Geosciences. Analysis was run using a DELL PowerEdge R7525 server equipped with two AMD EPYC 7542 processors (32 cores each), 768 GB of RAM, 5.6 TB local storage, and two NVIDIA A100 GPUs. In our case, only CPU resources were used for analysis.
6. Discussion
This work proposes an automatic procedure for the detection of new volcanic islands in the Tonga archipelago region, and the monitoring of their surface through time. At the core of the work is a U-Net type convolutional neural network for semantic segmentation of pixel Sentinel-2 images. While simple methods for land–water segmentation —such as NDWI thresholding—exist, these are often inadequate. Convolutional neural networks represent a more principled choice of architecture for this problem, as they are able to leverage contextual information contained in each pixel’s surrounding.
The models explored are variants of the famous U-Net architecture, introduced in Ronneberger et al. [
10]. The two best models obtained are U-Net with SE-ResNet50 backbone and Attention U-Net with ResNet101V2 backbone. Both architectures include attention mechanisms, which direct the model to focus on relevant image features. In SE-ResNet50, Squeeze-and-Excitation (SE) blocks calibrate channel-wise feature responses by explicitly modelling interdependencies between channels. In the case of Attention U-Net, Attention Gates (AG) are inserted in skip connections, and learn to suppress irrelevant portions of the image. Both models have similar complexity in terms of the number of trainable parameters.
The models were trained on a dataset that we collected using the GEE Python API. The dataset contains numpy arrays of 424 12-channel Sentinel-2 Level 1C images, with their respective ground truth arrays and weight map arrays. Ground truth arrays were manually annotated by identifying land and water regions within the images. Weight maps were applied in the calculation of a weighted Binary Cross-Entropy loss, which targets the issues specific to the set task and geography. The models were trained using a selection of the available channels, with the addition of NDWI, based on a feature importance test.
The trained models were compared on the Test set and on a subset of the whole dataset consisting of small volcanic island images. The IoU accuracy scores were good overall, but we preferred Attention U-Net for its overall edge in performance, especially on the Small Islands set. Less consistency was observed when dealing with very small islands such as Metis Shoal. Barrier reefs, typical of the Tonga region, sometimes caused nearby pixels to be classified as land, most likely due to the presence of an ocean break. These artifacts are functionally reasonable as they are associated with features typical of the land–water interface.
Change detection between two images was defined using the segmentation masks. Our proposed approach involves calculating the IoU of the two segmentation maps and determining if it falls below a certain threshold. The threshold is an increasing logarithmic function of land size, which allows more inconsistency in smaller islands. When IoU is lower than the threshold, the method verifies whether the change detection may be due to cloud presence. Change detection can also be done by downsampling the change mask to look for regions with a high density of changed pixels. The change detection method is applied to the monitoring of a region of interest of variable size. A tiling process is needed to obtain images of size . Once the tiling is defined, the region is analyzed by looping through the individual tiles and performing change detection on each one individually.
We applied the described procedure to known test cases of volcanic activity that caused changes in the shape of Metis Shoal and Home Reef islands. Although the methods were effective in identifying actual changes, their temporal resolution was at times limited by the availability of clear images. The use of MODIS thermal anomaly data can add context where data is missing, as the thermal infrared wavelengths used can penetrate some types of clouds. When using a higher cloudy pixel percentage threshold for image selection, we can obtain more data points at the cost of some noise. However, the verification step described in
Section 4.3.2 often correctly detects cloud artifacts, preventing the detection of false changes. We analyzed time series for the two islands. In all cases, known events were successfully detected. However, some noise was present due to cloudiness and smoke from volcanic activity. Although the Precision of the change detection method was low (20.00% and 58.06% for the Metis Shoal and Home Reef series), a higher Recall (80.00% and 85.71%) is essential for the detection of dangerous cases; False Positives can be managed through manual follow-up assessments. While there is still a need for human verification in dubious cases (if not in all change classification cases), this task is not particularly cumbersome when the analyzed region is not extremely large. Out of 3055
pixel tiles analyzed for the Tonga arc area, 153 were False Positives. Thus, we believe that the procedure can be successfully applied to the Tonga region, albeit in a semi-automatic fashion.
We proved the practical feasibility of the algorithm by testing it over a large area using an HPC cluster. The processing times per tile were and with and without overlapping tiles. The False Positive Rate of 5.01% indicates good stability over open ocean areas.
Possible Improvements
The accuracy of the method is strongly related to that of the segmentation model. Our approach leveraged U-Net-type architectures, which are recognized as strong choices for semantic segmentation. However, various and more recent architecture types can be explored, such as CNN-transformer hybrids.
The current thresholding technique for change detection can be further refined. For example, a neural network could be trained with before-and-after images and their corresponding segmentation maps to detect changes. Depending on the architecture, these images can be concatenated into a single image of shape (256, 256, C + ), where C is the number of channels for each image, and 1 is the added dimension for each segmentation map, or they can be processed separately in a multi-input neural network. The network could be trained to output a binary value indicating whether a significant change has occurred between the two images, or, if more spatial detail is required, it could generate a full change mask.
Furthermore, the change detection procedure could be extended to analyze sequences of images rather than just pairs. By examining a series of images, it may be possible to track more gradual changes. In this paper, however, the focus was on detecting abrupt changes resulting from volcanic activity, particularly for hazard warning purposes. Thus, we limited our approach to using image pairs. Nonetheless, the techniques presented here could be adapted for contexts where the objective is to monitor slower changes, by extension to multi-temporal data.
The segmentation models are specific to the region of Oceania. In general, it is easier to achieve strong model performance when the problem is clearly defined and restricted to a specific task or domain. However, some level of generalization can still be attempted to broaden the model’s applicability. The models demonstrated moderate generalization capability on the SNOWED dataset, although not sufficient to be applied on a global scale. Training a U-Net model (or any deep learning model) for land–water segmentation on a global scale is challenging because of the extreme variability in environmental features, such as coastlines, vegetation, terrain, water types, and seasonal differences across different regions.
To enhance generalization capabilities and make the approach applicable on a global scale, it would be necessary to incorporate additional datasets. A few global datasets for land–water segmentation do exist, such as those presented in Wieland et al. [
53] and Li et al. [
54]. However, such datasets do not always share the same band combination, and the wavelength ranges of corresponding bands are not always consistent. It is recommended to use data that include at least one infrared band in addition to RGB. If broader geographic coverage or feature diversity is needed, existing datasets can be manually expanded, as demonstrated in this work. Additionally, integrating auxiliary features such as geographic coordinates may enhance model performance across diverse environments.
Finally, it is worth considering the use of alternative satellite platforms. Sentinel-2 offers freely accessible data, but is limited in its horizontal resolution and revisit time. Newer satellite missions provide higher spatial resolution and more frequent revisits, which would increase the likelihood of obtaining cloud-free images and improve the ability to track rapid morphological changes. However, the use of such platforms is often constrained by service costs, which must be carefully considered. In its current form, this study lays the groundwork for the development of a monitoring tool for volcanic islands.
7. Conclusions
In the present study, we presented a methodology for monitoring active volcanic regions, with the goal of detecting the emergence and change of volcanic islands. The methodology was applied to two important cases: Metis Shoal and Home Reef islands. In both cases, the procedure was able to successfully capture significant events.
This application is timely, due to the ongoing intense activity at sites like Home Reef volcano. During the year 2024, Home Reef doubled in size, reaching a surface of around 165.9 m2 on 21 January 2025. The use of Sentinel-2 imagery, with a revisit time of 5 days, allows fast detection of volcanic unrest. This would provide valuable data for the issuing of hazard warnings for navigation safety.