Evaluation of Machine Learning Methods for Detecting Subcircular Structures Associated with Potential Natural Hydrogen Sources

García-Arias, Sergio; Florez, Manuel A.; Valencia Ortiz, Joaquín Andrés

doi:10.3390/geomatics6010016

Open AccessArticle

Evaluation of Machine Learning Methods for Detecting Subcircular Structures Associated with Potential Natural Hydrogen Sources

by

Sergio García-Arias

^1,*

,

Manuel A. Florez

²

and

Joaquín Andrés Valencia Ortiz

³

¹

School of Geology, Universidad Industrial de Santander, Carrera 27 Calle 9, Bucaramanga 680002, Colombia

²

School of Physics, Universidad Industrial de Santander, Carrera 27 Calle 9, Bucaramanga 680002, Colombia

³

Department of Geology, Faculty of Sciences, University of Salamanca, Plaza de Los Caídos s/n, 37008 Salamanca, Spain

^*

Author to whom correspondence should be addressed.

Geomatics 2026, 6(1), 16; https://doi.org/10.3390/geomatics6010016

Submission received: 27 November 2025 / Revised: 27 January 2026 / Accepted: 3 February 2026 / Published: 6 February 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Deep learning models (especially U-Net architecture with 256 × 256 input tiles) outperform traditional pixel-based classifiers for detecting subcircular structures related to fairy circles using Landsat-8 data.
Variable importance analysis shows that Landsat-8 Bands 3 (green), 6 (SWIR1), and a subset of Normalised Unit Indices (NUIs) provide the highest spectral contribution to fairy circles detection.

What are the implications of the main findings?

Spectral information alone is not sufficient. Integrating morphometric and non-spectral terrain attributes is necessary to better delineate fairy circles and reduce false positives.
The proposed workflow can guide future exploration strategies for natural hydrogen by helping to prioritise areas of interest from widely available satellite imagery.

Abstract

Natural hydrogen has gained attention as a low-carbon energy vector, and some reported surface expressions have been linked to subcircular patterns, or fairy circles (FC), that may be detectable in multispectral satellite imagery. The Carolina Bays region, on the eastern coast of the United States, was selected because it hosts abundant, well-mapped subcircular depressions. This study aims to comparatively evaluate machine learning algorithms for identifying subcircular structures using Landsat-8 data. We processed 105 Collection 2 Level 2 scenes, masking clouds and shadows using the Level 2 quality band. Pixel-level labels were determined using a well-curated public dataset, derived from a high-resolution LiDAR survey. Traditional models (logistic regression, random forest, and multilayer perceptron) achieved precision scores below 0.66 and enabled a variable-importance analysis, which identified Band 3 (green), Band 6 (SWIR1), and five Normalised Unit Indices as the most predictive features. Deep learning models improved detection, and a U-Net architecture allowed for pixel-level segmentation of FC-like structures, producing false positives mostly in cloudy or shadowed areas. Overall, the results suggest that FC detection from multispectral data alone remains challenging due to class overlap and cloud/shadow contamination. Future work could explore integrating additional non-spectral descriptors, such as morphometric variables, to reduce ambiguities.

Keywords:

fairy circles; Landsat 8; automatic classification; convolutional neural networks; Carolina Bays

Graphical Abstract

1. Introduction

Natural or white hydrogen has gained increasing interest as a clean and carbon-free energy source. Its generation can occur through various mechanisms depending on the geological setting, but in some natural emanations, subcircular surface patterns, known as fairy circles (FC), have been identified [1,2,3,4,5,6,7,8,9,10,11,12,13,14]. These FCs documented in the literature exhibit a range of sizes, often large enough to be detected using satellite imagery. Their identification has incorporated local geological information, terrain topography, and the computation of spectral indices derived from satellite data [14,15,16,17,18,19,20,21].

In studies focused on FC-related structures, satellite imagery constitutes the most commonly used input, from which a variety of spectral indices are computed. Previous works relying primarily on optical satellite data have considered indices such as the Normalised Difference Vegetation Index (NDVI), Atmospherically Resistant Vegetation Index (ARVI), Green Normalised Difference Vegetation Index (GNDVI), Near-Infrared and Shortwave Infrared Burn Ratio Index (NBRI), Green Leaf Index (GLI), Enhanced Vegetation Index (EVI), and the Normalised Difference Water Index (NDWI), as used in the works of [15,18]. In addition to these indices, other studies have incorporated Digital Elevation Models (DEMs) and topographic inputs [15,18,20,21], geophysical information [17], as well as statistical and morphological analyses of the subcircular structures [19,20]. All of these works emphasise the importance of understanding the geological context, including the identification of source rocks, potential reservoir rocks, and faulting, in a manner analogous to petroleum exploration. Among the key morphometric characteristics, the slope of the structures is notable, as it tends to be lower when compared with depressions associated with karst features [20]. Technologies such as Light Detection and Ranging (LiDAR) are highly useful for identifying areas with potential FCs, as they enable improved geomorphological characterisation relative to DEMs [21]. Field measurements of hydrogen concentrations have shown that hydrogen concentration varies within FC structures, with localised zones of higher concentration inside the surface area of the feature [7,18,21].

Despite their widespread use, spectral indices are inherently dependent on local environmental conditions [22]. Consequently, although spectral indices have been applied in FC exploration, their individual contribution to FC identification has rarely been isolated or systematically assessed. Vegetation stress or anomalous spectral responses may arise from multiple factors unrelated to hydrogen emissions, such as soil moisture conditions or precipitation variability, complicating the interpretation of index-based signals.

In recent years, with the aim of automating analytical workflows, machine learning (ML) tools have been increasingly integrated with satellite imagery for various classification tasks. These include land-cover classification using random forests [23], pixel-based convolutional neural networks (CNNs) for the extraction of water bodies [24] and comparative evaluations of deep learning models against traditional approaches such as random forests and decision trees for soil classification [25]. Regarding the prediction of fairy circles, Nigar et al. [26] implemented a U-Net model for FC identification, using WorldView-2 image data. Across these studies, in addition to the spectral bands of the respective satellite platforms, researchers have incorporated spectral indices, elevation information, terrain slope, image texture metrics, and other ancillary variables. Satellite imagery provides continuous temporal coverage and allows for the analysis of large territorial extents, making it one of the primary inputs used in exploration workflows [27,28,29,30].

McMaho et al. [7] compiled global locations where natural hydrogen emissions have been identified but are not necessarily associated with FCs. Among regions where FC-like surface expressions have been reported, the Carolina Bays on the eastern United States coast stand out for the abundance and clear manifestation of subcircular structures, motivating their selection as the study area. Although previous research has demonstrated the utility of satellite imagery for FC-related studies, a systematic evaluation of how individual spectral bands and derived indices contribute to FC detection remains limited, particularly when using multispectral data alone. This lack of systematic assessment hinders a clear understanding of which spectral inputs are most informative and how they influence model performance in exploratory workflows for natural hydrogen.

To address this gap, this study proposes a progressive machine learning framework to evaluate the contribution of Landsat-8 spectral bands and normalised indices for FC detection in the Carolina Bays. The approach begins with traditional pixel-based classifiers, which enable variable-importance analysis, and subsequently uses the most informative inputs to guide experiments with deep learning models. Leveraging recent advances in ML and computational tools, we assess and compare the performance of logistic regression, random forest, multilayer perceptron, CNN, and U-Net architectures for identifying FC-associated areas of interest using multispectral satellite imagery as the sole input. The purpose of this work is to present a reproducible and extensible workflow that can be complemented with non-spectral variables in future studies in order to support future natural hydrogen exploration strategies through the prioritisation of predictors according to local conditions and ML model responses.

Study Area

Carolina Bays correspond to a set of shallow depressions 1 to 3 m deep, with elliptical to ovoid shapes, an average width-to-length ratio of approximately 0.58, elevated sandy rims, and muddy or organic infill, distributed along the Atlantic Coastal Plain from New Jersey to Georgia, United States (Figure 1). Their origin has been attributed to potential meteorite impacts as well as to wind- and wave-driven processes within the Earth system [31,32,33]. The bays are characterised by their high surface density, preferential NW-SE orientation, and smooth geometry, making them one of the most representative examples worldwide of subcircular patterns observable in remote sensing data [12,21,34]. The elliptical depressions exhibit a similarly well-defined geometric precision, suggesting that they may have formed contemporaneously during a common formative event. Comparable terrestrial landforms with the same characteristics are not widely reported in other regions of the world [35].

Zgonnik et al. [21] measured natural hydrogen concentrations along profiles that crossed each structure, starting outside their margins at a distance equivalent to half the diameter of each feature. Along these profiles, they identified the areas where hydrogen concentrations began to increase. Due to the nature of these natural depressions, commonly filled with water and organic deposits (peat), it has been suggested that the hydrogen source could be related to biological activity [36]. However, Zgonnik et al. [21] found that the highest hydrogen concentrations were located along the rims of the structures and at depths of up to 5 m, reducing the likelihood of a purely biogenic origin.

To analyse both areas with and without subcircular structures, the Landsat-8 scene was subdivided into four Areas of Interest (AOIs). AOIs 01 and 02 are located in regions with a high density of subcircular depressions and therefore represent positive scenarios with a clear surface expression of FCs. In contrast, AOIs 03 and 04 are situated toward the northwest portion of the Landsat-8 scene, in areas where no subcircular structures have been mapped. These two AOIs act as negative or control areas, enabling an evaluation of the model’s ability to minimise false detections in environments lacking any surface evidence of FCs.

2. Materials and Methods

The overall workflow consisted of acquiring satellite imagery and the corresponding target labels associated with FCs, followed by an exploratory data analysis prior to training the traditional ML models. These pixel-based models were used to evaluate the relative importance of the input variables. The input variables identified as most informative were subsequently used as inputs for the deep learning models, which take 2D image tiles containing the selected bands and derived indices and then output either an image-level classification (CNN) or a pixel-level classification map (segmentation mask, U-Net).

2.1. Data Acquisition and Preprocessing

The workflow began with the acquisition of multispectral Landsat 8 raster data for training the ML models. We downloaded 105 Collection 2 Level 2 scenes for path/row 016/036, spanning March 2013 to January 2025. All raster datasets were processed in a consistent coordinate reference system, using UTM Zone 17N (EPSG: 32617) for AOI-level analyses and spatial operations. Because prior studies have used a range of spectral indices but provide limited insight into their relative predictive value, we compute the Normalised Unit Index (NUI; Equation (1)) as a standardised framework for generating and comparing band-based indices:

N U I = \frac{(\frac{B_{j} - B_{i}}{B_{j} + B_{i}}) + 1}{2}

(1)

where

B_{i}

and

B_{j}

represent the satellite bands, and the index is always computed using bands of longer wavelength such that

j > i

. We use this index because, after converting digital numbers to reflectance, the resulting values fall between 0 and 1, whereas many traditional spectral indices can extend beyond this range. The NUI therefore constrains all derived variables to the [0, 1] interval, ensuring a consistent scale across predictors during model training.

Fairy circle detection is framed as a binary classification task: pixels located within a subcircular structure are assigned to Class 1, and pixels outside these structures are assigned to Class 0. Pixel-level labels are generated using a vector layer produced by the NCDEQ Division of Mitigation Services (available online at https://m.arcgis.com/home/item.html?id=300b3db308364fb4831eef08dadb1f67#overview (accessed on 2 February 2026)). The layer is rasterized to match the spatial resolution and coordinate reference system of the Landsat 8 imagery, assigning a value of 1 to polygons corresponding to FC-like structures and 0 to all other areas. Using the Landsat 8 quality band for each image, this raster was adjusted by discarding areas affected by clouds and shadows, assigning them a Class 0 value. This approach preserves the full image while acknowledging that clouds or shadows obscure the surface and prevent identification of FCs.

Prior to model training, an exploratory data analysis was conducted to examine the distribution and relationships of the input variables (see Supplementary Materials). This analysis included visual inspection of both classes using boxplots, scatterplots, histograms, and other graphical tools. In addition, a Principal Component Analysis (PCA) was also performed to assess the behaviour of the input variables under dimensionality reduction. During this exploratory phase, it became evident that the number of pixels associated with FCs (Class 1) was significantly smaller than the number of non-FC pixels (Class 0), revealing a pronounced class imbalance in the dataset.

2.2. Training Data

The Landsat 8 images and the corresponding rasterised label layer were subdivided into four areas of interest: AOI 01 and AOI 02, located in regions with evidence of fairy circles, and AOI 03 and AOI 04, located in areas without any surface expression of these structures. The training variables, or model inputs, correspond to the seven Landsat 8 spectral bands, excluding the thermal band. In addition, twenty-one NUI combinations were computed, yielding a total of twenty-eight predictor variables used in the ML experiments.

The four AOIs were grouped into two subsets: AOI 01 and AOI 03 were used for training the ML models, while AOI 02 and AOI 04 were used as a holdout or test set. Traditional ML models such as logistic regression (LR) [37], random forest (RF) [38] for classification, and the multilayer perceptron (MLP) classifier [39] operate at the pixel level, where each pixel is assigned a value of 1 (FC) or 0 (non-FC), and the scene is reconstructed from predictions for each pixel in the scene.

Given the class imbalance, with far more pixels belonging to Class 0 than Class 1, a majority class resampling approach was used to balance the training set. The resampling scheme started by randomly splitting the Class 1 (FC) pixels into 80% for training and 20% for testing. Then, an equal number of pixels was randomly selected from Class 0 (non-FC) to match the number of Class 1 pixels in each set. Finally, both the train and test sets were reshuffled. This resampling scheme was applied to every traditional model trained in this study.

For the deep learning experiments, we implemented a convolutional neural network (CNN) [40] and a U-Net architecture [41]. Raster tiles were stacked (concatenated across bands/derived variables) to preserve both the spatial context and the per-pixel values. The CNN output was a single binary label for each tile, indicating whether it contained FC-like structures. In contrast, the U-Net generated a pixel-level segmentation mask highlighting areas that would correspond to FC-like polygons. We did not apply training set resampling for the CNN and U-Net experiments because class imbalance was substantially less pronounced at the tile level than in the pixel-based formulation.

2.3. Model Training

Because Class 0 was downsampled for the logistic regression, random forest classifier, and MLP classifier, multiple model instances were trained using different resampled subsets of the training data. This ensemble-style procedure enabled the computation of summary statistics for the predicted probabilities and facilitated an assessment of prediction stability across runs. In addition, logistic regression coefficients and random forest feature-importance scores were extracted to identify the predictors that were most informative for discriminating whether a pixel corresponded to an FC-like structure. The highest-ranked variables were subsequently used as inputs to the deep learning models (CNN and U-Net).

For the deep learning experiments, we generated stacked raster tiles of 32 × 32, 64 × 64, and 256 × 256 pixels. Landsat 8 Collection 2 Level 2 multispectral Bands 1–7 have a spatial resolution of 30 m and a swath width of 185 km [42]. Accordingly, the three tile sizes correspond to ground footprints of approximately 960 m × 960 m, 1920 m × 1920 m, and 7680 m × 7680 m, respectively.

3. Results

3.1. Exploratory Data Analysis

The training data for AOI 01 and AOI 03 revealed the presence of outliers in the box-and-whisker plots for both classes (Appendix A). In the histograms, it was also observed that pixels associated with FCs did not exhibit a strong spectral distinction relative to non-associated pixels (Appendix B). In both types of graphical representations, the two classes shared similar value ranges across the variables, with the primary difference being the imbalance in sample counts; Class 0 (non-FC) contained substantially more samples than Class 1 (FC). The total number of pixels considered for training the traditional models was approximately 103.3 million, of which about 91.7 million correspond to Class 0 (non-FC) and 11.6 million to Class 1 (FC). This represents a class imbalance in which Class 0 contains approximately 7.9 times more samples than Class 1.

An examination of the descriptive statistics (Table 1) further illustrates the strong overlap between classes. For example, Band 3 shows a mean value of 0.13 for Class 0 and 0.10 for Class 1; for Band 4, mean values are 0.12 and 0.09 (Class 0/Class 1); and for Band 6, 0.24 and 0.21 (Class 0/Class 1). For the NUI B5-B4, which is analogous to the traditional NDVI as it uses the same spectral bands but rescales values to the 0–1 range, the mean values are 0.82 for Class 0 and 0.84 for Class 1. Similarly, the NUI B6-B4 presents mean values of 0.75 and 0.77 for Class 0 and Class 1, respectively. Overall, the statistical measures exhibit comparable behaviour, reflecting similar value distributions across both classes for the 28 available variables derived from the seven Landsat 8 bands. This overlap in feature distributions helps explain the limited separability observed between FC and non-FC classes in the traditional ML models.

In the analysis of Pearson’s linear correlation (Appendix C) between variables (spectral bands and NUI), strong relationships were observed among some variables that share a common band used in the NUI formulation. For example, Bands 3 and 4 exhibit a correlation coefficient of 1.00, while Bands 6 and 4 show a coefficient of 0.84. The NUI B5-B4 displays a correlation of −0.82 with Band 4 and −0.39 with Band 5. Relying solely on the quantitative Pearson coefficient can be misleading; therefore, it should be complemented with scatterplot representations to verify the presence of linear relationships between variables. As in the histograms, in the scatterplots (Appendix D), Class 1 (associated with FCs) falls within the distribution of Class 0 (absence of FCs), indicating that both classes share similar characteristics within certain ranges of the variables.

Since no clear separation between classes was achieved using the original variables, a PCA (Appendix E) was performed to reduce the number of variables or dimensions. The PCA results yield new components, and it was found that 90% of the variance of the original data can be explained using the first four principal components. The first principal component alone accounts for about 68% of the variance. An examination of the absolute values of the variable loadings for this component indicates that the six most influential variables are Bands 1, 2, 3, and 4, together with the NUIs B6-B1 and B7-B1. In contrast, the four variables contributing least to this component are the NUIs B2-B1, B4-B3, B6-B5, and B7-B6. The NUI B5-B4 ranks eleventh in terms of contribution to the first component, followed by B6-B4 in twelfth position.

However, when these new components were plotted in scatterplots and coloured by class, it was still not possible to distinguish clear boundaries between them (Appendix F); what was achieved instead was a partial removal of the linear dependence among the original variables. Although PCA effectively reduced the number of variables for training, these new components were not used because an additional objective of this study was to facilitate the interpretation of how the original input variables influence model performance during training. PCA generates new components that combine information from multiple original variables [43]. In remote sensing, it is commonly used to reduce the dimensionality of multispectral and hyperspectral imagery, condensing the most relevant information into a few components [44,45].

3.2. Training

A total of 75 traditional models were trained to evaluate the presence of FCs at the pixel level. These comprised 30 logistic regression models, 30 random forest models, and 15 MLP classifier models. All traditional models were implemented using the Scikit-Learn library [46]. For each algorithm, the official documentation was consulted, and a small set of hyperparameters was adjusted, while the remaining settings were kept at their default values. For LR, only the solver was modified (solver = ‘saga’). For the RF classifier, the following parameters were used: n_estimators = 30, max_depth = 10, min_samples_split = 0.05, min_samples_leaf = 0.01, bootstrap = True, and max_samples = 0.7. Of the 15 MLP classifier models, 10 used a hidden-layer architecture of {28, 14, 7, 3}, while the remaining 5 employed {112, 56, 28, 14, 7, 3}. For both architectures, the remaining parameters were consistent: activation = ‘relu’, solver = ‘adam’, max_iter = 100, learning_rate = ‘adaptive’, learning_rate_init = 0.001, tol = 0.001, early_stopping = True, shuffle = True, n_iter_no_change = 10, and validation_fraction = 0.2.

For the deep learning experiments, the CNN architecture consisted of a convolutional feature extractor followed by a fully connected classifier. The network receives a seven-channel input tile and applies successive 3 × 3 convolutions with ReLU activations, using 32, 64, and 128 feature maps. Two 2 × 2 max-pooling operations are employed to reduce spatial dimensionality. The resulting feature maps are then flattened and passed through a dense layer of 128 neurons with ReLU activation, followed by an output layer that produces class scores for the two target categories (FC vs. non-FC).

For the 64 × 64-pixel U-Net, the model follows an encoder–decoder scheme based on repeated convolutional blocks. Each encoder block comprises two 3 × 3 convolutions with ReLU activation, followed by a 2 × 2 max-pooling operation. The number of channels increases progressively through the network (from the input depth of 7 or 12, then projected to 16, 32, and 64). The decoder reconstructs the output by upsampling and concatenating, at each level, the corresponding encoder feature maps via skip connections. The output layer is a 1 × 1 convolution that produces a binary segmentation mask (FC vs. non-FC). The second U-Net configuration, using 256 × 256-pixel tiles, takes a seven-channel input tensor but employs a deeper and wider encoder. It comprises four hierarchical levels with progressively increasing channel depths (64, 128, 256, and 512), followed by a bottleneck block with 1024 channels. At each level, two 3 × 3 convolutions with ReLU activation are applied, followed by 2 × 2 max-pooling. The decoder mirrors this process using 2 × 2 transposed convolutions for upsampling and concatenates encoder feature maps via skip connections. The final layer is a 1 × 1 convolution that outputs the full-resolution binary segmentation mask.

The training time per cycle (including dataset creation and model training) for the LR models ranged from 5.4 min (323 s) to 6.4 min (383 s), with an average of 5.9 min (351 s) per model. For the RF models, training cycles ranged from 7.8 min (467 s) to 8.5 min (508 s), with an average of 8.0 min (482 s) per model. The simpler MLP architecture required between 22.9 min (1376 s) and 113.5 min (6807 s) per cycle, with an average of 60.6 min (3633 s). The more complex MLP architecture required between 52.2 min (3133 s) and 105.0 min (6298 s) per cycle, with an average of 84.5 min (5071 s).

As increasingly complex models were implemented, training times increased, and the performance metrics improved; this is evident in the fact that the MLP models required more time to train but achieved better performance in both the training and test sets. Considering that training times grew with ML model complexity, and given the preliminary runtimes observed during the initial deep learning experiments, only a limited number of CNN and U-Net models were ultimately trained for the analysis (Appendix G), as will be presented later.

The Scikit-Learn library used for traditional models provides a score value, which corresponds to the accuracy metric. Accuracy ranges from 0.0 to 1.0, where 1.0 indicates perfect classification. The LR models achieved a mean training accuracy of 0.65 (the 25th, 50th, and 75th percentiles were also 0.65) and a mean test accuracy of 0.59 (with the 25th, 50th, and 75th percentiles likewise equal to 0.59). The RF models achieved a mean training accuracy of 0.66 (25th, 50th, and 75th percentiles all 0.66) and a mean test accuracy of 0.61, with test set percentiles of 0.60, 0.61, and 0.61 for the 25th, 50th, and 75th percentiles, respectively. The MLP classifier models with the simpler architecture produced a mean training accuracy of 0.66 (25th, 50th, and 75th percentiles of 0.66, 0.67, and 0.67, respectively) and a mean test accuracy of 0.57 (25th, 50th, and 75th percentiles of 0.57, 0.56, and 0.57, respectively).

For the CNN models, the training curves indicate loss values below approximately 0.2 from around epoch 15 onwards for both the training and validation datasets, except for one model that reached this threshold at approximately epoch 30. Accuracy values exceeded 0.90 in all cases, with no systematic divergence between the training and validation curves. This behaviour suggests that neither overfitting nor underfitting occurred during training. Although greater variability is observed in the validation curves, particularly during the early epochs, this is expected, as these data were not seen by the models during training.

The U-Net models generally exhibit similar behaviour in their loss curves, achieving values below 0.2 for both the training and validation datasets, albeit with noticeable variability in the validation loss. For models trained over a larger number of epochs, the loss curves progressively decreased to values below 0.10, in some cases approaching 0.05. However, these longer training runs also display occasional sharp increases or spikes in the loss values, suggesting that further fine-tuning of the training hyperparameters would be required to stabilise convergence.

Nevertheless, considering the objectives of this study, to evaluate the performance of different classification and segmentation models using satellite imagery as an initial exploratory input, rather than to identify and optimise a single “best” model, it is evident that deep learning approaches outperform traditional ML models in terms of predictive performance. This improvement, however, comes at the cost of substantially longer training times and the need for more complex model architectures.

3.3. Features Importance: Logistic Regression and Random Forest Classification

The logistic regression and random forest classification algorithms allow an approximation of variable importance during model training (Appendix H). In the case of logistic regression, this is expressed through the weights or coefficients assigned to each variable, while for random forest classification, it is given by the feature importance values. Because LR coefficients can be either positive or negative, we used their absolute values to make feature contributions comparable across variables. The most influential predictor was Band 3, with a mean absolute coefficient of 57.55 (25th, 50th, and 75th percentiles: 57.49, 57.58, 57.63). This was followed by the NUI B5-B2 (mean absolute coef. = 35.48; percentiles 35.43, 35.49, 35.53) and B7-B4 (mean absolute coef. = 35.18; percentiles 35.14, 35.19, 35.23). For RF, the mean feature importance values do not show a marked contrast among predictors; however, their distributions are noticeably more dispersed than in LR, indicating stronger variability across model realisations. The three highest-ranked predictors were all NUIs: B4-B2 (mean importance = 0.12; percentiles 0.10, 0.12, 0.17), B6-B4 (mean = 0.10; percentiles 0.08, 0.10, 0.12), and B4-B3 (mean = 0.09; percentiles 0.08, 0.09, 0.11). When jointly ranking variable importance from both algorithms (Table 2), ordered from highest to lowest importance, it is observed that, within the top half of the training variables, Landsat 8 Band 3 and Band 6 are consistently among the most relevant.

Band 3 corresponds to the green portion of the spectrum, and Band 6 to the first shortwave infrared band (SWIR1), which may serve as a criterion for future selection of spectral indices. For example, NDVI is computed using Bands 5 and 4 of Landsat 8, and while NDVI is one of the most commonly used indices for FC identification in the literature; the results here suggest that it may not be the most relevant for ML model performance. The NUI B5-B4, which is analogous to NDVI, ranks 19th out of 28 variables in the LR analysis, with a mean absolute coefficient of 13.46 (25th, 50th, and 75th percentiles: 13.40, 13.46, 13.51). Similarly, in the RF feature importance ranking, B5-B4 occupies the 21st position out of 28, with a mean importance value of 0.01 (25th, 50th, and 75th percentiles: 0.01, 0.02, 0.02). These results indicate that, despite its widespread use, NDVI-like indices contribute relatively little to class discrimination in this specific ML framework. Bands 3 and 6 also contribute to several of the NUI combinations calculated as complementary training variables. The five NUIs that appear as important in both the logistic regression and random forest models correspond to the relationships between bands B6-B4, B4-B3, B6-B5, B4-B1, and B7-B3.

Based on the joint ranking of the 28 training variables, we selected the variables that were common to both algorithms and located within the upper half of the ranked list for training the deep learning architectures (CNNs and U-Net). In total, seven shared variables were identified: Bands B3 and B6, and the five NUI combinations B6-B4, B4-B3, B6-B5, B4-B1, and B7-B3.

3.4. Predictions from Traditional Models

The precision, recall, F1, and AUC metrics for the test data were computed using the pixels associated with AOI 02 and AOI 04. Precision is defined as the proportion of true positives relative to the sum of true positives and false positives. Recall (or sensitivity) is defined as the proportion of true positives relative to the sum of true positives and false negatives, while the F1 score represents the balance between precision and recall. The AUC corresponds to the area under the Receiver Operating Characteristic (ROC) curve, which is constructed by plotting the true positive rate against the false positive rate.

Across the test set, the performance was broadly similar when compared to metrics of the traditional ML approaches. Precision remained low for all methods, with mean values of 0.12 for LR (25th/50th/75th percentiles = 0.12/0.12/0.12), 0.13 for RF (0.13/0.13/0.13), 0.12 for the simpler MLP (0.12/0.12/0.12), and 0.12 for the more complex MLP (0.11/0.12/0.12). Recall showed variation among models, with LR achieving the highest mean recall of 0.69 (0.68/0.69/0.69), followed by the more complex MLP with 0.65 (0.60/0.67/0.68), RF with 0.61 (0.61/0.61/0.62), and the simpler MLP with 0.60 (0.56/0.60/0.62). F1 scores were consistently low and nearly identical across methods, with mean values of 0.21 for LR (0.21/0.21/0.21) and RF (0.21/0.21/0.21), 0.20 for the simpler MLP (0.20/0.20/0.21), and 0.20 for the more complex MLP (0.19/0.20/0.20). AUC values were also comparable, with LR slightly higher at a mean of 0.65 (0.65/0.65/0.65), followed by RF at 0.64 (0.64/0.64/0.65), the simpler MLP at 0.64 (0.63/0.64/0.65), and the more complex MLP at 0.63 (0.63/0.63/0.63).

The predictions generated using the AOI 02 image set, an area where FCs are present, do not show a clear delineation of the FCs, but they do succeed in highlighting zones of potential interest that correspond to the target layer. This limited ability to outline their shape is because, in this first stage, the models make predictions based solely on pixel-level probabilities. The mean values and standard deviations (with standard deviations close to zero) obtained from the 30 logistic regression models (Figure 2) indicate highly consistent behaviour among all models. The coefficients of these 30 models for each training variable show negligible variation, reflecting strong similarity among the logistic regression models, despite efforts to introduce randomness during the construction of the training subsets.

The predictions generated using the 30 random forest models do show noticeable differences among them, reflected in higher standard deviation values, reaching up to 5% deviation in their predictions. The spatial distribution of the mean probability is similar to that observed in the logistic regression models: although the FC boundaries cannot be clearly delineated, areas of potential interest are consistently highlighted (Figure 3).

For the 15 MLP models, differences in predictions are also observed, similar to the random forest results, with standard deviation values reaching up to 20%, although values around 5% are more typical. It should be noted that, due to the relatively small number of MLP models, outliers can have a stronger effect on the reliability of the mean and standard deviation estimates. For both hidden-layer architectures, the shallower MLPs (Figure 4) and those with a greater number of hidden layers (Figure 5), the predicted probabilities tend to be higher, indicating greater confidence in identifying pixels that may belong to an FC.

When integrating the predictions from all 75 traditional ML models, the standard deviations generally remain below 5%, although some pixels reach values up to 20%. The mean probability maps show a diffuse distribution, limiting the ability to clearly delineate FCs but still highlighting potential areas of interest (Figure 6). When predictions are generated for tiled subsets of AOI 04 (Figure 7), which contains no FCs, the models occasionally highlight a few pixels as potential FC candidates. However, in all cases, these predictions appear more as noise rather than meaningful signals, and they do not illuminate coherent zones of interest as observed in the AOI 02 predictions.

3.5. CNN Models

A total of four models were trained using 64 × 64-pixel tiles and the 7 Landsat 8 bands. An initial resolution of 32 × 32 pixels had been considered, but the models were unable to generate accurate predictions, leading to an increase in tile size. This difficulty is likely related to the satellite’s spatial resolution and the initial tile size, which may have been too small to capture sufficient information to determine whether an FC structure was present within the tile. The limited number of trained models is because CNNs predict only the probability that the input tile contains or does not contain an FC, without providing a visual representation of its spatial location (Figure 8). For this reason, only the influence of tile size was considered for future training stages. Nevertheless, the results represent an improvement compared with the predictions produced by the traditional pixel-based models used in the first stage.

3.6. U-Net Models

Deep learning models can be applied to both image classification (CNN models) and image segmentation (U-Net models). Classification assigns a single label to the entire input image, without explicitly accounting for the spatial extent of the labelled object within that image. By contrast, segmentation estimates the object’s extent by performing pixel-wise classification, thereby enabling the delineation of the object and its internal structure within the image [47].

Given that the CNN results showed improved predictions, largely because CNNs process entire images rather than individual pixels and are designed to extract spatial features for classification rather than segmentation, a U-Net architecture was adopted for the next stage. Since U-Net models can generate pixel-level segmentation masks of subcircular structures associated with FCs, a total of eleven models were trained. Five of these models used 64 × 64-pixel input images, while the remaining six used 256 × 256-pixel images. The number of training variables varied among models. For the 64 × 64-pixel models, some were trained using only the seven Landsat 8 bands. Others used seven variables corresponding to those identified as important in the first stage (Bands B3 and B6, and the five NUI combinations B6-B4, B4-B3, B6-B5, B4-B1, and B7-B3). A third group was trained with twelve variables: the seven Landsat bands plus the five NUI variables identified in the first stage.

Predictions for AOI 02 (Figure 9), where FCs are present, and AOI 04 (Figure 10), where no FCs occur, show clear improvements, with more continuous areas being delineated and sharper boundaries around potential structures. In AOI 02, the predictions resemble subcircular FC-like patterns, especially when using the seven variables that include the NUI combinations. In AOI 04, the models produce scattered areas of potential interest, but without subcircular patterns, consistent with the expectation that AOI 04 contains no FC structures.

These results, obtained using 64 × 64-pixel input images, motivated the training of models with a resolution of 256 × 256 pixels, grouped into two configurations. The first 256 × 256 pixel group used only the seven Landsat 8 bands, while the second group used seven variables corresponding to Bands B3 and B6, together with the five NUI combinations B6-B4, B4-B3, B6-B5, B4-B1, and B7-B3. For both groups, the predictions already exhibit structures resembling FCs; however, in AOI 02, where FCs are present, the models using Bands B3 and B6 plus the five NUIs produce more conservative predictions, with a spatial distribution more similar to the target layer (Figure 11). In AOI 04, where FCs are absent, the models that incorporate the NUIs also yield more conservative outputs, with fewer areas predicted as FCs (Figure 12). Nonetheless, in both model groups, FCs are still estimated in areas where they do not occur (AOI 04), likely associated with the presence of shadows and clouds in the imagery.

4. Discussion

Previous studies [26,48,49,50] have commonly applied ML approaches using predefined spectral inputs. By contrast, our study explicitly evaluates the contribution and interpretability of individual spectral bands and indices, derived from traditional ML analyses, to inform the subsequent design of deep learning models, with segmentation-based architectures providing the most informative spatial context. Recent segmentation-oriented ML architectures increasingly prioritise structural coherence, which is crucial for delineating geomorphological features, as demonstrated in applications such as alluvial mapping [51], rock glacier monitoring [52], and flood mapping [53].

Within this context, the progressive methodology adopted in this study provides a structured way to link variable interpretability and model complexity. By first analysing pixel-based classifiers, it becomes possible to assess the relative contribution of individual spectral bands and indices under a controlled and interpretable framework, and then to transfer this knowledge to deep learning architectures that better preserve spatial relationships.

The exploratory data analysis stratified by class revealed that both FC and non-FC pixels occupy largely overlapping ranges in the distributions of the training variables, which substantially hinders class separability. Traditional statistical methods may struggle as the number of variables or data dimensionality increases, leading to mathematical challenges, since not all measured variables necessarily contribute to understanding the underlying phenomena of interest [54]. Although PCA reduced linear dependence among some variables, the overlap between classes persisted. While no models were trained using dimensionality reduction, it is recommended that future studies consider this approach, as applying PCA reduced the original 28 variables to 5 components explaining at least 90% of the data variance, which could help decrease computational time during model training.

At the pixel level, the class overlap observed in the descriptive statistics persists even after applying PCA. For example, the NUI B5-B4 (analogous to NDVI) shows only a 0.05 difference in the first quartile, with higher values for Class 1, while the third quartile is identical for both classes (Table 1). Similarly, the NUI B6-B4, identified as relevant in the variable-importance analysis, exhibits only a 0.04 difference in the first quartile (higher for Class 1) and a 0.01 difference in the third quartile (higher for Class 0). Overall, these small shifts imply that for roughly half of the samples the two classes are nearly indistinguishable based on these predictors and have very similar distributional summaries. Although PCA was used for dimensionality reduction, neither B5-B4 nor B6-B4 contributed strongly to the leading components; B5-B4 and B6-B4 rank eleventh and twelfth, respectively, in their contribution to the first principal component. This persistent overlap helps explain why linear projections such as PCA do not produce clear class separation and underscores the intrinsic difficulty of discriminating FC from non-FC pixels using spectral information alone.

Class imbalance is a well-known issue in machine learning, often causing models to primarily learn the majority class. This imbalance leads to biassed decision thresholds in classification algorithms, resulting in poorly defined decision boundaries and misleading performance metrics. For this reason, balancing strategies such as downsampling the majority class, upsampling the minority class, or using class weights are recommended [55,56,57] so that models pay equal attention to all classes, which, in this case, are the subcircular structures associated with potential natural hydrogen sources. In this study, class imbalance was addressed through downsampling and the training of multiple LR, RF, and MLP models, which yielded favourable results when evaluating the study areas at the pixel level. This approach also allowed the computation of additional statistics, such as the standard deviation and mean values of predictions across the 75 models.

For LR and RF, it is possible to derive information on variable importance. In the case of LR, the estimated coefficients showed little to no variation among models, reflecting the simplicity and linear nature of the algorithm, which limits its ability to capture complex relationships. In contrast, RF models exhibited variation in the ranking of feature importance while still revealing a consistent pattern in terms of which variables were more relevant than others. Although variable importance measures in RF are useful for feature selection, they may be affected by differences in measurement scales or in the number of categories [58]. In this study, all input variables were normalised to a 0–1 range, and bootstrap sampling prior to training was used to mitigate these issues.

Band 3 and Band 6 provided the largest information gain in the classical models, suggesting that spectral indices incorporating these bands may be particularly informative. At this initial stage, the objective of the pixel-based classifiers was to assign each pixel to the presence or absence of FC-like structures, and vegetation-based indices are commonly used in the literature for FC detection. However, the contribution of any given index is likely to be site-dependent: local factors such as climatic variability, soil properties, and their temporal dynamics can amplify or constrain its predictive value [22]. Consistent with this observation, ANH and UPTC [59] performed pixel-level mapping of targets for natural hydrogen prospecting using satellite imagery and included terrain slope as an additional predictor; in their random forest training, slope ranked among the most influential variables for distinguishing FC pixels.

In the Carolina Bays region, Lundine et al. [34] applied deep learning models to morphometric analyses based on DEM data. They also evaluated pixel-based machine learning algorithms and reported limitations in consistently detecting subcircular structures. In particular, some classifiers were unable to clearly distinguish these structures from a stream segment present in the DEM. Moreover, several methods produced a salt-and-pepper classification pattern, in which pixels within the bays were incorrectly labelled as non-bay areas, a behaviour typical of pixel-based approaches. By contrast, when deep learning models trained on LiDAR-derived elevation data (using elevation information only) yielded a more coherent identification of these landforms [34]. These findings suggest that topography, and more broadly terrain-derived variables, can provide complementary information for representing the geometry of subcircular structures, potentially supporting their delineation and interpretation in studies aiming to identify potential FC.

Most of these traditional spectral indices are vegetation-related, and in many cases, they must be adjusted or recalibrated for a new study area [60,61,62]. In this context, a decrease in vegetation response does not necessarily imply the presence of hydrogen emissions. Likewise, an increase or decrease in hydrogen emissions would not immediately translate into a direct response in vegetation; such effects may take some time to become apparent. Establishing a robust relationship between these processes therefore requires dedicated field investigations and multitemporal monitoring. Accordingly, this study considered a broad temporal window of satellite imagery (March 2013 to January 2025) to maximise the amount of available data and to allow the models to learn which spectral variables contribute most to the classification task. In addition, we trained 60 classical models (30 LR and 30 RF models) for which feature importance can be readily extracted, enabling us to quantify the variability in the estimated importance of the training variables.

When performance was compared across the two test areas (AOI 02 and AOI 04), the traditional ML models exhibited only modest differences in their mean metrics. Precision remained consistently low and broadly similar across methods. Recall showed greater variation: logistic regression achieved the highest average recall (0.69), followed by the more complex MLP models (0.65), whereas the random forest and simpler MLP architectures yield slightly lower values. F1 scores remain uniformly low (mean 0.20–0.21), indicating limited balance between precision and recall. AUC values are moderate and comparable across approaches (mean 0.63–0.65), suggesting that no traditional model clearly outperforms the others. Overall, these results indicate a high false-positive rate, driven in largely by pronounced class imbalance. The limited performance is also consistent with the substantial class overlap observed in the exploratory analysis, which discriminates based on pixel-level spectral predictors, particularly challenging for these models.

However, qualitative visual inspection of the spatial predictions reveals a clear mismatch between the metric-based evaluation and the models’ practical behaviour. Although LR achieves marginally higher average scores, its spatial outputs, together with the mean probability and standard deviation maps, are comparable to those of the other traditional models and still fail to delineate FC-like subcircular structures. The predicted probabilities are diffuse and show little correspondence with the expected edges or geometry of these features. By contrast, the deep learning models produce markedly more coherent spatial representations. In particular, the U-Net improves boundary delineation of FC-like structures, even when the subcircular geometry is not recovered perfectly. CNN models, while often outperforming traditional approaches at the tile (image) level, remain largely limited to indicating the presence or absence of FC-like patterns and do not provide explicit localisation.

The probability of recognising FC-like structures is strongly influenced by tile size, that is, by the spatial context of the image, as a larger spatial extent allows the geometry of the object to be described in greater detail. In a complementary manner, the spatial resolution of the image or scene directly affects the ability to characterise its extent, smooth edges, and closed contours [26,34,48,63,64,65]. When analyses are performed at the pixel level or using small tiles, the available information is essentially local, which limits spatial context and can lead to misclassifications, as well as to diffuse probability maps and fragmented detections. By contrast, the use of larger tiles enables a more complete representation of FC structures, preserving boundary continuity and the contrast between the interior and exterior of the object. Consequently, for future studies or applications in other regions where FCs are present, it is essential to select tile sizes and spatial resolutions that preserve the key geometric characteristics of these structures.

The objective of this research was not to optimise the traditional models nor the deep learning models (CNNs and U-Nets), but rather to evaluate the influence of satellite image data on the identification of potential FCs. As expected, and consistent with previous studies [26,48,65], the best results were obtained using the more complex models, specifically U-Net architectures with an input resolution of 256 × 256 pixels, integrating Bands B3 and B6 with the five NUIs (B6-B4, B4-B3, B6-B5, B4-B1, and B7-B3). The use of a higher input resolution led to improvements in the detection of potential FCs, yielding structures with a more clearly subcircular geometry. The size of the input image directly controls the amount of spatial context available to the model by expanding the neighbourhood around each pixel, thereby improving the delineation of extended structures and reducing ambiguity in areas with diffuse boundaries [63,64].

Despite these improvements, U-Net models still predicted FC-like structures in areas where no FCs are present. This behaviour may be partly explained by the fact that, although cloud and shadow masks were applied during preprocessing, they were intentionally not reapplied to the model outputs during prediction. This decision allowed us to assess model robustness to residual cloud and shadow contamination and to explore whether FC could be detected without an additional postprocessing step, such as reapplying the cloud and shadow masks to the predicted outputs. Reintroducing these masks during the postprocessing stage could help mitigate false positives. More broadly, these challenges and the presence of artefacts caused by noise or errors related to clouds and shadows in satellite imagery underscore the importance of appropriate masking strategies to minimise their impact on model performance [66,67,68].

By contrast, the models trained in the initial stage, although unable to clearly delineate FC structures, were comparatively more stable in areas without FCs and avoided widespread false predictions. Consequently, the use of multiple complementary models, i.e., ensemble learning [69,70,71,72], should be considered for future investigations.

Hydrogen occurrence is not exclusively associated with FCs, as evidenced by the compilation presented in McMahon et al. [7]. It is therefore important to incorporate complementary information into the models, such as morphometric and regional characteristics of the study area. These variables may differ in spatial resolution from satellite imagery or may sometimes be difficult to obtain. In this study, such additional potential input variables were not included; instead, satellite images were prioritised because they exhibit higher temporal variability and capture changes in vegetation and soil [73,74], whereas variables such as topography and geology do not show significant temporal variation.

Our methodology, iteratively training multiple models for variable selection, is consistent with practical workflows in supervised classification, where different predictor subsets are evaluated using a representative, class-balanced dataset to identify informative feature combinations [75]. Dimensionality reduction techniques can also help extract salient information without substantially compromising classification performance. In this context, supervised dimensionality-reduction methods often achieve higher accuracy than unsupervised alternatives, although unsupervised feature extraction can still provide satisfactory results in some settings [76].

Satellite imagery plays a central role in exploration, planning, and monitoring of renewable energy projects, including environmental impact assessment and verification of emission reductions [30]. These satellite data also contribute to characterising the geological features of a region [77], although such characteristics do not vary significantly over time. Nonetheless, there are still technical, social, and structural barriers that hinder their use and the understanding of study areas [30]. Specifically in hydrogen exploration, previous studies [20,21,59] have shown that incorporating terrain morphometric variables, such as slope, provides additional information on the shape of the structures and helps to contextualise and identify FCs [26].

Potential future work includes exploring recent techniques that optimise these models for specific image-processing tasks. These include transfer learning, deep residual networks, attention mechanisms, transformers, generative and adversarial networks, and multimodal models. Such approaches could be applied, for example, to enhance image resolution in order to better capture relevant details in tasks such as semantic segmentation [49,50,78,79,80].

5. Conclusions

The variable importance analysis highlighted the relevance of Landsat 8 Bands B3 (green) and B6 (shortwave infrared 1), together with several Normalised Unit Indices derived from them (B6-B4, B4-B3, B6-B5, B4-B1, and B7-B3). Incorporating indices based on these bands can enhance the sensitivity of the models for detecting subcircular structures associated with potential hydrogen emissions. In this regard, future research should incorporate morphometric variables of the subcircular structures, since terrain geometry and slope can provide key complementary information for differentiating patterns associated with hydrogen emanations linked to fairy circles.

The proposed feature selection strategy, based on a progression from traditional models to more complex architectures, demonstrates that input selection should not be limited to a few traditional indices commonly reported in the literature. Instead, it requires a more robust criterion tailored to the local context. By integrating additional variables, it is possible to perform progressive training, moving from simple to more complex models, thereby identifying the most representative variables for each specific case study.

Traditional ML models, including logistic regression, random forest, and MLP, provided stable initial estimates at the pixel level, whereas more complex architectures like CNNs and U-Nets showed a greater ability to represent subcircular structures. Nevertheless, limitations were observed both in the precise delineation of these structures and in the occurrence of false detections in areas without fairy circles. These findings suggest that combining complementary models within ensemble learning schemes may be a promising strategy to improve the reliability of predictions. Finally, it is essential to complement these approaches with targeted field campaigns to acquire measurements and spectral signatures directly linked to hydrogen occurrence, thereby strengthening interpretation and supporting the verification of both the models and their input data.

Supplementary Materials

The notebooks used for model training and the analyses conducted in this study are available in the following repository: https://github.com/sergioGarcia91/ML_Carolina_Bays (accessed on 2 February 2026).

Author Contributions

All authors contributed to the study conception and design. S.G.-A.: methodology, formal analysis, writing—original draft. M.A.F.: supervision, methodology. J.A.V.O.: supervision, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

This appendix includes two example figures showing box-and-whisker plots, where Class 0 (no FC) and Class 1 (FC) exhibit overlap and share a portion of their value ranges.

Figure A1. Box-and-whisker plot of Landsat-8 Band 3 reflectance values, showing a similar value range and overlap between classes. Class 0 corresponds to the absence of FCs, and Class 1 corresponds to the presence of FCs at the pixel level.

Figure A2. Box-and-whisker plot of the NUI B6-B4 index values, showing a similar value range and overlap between classes. Class 0 corresponds to the absence of FCs, and Class 1 corresponds to the presence of FCs at the pixel level.

Appendix B

This appendix includes two figures showing histogram plots, where overlap is observed between Class 0 (no FC) and Class 1 (FC).

Figure A3. Histogram of the reflectance values of Landsat-8 Band 3, showing overlap between the classes. Class 0 corresponds to the absence of FCs and Class 1 corresponds to the presence of FCs at the pixel level.

Figure A4. Histogram of the NUI B6-B4 index values, showing overlap between the classes. Class 0 corresponds to the absence of FCs and Class 1 corresponds to the presence of FCs at the pixel level.

Appendix C

This appendix presents the figure corresponding to the Pearson linear correlation matrix. In this matrix, values close to 1 or −1 indicate a strong linear correlation between two variables, whereas values near 0 indicate no linear correlation. The matrix was arranged along the vertical axis so that the correlations are displayed from highest to lowest, with the input variables showing the weakest relationships positioned at the bottom of the plot.

Figure A5. Pearson linear correlation matrix showing the potential linear relationships among variables.

Appendix D

This appendix includes two example figures showing scatter plots between the input variables. These plots illustrate that some variable pairs exhibit linear relationships while others do not. The overlap between Class 0 (no FC) and Class 1 (FC) is also evident once again.

Figure A6. Scatter plot between Landsat-8 Bands 2 and 3, showing a linear relationship between the variables and overlap between the classes. Class 0 corresponds to the absence of FCs and Class 1 corresponds to the presence of FCs at the pixel level.

Figure A7. Scatter plot between Landsat-8 Band 2 and the NUI B5-B2 index, showing no linear relationship between the variables but clear overlap between the classes. Class 0 corresponds to the absence of FCs and Class 1 corresponds to the presence of FCs at the pixel level.

Appendix E

This appendix presents the figures related to the Principal Component Analysis (PCA) procedure. The histogram of explained variance for each newly generated component is highlighted, showing that the first four to five components account for more than 90% of the variance in the original data. Additionally, the loadings of the original variables were examined for each component. Since these loadings contain both positive and negative values, their absolute values were computed to more clearly visualise the relative contribution of each original variable to each component. For the last component, the contribution appears to be concentrated in only a few variables.

Figure A8. Explained variance of the new components after applying PCA. The first four components account for more than 90% of the variance in the original data.

Figure A9. Variable loadings for the first principal component. The values include both positive and negative magnitudes.

Figure A10. Absolute values of the variable loadings for the first principal component, showing which variables contribute the most information to this new component or dimension.

Figure A11. Absolute values of the variable loadings for the twenty-eighth principal component, showing which variables contribute the most information to this component or dimension. This component corresponds to the last one and carries the least amount of information from the original dataset.

Appendix F

This appendix presents two example figures showing scatter plots of the new components generated using PCA. These plots illustrate that linear relationships present in the original variables are removed in some components and preserved in others. However, overlap between Class 0 (no FC) and Class 1 (FC) remains.

Figure A12. Scatter plot between the first and second principal components, showing no linear relationship between the variables while still exhibiting overlap between the classes. Class 0 corresponds to the absence of FCs and Class 1 corresponds to the presence of FCs at the pixel level.

Figure A13. Scatter plot between the first and twelfth principal components, showing a linear relationship between the variables while still exhibiting overlap between the classes. Class 0 corresponds to the absence of FCs and Class 1 corresponds to the presence of FCs at the pixel level.

Appendix G

This appendix includes figures related to the model training stage, including box-and-whisker plots showing the training times required for the traditional models, along with their training and testing metrics. For the more complex models, such as CNNs and U-Nets, the corresponding training and validation curves are also provided.

Figure A14. Training times (in seconds) required for each of the traditional models. Logistic regression and random forest models required less time to train compared with the multilayer perceptron models. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A15. Training scores obtained for the traditional models using data from AOI 01 and AOI 03. The logistic regression models show nearly identical training scores with minimal variation, while the random forest models exhibit slightly more variability. In contrast, the multilayer perceptron models display clearer differences in their training scores. Nonetheless, all values remain around 0.6, indicating similar behaviour across models and an acceptable overall performance. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A16. Test scores obtained for the traditional models during evaluation on AOI 02 and AOI 04. The models show a slightly lower performance compared with the training stage, with this decrease being more noticeable in some of the multilayer perceptron models. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A17. Precision metric for the traditional models during the testing stage. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A18. Accuracy metric for the traditional models during the testing stage. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A19. Recall metric for the traditional models during the testing stage. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A20. AUC metric for the traditional models during the testing stage. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A21. F1 metric for the traditional models during the testing stage. The orange points indicate the individual values for each model, shown as a swarm plot overlaid on the box plot.

Figure A22. Training and validation curves for two of the four CNN models trained. (a) Initial test model with a batch size of 32, trained for a maximum of 20 epochs. (b) Model with a batch size of 256, trained for a maximum of 100 epochs.

Figure A23. Training and testing curves of the 64 × 64 U-Net model trained using 12 input variables: the seven Landsat-8 bands and the five NUIs (B6-B4, B4-B3, B6-B5, B4-B1, and B7-B3) identified as the most relevant.

Figure A24. Training and testing curves for the two 256 × 256 U-Net models. (a) Model trained using only the seven Landsat 8 satellite bands. (b) Model trained using the most relevant bands and indices.

Appendix H

This appendix presents two figures related to the importance of the training variables in the logistic regression models and the random forest classification models. For logistic regression, because it is a strictly linear model, the coefficients can take positive or negative values and show no variation across the different trained models, which is consistent with the linear nature of the algorithm. In contrast, the random forest models exhibit some variation in the importance assigned to the training variables when classifying a pixel as Class 0 (no FC) or Class 1 (FC).

Figure A25. Box-and-whisker plot of the coefficients (weights) for each training variable across the 30 logistic regression models. No variation is observed in the coefficient values for each variable, highlighting the inherently linear nature of logistic regression. The weights can take positive or negative values, which can be interpreted as contributing positively (in favour) or negatively (against) toward classifying a pixel as an FC. Evaluating the absolute values (magnitudes) of the coefficients provides clearer insight into which variables are most influential.

Figure A26. Box-and-whisker plot of the absolute values of the coefficients (weights) from the 30 logistic regression models. The plot shows that variable B3 has the highest weight, followed by variables B5_B2, B7_B4, B6_B4, and B4. Beyond this point, the remaining variables exhibit similar weight magnitudes.

Figure A27. Box-and-whisker plots of variable importance obtained from the 30 random forest models. The first eight variables (B4_B2, B6_B4, B4_B3, B7_B6, B7_B3, B7_B2, B4_B1, B5, B6) exhibit the highest relevance, whereas the remaining variables contribute little to the classification of a pixel as FC.

References

Aimikhe, V.J.; Eyankware, O.E. Recent Advances in White Hydrogen Exploration and Production: A Mini Review. J. Energy Res. Rev. 2023, 13, 64–79. [Google Scholar] [CrossRef]
Boreham, C.J.; Edwards, D.S.; Czado, K.; Rollet, N.; Wang, L.; van der Wielen, S.; Champion, D.; Blewett, R.; Feitz, A.; Henson, P.A. Hydrogen in Australian Natural Gas: Occurrences, Sources and Resources. APPEA J. 2021, 61, 163–191. [Google Scholar] [CrossRef]
Gaucher, E.C. New Perspectives in the Industrial Exploration for Native Hydrogen. Elements 2020, 16, 8–9. [Google Scholar] [CrossRef]
Lévy, D.; Roche, V.; Pasquet, G.; Combaudon, V.; Geymond, U.; Loiseau, K.; Moretti, I. Natural H2 Exploration: Tools and Workflows to Characterize a Play. Sci. Technol. Energy Transit. 2023, 78, 27. [Google Scholar] [CrossRef]
Lévy, D.; Boka-Mene, M.; Meshi, A.; Fejza, I.; Guermont, T.; Hauville, B.; Pelissier, N. Looking for Natural Hydrogen in Albania and Kosova. Front. Earth Sci. 2023, 11, 1167634. [Google Scholar] [CrossRef]
Mainson, M.; Heath, C.; Pejcic, B.; Frery, E. Sensing Hydrogen Seeps in the Subsurface for Natural Hydrogen Exploration. Appl. Sci. 2022, 12, 6383. [Google Scholar] [CrossRef]
McMahon, C.J.; Roberts, J.J.; Johnson, G.; Edlmann, K.; Flude, S.; Shipton, Z.K. Natural Hydrogen Seeps as Analogues to Inform Monitoring of Engineered Geological Hydrogen Storage. Geol. Soc. Lond. Spec. Publ. 2023, 528, 461–489. [Google Scholar] [CrossRef]
Moretti, I.; Baby, P.; Alvarez Zapata, P.; Mendoza, R.V. Subduction and Hydrogen Release: The Case of Bolivian Altiplano. Geosciences 2023, 13, 109. [Google Scholar] [CrossRef]
Moretti, I.; Geymond, U.; Pasquet, G.; Aimar, L.; Rabaute, A. Natural Hydrogen Emanations in Namibia: Field Acquisition and Vegetation Indexes from Multispectral Satellite Image Analysis. Int. J. Hydrogen Energy 2022, 47, 35588–35607. [Google Scholar] [CrossRef]
Prinzhofer, A.; Moretti, I.; Françolin, J.; Pacheco, C.; D’Agostino, A.; Werly, J.; Rupin, F. Natural Hydrogen Continuous Emission from Sedimentary Basins: The Example of a Brazilian H2-Emitting Structure. Int. J. Hydrogen Energy 2019, 44, 5676–5685. [Google Scholar] [CrossRef]
Prinzhofer, A.; Tahara Cissé, C.S.; Diallo, A.B. Discovery of a Large Accumulation of Natural Hydrogen in Bourakebougou (Mali). Int. J. Hydrogen Energy 2018, 43, 19315–19326. [Google Scholar] [CrossRef]
Zgonnik, V. The Occurrence and Geoscience of Natural Hydrogen: A Comprehensive Review. Earth. Sci. Rev. 2020, 203, 103140. [Google Scholar] [CrossRef]
Tschinkel, W.R. The Life Cycle and Life Span of Namibian Fairy Circles. PLoS ONE 2012, 7, e38056. [Google Scholar] [CrossRef]
Langhi, L.; Strand, J. Exploring Natural Hydrogen Hotspots: A Review and Soil-Gas Survey Design for Identifying Seepage. Geoenergy 2023, 1. [Google Scholar] [CrossRef]
Aimar, L.; Frery, E.; Strand, J.; Heath, C.; Khan, S.; Moretti, I.; Ong, C. Natural Hydrogen Seeps or Salt Lakes: How to Make a Difference? Grass Patch Example, Western Australia. Front. Earth Sci. 2023, 11, 1236673. [Google Scholar] [CrossRef]
Carrillo Ramirez, A.; Gonzalez Penagos, F.; Rodriguez, G.; Moretti, I. Natural H2 Emissions in Colombian Ophiolites: First Findings. Geosciences 2023, 13, 358. [Google Scholar] [CrossRef]
Donzé, F.-V.; Truche, L.; Shekari Namin, P.; Lefeuvre, N.; Bazarkina, E.F. Migration of Natural Hydrogen from Deep-Seated Sources in the São Francisco Basin, Brazil. Geosciences 2020, 10, 346. [Google Scholar] [CrossRef]
Frery, E.; Langhi, L.; Maison, M.; Moretti, I. Natural Hydrogen Seeps Identified in the North Perth Basin, Western Australia. Int. J. Hydrogen Energy 2021, 46, 31158–31173. [Google Scholar] [CrossRef]
Larin, N.; Zgonnik, V.; Rodina, S.; Deville, E.; Prinzhofer, A.; Larin, V.N. Natural Molecular Hydrogen Seepage Associated with Surficial, Rounded Depressions on the European Craton in Russia. Nat. Resour. Res. 2015, 24, 369–383. [Google Scholar] [CrossRef]
Moretti, I.; Brouilly, E.; Loiseau, K.; Prinzhofer, A.; Deville, E. Hydrogen Emanations in Intracratonic Areas: New Guide Lines for Early Exploration Basin Screening. Geosciences 2021, 11, 145. [Google Scholar] [CrossRef]
Zgonnik, V.; Beaumont, V.; Deville, E.; Larin, N.; Pillot, D.; Farrell, K.M. Evidence for Natural Molecular Hydrogen Seepage Associated with Carolina Bays (Surficial, Ovoid Depressions on the Atlantic Coastal Plain, Province of the USA). Prog. Earth Planet. Sci. 2015, 2, 31. [Google Scholar] [CrossRef]
Bannari, A.; Morin, D.; Bonn, F.; Huete, A.R. A Review of Vegetation Indices. Remote Sens. Rev. 1995, 13, 95–120. [Google Scholar] [CrossRef]
Chen, H.; Yang, L.; Wu, Q. Enhancing Land Cover Mapping and Monitoring: An Interactive and Explainable Machine Learning Approach Using Google Earth Engine. Remote Sens. 2023, 15, 4585. [Google Scholar] [CrossRef]
Li, K.; Wang, J.; Cheng, W.; Wang, Y.; Zhou, Y.; Altansukh, O. Deep Learning Empowers the Google Earth Engine for Automated Water Extraction in the Lake Baikal Basin. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102928. [Google Scholar] [CrossRef]
Nigar, A.; Li, Y.; Jat Baloch, M.Y.; Alrefaei, A.F.; Almutairi, M.H. Comparison of Machine and Deep Learning Algorithms Using Google Earth Engine and Python for Land Classifications. Front. Environ. Sci. 2024, 12, 1378443. [Google Scholar] [CrossRef]
Noy, K.; Silver, M.; Pesek, O.; Yizhaq, H.; Marais, E.; Karnieli, A. Spatial and Spectral Analysis of Fairy Circles in Namibia on a Landscape Scale Using Satellite Image Processing and Machine Learning Analysis. Int. J. Appl. Earth Obs. Geoinf. 2023, 121, 103377. [Google Scholar] [CrossRef]
Adiri, Z.; Lhissou, R.; El Harti, A.; Jellouli, A.; Chakouri, M. Recent Advances in the Use of Public Domain Satellite Imagery for Mineral Exploration: A Review of Landsat-8 and Sentinel-2 Applications. Ore Geol. Rev. 2020, 117, 103332. [Google Scholar] [CrossRef]
Matthieu Tshanga, M.; Ncube, L.; van Niekerk, E. Remote Sensing Insights into Subsurface-Surface Relationships: Land Cover Analysis and Copper Deposits Exploration. Earth Sci. Inform. 2024, 17, 3979–4000. [Google Scholar] [CrossRef]
Ginzburg, N.; Daynac, J.; Hesni, S.; Geymond, U.; Roche, V. Identification of Natural Hydrogen Seeps: Leveraging AI for Automated Classification of Sub-Circular Depressions. Earth Space Sci. 2025, 12, e2025EA004227. [Google Scholar] [CrossRef]
Edwards, M.R.; Holloway, T.; Pierce, R.B.; Blank, L.; Broddle, M.; Choi, E.; Duncan, B.N.; Esparza, Á.; Falchetta, G.; Fritz, M.; et al. Satellite Data Applications for Sustainable Energy Transitions. Front. Sustain. 2022, 3, 910924. [Google Scholar] [CrossRef]
Lundine, M.; Trembanis, A. Investigating the Origin and Dynamics of Carolina Bays. Mar. Geol. 2025, 480, 107449. [Google Scholar] [CrossRef]
Brooks, M.J.; Taylor, B.E.; Ivester, A.H. Carolina Bays: Time Capsules of Culture And Climate Change. Southeast. Archaeol. 2010, 29, 146–163. [Google Scholar] [CrossRef]
Zamora, A. A Model for the Geomorphology of the Carolina Bays. Geomorphology 2017, 282, 209–216. [Google Scholar] [CrossRef]
Lundine, M.; Trembanis, A. Using Convolutional Neural Networks for Detection and Morphometric Analysis of Carolina Bays from Publicly Available Digital Elevation Models. Remote Sens. 2021, 13, 3770. [Google Scholar] [CrossRef]
Cottrell, C.; Zamora, A. Interpreting the Geomorphology of Carolina Bays as Secondary Impact Structures. J. Environ. Earth Sci. 2025, 7, 111–124. [Google Scholar] [CrossRef]
Sugimoto, A.; Fujita, N. Hydrogen Concentration and Stable Isotopic Composition of Methane in Bubble Gas Observed in a Natural Wetland. Biogeochemistry 2006, 81, 33–44. [Google Scholar] [CrossRef]
Cox, D.R. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Representations by Back-Propagating Errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Earth Resources Observation and Science Center (EROS). Landsat 8-9 Operational Land Imager/Thermal Infrared Sensor Level-2, 2nd ed.; Earth Resources Observation and Science Center: Sioux Falls, SD, USA, 2020.
Abdi, H.; Williams, L.J. Principal Component Analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Lu, M.; Hamunyela, E.; Verbesselt, J.; Pebesma, E. Dimension Reduction of Multi-Spectral Satellite Image Time Series to Improve Deforestation Monitoring. Remote Sens. 2017, 9, 1025. [Google Scholar] [CrossRef]
Vaddi, R.; Phaneendra Kumar, B.L.N.; Manoharan, P.; Agilandeeswari, L.; Sangeetha, V. Strategies for Dimensionality Reduction in Hyperspectral Remote Sensing: A Comprehensive Overview. Egypt. J. Remote Sens. Space Sci. 2024, 27, 82–92. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, G.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2012, 12, 2825–2830. [Google Scholar]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Mahanta, H.J.; Tiwari, V.M. Deep Learning Twined Spatial Analysis for Detection of Mysterious Fairy Circles. Sci. Rep. 2025, 15, 40668. [Google Scholar] [CrossRef] [PubMed]
Trigka, M.; Dritsas, E. A Comprehensive Survey of Deep Learning Approaches in Image Processing. Sensors 2025, 25, 531. [Google Scholar] [CrossRef]
Ji, Y.; Shi, W.; Lei, J.; Ding, J. DBRSNet: A Dual-Branch Remote Sensing Image Segmentation Model Based on Feature Interaction and Multi-Scale Feature Fusion. Sci. Rep. 2025, 15, 27786. [Google Scholar] [CrossRef]
Dawson, M.; Lewin, J. Automated Classification and Mapping for Alluvial Geomorphic Units: Current Approaches and Future Directions. Earth Sci. Rev. 2025, 271, 105292. [Google Scholar] [CrossRef]
Liu, Y.; Xing, T.; Yao, X. Deep Learning-Based Remote Sensing Monitoring of Rock Glaciers—Preliminary Application in the Hunza River Basin. Remote Sens. 2025, 17, 3942. [Google Scholar] [CrossRef]
Rossi, M.J.; Vervoort, R.W. Enhancing Inundation Mapping with Geomorphological Segmentation: Filling in Gaps in Spectral Observations. Sci. Total Environ. 2025, 997, 180180. [Google Scholar] [CrossRef] [PubMed]
Fodor, I.K. A Survey of Dimension Reduction Techniques; Center for Applied Scientific Computing, Livermore National Laboratory: Livermore, CA, USA, 2002.
Guo, H.; Viktor, H.L. Learning from Imbalanced Data Sets with Boosting and Data Generation. ACM SIGKDD Explor. Newsl. 2004, 6, 30–39. [Google Scholar] [CrossRef]
He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Japkowicz, N.; Stephen, S. The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
Strobl, C.; Boulesteix, A.-L.; Zeileis, A.; Hothorn, T. Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef] [PubMed]
Agencia Nacional de Hidrocarburos (ANH); Universidad Pedagógica y Tecnológica de Colombia (UPTC). Atlas de Áreas de Interés para la Exploración de H2 en el Norte de Colombia; Agencia Nacional de Hidrocarburos (ANH): Bogotá, Colombia; Universidad Pedagógica y Tecnológica de Colombia (UPTC): Tunja, Colombia, 2025.
Jurgens, C. The Modified Normalized Difference Vegetation Index (MNDVI) a New Index to Determine Frost Damages in Agriculture Based on Landsat TM Data. Int. J. Remote Sens. 1997, 18, 3583–3594. [Google Scholar] [CrossRef]
Hunt, E.R.; Daughtry, C.S.T.; Eitel, J.U.H.; Long, D.S. Remote Sensing Leaf Chlorophyll Content Using a Visible Band Index. Agron. J. 2011, 103, 1090–1099. [Google Scholar] [CrossRef]
Vincini, M.; Frazzi, E.; D’Alessio, P. A Broad-Band Leaf Chlorophyll Vegetation Index at the Canopy Scale. Precis. Agric. 2008, 9, 303–319. [Google Scholar] [CrossRef]
Jaturapitpornchai, R.; Matsuoka, M.; Kanemoto, N.; Kuzuoka, S.; Ito, R.; Nakamura, R. Newly Built Construction Detection in SAR Images Using Deep Learning. Remote Sens. 2019, 11, 1444. [Google Scholar] [CrossRef]
Hamwood, J.; Alonso-Caneiro, D.; Read, S.A.; Vincent, S.J.; Collins, M.J. Effect of Patch Size and Network Architecture on a Convolutional Neural Network Approach for Automatic Segmentation of OCT Retinal Layers. Biomed. Opt. Express 2018, 9, 3049–3066. [Google Scholar] [CrossRef]
Zhu, Y.; Moayed, Z.; Bollard-Breen, B.; Doshi, A.; Ramond, J.B.; Klette, R. Detection of Fairy Circles in UAV Images Using Deep Learning. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
Irish, R.R.; Barker, J.L.; Goward, S.N.; Arvidson, T. Characterization of the Landsat-7 ETM+ Automated Cloud-Cover Assessment (ACCA) Algorithm. Photogramm. Eng. Remote Sens. 2006, 72, 1179–1188. [Google Scholar] [CrossRef]
Zhu, Z.; Woodcock, C.E. Object-Based Cloud and Cloud Shadow Detection in Landsat Imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
Zhu, Z.; Woodcock, C.E. Continuous Change Detection and Classification of Land Cover Using All Available Landsat Data. Remote Sens. Environ. 2014, 144, 152–171. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. A Comprehensive Review on Ensemble Deep Learning: Opportunities and Challenges. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, J.; Li, Y.; Li, B. Deep Ensemble Learning for Quantitative Geological Fracture Analysis Using Borehole Televiewer Images. J. Appl. Geophy. 2023, 213, 105046. [Google Scholar] [CrossRef]
Kazemi, F.; Asgarkhani, N.; Ghanbari-Ghazijahani, T.; Jankowski, R. Ensemble Machine Learning Models for Estimating Mechanical Curves of Concrete-Timber-Filled Steel Tubes. Eng. Appl. Artif. Intell. 2025, 156, 111234. [Google Scholar] [CrossRef]
Dorigo, W.A.; Zurita-Milla, R.; de Wit, A.J.W.; Brazile, J.; Singh, R.; Schaepman, M.E. A Review on Reflective Remote Sensing and Data Assimilation Techniques for Enhanced Agroecosystem Modeling. Int. J. Appl. Earth Obs. Geoinf. 2007, 9, 165–193. [Google Scholar] [CrossRef]
Al-Khaier, F. Soil Salinity Detection Using Satellite Remote Sensing. Master’s Thesis, International Institute for Geo-Information Science and Earth Observation, Enschede, The Netherlands, 2003. [Google Scholar]
Lu, D.; Weng, Q. A Survey of Image Classification Methods and Techniques for Improving Classification Performance. Int. J. Remote Sens. 2007, 28, 823–870. [Google Scholar] [CrossRef]
Kumar, B.; Dikshit, O.; Gupta, A.; Singh, M.K. Feature Extraction for Hyperspectral Image Classification: A Review. Int. J. Remote Sens. 2020, 41, 6248–6287. [Google Scholar] [CrossRef]
Volesky, J.C.; Stern, R.J.; Johnson, P.R. Geological Control of Massive Sulfide Mineralization in the Neoproterozoic Wadi Bidah Shear Zone, Southwestern Saudi Arabia, Inferences from Orbital Remote Sensing and Field Studies. Precambrian Res. 2003, 123, 235–247. [Google Scholar] [CrossRef]
Chen, X.; Tao, H.; Zhou, H.; Zhou, P.; Deng, Y. Hierarchical and Progressive Learning with Key Point Sensitive Loss for Sonar Image Classification. Multimed. Syst. 2024, 30, 380. [Google Scholar] [CrossRef]
Gallo, I.; Gatti, M.; Landro, N.; Lo Schiavo, C.; Boschetti, M.; La Grassa, R.; Rehman, A.U. Enhancing Crop Segmentation in Satellite Image Time-Series with Transformer Networks. In Proceedings of the Sixteenth International Conference on Machine Vision (ICMV 2023), Yerevan, Armenia, 15–18 November 2023; Osten, W., Ed.; SPIE: Bellingham, WA, USA, 2024; p. 23. [Google Scholar]
Gonthina, N.; Narasimha Prasad, L.V. An Enhanced Convolutional Neural Network Architecture for Semantic Segmentation in High-Resolution Remote Sensing Images. Discov. Comput. 2025, 28, 91. [Google Scholar] [CrossRef]

Figure 1. (A) Location of the study area within the Carolina Bays (USA), where McMaho et al. [7] report subcircular structures associated with potential natural hydrogen emissions. (B) AOI-01 and AOI-02, zones with clear subcircular expression. (C) Landsat-8 coverage used (path 016/row 036). (D) AOI-03 and AOI-04, areas without surface evidence of subcircular structures. The training labels were derived from the public layer provided by the North Carolina Department of Environmental Quality (NCDEQ). The coordinate reference system for Figure (A) is WGS 84 (EPSG: 4326), while Figures (B–D) use UTM Zone 17N (EPSG: 32617).

Figure 2. Prediction using logistic regression in a subsection of AOI-02: (A) Landsat-8 image, (B) target map (ground-truth class), (C) standard deviation of the predicted probability, and (D) mean predicted probability for Class 1 (FC).

Figure 3. Prediction using random forest in a subsection of AOI-02: (A) Landsat-8 image, (B) target map (ground-truth class), (C) standard deviation of the predicted probability, and (D) mean predicted probability for Class 1 (FC).

Figure 4. Prediction using the MLP classifier (version 1) in a subsection of AOI-02: (A) Landsat-8 image, (B) target map (ground-truth class), (C) standard deviation of the predicted probability, and (D) mean predicted probability for Class 1 (FC).

Figure 5. Prediction using the MLP classifier (version 2) in a subsection of AOI-02: (A) Landsat-8 image, (B) target map (ground-truth class), (C) standard deviation of the predicted probability, and (D) mean predicted probability for Class 1 (FC).

Figure 6. Prediction from 75 traditional classification models in a subsection of AOI-02: (A) Landsat-8 image, (B) target map (ground-truth class), (C) standard deviation of the predicted probability, and (D) mean predicted probability for Class 1 (FC).

Figure 7. Prediction from 75 traditional classification models in a subsection of AOI-04: (A) Landsat-8 image, (B) target map (ground-truth class), (C) standard deviation of the predicted probability, and (D) mean predicted probability for Class 1 (FC).

Figure 8. CNN predictions for sub-areas within AOI-02 and AOI-04. (A) Landsat-8 image, (B) label map (ground-truth class), and (C) predicted probability of Class 1 (FC). Panels (A–C) correspond to AOI-02. (D) Landsat-8 image. (E) label map (ground-truth class), and (F) predicted probability of Class 1 (FC). Panels (A–C) correspond to AOI-04.

Figure 9. Predictions from three U-Net architectures in a subsection of AOI-02: (A) Landsat-8 image, (B) target map, (C) prediction using only the 7 satellite bands, (D) prediction integrating the most relevant bands and indices, and (E) prediction using all bands plus the relevant indices. In AOI-02, where FCs are present, the U-Net models exhibit greater lateral continuity than the traditional models; however, the subcircular delineation of the structures is still not clearly defined.

Figure 10. Predictions from three U-Net architectures in a subsection of AOI-04: (A) Landsat-8 image, (B) target map, (C) prediction using only the 7 satellite bands, (D) prediction integrating the most relevant bands and indices, and (E) prediction using all bands plus the relevant indices. In AOI-04, where there is no evidence of FCs, the U-Net models show a consistent performance in avoiding spurious detections; however, the presence of clouds and shadows occasionally affects the predictions, introducing residual artefacts.

Figure 11. Predictions from two U-Net architectures with higher input resolution in a subsection of AOI-02: (A) Landsat-8 image, (B) target map, (C) prediction using only the seven satellite bands, and (D) prediction using the most relevant bands and indices. In AOI-02, where FCs are present, the larger input size improves spatial contextualization and enhances the continuity of the model responses.

Figure 12. Predictions from two U-Net architectures with higher input resolution in a subsection of AOI-04: (A) Landsat-8 image, (B) target map, (C) prediction using only the seven satellite bands, and (D) prediction using the most relevant bands and indices. In AOI-04, where no FCs are present, the U-Net models show an apparently reduced performance due to the presence of spurious detections; cloud and shadow interference more strongly affect the predictions, introducing residual artefacts.

Table 1. Summary statistics for five of the 28 predictor variables derived from Landsat 8 imagery. The table includes Bands B3, B4, and B6, as well as the NUI B5-B4 (analogous to NDVI) and B6-B4. The reported values indicate a substantial overlap between the two classes, Class 0 (non-FC) and Class 1 (FC), suggesting limited separability of these predictors when considered individually.

	B3		B4		B6		B5-B4		B6-B4
	0	1	0	1	0	1	0	1	0	1
Mean	0.13	0.10	0.12	0.09	0.24	0.21	0.82	0.84	0.75	0.77
Std	0.16	0.11	0.16	0.11	0.13	0.10	0.14	0.10	0.13	0.09
Min	0.00	0.00	0.00	0.00	0.00	0.03	0.00	0.51	0.00	0.37
25%	0.05	0.05	0.03	0.03	0.15	0.14	0.75	0.80	0.71	0.75
50%	0.07	0.06	0.05	0.05	0.21	0.19	0.88	0.88	0.80	0.80
75%	0.11	0.09	0.11	0.09	0.30	0.27	0.92	0.92	0.84	0.83
Max	1.00	0.89	1.00	0.91	1.00	0.73	1.00	1.00	1.00	1.00

Table 2. Joint ranking of the training feature importance for the logistic regression and random forest models. Variables that appear in the top half of the ranking for both models are highlighted in bold.

Order	Logistic Regression	Random Forest
1	B3	B4-B2
2	B5-B2	B6-B4
3	B7-B4	B4-B3
4	B6-B4	B7-B6
5	B4	B7-B3
6	B6-B1	B7-B2
7	B1	B4-B1
8	B4-B3	B5
9	B6-B5	B6
10	B4-B1	B3
11	B3-B1	B6-B3
12	B7-B3	B7-B1
13	B5-B3	B2-B1
14	B6	B6-B5
15	B3-B2	B7
16	B7	B4
17	B6-B2	B5-B3
18	B7-B2	B7-B4
19	B5-B4	B2
20	B4-B2	B6-B2
21	B7-B5	B5-B4
22	B6-B3	B7-B5
23	B7-B1	B6-B1
24	B5-B1	B3-B1
25	B5	B1
26	B2	B5-B2
27	B2-B1	B3-B2
28	B7-B6	B5-B1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

García-Arias, S.; Florez, M.A.; Valencia Ortiz, J.A. Evaluation of Machine Learning Methods for Detecting Subcircular Structures Associated with Potential Natural Hydrogen Sources. Geomatics 2026, 6, 16. https://doi.org/10.3390/geomatics6010016

AMA Style

García-Arias S, Florez MA, Valencia Ortiz JA. Evaluation of Machine Learning Methods for Detecting Subcircular Structures Associated with Potential Natural Hydrogen Sources. Geomatics. 2026; 6(1):16. https://doi.org/10.3390/geomatics6010016

Chicago/Turabian Style

García-Arias, Sergio, Manuel A. Florez, and Joaquín Andrés Valencia Ortiz. 2026. "Evaluation of Machine Learning Methods for Detecting Subcircular Structures Associated with Potential Natural Hydrogen Sources" Geomatics 6, no. 1: 16. https://doi.org/10.3390/geomatics6010016

APA Style

García-Arias, S., Florez, M. A., & Valencia Ortiz, J. A. (2026). Evaluation of Machine Learning Methods for Detecting Subcircular Structures Associated with Potential Natural Hydrogen Sources. Geomatics, 6(1), 16. https://doi.org/10.3390/geomatics6010016

Article Menu

Evaluation of Machine Learning Methods for Detecting Subcircular Structures Associated with Potential Natural Hydrogen Sources

Highlights

Abstract

1. Introduction

Study Area

2. Materials and Methods

2.1. Data Acquisition and Preprocessing

2.2. Training Data

2.3. Model Training

3. Results

3.1. Exploratory Data Analysis

3.2. Training

3.3. Features Importance: Logistic Regression and Random Forest Classification

3.4. Predictions from Traditional Models

3.5. CNN Models

3.6. U-Net Models

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Appendix G

Appendix H

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI